Basic experiment data structure

How experiment data is stored on a technical level.

Two files are created when a participant completes an experiment in our lab: expt.mat and data.mat. Expt.mat is a setup or configuration file which contains information about the experiment: what day was it run, what is the participant's ID, which experiment was run, etc. Data.mat contains the actual experiment data: for speech experiments, the most important one being the signal recorded from the participant's microphone.

More information about each of these files is below.

Both expt.mat and data.mat are files which contain a single variable: either expt or data. These variables are structures, AKA "structs."

What's a struct?

Structure files ("structs") are like filing cabinets. A filing cabinet doesn't just have loose papers in it; instead, it has folders inside of it. A folder could either have loose papers in it, or it could have multiple other folders, which themselves have papers (or more folders).

In Matlab, the folders inside of a filing cabinet are called "fields" and a piece of paper is the value of a variable, like 2 or 'Hello world'. So if you see something that says expt.name = 'simonSingleWord', that means: there is a structure expt which has a field called name. The value of the field name is 'simonSingleWord'.

expt.mat

expt is a setup or configuration variable. It contains information about the experiment.

expt is automatically generated by the Matlab code which runs our experiments, and is typically set up right before we start recording any trials with the participant. Meaning, no information about the recordings themselves are in expt. (That all goes in the data variable).

expt has several important fields:

X, allX, listX

In some cases, there are three fields associated with one another. For example: expt.words, expt.allWords, expt.listWords. I'll refer to these as expt.X, expt.allX, and expt.listX.

expt.X is the complete set of possible values for that thing. So if we had an experiment which for 99 trials cycles through displaying "bed" "head" or "ted" to the participant, expt.words would contain {'bed', 'head', 'ted'}

expt.allX and expt.listX are similar, because they both tell you, what is happening on each trial of the experiment. They are both arrays with a length equal to expt.ntrials, meaning in a 99-trial experiment they would be 99 values long.

expt.allX is a numerical vector of indexes into expt.X for each trial. So if expt.allX(1) = 2, that means that on trial 1, the 2nd-indexed value in expt.X would be used. In our words example, expt.allWords(1) = 2 means that 'head' would display.)

expt.listX is an extension of expt.allX but is a cell array. Instead of displaying the numbered index into expt.X, it holds the actual representational value, such that expt.listX = expt.X(expt.allX). In our words example, expt.listWords(1) = {'head'}

This same format of X, allX, and listX is used for multiple types of information:

Lastly, the field expt.inds is a struct which catalogs the indexes of these various fields. So expt.inds.words would contain the fields .bed and .head and .ted. So, expt.inds.words.bed is an array of the 33 trials on which 'bed' was the word used. expt.inds.words.bed = [2 6 8 11 15 16 20 ...]. This is just handy for some data analysis scripts.

Other fields in expt important for Audapter experiments

The bare minimum

The required fields in expt to complete data analysis are expt.ntrials, expt.words, expt.allWords, and expt.listWords.

Here is an example of the minimum expt.mat file: https://drive.google.com/drive/folders/1qsVp8iYUyD0AfnV5JD-TaecadsA1WE2v?usp=sharing

data.mat

data is a vectorized structure, where each row of data represents a trial. For example, data(11) gives you all the information about trial 11.

The two most important fields in data are data.signalIn and data.signalOut. 

The length of signalIn and signalOut is determined by the sampling rate of the signal. Most Audapter experiments record at a hardware sampling rate of 48kHz, but downsample by a factor of 3, for an effective sampling rate of 16kHz. The sampling rate and downsampling rate can be found in data(1).params.sr and data(1).params.downFact.

Like a .wav file, data.signalIn and data.signalOut record in the range [-1 1] the position of the mic diaphragm. In fact, these fields can be converted to .wav files very easily: audiowrite('trial1_exported.wav', data(1).signalIn, data(1).params.sr)

Other fields in data

Many other fields in data are the output of Audapter, and are measured in units of frames (aka, sampling window). Audapter evaluates things like RMS and formant values in frames, to improve accuracy and reduce noise. An Audapter frame is typically 2 milliseconds.

Other important fields in data:

The bare minimum

The required fields in data to complete data analysis are data.signalIn and data.params.sr, which is the post-downsampling sampling rate in Hz. Thus, often data.params.sr = 16000

Here is an example of that bare minimum data.mat file: https://drive.google.com/drive/folders/1qsVp8iYUyD0AfnV5JD-TaecadsA1WE2v?usp=sharing