How experiment data is stored on a technical level.
Two files are created when a participant completes an experiment in our lab: expt.mat and data.mat. Expt.mat is a setup or configuration file which contains information about the experiment: what day was it run, what is the participant's ID, which experiment was run, etc. Data.mat contains the actual experiment data: for speech experiments, the most important one being the signal recorded from the participant's microphone.
More information about each of these files is below.
Both expt.mat and data.mat are files which contain a single variable: either expt
or data
. These variables are structures, AKA "structs."
Structure files ("structs") are like filing cabinets. A filing cabinet doesn't just have loose papers in it; instead, it has folders inside of it. A folder could either have loose papers in it, or it could have multiple other folders, which themselves have papers (or more folders).
In Matlab, the folders inside of a filing cabinet are called "fields" and a piece of paper is the value of a variable, like 2
or 'Hello world'
. So if you see something that says expt.name = 'simonSingleWord'
, that means: there is a structure expt
which has a field called name
. The value of the field name
is 'simonSingleWord'
.
expt
is a setup or configuration variable. It contains information about the experiment.
expt
is automatically generated by the Matlab code which runs our experiments, and is typically set up right before we start recording any trials with the participant. Meaning, no information about the recordings themselves are in expt
. (That all goes in the data
variable).
expt
has several important fields:
In some cases, there are three fields associated with one another. For example: expt.words, expt.allWords, expt.listWords. I'll refer to these as expt.X, expt.allX, and expt.listX.
expt.X is the complete set of possible values for that thing. So if we had an experiment which for 99 trials cycles through displaying "bed" "head" or "ted" to the participant, expt.words
would contain {'bed', 'head', 'ted'}
expt.allX and expt.listX are similar, because they both tell you, what is happening on each trial of the experiment. They are both arrays with a length equal to expt.ntrials, meaning in a 99-trial experiment they would be 99 values long.
expt.allX is a numerical vector of indexes into expt.X for each trial. So if expt.allX(1) = 2
, that means that on trial 1, the 2nd-indexed value in expt.X would be used. In our words example, expt.allWords(1) = 2
means that 'head' would display.)
expt.listX is an extension of expt.allX but is a cell array. Instead of displaying the numbered index into expt.X, it holds the actual representational value, such that expt.listX = expt.X(expt.allX)
. In our words example, expt.listWords(1) = {'head'}
This same format of X, allX, and listX is used for multiple types of information:
expt.conds = {'baseline' 'ramp' 'hold' 'washout'}
expt.vowels = {'AE' 'EH' 'IH'}
expt.words = {'bed' 'head'}
whereas expt.stimulusText = {'I spend my mornings reading in BED', 'HEAD over when you feel like it'}
expt.shiftNames = {'noShift' 'shiftUp' 'shiftDown'}
expt.shifts = {0 1 -1}
. Note that expt.allShifts and expt.allShiftNames need to be identical for this to work.
expt.colors = {'blue' 'red' green'}
Lastly, the field expt.inds
is a struct which catalogs the indexes of these various fields. So expt.inds.words
would contain the fields .bed and .head and .ted. So, expt.inds.words.bed
is an array of the 33 trials on which 'bed' was the word used. expt.inds.words.bed = [2 6 8 11 15 16 20 ...].
This is just handy for some data analysis scripts.
expt
important for Audapter experimentsThe required fields in expt
to complete data analysis are expt.ntrials
, expt.words
, expt.allWords
, and expt.listWords
.
Here is an example of the minimum expt.mat file: bare minimum expt.mat
data
is a vectorized structure, where each row of data
represents a trial. For example, data(11)
gives you all the information about trial 11.
The two most important fields in data are data.signalIn and data.signalOut.
Audapter('getData')
)The length of signalIn and signalOut is determined by the sampling rate of the signal. Most Audapter experiments record at a hardware sampling rate of 48kHz, but downsample by a factor of 3, for an effective sampling rate of 16kHz. The sampling rate and downsampling rate can be found in data(1).params.sr and data(1).params.downFact.
Like a .wav file, data.signalIn and data.signalOut record in the range [-1 1] the position of the mic diaphragm. In fact, these fields can be converted to .wav files very easily: audiowrite('trial1_exported.wav', data(1).signalIn, data(1).params.sr)
Many other fields in data
are the output of Audapter, and are measured in units of frames (aka, sampling window). Audapter evaluates things like RMS and formant values in frames, to improve accuracy and reduce noise. An Audapter frame is typically 2 milliseconds.
Other important fields in data
:
data
. They are stored in separate files called trial files, discussed in the "What is a trial file?" section of this KB doc.
The required fields in data
to complete data analysis are data.signalIn
and data.params.sr
, which is the post-downsampling sampling rate in Hz. Thus, often data.params.sr = 16000
Here is an example of that bare minimum data.mat file: bare minimum data.mat