Basic experiment data structure

How experiment data is stored on a technical level.

Two files are created when a participant completes an experiment in our lab: expt.mat and data.mat. Expt.mat is a setup or configuration file which contains information about the experiment: what day was it run, what is the participant's ID, which experiment was run, etc. Data.mat contains the actual experiment data: for speech experiments, the most important one being the signal recorded from the participant's microphone.

More information about each of these files is below.

Both expt.mat and data.mat are files which contain a single variable: either expt or data. These variables are structures, AKA "structs."

What's a struct?

Structure files ("structs") are like filing cabinets. A filing cabinet doesn't just have loose papers in it; instead, it has folders inside of it. A folder could either have loose papers in it, or it could have multiple other folders, which themselves have papers (or more folders).

In Matlab, the folders inside of a filing cabinet are called "fields" and a piece of paper is the value of a variable, like 2 or 'Hello world'. So if you see something that says expt.name = 'simonSingleWord', that means: there is a structure expt which has a field called name. The value of the field name is 'simonSingleWord'.

expt.mat

expt is a setup or configuration variable. It contains information about the experiment.

expt is automatically generated by the Matlab code which runs our experiments, and is typically set up right before we start recording any trials with the participant. Meaning, no information about the recordings themselves are in expt. (That all goes in the data variable).

expt has several important fields:

  • name - the name of the experiment
  • snum - the participant's ID
  • dataPath - where the data was originally saved
  • ntrials - the number of trials in the whole experiment
  • date - when the experiment was run
  • timing - expt.timing is a sub-struct which contains fields relevant to the presentation of stimulus timing
    • timing.stimdur - "stimulus duration," or how long the stimulus is on-screen in a single trial. Typically around 2 seconds.
    • timing.interstimdur - "interstimulus duration". The baseline amount of time between trials. Typically around 1.25 seconds.
    • timing.interstimjitter - "interstimulus jitter". In addition to expt.timing.interstimdur, a variable amount of time to also wait between trials. Typically around 0.75, which means that the "jitter" lasts anywhere from 0 to 0.75 seconds.
  • inds - discussed below

X, allX, listX

In some cases, there are three fields associated with one another. For example: expt.words, expt.allWords, expt.listWords. I'll refer to these as expt.X, expt.allX, and expt.listX.

expt.X is the complete set of possible values for that thing. So if we had an experiment which for 99 trials cycles through displaying "bed" "head" or "ted" to the participant, expt.words would contain {'bed', 'head', 'ted'}

expt.allX and expt.listX are similar, because they both tell you, what is happening on each trial of the experiment. They are both arrays with a length equal to expt.ntrials, meaning in a 99-trial experiment they would be 99 values long.

expt.allX is a numerical vector of indexes into expt.X for each trial. So if expt.allX(1) = 2, that means that on trial 1, the 2nd-indexed value in expt.X would be used. In our words example, expt.allWords(1) = 2 means that 'head' would display.)

expt.listX is an extension of expt.allX but is a cell array. Instead of displaying the numbered index into expt.X, it holds the actual representational value, such that expt.listX = expt.X(expt.allX). In our words example, expt.listWords(1) = {'head'}

This same format of X, allX, and listX is used for multiple types of information:

  • words - discussed above
  • conds - aka "conditions." An experiment with baseline, ramp, hold, and washout phases would use expt.conds = {'baseline' 'ramp' 'hold' 'washout'}
  • vowels - similar to expt.words, but with just the arpabet value of the vowel in the first syllable. For example, expt.vowels = {'AE' 'EH' 'IH'}
  • stimulusText - the actual text displayed on-screen. This is different than expt.words for experiments with sentences (vsaSentence, taimComp), or where expt.words conveys something other than just the word itself (simonToneMatch, simonToneLex). Here is an example: expt.words = {'bed' 'head'} whereas expt.stimulusText = {'I spend my mornings reading in BED', 'HEAD over when you feel like it'}
  • shiftNames - perturbation direction of F1, with a string. expt.shiftNames = {'noShift' 'shiftUp' 'shiftDown'}
  • shifts - similar to shiftNames, but with the actual perturbation value that would be passed into Audapter. expt.shifts = {0 1 -1}. Note that expt.allShifts and expt.allShiftNames need to be identical for this to work. 
  • colors - for stroop experiments, the color of the text the word was printed in. For example, expt.colors = {'blue' 'red' green'}

Lastly, the field expt.inds is a struct which catalogs the indexes of these various fields. So expt.inds.words would contain the fields .bed and .head and .ted. So, expt.inds.words.bed is an array of the 33 trials on which 'bed' was the word used. expt.inds.words.bed = [2 6 8 11 15 16 20 ...]. This is just handy for some data analysis scripts.

Other fields in expt important for Audapter experiments

  • trackingFileName - if this was an experiment that used Audapter, this is the name of the OST file used for status tracking
  • audapterParams - the parameters provided to Audapter for this participant. Includes things like sampling rate, LPC Order, forgetting factor, perturbation amplitude (often configured elsewhere if changing from trial to trial), if experiment was 1D or 2D
  • shiftMags - the amount of perturbation applied to the signal on each trial (if changed from trial to trial)
  • subjOstParams - if custom OSTs were configured for this participant, this stores those custom OST values that would be loaded into the OST file

The bare minimum

The required fields in expt to complete data analysis are expt.ntrials, expt.words, expt.allWords, and expt.listWords.

Here is an example of the minimum expt.mat file: https://drive.google.com/drive/folders/1qsVp8iYUyD0AfnV5JD-TaecadsA1WE2v?usp=sharing

data.mat

data is a vectorized structure, where each row of data represents a trial. For example, data(11) gives you all the information about trial 11.

The two most important fields in data are data.signalIn and data.signalOut. 

  • signalIn - what the experiment microphone recorded the participant saying. 
  • signalOut - the signal presented to the participant's headphones. On perturbation trials, this will be the perturbed signal.
    • Note that for experiments which use masking noise, signalOut will include the masking noise baked in. (This isn't an intentional choice; it's just how Audapter sends information back via Audapter('getData'))

The length of signalIn and signalOut is determined by the sampling rate of the signal. Most Audapter experiments record at a hardware sampling rate of 48kHz, but downsample by a factor of 3, for an effective sampling rate of 16kHz. The sampling rate and downsampling rate can be found in data(1).params.sr and data(1).params.downFact.

Like a .wav file, data.signalIn and data.signalOut record in the range [-1 1] the position of the mic diaphragm. In fact, these fields can be converted to .wav files very easily: audiowrite('trial1_exported.wav', data(1).signalIn, data(1).params.sr)

Other fields in data

Many other fields in data are the output of Audapter, and are measured in units of frames (aka, sampling window). Audapter evaluates things like RMS and formant values in frames, to improve accuracy and reduce noise. An Audapter frame is typically 2 milliseconds.

Other important fields in data:

  • rms - Audapter's root mean squared (RMS) intensity value.
    • Note that when doing data analysis, our lab typically run files through Praat and uses Praat's more accurate/configurable RMS and formant values. These Praat values DO NOT overwrite the values in data. They are stored in separate files called trial files, discussed in the "What is a trial file?" section of this KB doc.
    • The first column is short-time RMS values. The second column is short-time RMS values with a high pass filter. The third column is non-smoothed, non-filtered short-time RMS values
  • fmts - Audapter's formant values for F1, F2, F3, and F4, in each of columns 1-4
  • dfmts - derivative of the formant values in data.fmts
  • ost_stat - the OST status number on that frame
  • params - a sub-struct with all the Audapter parameters used on that trial
  • pitchHz and shiftedPitchHz - if doing pitch shifting, the F0 value from signalIn and signalOut, respectively

The bare minimum

The required fields in data to complete data analysis are data.signalIn and data.params.sr, which is the post-downsampling sampling rate in Hz. Thus, often data.params.sr = 16000

Here is an example of that bare minimum data.mat file: https://drive.google.com/drive/folders/1qsVp8iYUyD0AfnV5JD-TaecadsA1WE2v?usp=sharing



Keywords:
data, expt, data.mat, expt.mat, structure, matlab 
Doc ID:
117641
Owned by:
Chris N. in SMNG Lab Manual
Created:
2022-03-29
Updated:
2023-12-18
Sites:
Speech Motor Neuroscience Group