How to run: taimComp

How to run taimComp. Part of cerebellar battery (2022-2023).

Special circumstances: part of battery

This experiment is part of the cerebellar battery run in 2022-2023. 

  • Patients (UW and UCSF): Session 3
  • Controls (UW): Session 2

In this battery, participants come in for multiple sessions and do multiple experiments in a row. As such, this is a bare bones document on how to run the experiment. Procedures for consent, hearing screening, awareness surveys, general equipment set up, and payment are not included in this document. See the documents below for how these procedures are implemented in this multi-study session: 

  1. Protocol for cerebellar battery: controls
  2. For patients

What's special about this experiment

This experiment uses formant clamping to simulate acceleration, deceleration, undershoot, and overshoot of the vowel /ai/. 

This is very reliant on accurate OST tracking from Audapter. For this, we individualize OST parameters for each participant using an in-house GUI called audapter_viewer. Here is a video guide for how to use audapter_viewer. If you would like more information about the particular heuristics that are used for OST tracking, see this guide

Note: You MUST use UW's version of Audapter (and accompanying Matlab code) for this!! Other versions do not have formant clamping. The experiment code does a hard check for the formant clamping before starting so you will find out quickly if your Audapter is not set up right. 

Prepping for participant

Before running the participant, determine if they are a speaker with monophthongization of the target vowel or not. Speakers with monophthongization cannot participate in this experiment because it renders the manipulations null!

  1. Monophthongization of /ai/ is a typical feature of Southern American English and Black English, though not all speakers of these dialects will necessarily have it (depending on their other linguistic experiences)

  2. Monophthongization means the vowel in “buy” or “guide” will sound more like “bah” or “gahd” 

  3. If you cannot hear this specifically without looking at a spectrogram, you will get the opportunity to do that during the LPC order check.

Pre-experiment instructions

Tell the participant: “This experiment has three shorter sections and then one long section. There will be breaks between sections while I set up the next part.” 

  1. Type run_taimComp_expt into the command window and hit enter. 
  2. You will be asked for participant number. It is important to use the right kind of prefix so that the trials are the right duration (for patients, they are longer/slower with more time between trials) 
    1. UW:

      1. if control, spXXX

      2. If patient, caXXX

    2. UCSF, UC-Berkeley: 

      1. Currently, the code looks for the substring ‘ca’ to identify patients. This can be changed to look for an additional condition if you have some other identifier in your own system

  3. You will then be asked about the participant’s height. This is how we determine the starting value for LPC order.

Preparation phase 1: LPC order

  1. In this phase, participants will see words on the screen and say them out loud. 

  2. Tell the participant: “For this first section, you will see one word at a time appear on the screen. When you see the word on the screen, read it out loud, just like you would normally say it. You will be speaking into the microphone on the desk, and you will hear your own voice and some noise played back through the headphones. Do you have any questions?"

    "Please put on your headphones now."

  3. The participant will complete 30 trials, 10 trials per word (bod, bead, bide). 

    1. If you have not yet determined if the speaker has monophthongization, look at the formant trajectories in “bide” as they show up on the control screen. 

    2. Examine the Audapter-tracked formants as they are coming up on the control screen. Note if the tracking seems to indicate that the LPC order should be changed. Indications that something might be off: 
      1. F2 transition from a to i in /ai/ might be extremely jumpy or jittery, in a way that does not follow the underlying spectrogram (some speakers may be smoother or more stable than others, so be sure to look at how the colored formant track corresponds with the spectrogram, not just the characteristics of the colored line alone). 
      2. F2 for /a/, especially near the /b/ transition, is questionable
      3. F2 for /i/ might jump down and back up 
  4. The check_audapterLPC GUI will then come up. Use the GUI to find an appropriate LPC order for the participant (if you want more general information, see this primer on LPC order, with explanations and examples). Some general guidelines for this experiment: 

    1. If you are still unsure about monophthongization, you can use this time to take a good look at those formant trajectories. Monophthongization should look very obvious: formants will not move at all across the entire vowel (see example below, this is someone from Duck Dynasty saying "sideline" [saidlain] with F1 and F2 highlighted)

      sideline with monophthongization. F1 and F2 do not move

    2. You do NOT need to include/exclude trials. This button is used to pick out vowels that should be used to calculate a mean F1/F2 value for a particular vowel. We do not use that information in this study. 

    3. Look at several different vowels by clicking on different points in the vowel map to be sure that you are getting an overall impression of how formant tracking is going, not just looking at a single trial. Since people have some natural variability in their productions from trial to trial, you need to make sure that their general formant range is tracking well. 

    4. The most important vowel to check for accuracy is the /ai/ in "bide", because this is the vowel that is actually used in the experiment. It is also the one with the most movement over the course of the formants, which can interact with LPC order to produce a formant track that jumps away from the actual resonances. 
    5. If the formant trajectory seems to be pushed lower than where the actual formant is (e.g., the formant  track is overall too low, or there are small patches or spikes where the formant is being pushed down), this is an indication that the LPC order is probably too high (the formant tracking algorithm is assuming that the person has a longer vocal tract than they do, and thus that formants should be lower/closer together). LOWER the LPC order (e.g., from 15 to 13).  

    6. If the formant trajectory seems to be pushed higher than where the actual formant is, this is an indication that the LPC order is too low (the formant tracking algorithm is assuming that the person has a shorter vocal tract than they do, and thus that formants should be higher/further apart). RAISE the LPC order (e.g., from 15 to 17). 

    7. If you've done another experiment with this participant that checked LPC order, the value you use there may be suggestive as to what you should do in this experiment. However, it may not necessarily be exactly the same, since different vowels are being used, and thus different formant values. 

    8. This is the first chance you have in this experiment to get to know the participant's vocal tract, which will come into play in the upcoming OST setting. People with a shorter vocal tract (LPC order = 15 or lower) will likely have higher RMS ratio values overall. People with a longer vocal tract (LPC order = 17 or higher) will likely have lower RMS ratio values overall. This is because vocal tract length affects how much energy there is in higher frequencies, and the amount of energy in higher frequencies affects Audapter's calculation of RMS ratio. 

Pretest phase 2: OST setting for "BUY donuts" and "GUIDE voters" 

  1. Tell the participant: “For this section, you will see a phrase and read it out loud. Try to read it in a clear voice, putting emphasis on the capitalized word, like this: BUY donuts now. Can you say those phrases for me?”
    1. You should coach them until they say the phrase in the right way: [sound examples of good productions: buy donuts example; guide tutors example

      1. It is important to use focus (emphasis) on the capitalized word so that it is long enough without being a very unnatural speech rate. 

      2. Many people can produce prosody more accurately to the target when you give them conversational examples where the target prosody would be used. For example, "pretend that you are an elementary school teacher and you are trying to calmly correct a young student. For example, they might ask you, should we make donuts now?? And you would say, no, BUY donuts now." 
      3. They should NOT put pauses between words, because they will be confounding the experimental conditions (and making it difficult to automatically track the segments). It should be a smooth, slow-ish speech rate. People are more likely to put pauses in "guide voters" (likely due to the /dv/ sequence) than in "buy donuts" but there is a tendency for both to happen if they are really trying to emphasize BUY or GUIDE. 

      4. Although duration will not be tracked in this phase, you should try to get them to say it at a speech rate similar to what will be used in the full experiment. That way, the landmarks for the vowels will be consistent (people may have different proportions of [a] to [i] at different speech rates), and the OST setting will be valid for the rest of the experiment. 
  2. When they have gotten comfortable with saying the phrases, press the space bar to advance to the screen that gives them the general instructions. Tell them: “Okay, you can start whenever you are ready.” 

  3. They will read each phrase 9 times in random order

  4. When they finish, tell them: “I am just going to make some measurements, so you can relax for a few minutes.” 

  5. After they have finished, audapter_viewer will open. Use audapter_viewer to set the OST parameters for the participant. (See section below for taimComp-specific guidance on OSTs)

  6. When you are satisfied with the parameters:

    1. Click "Recalculate ALL trials" using the set of parameters that you are satisfied with. This makes sure that there is no conflicting information about what the OST parameters will be going forward. 
    2. Click “Continue and Exit”.

    3. Click “Save and Exit”. 

    4. Verify the folder you would like to save into 

  7. If you had to change anything from the default, it is HIGHLY RECOMMENDED to run the OST setting phase again to make sure that they work with new data (and thus that they can generalize to the participant’s speech)

  8. If you have to repeat, tell the participant: “We’re just going to do that one more time so I can make sure everything is set up correctly.” 

taimComp-specific guidance for OST setting

For more general guidance on setting OSTs, see: guide on Audapter’s OST capabilities or how to use audapter_viewer

  1. Status 2: onset of /ai/ 
    1. This is the most important status! This is the status that finds beginning of the target vowel and thus, the beginning of the perturbation

    2. This status should be rather robustly tracking the very beginning of the vowel, but if you need it to be a touch late to avoid accidental triggers at other points, that is okay. It should not be more than 50 ms late or so, however. 

    3. The default heuristic is INTENSITY_AND_RATIO_ABOVE_THRESH, which allows more precise tracking of vowel onset separately from prevoiced /b/ onset, which is VERY common in the older population. 

      1. This heuristic is by far the most reliable to detect vowels after stops (prevoiced or otherwise), but if for some reason it is not working due to really weird ratio values (likely a voice quality issue), INTENSITY_RISE_HOLD_POS_SLOPE is an okay substitute. 

    4. Values for the first parameter, which is RMS intensity, will likely be in the 0.035 range. However, this will vary from person to person, or from trial to trial. Factors that may affect the threshold you use are: 

      1. How loud the person is talking.
        1. Loud talkers will reach 0.035 very quickly
        2. Quiet talkers will reach 0.035 slowly/further into the vowel; 

      2. How close they are to the mic (affects recorded loudness, so same effects as loudness). 

      3. Which mic you are using (we chose this threshold based on our setup at UW, so your mileage may vary)

    5. Values for the second parameter, which is RMS ratio, defaults to 0.17. Any adjustment will likely be in the 0.15 - 0.22 range. Factors that may affect the threshold you use are: 

      1. Vocal tract length (take note from LPC order):
        1. People with longer vocal tracts will tend to be fine with the default or lower values.
        2. People with shorter vocal tracts may need higher values

      2. Loudness/effort and voice quality 
        1. Being very loud and forceful can increase the energy in higher frequencies, and thus boost RMS ratio, such that you reach the threshold early. You might not have to address the threshold in this case, unless they are triggering the status before the vowel (in /b/ or /g/). 
        2. Being very quiet or breathy can decrease the energy in higher frequencies, and lower RMS ratio, such that you don't reach the threshold until too late. In this case you may need to lower the threshold so that the vowel can be detected in time. 
    6. Values for the third parameter, which is time, defaults to 0.008 (8 ms). This is largely to make sure that the speaker is well and truly beyond the thresholds, and not just randomly shimmering. 

  2. Status 4: start of /d/ in “guide” or “donuts”

    1. This is also the most important status! This is the status that finds the end of the vowel and thus the end of the perturbation. 

    2. You should try to get this status as close to the end of the vowel as possible, since /d/ usually has enough voicing such that Audapter tries to track formants through it. You should NOT cut off the vowel. 

    3. The default heuristic is INTENSITY_AND_RATIO_BELOW_THRESH, which allows more precise tracking of vowel offset without including a fully voiced /d/. This is a new heuristic introduced by UW on 4/17/2023 so you will need to update your Mex file! 

      1. If this heuristic is not working for some reason, one you can try is NEG_INTENSITY_SLOPE_STRETCH_SPAN or INTENSITY_FALL but both of these will be extremely prone to error if there are dips in RMS in the middle of the vowel (which is very common). 
    4. Similarly to status 2, values for the first parameter, which is RMS intensity, will likely be in the 0.035 range, with some variation from person to person, or from trial to trial. The same factors will affect this threshold as for status 2, but with opposite effects (because you are trying to be below the threshold now, not above): 

      1. Loudness:
        1. Louder talkers might get back down below 0.035 too late (in the /d/), or potentially not even at all; in this case you might have to increase the threshold to, say, 0.05. 
        2. If there is a REALLY quiet talker, they might dip below 0.035 in the middle of the vowel so you may need to lower the threshold so they don't trigger it early. (However, usually speakers still don't satisfy the ratio requirement in the middle, so RMS dips alone do not mean you'll need to change threshold)

      2. Voicing/loudness during /d/ can make the RMS take a long time to go down, so you may need to raise the threshold 

    5. Values for the second parameter, which is RMS ratio, will likely be around 0.15-0.22, with the same effects from speaker variation as status 2. 

      1. If a speaker had a naturally higher ratio (due to short vocal tract, extra loud voice/lots of energy in high frequencies), they may not go below the default ratio threshold in time. In that case, you'll have to raise this parameter.

    6. Values for the third parameter, which is time, defaults to 0.008 (8 ms). This is largely to make sure that the speaker is well and truly beyond the thresholds, and not just randomly shimmering. 

      1. This can be particularly helpful for the end of the vowel because there is frequently some short-scale change in loudness in the middle of the vowel that you don’t want to let trigger this status, particularly if they are a quiet talker. 

  3. Note on the symmetry of heuristics: because the heuristics for Status 2 and Status 4 are mirror images of each other, if you make adjustments to 2, you will likely have to make similar adjustments to 4 or risk status 4 happening immediately after status 2. This is because the heuristics only check for the values being ABOVE or BELOW the thresholds; they do not check for direction of change (rise vs. fall). 

    1. Example: if you set the thresholds for status 2 very low, say 0.01 for RMS ratio, the speaker will quickly move above those thresholds, say to 0.02. 

    2. Then, if the threshold for status 4 is higher, say 0.05, the speaker will immediately satisfy the requirement to be below status 4 thresholds as well (because 0.02 is already below the threshold, even though it is rising). 

  4. Note about trial-to-trial variability: When you are setting the OSTs, keep in mind that you are trying to set them for the entire experiment.

    1. Don’t fixate on a single trial that isn’t perfect. The OST section gives you 18 trials to work with; you want it to be good for at least 15 of them, with the remaining ones pretty close. If the remaining three are egregious (not triggering at all, triggering far too late or far too early), you should try to find a better set. 

    2. Don’t “overfit” the data. Sometimes, you may try so hard to get the alignment perfect that you choose thresholds that just BARELY trigger the statuses on exactly this set of productions. You want to find thresholds that are robust to natural variability that occurs in a speaker when they produce 200 trials of these two phrases. One good way to make sure that your statuses are robust is to choose to redo the OST setting phase after you change anything and see if you still get the same success. 

Segmentation

If everything was okay, audioGUI will then pop up for you to hand-correct four landmarks on all 18 trials. [See example of how to segment: buy donuts; guide tutors --- the full phrases are slightly different than the current version, but the segmentation of /ai/ is the same.] 

Note: the segmentation can take a while so if you are comfortable with multitasking and you have the technological means (e.g. you are in the same room as them), you can make chitchat with them while you make adjustments

  1. aiStart: beginning of vowel 

    1. Move this event to the beginning of the /ai/ in “buy” 

  2. a2iStart

    1. Move this event to when F2 starts moving up towards the second quality in /ai/ in earnest. 

  3. iPlateauStart

    1. Move this event to where F2 starts to reach the plateau (do not mark the peak—mark where the F2 trajectory starts to flatten out) 

  4. dStart

    1. Move this event to where the /d/ closure starts. This should be where formant energy reduces; some voicing will almost certainly still be there. 

Preparation phase 3: Duration training

  1. Tell the participant: 

    1. “In this section, you will practice saying the phrases at a good speed. When you say each phrase, you will get some feedback about how fast you were talking. If you see a BLUE circle, it’ll tell you to speak a little faster. If you see a YELLOW circle, it will tell you to speak a little slower. If you see a GREEN circle, that means you were speaking at a good speed.” 

    2. Pause to confirm 

    3. “So if you are told to speak a little slower or a little faster, you don’t have to really change how you are speaking drastically. Keep speaking smoothly and clearly, and just adjust a little. So like if you said [speak quickly] “BUY donuts now” and have to slow down, you can just say [speak more slowly] “BUY donuts now”, you don’t have to put any extra pauses in or anything.”

    4. Pause to confirm 

  2. They will do 10 trials (5 of each phrase). 

    1. Keep general track of how they do (usually too fast, usually too slow, usually good, etc.) 

    2. Keep an eye on the OSTs. The duration feedback is based on the OST values, so if they are not tracking correctly, the feedback will be off. 

  3. You will be given the option to repeat. 

    1. If you need to adjust the OSTs, you can do that, and then run again 

    2. Give general guidance on how fast to speak to the participant if necessary (referring to if they were generally fast/slow) 

Main experiment

  1. Tell the participant: “This is the last section. It will be just like the section you just did, but will last longer, about 20 minutes. There will be breaks every 20 trials. If you need to pause at another time, like to cough or to drink water, you can press p on the keyboard. Do you have any questions?” 

  2. During the experiment: 

    1. Keep an eye on their OST tracking. You can adjust mid-experiment if necessary by pressing ‘a’. Common triggers of adjustment:  
      1. Changes in loudness: Many people will talk more quietly as the experiment wears on, with accompanying changes in RMS and RMS ratio. You can sometimes address RMS changes by adjusting the gain, but if the ratio gets really out of whack, you may have to adjust the ratio threshold. 

      2. Changes in proximity to microphone: Some speakers may sit back in the chair after a while, which mostly affects RMS. This again can be helped by changing the gain or by asking them to sit forward again, but if that isn't possible you may need to adjust. 

    2. There are a few other settings that you can adjust mid-experiment by pressing 'e' 
      1. trial duration: If participants (particularly patients) are having a hard time completing the utterances before the end of the trial, you can increase the trial duration. 

      2. target vowel duration boundaries: if participants (particularly patients) are having a hard time getting the right duration feedback (not due to OST issues), you can loosen the boundaries for what is considered to be a good duration. Ideally you should only increase the maximum; participants must go slow enough for them to be able to react to the perturbation

      3. LPC Order: if you notice that the formants are not tracking the way they should be, the LPC order may be off. Formants that are not being tracked well will be overly jittery, and may have sudden dips or spikes. Specifically, these dips and spikes do not follow the underlying formant structure---some people (particularly patients) may have less smooth formants. An example of a trial with LPC order too high (that is, assuming a too-long vocal tract) is below, with the suspect areas underlined in red. Very bad LPC orders may even have whole patches where the shape of the formant is preserved, but is out of line with the rest of the formant (similar to octave doubling/halving in pitch tracking). 
        1. Sudden downward dips below where the actual formant is (such as in the example below) indicate that the formant tracking algorithm is preferring candidates in frequencies that are lower than the actual formant---in this case it is also interacting with the rapid change in formants, basically penalizing candidate formant values for being much higher than the previous values. In this case, you should LOWER the LPC order (e.g., from 15 to 13). This will ultimately give a little more preference to higher formant candidates.  
        2. Sudden upward spikes above where the actual formant is indicate that the formant tracking algorithm is preferring candidates in frequencies that are higher than the actual formant. In this case, you should RAISE the LPC order (e.g., from 15 to 17). This will ultimately give a little more preference to lower formant candidates. 

          Illustration of dips in F2 and overall jittery formant tracking. LPC order too high.
      4. RMS ratio threshold: this is the parameter that limits when formants can be tracked relative to how much energy there is in the high frequencies. This is most useful for avoiding tracking formants during sibilants. The default value is 1.7, which should be pretty permissive. Higher values are more restrictive (i.e., if you increase it enough, even vowels won't track anymore), lower values will allow tracking through higher-ratio segments (like /s/). 
        1. Problems with the RMS ratio threshold will result in full dropouts in BOTH formants at the same time. That is, both formants will drop to 0 at the same time. This is because Audapter simply isn't tracking the formants there and essentially providing no value for any formant. 
        2. NOTE 1: It is not uncommon for people with shorter vocal tracts (i.e., higher formants) to need the threshold to be adjusted downward. If you see dropouts, that is an indication that the RMS ratio threshold should be adjusted DOWN. An example of such dropouts is below (threshold set to 3.7; should be lower than 3.2 for this speaker). There are multiple dropouts in the formant tracks that are in the red circle---note the gap between the first and second halves of each formant's contour. 
        3. NOTE 1: 2.5 is 1/0.4. Audapter's coding uses 0.4 as the actual ratio value, but has it inverted for the threshold. 


          Example of formant dropout due to ratio threshold being too high (in this case, 3.7).

If Matlab crashes during the experiment

To restart taimComp in the event of a crash: 

  1. Type in run_taimComp_expt and hit enter
  2. Type in simple/hard (depending on what version you have been doing) 
  3. Type in the participant code
  4. You will then be asked if you want to load in their expt file (which should exist already from the first attempt at running). Type y
  5. You will be asked if you want to OVERWRITE their expt file. Click CANCEL
  6. The script will then look for which modules have already been done. If a data file already exists in each module (LPC order, OST testing, duration training), it will let you know and ask if you want to redo that phase anyway. If there is NOT a data file in one of those modules, that means that you didn't complete that module and will automatically redo it
    1. Note: if you didn't get to segmentation in OST pretest, you should redo it anyway 
  7. If you were in the middle of the main experiment, it will start you back where you were 
    1. Note: if you didn't get to the first trial of the perturbation phase, it will start over from trial 1. 

 



Keywords:
how to run, cerebellar, time, OST, taimComp, compensation 
Doc ID:
122062
Owned by:
Robin K. in SMNG Lab Manual
Created:
2022-10-21
Updated:
2024-06-28
Sites:
Speech Motor Neuroscience Group