Guide to OST heuristics (UW-Madison Audapter)

A guide to the OST capabilities available in UW-Madison's version of Audapter.

Guide to Audapter Online Status Tracking (OST) capabilities

The following table provides information about different heuristics, the parameters that they operate over, a description of their (actual) functionality, and examples of what they are good for. This includes both legacy heuristics as well as new heuristics introduced by Chris Naber at UW-Madison (denoted with *), which can be found in our fork of the Audapter repo (https://github.com/blab-lab/audapter_mex). Note that some of the functionality may differ from what is described in the manual, as the manual is not wholly accurate. 

Some heuristics are more reliable than others. These are highlighted in green. Others can be used, but may not have the effect you intended for confusing reasons. These are highlighted in yellow. Some are buggy. These have dark red font. 

A table of OST heuristics and their uses

Heuristic name

Param 1

Param 2

Param 3

Function

Use case

ELAPSED_TIME duration NaN -- Waits for the given amount of time, then advances. Note: This heuristic only increments the OST status value by 1 number. Setting a trigger after a fixed time, rather than in relation to a speech event.
INTENSITY_RISE_HOLD RMS duration -- Checks if RMS is above threshold for given amount of time. Does NOT check for rise.  Onsets of vowels. 
INTENSITY_RISE_HOLD_POS_SLOPE RMS duration -- Checks if RMS is above threshold for given amount of time. DOES check for rise (positive slope) Onsets of vowels. 
POS_INTENSITY_SLOPE_STRETCH frames delta -- Checks if RMS is rising above threshold and covers a certain amount of ground.  Onsets of vowels. Somewhat unintuitive. 
NEG_INTENSITY_SLOPE_STRETCH_SPAN frames delta -- Checks for negative slope for a number of frames and distance covered.  Offsets of vowels/onsets of stops after vowels. Somewhat unintuitive. 
*INTENSITY_SLOPE_BELOW_THRESH RMS slope duration -- Checks that the RMS slope is below a threshold for a given duration.  Can check for slow increases in RMS or rapid decreases in RMS, like a vowel to stop transition. 
INTENSITY_FALL RMS duration -- Checks for RMS below threshold for given duration. Does NOT check for fall.  Offsets of vowels/onsets of stops after vowels
*INTENSITY_BELOW_THRESH_NEG_SLOPE RMS duration -- Checks for RMS below threshold, with decreasing RMS (negative slope), for given duration. DOES check for fall.  Offsets of vowels/onsets of stops after vowels. 
INTENSITY_RATIO_RISE ratio duration -- Checks for high ratio of high frequency energy to low frequency energy for given duration.  Onsets of fricatives. Can also be used to demarcate vowels from nasals, especially front vowels with high F2. 
*INTENSITY_RATIO_ABOVE_THRESH_
WITH_RMS_FLOOR
ratio duration (RMS = 0.0003) Checks that ratio is above a threshold for a given duration, and requires that the RMS itself be above 0.0003.  Onsets of fricatives, particularly if they are the first segment in an utterance. 
*INTENSITY_AND_RATIO_ABOVE_THRESH RMS ratio duration Checks that both RMS and ratio are above individually specified thresholds for a given duration.  Onsets of fricatives, particularly if they are the first segment in an utterance. Also good for demarcating the onset of a vowel after a nasal, with additional security from vowels being louder than nasals. More sophisticated version of the fixed RMS floor. 
INTENSITY_RATIO_FALL_HOLD ratio duration -- Checks for ratio below threshold for given duration. Does NOT check for fall in ratio.  Offsets of fricatives. Note that this does not check for fall, only threshold. 
*INTENSITY_RATIO_SLOPE_ABOVE_
THRESH
ratio-slope duration -- Checks that the ratio slope (i.e., change in ratio) is above a given threshold for a given duration.  Onsets of fricatives or front vowels, with attention paid to how quickly the segment onset is. 
*INTENSITY_RATIO_SLOPE_BELOW_
THRESH
ratio-slope duration -- Checks that the ratio slope (i.e., change in ratio) is below a given threshold for a given duration.  Offsets of fricatives or front vowels, with attention paid to how quickly the segment stops. 

General notes

  1. Using "ratio" heuristics
    1. These are good for:
      1. Fricative boundaries, particularly sibilants 
      2. Distinguishing vowels (which have more energy in higher formants) from nasals (which have nasal antiformants) 
    2. Reasonable values
      1. With UW's setup, we tend to use values like 0.04 to find fricative boundaries. Sibilants easily get to values past 1. 
    3. Precautions
      1. Most of the ratio heuristics, particularly the legacy heuristics, do not check for direction of the passed threshold. So improper use of two ratio-based heuristics in a row can lead to blowing through OST statuses. 
      2. Vowel ratios are more unreliable than fricatives, due to vocal tract variability 
      3. The Audapter manual does not describe ratio correctly. Segments with a lot of energy in high frequencies have a HIGH ratio. 
  2. Using RMS heuristics
    1. These are good for: 
      1. Finding vowels, particularly when framed by stops (or silence)
    2. Reasonable values
      1. With UW's setup, we tend to use values like 0.02 to find vowel onsets and offsets. You can of course be more aggressive to work within the confines of your particular target phrase. 
    3.  Precautions
      1. Many of the legacy heuristics are not descriptive, e.g. INTENSITY_FALL does not check for a fall. 
      2. The stretch/span heuristics are somewhat unintuitive but can be manageable with audapter_viewer. They are somewhat reliable but seem to be susceptible to natural variability and voice quality issues. This is because the slope can be very negative and fall for a while in the middle of a vowel if something happens to the voice quality, all while still keeping relatively high RMS values. The heuristics that take both slope and RMS into consideration are more reliable in this respect. 

Last updated 10/14/2022 by RPK




Keywords:audapter, guide, OST, tracking, segmentation, time   Doc ID:121883
Owner:Robin K.Group:Speech Motor Neuroscience Group
Created:2022-10-14 21:05 CSTUpdated:2022-12-12 08:42 CST
Sites:Speech Motor Neuroscience Group
Feedback:  0   0