Primer on LPC order
Why should I care about this?
For most of our speech production experiments, we use the GUI check_audapterLPC, which allows us to check that the LPC order for a participant so their vowel formants can be tracked well. Although Audapter (our main experimental software) provides default values for male vs. female speakers,a sometimes speakers fall outside these defaults and would benefit from a different LPC order. In addition, when doing data analysis, we also personalize the LPC order for each participant in order to best track their vowels.
The GUIs we have are both very good for experimenting with different LPC orders, such that you can try out a couple of different LPC orders and see which one is best. You can use these GUIs without knowing how LPC orders work. However, for some experiments, it can be beneficial to have a basic understanding of what LPC order is doing, and how it may relate to other characteristics of the participant's voice that can bear on how the experiment runs.
What is LPC order? (Intuitive)
LPC stands for Linear Predictive Coding. In our lab, we use LPC to extract formant values from a complex wave signal (the speech waveform).
Why do we need LPC orders anyway?
We can model the speech waveform as a combination of a source (vibration of the vocal folds) and a filter (the rest of the vocal tract). When we measure the rate of vibration of the vocal folds, we mostly care about f0 (fundamental frequency), which is the lowest frequency of vibration. However, the vocal folds also produce smaller vibrations at other rates, which are known as harmonics. Harmonics are at integer multiples of f0. So, for example, a vocal fold source with f0 100 Hz will also have vibrations at 200 Hz, 300 Hz, 400 Hz, etc.
Generally speaking, the amplitude of each of these vibrations will decrease as the frequency gets higher. So, f0 is the loudest, followed by a quieter H2 (second harmonic), followed by an even quieter H3, and so on. However, the vocal tract will then filter this complex source. When we make different articulations with our mouths, we make different chambers of air that the source has to go through. These chambers of air have preferred frequencies that they like to vibrate at, which is based on their size and shape. In general, larger chambers like lower frequencies, while smaller chambers like higher frequencies. So when the source reaches these chambers, those preferred frequencies will be extra amplified compared to the others around it---the chambers "filter out" the frequencies that they don't like, and intensify the ones that they do like.
When these different chambers are associated with vowels, we call them formants. When we do speech analysis, we often want to measure formants. One way to do this is to measure the size and shape of the air chambers that the source went through to produce a particular vowel---that is, the size and shape of someone's mouth as they were talking. Obviously, that is very impractical! Instead, what we have is the waveform that we recorded with a microphone, which is the combined result of the source and the filter. The math that uses LPC order is an attempt to extract the information about the filter from the information about the source.
What is an LPC order?
In the most intuitive sense, the LPC order refers to how many formants you expect to exist below a given frequency (transformed by some fancy math---if you want to learn more about the math, see "What is an LPC order? (Math)" below). The number of formants that you expect in a signal will depend on how far apart the chambers' preferred resonances are. This, in turn, is related to how long the vocal tract is.
You may be thinking that we only ever measure F1 and F2, so shouldn't we always expect two formants, no matter how long someone's vocal tract is? Although for our purposes in the lab we typically only care about F1 and F2, this is just because vowel identity can be almost entirely determined by F1 and F2 alone.b However, much like how the vocal folds vibrate at f0 as well as integer-multiple harmonics, the chambers formed by the mouth also have multiple frequencies that they prefer. For example, in a schwa, the whole vocal tract forms one resonating chamber. Its lowest preferred frequency may be 500 Hz, which we call F1 (the first, or lowest, preferred frequency). However, it will also prefer 1500 Hz (F2---the next lowest preferred frequency), 2500 Hz (F3), 3500 Hz (F4), and so on. Note that these frequencies are 1000 Hz apart, which is 2x F1.c Thus, for a vocal tract of the size that produces these resonances, we would expect 5 formants below 5,000 Hz:
F1) 500 Hz;
F2) 1500 Hz;
F3) 2500 Hz;
F4) 3500 Hz;
F5) 4500 Hz
The next formant would be beyond 5,000 Hz (F6 = 5500 Hz).
In contrast, someone with a slightly shorter vocal tract may produce a schwa with higher formants, which are spaced further apart (recall that the spacing is 2x F1).
F1) 600 Hz,
F2) 1800 Hz;
F3) 3000 Hz;
F4) 4200 Hz.
The next formant would be beyond 5,000 Hz (F5 = 5400 Hz). So, for this speaker, we would only expect 4 formants below 5,000 Hz.
Thus, the LPC order will be higher for people with a longer vocal tract, and lower for people with a shorter vocal tract. Generally speaking, men tend to have longer vocal tracts, and women tend to have shorter vocal tracts. This is why Audapter's default LPC order is 17 for men, and 15 for women (recall that there is some math to get to these numbers---we do not expect 17 or 15 formants!). If we ever recorded child speech, the LPC order would be much lower, because children (young children especially) have very short vocal tracts.
How does an LPC order help us extract formants?
Another way of stating this question is: how does knowing the number of formants help us determine the values of the formants? Essentially, LPC is trying to use linear regressions to predict the sound signal using a certain number of coefficients (the order).
As an analogy, you can think about trying to recreate an ink color that was made by combining 4 different basic ink colors together (CMYK---cyan, magenta, yellow, black). If you only allow someone to use 2 different inks to recreate the original, they will not be able to match that color---and moreover, the proportions of the two colors they could use would likely not be correct. If you forced them to use all 4 + 1 additional ink, they still wouldn't be able to match the color, and in trying to use that fifth color, the proportions of the other four colors might be affected as well.
A similar thing happens if you provide the LPC analysis with the wrong order (number of formants):
- If you tell it that there should only be three formants below 5,000 Hz when there should really be five, it may ignore some formant frequencies in favor of spreading out the two that it is allowed ("missing inks"), or combine two formants into one "super" formant frequency ("misattribute some part of the color to a permissible ink")
- If you tell it that there should be seven formants below 5,000 Hz when there should really be five, it may find formants where none really exist ("be forced to use an additional ink color"), or move other formants down to make room for other formants ("change the ink proportions").
What is LPC order? (Math)
Examples of good and bad LPC orders
Below are some examples from a short female speaker saying the word "bide" (vowel: /ai/) that illustrates some of the principles discussed above. Note: in the audioGUI pictures, the spectrogram limit is set to 3500 Hz so that you can more easily see F1 and F2 (instead of them being squished together at the bottom).
Figure 0: showing bare spectrogram with dotted line trace of approximately where the formants should be found
Note that for the [a] part of /ai/, F1 and F2 are very close together so it can be hard to see them separately. As the vowel progresses into the [i] portion, F2 goes up and F1 goes down.
Figure 1: LPC order = 7
For this speaker, LPC order = 7 is far too low. That is, the analysis "ran out of formants" and is trying to space them out. So F1 starts out kind of where F2 should be, and F2 actually starts kind of above where F3 should be (the next gray band above F1 and F2). There are skips and jumps as this analysis gets better and worse---you can see that at one point about 25% of the way through the vowel, F2 jumps down to where it should be (F1 jumps down a bit too low). At about the halfway point, F1 jumps down to where it should be, and F2 starts following F3.
Figure 2: LPC order = 9
For this speaker, LPC order = 9 is okay. There is a bit of wobbliness when F2 gets high and gets close to F3---remember that with a lower LPC order, the formants want to be a little more spaced out. It is difficult to see, but F1 does look a little too low near the beginning (you can see the separation of F1 and F2 a little better near the beginning of the vowel.
Figure 3: LPC order = 11
This is probably the correct LPC order for this participant. The formant tracks are comparatively stable, and follow the body of the formant bands well throughout.
Figure 4: LPC order = 13
Here, you can see that the formant tracks are jumping away from the actual formant bands in the opposite direction of when the LPC order was too low. That is, now that the math has to fit more formants into the same amount of space, it wants to pack them all together. The algorithm particularly does not like how high F2 gets at the end of the vowel.
Okay, but I don't see "extra formants". Show me the extra formants!
The above figures are taken from audioGUI, which does not show formant estimates beyond F1 and F2, so we can't see the "invention" of formants. However, since Praat will trace more formants for you, we can see what happens when you ask for more formants than actually exist. Formants in Praat are traced with red speckles. In this figure, the spectrogram goes up to 8000 Hz so you can see more formants.
As a note: LPC order is based on the number of formants desired, but it is not actually the number of formants due to fancy math reasons. When Praat asks you for the LPC order, it literally asks you for the number of formants that you expect to find, and then does the math on the back end. audioGUI is asking you for the mathed-up value. So, the good-looking formant tracks will not use the same number in the audioGUI figures vs. the Praat figure.
Figure 5: Praat, asking for 7 formants below 8000 Hz
Here, you can see that the formant tracks follow the grey bands quite well, and you can see all the way up to F6. The formant tracks are relatively smooth, especially F1 and F2, where there is naturally more energy and thus the prediction is a little easier.
Figure 6: Asking for Praat for 10 formants below 8,000 Hz
Here, you can see that Praat tried its darndest to find you 10 formants, but to do so, it had to make many bad tracks. For example, in the blue circled areas, it split a formant into two different formants. In the green area, there is general confusion about how many formants there should be as some split, merge, and then split again in a different spot.
Because LPC order is related to vocal tract length, and vocal tract length is related to formant values, knowing someone's LPC order can give you some information about a speaker that could be helpful in another part of the experiment.
- Vocal tract length is the distance between the vocal folds and the lips. Since it is a physical phenomenon, it is typically correlated with biological sex and height. You can also imagine some other sources of variability, such as neck length, larynx height, length between pharynx and lips, etc.
Recall also that source (pitch) and filter (vocal tract) are not the same, though they also tend to be correlated for the same physical reasons---habitual pitch is based on the size of the vocal folds themselves. However, just because someone has a high voice does not necessarily mean that they have a short vocal tract (or vice versa), so your impression of a speaker's pitch is not necessary going to be informative regarding their LPC order. Because of this disconnect, some voice therapies for transgender people include a focus on the resonance (filter) of their voice in addition to the pitch (source).
- F3 is often used to distinguish rhoticized vowels, like the vowel in "her" (rhotic schwa/syllabic r). Rhotics tend to make F3 go down compared to similar, non-rhotacized vowel (e.g. last syllable of "Becca" vs. "Becker").
- This is due to the fact that the vocal tract can be modeled as a tube that is closed at one end and open at the other. If you would like more information on resonant frequencies and different configurations of tubes, you can check out this site or this book