GMM-based classification of vowels
Tasks to do:
- Simplified classification of vowels based on GMM
Compute mean values and variances for 1st and 2nd formant for basic 5 vowels which formant frequencies are saved in the following file
(Formants 1-4 computed always from 3 realizations of each particular vowel are saved in variables Fa1, Fa2, Fa3, etc.). Compute mean values from the 1st realizations, i.e. from Fa1, Fe1, Fi1, Fo1 a Fu1.
- 1st checked result:
- Display distribution of 1st and 2nd formant (formant triangle)
for particular realizations of vowels (i.e. 3 different figures),
- add mean values of formants for all 5 vowels into previously created figures.
- Define the formula for two dimensional (n-dimensional) Gaussian function (i.e. probability density function) for the purpose of vowel classification based on the first 2 (4) formants and prepare the implementation of this function in MATLAB.
- Display particular two dimensional Gaussian probability density functions of particular vowels.
- 2nd checked result:
- Compute values of likelihood emitted by Gaussian probability density function for particular vowels. Use one short-time frame from the second or third realization of vowels (e.g. the 5th row of Fa2). Find the highest likelihood and check that it correspond to analyzed sound. Repeat for other vowels.
- Repeat the same task using available functions in MATLAB,
i.e. gmdistribution.fit and pdf and compare obtained results.
- 3rd checked result:
- Realize the classification using MATLAB function pdf for all short-time frames of the 2nd or 3rd realization of vowels for
all available GMM models. Observe results in graphical form.
- Try to use also logarithmic probabilities.
- On-line identification of vowels based on formants or cepstra
- Compute parameters of GMM model from all your available realizations of vowels saved in the database zreratdb.
- You can access recorded signals directly in clasrooms at CTU FEE. The data are directly available in the directory "H:\VYUKA\ZRE\signaly\zreratdb".
- To work outside of CTU FEE, you can dowload the following archive of signals resampled to
16 kHz zrerat_blocken_2018_cs0.zip.
- Realize the classification of on-line recorded vowel.
- Use all 4 formant frequencies or 12 MFCC cepstral coefficients (c-c) as speech features.
- 4th checked result: Display for on-line recorded vowel:
- waveform, spectrogram and formant (or MFCC) time dependency,
- formants or cepstrum for voice activity short-time frames only (use power VAD),
- all values of log-likelihood and their averages for all 5 GMM modles of particular basic vowels.