BE2M31ZRE seminar
GMM-based classification of vowels
Tasks to do:
HOME PREPARATION:
Compute MFCC cepstra (12+1) for all your own realizations of basic vowels from the database zreratdb (This data are available for you in the directory "K:\ZRE\data\zreratdb" in the classroom 802 at CTU FEE or they can be downloaded from the archive zrerat_blocken_2023_cs0.zip which contains *.CS0 signals resampled to 16 kHz.).
Use the following setup of MFCC computation:
- apply preemphasis with the coeffcient m=0.97 - short-time frame length of 25 ms, frame shift 5 ms (to increase number of realizations), Hamming weighting window,
- number of filter-bank bands nd frequency range boundaries: M=30, fmin=100Hz, fmax=6500Hz,
- use function vmfcc.m (s
with further called functions melbf.m,
mel.m, melinv.m).
Save computed MFCC coefficients in particular variables for available realization of all vowels.
Observe the clustering of cepstrum in available 8 dimension.
Result: for all your own realizations of vowels display dependencies c[1]-c[2], c[3]-c[4], c[5]-c[6] and c[7]-c[8] for MFCC cepstrum (i.e. 4 subplots in one figure,
use different colors for particular vowels).
Classification of vowels based on formants
Observe distribution of 1st and 2nd formant (formant triangle)
for particular realizations of vowels saved in the following file
formants_vowels.mat
(formants 1-4 computed always from 3 realizations of each particular vowel are saved in variables Fa1, Fa2, Fa3, etc.). Display 3 different figures of formant triangles for 3 available realizations.
Compute mean values and variances of formant frequencies for the 1st realizations of basic 5 vowels, values are saved in variables Fa1, Fe1, Fi1, Fo1 a Fu1.
Result:
Add mean values of formants for all 5 vowels into previously created figures.
Define the formula for n-dimensional Gaussian function (i.e. probability density function) and prepare your own implementation of this function in MATLAB.
Result:
Compute values of likelihood and logarithmic likelihood emitted by Gaussian probability density function for particular vowels. Use one short-time frame from the second or third realization of vowels (e.g. the 5th row of Fa2). Find the highest likelihood and check that it correspond to analyzed sound. Repeat for other vowels.
Repeat the same task using available functions in MATLAB,
i.e. gmdistribution.fit and pdf and compare obtained results.
Result:
Realize the classification using MATLAB function pdf for all short-time frames of the 2nd or 3rd realization of vowels for
all available GMM models. Observe results in graphical form.
Identification of vowels based on cepstral features
Repeat previous steps and realize vowel identification based on MFCC features computed within home preparation. Compute parameters of GMM models and probabilities as well using MATLAB functions fitgmdist and pdf.
Realize the classification for 3rd realization of your vowels.
Result: Display for selected realization of vowel "E":
all values of log-likelihood and their averages for all 5 GMM models of particular basic vowels,
repeat it also for other vowels.
You can use pre-trained GMM models of vowels computed on the basis larger set from the database zreratdb, see cv8_vowel_gmms.mat.
Realize a classification of on-line recorded vowel.
Remove non-speech frames and very soft frames as well for on-line recorded vowel based on VAD using short=time power in dBs and fixed thresholding.
Result:
observe waveform and spectrogram without application of VAD,
observe MFCC time dependency without VAD,
observe MFCC time dependency after application of VAD for purposes of further classification,
all values of emitted log-likelihood and their averages for all 5 GMM models of particular basic vowels.
OPTIONAL HOMEWORK
Traning of general speaker-independent GMM models of vowels
Compute parameters of general GMM model from all available realizations of basic Czech vowels saved in the database zreratdb pronunced by Czech speakers.
Data are available in the classroom 802 at CTU FEE
in the directory "K:\ZRE\data\zreratdb".
For your work out of FEE classrooms, take data from the following archive of signals resampled to sampling frequency of 16 kHz zrerat_block200_2023_cs0.zip (Czech speakers) or small archive of all vowels zrerat_vowels_all_cs0.zip.