Demonstration of clustering on the basis of K-means
Observe K-means based clustering using the following script
kmeans_example.m.
Used data represents clusters in 2-dimensional space with adjustable
standard deviation of additive Gaussian noise.
Observe the results of clustering in the following cases:
- varying standard deviation of additvie Gaussian noise (variable sigma)
- varying number of realizations of clustered data (variable xlen)
- varying number of clusters (variable clusternum)
Cepstral space of one speaker
Create training data of selected speaker (optimally for your voice), i.e. for signals S0-S9 compute matrix of MFCC cepstral coefficients for all available short-time frames (e.g. "TEN*S1.CS0").
Use signals recorded by you at the first seminar:
Your signals resampled to 16 kHz (files *.CS0) are in FEL
classroom available in the direcotry
"H:\VYUKA\ZRE\signaly\zreratdb".
For your work outside of FEL network the database is avaialable
at http:noel.feld.cvut.cz/vyu/data/zreratdb. In
this location, there are both the original signals sampled by
48 kHz (*.BIN files) same as 16 kHz resample data (*.CS0 files).
For your work outside of FEL network you can also download the
following archive zrerat_blocken_2022_cs0.zip.
USe the function vmfcc.m
(and further called functions
, melbf.m, mel.m, melinv.m),
and use the following setup of MFCC computation:
apply preemphasis to processed signal with the coefficient m = 0.97,
number of filter-bank bands M=30; frequency range fmin=100Hz, fmax=6500Hz,
number of cepstral coefficients cp=12 (i.e. 12 + 1).
apply VAD based on power in
dB with fixed thresholding with the level
approx. at 40-50% of dynamics (for application of VAD use
functions speechpwr.m
and thr_fixed.m
used already at previous seminars),
Computed cepstra without c[0] join into one matrix
and observe distribution of cepstrum in the space of ceofficients
c[1] - c[8].
Result: pro for your utterances S0 - S9
observe:
dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6] and c[7]-c[8] for MFCC cepstrum .
VQ of MFCC cepstrum distribution
Compute codebook for given train data set (your voice), i.e. vector-quantization of given cepstral space. It means that you should use the same train and test sets, i.e. cepstral matrices with removed non-speech frames computed within the previous task.
For the training (i.e. the creation of codebook), use K-means
algorithm (function kmeans, as it is used in demo kmeans_example.m), i.e.
- the number of clusters should be 50-200 according to the amount
of training data,
Checked result (1 point):
Display computed codebook of your voice as dependencies
c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] (optimally in 4 subplots) commonly with the all short-time relizations for your training set.
Display similarly as mentioned above your codebook commonly with the all short-time relizations for:
- testing data created from your utterances,
- testing data created from other speaker.
Speaker indetification based on K-means Vector
Quantization (VQ)
Compare the testing utterances of your voice and testing utternaces of other speaker with the codebook created created for your voice (from training utterances). Use the following
function for this purpose vq_encode.m.
Checked result (1 point):
Display:
values of all available values and MEAN VALUE of measured cepstral distance for all your testing utterances,
all values same as the MEAN VALUE of measured cepstral distance for utterances of OTHER SPEAKER and YOUR CODEBOOK.
GMM-based speaker verification
Compute GMM models of cepstrum variability for your own voice using the function fitgmdist and same training data like for the codebook computation. Use number of mixtures m=4-6 and full covarinace
matrix (option 'CovType').
Realize the verification based on log-likelihoods for all short-time frames of analyzing speakers using MATLAB function pdf.
Results: Display:
all short-time values as well as the mean of the score computed
on the basis of emitted logarithmic likelihood for all test
utterances for your own voice,
short-time values as well as the mean of the score test
utternaces for other speaker.
the result of verification for other speakers.
ON-LINE speaker verification based on GMM
On the basis of mean value of distances computed for your voice and for the voices of ther speakers define empirically the value of verification threshold and realize the verification of on-line recorded utterance.
Checked result (1 point):
ON-LINE VERIFICATION OF YOUR VOICE
recommended outputs: ACCEPTANCE/REJECTION of supposed identity + computed MEAN distance + verification threshold.