BE2M31ZRE seminar
VQ-based speaker identification
Guidelines:
- Demonstration of clustering on the basis of K-means
- Observe K-means based clustering using the following script
kmeans_example.m.
Used data represents clusters in 2-dimensional space with adjustable
standard deviation of additive Gaussian noise.
- Observe the results of clustering in the following cases:
- varying standard deviation of additvie Gaussian noise (variable sigma)
- varying number of realizations of clustered data (variable xlen)
- varying number of clusters (variable clusternum)
- Cepstral space of one speaker
- Use signals recorded by you at the first seminar:
- Your signals resampled to 16 kHz (files *.CS0) are in FEL
classroom available in the direcotry
"H:\VYUKA\ZRE\signaly\zreratdb".
- For your work outside of FEL network the database is avaialable
at http:noel.feld.cvut.cz/vyu/data/zreratdb. In
this location, there are both the original signals sampled by
48 kHz (*.BIN files) same as 16 kHz resample data (*.CS0 files).
- For your work outside of FEL network you can also download the
fllowing archive
zrerat_blocken_2018_cs0.zip.
- For signals S0-S9 compute matrix of MFCC cepstral coefficients for
short-time frames (e.g. "TEN*S1.CS0").
- USe the function vmfcc.m
(and further called functions
, melbf.m, mel.m, melinv.m),
and use the following setup of MFCC computation:
- frame length 25 ms, frame shift 5 ms, Hamming window
- number of filter-bank bands M=30; frequency range fmin=100Hz, fmax=6500Hz,
- number of cepstral coefficients cp=12 (i.e. 12 + 1).
- Computed kepstra without c[0] join into one matrix
and observe distribution of kepstrum in the space of ceofficients
c[1] - c[8].
- 1st checked result: pro for yoour utterances S0 - S9
observe:
- dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6] and c[7]-c[8] for MFCC cepstrum .
- VQ of MFCC cepstrum distribution
- Replace the classification based on GMM realized within Task 8a by vector-quantization-based clasification. It means that you should use the same train and test sets, i.e. cepstral matrices with removed non-speech frames computed within the previous task.
- For the training (i.e. the creation of codebook), use K-means
algorithm (function kmeans, as it is used in demo kmeans_example.m), i.e.
- the number of clusters should be 50-200 according to the amount
of training data,
- 2nd checked result:
- Display computed codebook of your voice as dependencies
c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] (optimally in 4 subplots) commonly with the all short-time relizations for your training set.
- 3rd checked result:
- Display similarly as mentioned above your codebook commonly with the all short-time relizations for:
- testing data created from your utterances,
- testing data created from other speaker.
- Speaker indetification based on K-means Vector
Quantization (VQ)
- Compare the testing utterances of your voice and testing utternaces of other speaker with the codebook created created for your voice (from training utterances). Use the following
function for this purpose vq_encode.m.
- 4th checked result: Display:
- values of all available values and MEAN VALUE of measured cepstral distance for all your testing utterances,
- all values same as the MEAN VALUE of measured cepstral distance for utterances of OTHER SPEAKER and YOUR CODEBOOK.
- ON-LINE speaker verification
- On the basis of mean value of distances computed for your voice and for the voices of ther speakers define empirically the value of verification threshold and realize the verification of on-line recorded utterance.
- Homework:
- ON-LINE VERIFICATION OF YOUR VOICE
- recommended outputs: ACCEPTANCE/REJECTION of supposed identity + computed MEAN distance + verification threshold.