AE2M31ZRE seminar - Task No. 8
VQ-based speaker identification
Guidelines:
- Demonstration of clustering on the basis of K-means
- Observe K-means based clustering using the following script
kmeans_example.m.
Used data represents clusters in 2-dimensional space with adjustable
standard deviation of additive Gaussian noise.
- Observe the results of clustering in the following cases:
- varying standard deviation of additvie Gaussian noise (variable sigma)
- varying number of realizations of clustered data (variable xlen)
- varying number of clusters (variable clusternum)
- VQ of MFCC cepstrum distribution
- Replace the classification based on GMM realized within Task 8a by vector-quantization-based clasification. It means that you should use the same train and test sets, i.e. cepstral matrices with removed non-speech frames computed within the previous task.
- For the training (i.e. the creation of codebook), use K-means
algorithm (function kmeans, as it is used in demo kmeans_example.m), i.e.
- the number of clusters should be 50-200 according to the amount
of training data,
- 1st checked result:
- Display computed codebook of your voice as dependencies
c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] (optimally in 4 subplots) commonly with the all short-time relizations for your training set.
- 2nd checked result:
- Display similarly as mentioned above your codebook commonly with the all short-time relizations for:
- testing data created from your utterances,
- testing data created from other speaker.
- Speaker indetification based on K-means Vector
Quantization (VQ)
- Compare the testing utterances of your voice and testing utternaces of other speaker with the codebook created created for your voice (from training utterances). Use the following
function for this purpose vq_encode.m.
- 3rd checked result: Display:
- values of all available values and MEAN VALUE of measured cepstral distance for all your testing utterances,
- all values same as the MEAN VALUE of measured cepstral distance for utterances of OTHER SPEAKER and YOUR CODEBOOK.
- ON-LINE speaker verification
- On the basis of mean value of distances computed for your voice and for the voices of ther speakers define empirically the value of verification threshold and realize the verification of on-line recorded utterance.
- 4th checked result:
- ON-LINE VERIFICATION OF YOUR VOICE
- recommended outputs: ACCEPTANCE/REJECTION of supposed identity + computed MEAN distance + verification threshold.
- In the case of interest and free time try to realize the verification on the basis of GMM modelling (based on the solution used within Task 8a.