This document commonly with all following links uses ISO
8859-2 (ISO-Latin-2) encoding
AE2M31ZRE - seminars - TASK No. 9
DTW based speech recognition
Tasks to do:
- Selection of signals for processing
- Work with the signals recorded at the first seminar.
- Recorded and resampled signals to 16 kHz (files *.CS0) are
avialable in the CTU FEL computer rooms in the directory
"H:\VYUKA\ZRE\signaly\zreratdb".
- To work out of the FEL the database is available at http:noel.feld.cvut.cz/vyu/data/zreratdb or in the archive zrerat_blocken_2018_cs0.zip.
- Processed utterances:
REFERENCE UTTERNACE"
- 1st realization of numeral - sc0
(numeral "zero" - signal "*C0.CS0")
UNKNOWN (RECOGNIZED) UTTERANCE:
1) identically same utterance - sc0
2) 2nd realization with the same content - sd0
(numeral "zero" - signal "*D0.CS0")
3) other utternace with different phonetic
contents - sd5 (numeral "five" - signal "*D5.CS0" or "*C5.CS0")
- Computation of features
- Before the computation of cepstrum apply preemphasis with the coefficient
0.97. Observe spectrum of signal with and without preemphasis.
- Compute short-time cepstra of used signals with following parameters:
- MFCC, frame length 25 ms, frame step 10ms, Hamming window,
fmin=100 Hz, fmax=6500 Hz, 30 filter-bands.
- Use functiones from Task No. 4.
- 1st checked result: Display:
- for the utterance sc0 with numeral
"zero" spectrograms without and with preemphasis,
- spektrograms of two utterances with different contents, i.e. sc0 and
sd5, ("zero" and "five"),
- display into one figure for MFCC the following dependanciesc[1]-c[2], c[3]-c[4],
c[5]-c[6] and c[7]-c[8] for utterances sc0, sd0 and sd5 .
- display time dependencies of cepstral coefficients c[1],
..., c[8] for utterances sc0, sd0 and sd5 .
- Compute cepstral distances of between two utterances
- Compute the distance between all short-time frames and save
results in the matrix
- Use distance
- cd1.m (distance without c[0])
- Display matrix in 2-dimensional graph (function pcolor)
- 2nd checked result: Display the computed
distances for:
- same (identical) utterances: sc0 - sc0
- utterances with same content but different realization: sc0 - sd0
- different utterances: sc0 - sd5 (sc5)
- As homework, try to do the same procedure for the other approaches
of cepstrum and cepstral distance computation, i.e. vrceps.m, vaceps.m
or cd0.m, cd2.m,
etc.
- Cumputation of cummulative distance using DTW algorithm
- From above obtained matrix computed the matrix of cumulative
distance of DTW algorithm
- Observe this matrix and compare the last target value (value at
bottom right corner).
- Normalize the computed distance with respect to the length of both
utternaces.
- 3rd checked result: Observe matrices of
cummulative distance for:
- same (identical) utterances: sc0 - sc0
- utterances with same content but different realization: sc0 - sd0
- different utterances: sc0 - sd5 (sc5)
- Display the target normalized distance (the value at
bottom right corner) for all possible reference patterns and one unknown
(recognized) utterance. Use data from recorded database
(i.e. recorded patterns
"*C0.BIN"-"*C9.BIN", "*W0.BIN", "*W1.BIN" vs. one selected utterance from
"*D0.BIN"-"*D9.BIN", "*V0.BIN", "*V1.BIN").
- Phone dialing recognizer
- Create the patterns of commands (digits) for telephone dialing -
use the utterances recorded at the 1st exercise, i.e. isolated digits
"zero"-"nine" and commands
"star" a "hash".
- Create a demonstrative script, which will record an utterance to
be recognized after a key-press. Compare recorded utternace on the
basis of DTW algorithm with all patterns and find that one with
minimal cumulative distance from recorded utterance.
- cyclic interactive recording of an utterance to be recognized in
MATLAB can be realized as recording_signals_in_loop.m
- 4th checked result:
- Functional on-line recognizer for the telephone dialing.