Processed utterances: REFERENCE UTTERNACE"
- 1st realization of numeral - sc0
(numeral "zero" - signal "*C0.CS0")
UNKNOWN (RECOGNIZED) UTTERANCE:
1) identically same utterance - sc0 2) 2nd realization with the same content - sd0
(numeral "zero" - signal "*D0.CS0")
3) other utternace with different phonetic
contents - sd9 (numeral "nine" - signal "*D9.CS0" or "*C9.CS0")
Computation of cepstral features
Before the computation of cepstrum apply preemphasis with the coefficient
0.97. (Observe spectrum of signal with and without preemphasis.)
Compute short-time cepstra of used signals with following parameters:
- MFCC, frame length 25 ms, frame step 10ms, Hamming window,
fmin=100 Hz, fmax=6500 Hz, 30 filter-bands.
spectrograms (preemphasized) of two utterances with different contents, i.e. sc0 and
sd9, ("zero" and "nine") or sc0 and
sd0, ("zero" in two different realizations),
for above mentioned signal couples display time dependencies of MFCC.
Cepstral distances between two utterances
Compute the distance between all short-time frames and save
results in the matrix. Use Euclidan distance cde.m and compute it including c[0].
Display matrix in 2-dimensional graph (function pcolor)
Result: Display the computed
distances for:
same (identical) utterances: sc0 - sc0
utterances with same content but different realization: sc0 - sd0
different utterances: sc0 - sd9 (sc9)
In a free time compare also other utterances recorded in our database
(i.e. digits 0-9 + "star" and "hash").
In further free time, try to do the same procedure for the other approaches
of cepstrum and cepstral distance computation, i.e. vrceps.m, vaceps.m
or cd0.m, cd2.m,
etc.
Cummulative distance using DTW algorithm
From above obtained matrix compute the matrix of normalized cumulative
distance based on DTW algorithm.
Observe this matrix and compare the last target value (value at
bottom right corner).
Result: Observe matrices of
cummulative distance for:
same (identical) utterances: sc0 - sc0
utterances with same content but different realization: sc0 - sd0
different utterances: sc0 - sd9 (sc9)
Create a function dtwdist for the computation of normalize the computed distance with respect to the length of both
utternaces. As an input of this function take two cepstral matrices containing cepstra of particular short-time frames of two compared utterances, the output should be target normalized cummulative distance.
Result:
Display the target normalized distance (the value at
bottom right corner) for all possible reference patterns and one unknown
(recognized) utterance. Use data from recorded database
(i.e. recorded patterns
"*C0.BIN"-"*C9.BIN", "*W0.BIN", "*W1.BIN" vs. one selected utterance from
"*D0.BIN"-"*D9.BIN", "*V0.BIN", "*V1.BIN").
As homework, try to extent cepstral features by dynamic features (delta parameters) or acceleration features (delta-delta parameters). To compute delta or delta-delta features use the following function diffceps.m with M = 3 (order of derivative estimation).
Phone dialing recognizer
Create the patterns of commands (digits) for telephone dialing -
use the utterances recorded at the 1st exercise, i.e. isolated digits
"zero"-"nine" and commands
"star" a "hash".
Create a demonstrative script, which will record an utterance to
be recognized after a key-press. Compare recorded utternace on the
basis of DTW algorithm with all patterns and find that one with
minimal cumulative distance from recorded utterance.
- cyclic interactive recording of an utterance to be recognized in
MATLAB can be realized as recording_signals_in_loop.m
Result:
Functional on-line recognizer for the telephone dialing.