Ceska verze teto stranky Back to main page | list of seminars

### BE2M31ZRE seminar Cepstrum, cepstral distance, voice activity detection

• Cepstrum of one speech frame
• Compute cepstrum of this frame using following techniques:
- DFT based real cepstrum - function rceps
- LPC cepstrum - functions lpc, a2c.m, a2c0.m, use LPC order p = 16,
- MFCC (mel-frequency cepstral coefficients) - as dct of logarithm of mel spectrum computed on the basis of mel-frequency filter bank using the following functions melbf.m, mel.m, melinv.m.
• Result: for one short-time frame muz1-AA-frame.CS0 display:
- coefficients c[0]-c[12] for real, LPC and MFCC cepstrum

• Cepstrum of longer utterance
• Compute cesptral coefficients for all short-time frames of the signal SA176S01.CS0 using following parameters of short-time analysis:
- frame length 25 ms, frame step 10ms, Hamming window weighting
- LPC order p=16,
- number of bands in mel filter bank M=30, boundaries of frequnecy band fmin=100Hz, fmax=6500Hz,
- observe always coefficients c[0] - c[12] or c[1] - c[12] ( i.e. without the coefficient c[0])
• Use the following functions
- short-time real cepstrum of long signal - vrceps.m
- short-time LPC cepstrum of long signal - vaceps.m, aceps.m, burg.m, a2c.m, a2c0.m,
- short-time mel-cepstrum of long signal (MFCC] - vmfcc.m, melbf.m, mel.m, melinv.m,
• Result: for longer utterance SA176S01.CS0 display:
- coefficients c[1]-c[12] for real, LPC and MFCC cepstrum

• Further signals for possible processing
- mc20bc116016.ils_a - raw data, fs=44100 Hz,
- a30650b1.wav - wav format,
- your own signal on-line recorded using the sampling frequency of 8 kHz, 16 kHz, 44.1 kHz

• Cepstral distance from a background and cepstral voice activity detector
• Compute average cepstrum from the inital non-speech part of processed utterance SA176S01.CS0, see 2nd checked result, i.e. from 10-20 segments using MFCC cepstrum.
• Compute Euclidan distance between current frame cepstrum and background average cepstrum (computed in the previous step as written above) and observe its time-dependency.
• Use the following approaches of cepstral distance computation
- cd0.m (distance in dBs with c[0])
- cd1.m (distance in dBs without c[0])
• Realize the detection of speech activity on the basis of computed distance using fixed and adaptive thresholding on the basis of dynamics, i.e. using functions thr_fixed.m and thr_adapt_dyn.m
• Result: for the utteranceSA176S01.CS0 display:
- result of VAD using MFCC cepstrum and adaptive threshold on the basis of dynamics.

• Compare the results of cepstral VAD with energy one, i.e. use the analogous thresholding for the estimation of short-time power of the signal in the same short-time frames. You can use your functions created at 2nd seminar or the following function speechpwr.m).