Download into MATLAB space the signal muz1-AA-frame.CS0 - raw data
without header, fs=16000 Hz, for loading into MATLAB
use function loadbin.m
Compute cepstrum of this frame using following techniques:
- DFT based real cepstrum - function rceps - LPC cepstrum -
functions lpc, a2c.m, a2c0.m, use LPC order p = 16,
- MFCC (mel-frequency cepstral coefficients) - as dct of logarithm of mel spectrum computed on the basis of mel-frequency filter bank using the following functions melbf.m, mel.m, melinv.m.
Result: for one short-time framemuz1-AA-frame.CS0 display:
- coefficients c[0]-c[12] for real, LPC and MFCC cepstrum
Cepstrum of longer utterance
Compute cesptral coefficients for all short-time frames of
the signal SA176S01.CS0 using following parameters of short-time analysis:
- frame length 25 ms, frame step 10ms, Hamming window weighting
- LPC order p=16,
- number of bands in mel filter bank M=30, boundaries of frequnecy band fmin=100Hz, fmax=6500Hz,
- observe always coefficients c[0] - c[12] or c[1] - c[12] ( i.e. without the coefficient c[0])
Result: for longer utteranceSA176S01.CS0 display:
- coefficients c[1]-c[12] for real, LPC and MFCC cepstrum
Further signals for possible processing - mc20bc116016.ils_a - raw data, fs=44100 Hz,
- a30650b1.wav - wav format,
- your own signal on-line recorded using the sampling frequency of 8 kHz, 16 kHz, 44.1 kHz
Cepstral distance from a background and cepstral voice activity detector
Compute
average cepstrum from the inital non-speech part of processed utterance SA176S01.CS0, see 2nd checked result, i.e. from 10-20 segments using MFCC cepstrum.
Compute Euclidan distance between current frame cepstrum and
background average cepstrum (computed in the previous step as written
above) and observe its time-dependency.
Use the following approaches of cepstral distance computation
- cd0.m (distance in dBs with c[0])
- cd1.m (distance in dBs without c[0])
Realize the detection of speech activity on the basis of computed distance using fixed and adaptive thresholding on the basis of dynamics, i.e. using functions thr_fixed.m and thr_adapt_dyn.m
Result: for the utteranceSA176S01.CS0 display:
- result of VAD using MFCC cepstrum and adaptive threshold on the basis of dynamics.
Compare the results of cepstral VAD with energy one, i.e. use the analogous thresholding for the estimation of short-time power of the signal in the same short-time frames. You can use your functions created at 2nd seminar or the following function speechpwr.m).