Archives of Acoustics, 40, 1, pp. 25–31, 2015
10.1515/aoa-2015-0004

Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition

Sayf A. MAJEED
Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment National University of Malaysia, 43600 UKM, Bangi, Selangor.
Malaysia

Hafizah HUSAIN
Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment National University of Malaysia, 43600 UKM, Bangi, Selangor.
Malaysia

Salina Abd. SAMAD
Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment National University of Malaysia, 43600 UKM, Bangi, Selangor.
Malaysia

In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions.
Keywords: speech recognition; feature extraction; phase autocorrelation; wavelet transform.
Full Text: PDF

References

Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, 27, 2, 113-120.

Chang C. C., Lin C. J. (2011), LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2, 3, 27.

Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 4, 357-366.

Ikbal S., Misra H., Hermansky H., Magimai-Doss M. (2012), Phase AutoCorrelation (PAC) features for noise robust speech recognition, Speech Communication, 54,7, 867-880.

Jie Y., Zhenli W. (2009), On the application of variable-step adaptive noise cancelling for improving the robustness of speech recognition, Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on, IEEE.

Jolliffe I. (2005), Principal component analysis, Wiley Online Library.

Leonard R. (1984), A database for speaker-independent digit recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'84., IEEE.

Liu F. H., Stern R. M., Huang X., Acero A. (1993), Efficient cepstral normalization for robust speech recognition, Proceedings of the workshop on Human Language Technology, Association for Computational Linguistics.

Majeed S., Husain H., Samad S., Hussain A. (2012), Hierarchical K-Means Algorithm Applied On Isolated Malay Digit Speech Recognition, International Proceedings of Computer Science & Information Technology, 34, 33- 37.

Mansour D., Juang B. H. (1989), A family of distortion measures based upon projection operation for robust speech recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 37, 11, 1659-1671.

Nasersharif B. Akbari A. (2007), SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features, Pattern recognition letters, 28, 11, 1320-1326.

Nehe N. S., Holambe R. S. (2009), Isolated Word Recognition Using Normalized Teager Energy Cepstral Features, International Conference on Advances in Computing, Control, & Telecommunication Technologies. ACT '09.

Paliwal K., Basu A. (1987), A speech enhancement method based on Kalman filtering, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE ICASSP'87.

Rabiner L., Juang B. H. (1993), Fundamentals of speech recognition, PTR Prentice-Hall, Inc, Englewood Cliffs, New Jersey , USA.

Sambur M. (1978), Adaptive noise canceling for speech signals, IEEE Transactions on Acoustics, Speech and Signal Processing, 26,5, 419-423.

Shannon B. J., Paliwal K. K. (2006), Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition, Speech Communication, 48, 11, 1458-1485.

Traunmüller H. (1990), Analytical expressions for the tonotopic sensory scale, The Journal of the Acoustical Society of America, 88, 1, 97-100.

Tufekci Z., Gowdy J. (2000), Feature extraction using discrete wavelet transform for speech recognition, Proceedings of the IEEE, Southeastcon 2000.

Vaseghi S. V. (2008), Advanced digital signal processing and noise reduction, Wiley.

Yapanel U., Hansen J. H., Sarikaya R., Pellom B. (2001), Robust digit recognition in noise: an evaluation using the AURORA Corpus, Proc. Eurospeech.

Zhang X., Jiao Z., Zhao Z. (2005), The speech recognition based on the bark wavelet front-end processing, Fuzzy Systems and Knowledge Discovery, Springer, 302-305.

Zhang X., Bai J., Liang W. (2006), The speech recognition system based on bark wavelet MFCC, 8th International Conference on Signal Processing IEEE.

Zhu D., Paliwal K. K. (2004), Product of power spectrum and group delay function for speech recognition, Proceedingson IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'04).




DOI: 10.1515/aoa-2015-0004

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)