Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition

Branko R MARKOVIĆ; Jovan GALIĆ; Miomir MIJIĆ

doi:10.24425/118075

Authors

Branko R MARKOVIĆ School of Electrical Engineering, Serbia
Jovan GALIĆ School of Electrical Engineering, Serbia
Miomir MIJIĆ School of Electrical Engineering, Serbia

Abstract

This paper presents experimental results on whispered speech recognition based on Teager Energy Operator for linear and mel cepstral coefficients including the Cepstral Mean Subtraction normalization technique. The feature vectors taken into consideration are Linear Frequency Cepstral Coefficients, Teager Energy based Linear Frequency Cepstral Coefficients, Mel Frequency Cepstral Coefficients and Teager Energy based Mel Frequency Cepstral Coefficients. A speaker dependent scenario is used. For the recognition process, Dynamic Time Warping and Hidden Markov Models methods are applied. Results show a respectable improvement in whispered speech recognition as achieved by using the Teager Energy Operator with Cepstral Mean Subtraction.

Keywords:

Teager energy operator, cepstral mean subtraction, whispered speech recognition, linear scale, mel scale, dynamic time warping, hidden Markov models

References

1. Catford J.C. (1977), Fundamental problems in phonetics, Edinburgh: Edinburgh University Press.

2. De Veth J., Boves L. (1998), Channel normalization techniques for automatic speech recognition over the telephone, Speech Communication, 25, 149.

3. Dimitriadis D., Maragos P., Potamianos A. (2005), Auditory teager energy cepstrum coefficients for robust speech recognition, Proc. of European Speech Processing Conference, Lisbon, Portugal.

4. Fan X., Hansen J.H.L., Speaker identification with whispered speech based on modified LFCC parameters and feature mapping, Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 4553–4556.

5. Galić J., Jovičić S.T., Grozdić Đ., Marković B. (2014), HTK-based recognition of whispered speech, A. Ronzhin et al. (Eds.): SPECOM 2014, LNAI 8773, Springer International Publishing Switzerland 2014, 251.

6. Gang L., Heming Z. (2009), Formant frequency estimations of whispered speech in Chinese, Archives of Acoustics, 34, 2, 127–135.

7. Gang L., Heming Z. (2012), Joint factor analysis of channel mismatch in whispering speaker verification, Archives of Acoustics, 37, 4, 555–559.

8. Hansen J.H.L., Patil S. (2007), Speech under stress: analysis, modeling and recognition, In: Müller C. (Ed.), Speaker Classification I: Fundamentals, Features, and Methods, Springer, Berlin–Heidelberg, pp. 108–137.

9. Heracleous P. (2009), Using teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech, International Journal of Information and Communication Engineering, 5, 1, 31–37.

10. Hidden Markov Model Toolkit (2016), http://htk.eng.cam.ac.uk/ (retrieved June 15, 2016).

11. Ito T., Takeda K., Itakura F. (2005), Analysis and recognition of whispered speech, Speech Communication, 45, 139–152.

12. Jovičić S.T.(1998), Formant feature differences between whispered and voiced sustained vowels, Acustica united with Acta Acoustica, 84, 4, 739–743.

13. Jovičić S.T., Šarić Z.M. (2008), Acoustic analysis of consonants in whispered speech, Journal of Voice, 22, 3, 263–274.

14. Kaiser J.F. (1983), Some observations on vocal tract operation from a fluid flow point of view, in: Vocal Fold Physiology: Biomechanics, Acoustics and Phonatory Control, I.R. Titze, R.C. Scherer (Rds), Denver Center for the Performing Arts, Denver, CO, pp. 358–386.

15. Kostek B. (1999), Soft computing in acoustics, applications of neural networks, fuzzy logic and rough sets to musical acoustics, Springer-Verlag, Berlin.

16. Kozierski P., Sadalla T., Drags S., Dobrowski A., Horla D. (2016), Kaldi toolkit in Polish whispery speech recognition, Przeglad Elektrotechniczny, R.92, 11, 301–304.

17. Marković B., Galić J., Grozdić Đ., Jovičić S.T. (2013), Application of DTW method for whispered speech recognition, Speech and Language 2013, 4th International Conference on Fundamental and Applied Aspects of Speech and Language, Belgrade, October 25–26.

18. Marković B., Jovičić S.T., Galić J., Grozdić Đ. (2013), Whispered speech database: design, processing and application, Proc. of 16th International Conference, TSD 2013, I. Habernal and V. Matousek (Eds.): TSD 2013, LNAI 8082, Springer-Verlag Berlin Heidelberg, pp. 591–598.

19. Neyman J., Pearson E. (1933), On the problem of the most efficient tests of statistical hypotheses, Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337.

20. Rabiner L., Juang B-H. (1993), Fundamentals of speech recognition, Prentice Hall, New Jersey.

21. Sakoe H., Chiba S. (1978), Dynamic programming optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 1, 43–49.

22. Tsunoda K., Sekimoto S., Baer T. (2012), Brain activity in aphonia after a coughing episode: different brain activity in healthy whispering and pathological aphonic conditions, Journal of Voice, 26, 5, 668.e11–668.e13.

23. Zhang C., Hansen J.H.L. (2007), Analysis and classification of speech mode: whisper through shouted, Proc. of Interspeech 2007, pp. 2289–2292.

24. Zhou X., Garcia-Romero D., Duraiswami R., Espy-Wilson C., Shamma S. (2011), Linear versus mel frequency cepstral coefficients for speaker recognition, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11–15, pp. 559–564.

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact