Speech Emotion Recognition Based on Voice Fundamental Frequency

Teodora DIMITROVA-GREKOW; Aneta KLIS; Magdalena IGRAS-CYBULSKA

doi:10.24425/aoa.2019.128491

Authors

Teodora DIMITROVA-GREKOW Bialystok University of Technology, Poland
Aneta KLIS Bialystok University of Technology, Poland
Magdalena IGRAS-CYBULSKA AGH University of Science and Technology, Poland

Abstract

The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.

Keywords:

emotion recognition, speech signal analysis, voice analysis, fundamental frequency, speech corpora

References

1. Adamczak R. (2001), Application of neural networks for the classification of experimental data [in Polish: Zastosowanie sieci neuronowych do klasyfikacji danych doświadczalnych], Ph.D. Thesis, Department of Computer Science Methods, Nicolaus Kopernikus University in Toruń.

2. Ananthakrishnan S., Vembu N.A., Prasad R. (2011), Model-based parametric features for emotion recognition from speech, Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 529–534, Big Island, USA.

3. Andruszkiewicz P. (2009), Metalearning and the possibility of improving the efficiency of classification [in Polish: Metauczenie a możliwość poprawy skuteczności klasyfikacji], Metody Informatyki Stosowanej, 3, 5–18.

4. Banse R., Scherer K.R. (1996), Acoustic profiles in vocal emotion expression, Journal of Personality and Social Psychology, 70(3), 614–636.

5. Bertero D., Fung P. (2017), A first look into a Convolutional Neural Network for speech emotion detection, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5115–5119, New Orleans, USA.
6. Breiman L. (2001), Random Forests, Machine Learning, 45, 48–156.

7. Chua G., Chang Q., Park Y., Chan P., Dong M., Li H. (2015), The Expression of Singing Emotion – Contradicting the Constraints of Song, Proceedings of 19th International Conference on Asian Language Processing, pp. 98–102, Soochow, China.

8. Emerich S., Lupu E. (2011), Improving speech emotion recognition using frequency and time domain acoustic features, Proceedings of Signal Processing and Applied Mathematics for Electronics and Communications Workshop, pp. 85–88, Cluj Napoca, Romania.

9. Fatyga P., Podraza R. (2010), Data classification – an overview of selected methods [in Polish: Klasyfikacja danych – przegląd wybranych metod], Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne, 19, 55–60.

10. Firoz Shah A., Babu Anto P. (2017), Wavelet Packets for Speech Emotion Recognition, Proceedings of 3th International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics, pp. 479–481, Chennai, India.

11. Grekow J., Raś Z.W. (2010), Emotion based midi files retrieval system, Advances in Music Information Retrieval, Studies in Computational Intelligence, vol. 274, pp. 261–284 Springer, Berlin, Heidelberg.

12. Haleem M.S. (2008), Voice controlled automation system, Proceedings of 12th IEEE International Multitopic Conference, 508–512, Karachi, Pakistan.

13. Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H. (2009), The WEKA data mining software: an update, SIGKDD: The community for data mining, data science and analytics, Explorations, 11(1), 10–18.

14. Heuft B., Portele T., Rauth M. (1996), Emotion in time domain synthesis, Proceedings of 4th IEEE International Conference on Spoken Language Processing, 3, pp. 1974–1977.

15. Ho T.K. (1995), Random Decision Forest, Proceedings of 3rd International Conference on Document Analysis and Recognition, pp. 278–282, Montreal.

16. Ho T.K. (1998), The Random Subspace Method for Constructing Decision Forests, Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

17. Igras M., Ziółko B. (2013), Database of emotional speech recordings [in Polish: Baza danych nagrań mowy emocjonalnej], Studia Informatica, 34, 67–77.

18. Kamińska D. (2014), Emotion recognition based on natural speech [in Polish: Rozpoznanie emocji na podstawie mowy naturalnej], Ph.D. Thesis, Department of Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology.

19. Khan M., Goskula T., Nasiruddin M., Quazi R. (2011), Comparison between k-nn and svm method for speech emotion recognition, International Journal on Computer Science and Engineering, 3(2), 607–612.

20. Kim E., Hyun K., Kim S., Kwak Y. (2007), Speech Emotion Recognition Using Eigen-FFT in Clean and Noisy Environments, Proceedings of 16th IEEE International Symposium on Robot and Human Interactive Communication, pp. 689–694, Jeju Island, South Korea.

21. Kołodziej M., Majkowski A., Rak R. (2011), The use of a support vector machine (SVM) to classify the EEG signal for the brain-computer interface [in Polish: Wykorzystanie maszyny wektorów wspierających (SVM) do klasyfikacji sygnału EEG na użytek interfejsu mózg-komputer], Pomiary Automatyka Kontrola, 12, 1546–1548.

22. Maulida N., Alfiah W., Pawestri D., Susanto H., Zaman M., Aritianto D. (2016), Fundamental Frequency Evaluation of Infant Crying, Proceedings of IEEE International Seminar on Intelligent Technology and Its Application, pp. 61–66, Mataram, Indonesia.

23. Morzy T. (2013), Data mining [in Polish: Eksploracja danych], PWN, Warszawa, pp. 83–104.

24. Savargiv M., Bastanfard A. (2015), Persian speech emotion recognition, Proceedings of 7th International Conference on Information and Knowledge Technology, pp. 1–5, Urmia.

25. Scherer K.R. (1986), Vocal affect expression: A review and a model for future research, Psychological Bulletin, 99(2), 143.

26. Scherer K.R., Banse R., Wallbott H.G. (2001), Emotion inferences from vocal expression correlate across languages and cultures, Journal of Cross-Cultural Psychology, 32(1), 76–92.

27. Scherer K.R., Johnstone T., Klasmeyer G. (2003), Vocal expression of emotion, [in:] Handbook of Affective Sciences, pp. 433–456.

28. Scherer K.R. (2003), Vocal communication of emotion: A review of research paradigms. Speech communication, 40(1–2), 227–256.

29. Sidorova J. (2009), Speech emotion recognition with TGI+.2 classifier, Proceedings of the Student Research Workshop at EACL, pp. 54–60, Athens, Greece.

30. Skowron A., Wojna A. (2004), K Nearest Neighbor Classification with Local Induction of the Simple Value Difference Metric, [in:] J.F. Peters, A. Skowron (Eds.), Rough Sets and Current Trends in Computing, LNCS, 3066, pp. 229–234, Springer, Berlin, Heidelberg.

31. Soltani K., Ainon R. (2007), Speech emotion detection based on neural networks, Proceedings of 9th International Symposium on Signal Processing and Its Applications, pp. 1–3, Sharjah.

32. William W.C. (1995), Fast Effective Rule Induction, Proceedings of 12th International Conference on Machine Learning, pp. 115–123, Edinburg.

33. Vogt T., André E. (2005), Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition, Proceedings of IEEE International Conference on Multimedia and Expo, pp. 474–477, Amsterdam.

34. Yamada T., Hashimoto H., Tosa N. (1995), Pattern recognition of emotion with Neural Network, Proceedings of 21st International Conference on Industrial Electronics, Control, and Instrumentation, 1, 183–187.

35. Yashaswi A.M., Nachamai M., Joy P. (2015), A Comprehensive Survey on Features and Methods for Speech Emotion Detection, Proceedings of International Conference On Electrical, Computer and Communication Technologies, pp. 1–6, Coimbatore, India.

36. Yu D., Deng L. (2014), Automatic Speech Recognition: A Deep Learning Approach, Springer.

Online first
2025, Vol 50
	No 1	No 2
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Speech Emotion Recognition Based on Voice Fundamental Frequency

Downloads

Authors

Abstract

Keywords:

References

Most read articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact