Archives of Acoustics, 44, 2, pp. 277–286, 2019
10.24425/aoa.2019.128491

Speech Emotion Recognition Based on Voice Fundamental Frequency

Teodora DIMITROVA-GREKOW
Bialystok University of Technology
Poland

Aneta KLIS
Bialystok University of Technology
Poland

Magdalena IGRAS-CYBULSKA
AGH University of Science and Technology
Poland

The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.
Keywords: emotion recognition; speech signal analysis; voice analysis; fundamental frequency; speech corpora
Full Text: PDF

References

Adamczak R. (2001), Application of neural networks for the classification of experimental data [in Polish: Zastosowanie sieci neuronowych do klasyfikacji danych doświadczalnych], Ph.D. Thesis, Department of Computer Science Methods, Nicolaus Kopernikus University in Toruń.

Ananthakrishnan S., Vembu N.A., Prasad R. (2011), Model-based parametric features for emotion recognition from speech, Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 529–534, Big Island, USA.

Andruszkiewicz P. (2009), Metalearning and the possibility of improving the efficiency of classification [in Polish: Metauczenie a możliwość poprawy skuteczności klasyfikacji], Metody Informatyki Stosowanej, 3, 5–18.

Banse R., Scherer K.R. (1996), Acoustic profiles in vocal emotion expression, Journal of Personality and Social Psychology, 70(3), 614–636.

Bertero D., Fung P. (2017), A first look into a Convolutional Neural Network for speech emotion detection, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5115–5119, New Orleans, USA.

Breiman L. (2001), Random Forests, Machine Learning, 45, 48–156.

Chua G., Chang Q., Park Y., Chan P., Dong M., Li H. (2015), The Expression of Singing Emotion – Contradicting the Constraints of Song, Proceedings of 19th International Conference on Asian Language Processing, pp. 98–102, Soochow, China.

Emerich S., Lupu E. (2011), Improving speech emotion recognition using frequency and time domain acoustic features, Proceedings of Signal Processing and Applied Mathematics for Electronics and Communications Workshop, pp. 85–88, Cluj Napoca, Romania.

Fatyga P., Podraza R. (2010), Data classification – an overview of selected methods [in Polish: Klasyfikacja danych – przegląd wybranych metod], Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne, 19, 55–60.

Firoz Shah A., Babu Anto P. (2017), Wavelet Packets for Speech Emotion Recognition, Proceedings of 3th International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics, pp. 479–481, Chennai, India.

Grekow J., Raś Z.W. (2010), Emotion based midi files retrieval system, Advances in Music Information Retrieval, Studies in Computational Intelligence, vol. 274, pp. 261–284 Springer, Berlin, Heidelberg.

Haleem M.S. (2008), Voice controlled automation system, Proceedings of 12th IEEE International Multitopic Conference, 508–512, Karachi, Pakistan.

Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H. (2009), The WEKA data mining software: an update, SIGKDD: The community for data mining, data science and analytics, Explorations, 11(1), 10–18.

Heuft B., Portele T., Rauth M. (1996), Emotion in time domain synthesis, Proceedings of 4th IEEE International Conference on Spoken Language Processing, 3, pp. 1974–1977.

Ho T.K. (1995), Random Decision Forest, Proceedings of 3rd International Conference on Document Analysis and Recognition, pp. 278–282, Montreal.

Ho T.K. (1998), The Random Subspace Method for Constructing Decision Forests, Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

Igras M., Ziółko B. (2013), Database of emotional speech recordings [in Polish: Baza danych nagrań mowy emocjonalnej], Studia Informatica, 34, 67–77.

Kamińska D. (2014), Emotion recognition based on natural speech [in Polish: Rozpoznanie emocji na podstawie mowy naturalnej], Ph.D. Thesis, Department of Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology.

Khan M., Goskula T., Nasiruddin M., Quazi R. (2011), Comparison between k-nn and svm method for speech emotion recognition, International Journal on Computer Science and Engineering, 3(2), 607–612.

Kim E., Hyun K., Kim S., Kwak Y. (2007), Speech Emotion Recognition Using Eigen-FFT in Clean and Noisy Environments, Proceedings of 16th IEEE International Symposium on Robot and Human Interactive Communication, pp. 689–694, Jeju Island, South Korea.

Kołodziej M., Majkowski A., Rak R. (2011), The use of a support vector machine (SVM) to classify the EEG signal for the brain-computer interface [in Polish: Wykorzystanie maszyny wektorów wspierających (SVM) do klasyfikacji sygnału EEG na użytek interfejsu mózg-komputer], Pomiary Automatyka Kontrola, 12, 1546–1548.

Maulida N., Alfiah W., Pawestri D., Susanto H., Zaman M., Aritianto D. (2016), Fundamental Frequency Evaluation of Infant Crying, Proceedings of IEEE International Seminar on Intelligent Technology and Its Application, pp. 61–66, Mataram, Indonesia.

Morzy T. (2013), Data mining [in Polish: Eksploracja danych], PWN, Warszawa, pp. 83–104.

Savargiv M., Bastanfard A. (2015), Persian speech emotion recognition, Proceedings of 7th International Conference on Information and Knowledge Technology, pp. 1–5, Urmia.

Scherer K.R. (1986), Vocal affect expression: A review and a model for future research, Psychological Bulletin, 99(2), 143.

Scherer K.R., Banse R., Wallbott H.G. (2001), Emotion inferences from vocal expression correlate across languages and cultures, Journal of Cross-Cultural Psychology, 32(1), 76–92.

Scherer K.R., Johnstone T., Klasmeyer G. (2003), Vocal expression of emotion, [in:] Handbook of Affective Sciences, pp. 433–456.

Scherer K.R. (2003), Vocal communication of emotion: A review of research paradigms. Speech communication, 40(1–2), 227–256.

Sidorova J. (2009), Speech emotion recognition with TGI+.2 classifier, Proceedings of the Student Research Workshop at EACL, pp. 54–60, Athens, Greece.

Skowron A., Wojna A. (2004), K Nearest Neighbor Classification with Local Induction of the Simple Value Difference Metric, [in:] J.F. Peters, A. Skowron (Eds.), Rough Sets and Current Trends in Computing, LNCS, 3066, pp. 229–234, Springer, Berlin, Heidelberg.

Soltani K., Ainon R. (2007), Speech emotion detection based on neural networks, Proceedings of 9th International Symposium on Signal Processing and Its Applications, pp. 1–3, Sharjah.

William W.C. (1995), Fast Effective Rule Induction, Proceedings of 12th International Conference on Machine Learning, pp. 115–123, Edinburg.

Vogt T., André E. (2005), Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition, Proceedings of IEEE International Conference on Multimedia and Expo, pp. 474–477, Amsterdam.

Yamada T., Hashimoto H., Tosa N. (1995), Pattern recognition of emotion with Neural Network, Proceedings of 21st International Conference on Industrial Electronics, Control, and Instrumentation, 1, 183–187.

Yashaswi A.M., Nachamai M., Joy P. (2015), A Comprehensive Survey on Features and Methods for Speech Emotion Detection, Proceedings of International Conference On Electrical, Computer and Communication Technologies, pp. 1–6, Coimbatore, India.

Yu D., Deng L. (2014), Automatic Speech Recognition: A Deep Learning Approach, Springer.




DOI: 10.24425/aoa.2019.128491

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)