Archives of Acoustics, 45, 1, pp. 129–140, 2020

Analysis of Features and Classifiers in Emotion Recognition Systems: Case Study of Slavic Languages

University of Belgrade

University of Belgrade

University of Belgrade

Today’s human-computer interaction systems have a broad variety of applications in which automatic human emotion recognition is of great interest. Literature contains many different, more or less successful forms of these systems. This work emerged as an attempt to clarify which speech features are the most informative, which classification structure is the most convenient for this type of tasks, and the degree to which the results are influenced by database size, quality and cultural characteristic of a language. The research is presented as the case study on Slavic languages.
Keywords: emotion recognition; speech processing; classification algorithms
Full Text: PDF
Copyright © The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).


Albornoz E.M., Sánchez-Gutíerrez M., Martinez-Licona F., Rufiner H.L., Goddard J. (2014), Spoken emotion recognition using deep learning, [in:] Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, Bayro-Corrochano E., Hancock E. [Eds], Vol. 8827, pp. 104–111, Springer, Cham, doi: 10.1007/978-3-319-12568-8_13.

El Ayadi M., Kamel M.S., Karray F. (2011), Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition, 44(3): 572–587, doi: 10.1016/j.patcog.2010.09.020.

Bitouk D., Verma R., Nenkova A. (2010), Class-level spectral features for emotion recognition, Speech Communication, 52(7–8): 613–625, doi: 10.1016/j.specom.2010.02.010.

Bojanić M., Delić V., Sečujski M. (2014), Relevance of the types and the statistical properties of features in the recognition of basic emotions in speech, Facta Universitatis – Series: Electronics and Energetics, 27(3): 425–433.

Burkhardt F., Paeschke A., Rolfes M., Sendlmeier W.F., Weiss B. (2005), A database of German emotional speech, Proceedings of the 9th European Conference on Speech Communication and Technology, pp. 1517–1520, Lisbon.

Cichosz J. (2008), Database of polish emotional speech, retrieved October 16th, 2015, from

Cowie R. et al. (2001), Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine, 18(1): 32–80, doi: 10.1109/79.911197.

Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentence, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): 357–366, doi: 10.1109/TASSP.1980.1163420.

Delić V., Bojanić M., Gnjatović M., Sečujski M., Jovičić S.T. (2012), Discrimination capability of prosodic and spectral features for emotional speech recognition, Elektronika ir Elektrotechnika, 18(9): 51–54, doi: 10.5755/j01.eee.18.9.2806 .

Dropuljić B., Skansi S., Kopal R. (2016a), Analyzing affective states using acoustic and linguistic features, Proceedings of Central European Conference on Information and Intelligent Systems, pp. 201–206, Varaždin.

Dropuljić B., Chmura M.T., Kolak A., Petrinović D. (2011), Emotional speech corpus of Croatian language, Proceedings of the 7th International Symposium on Image and Signal Processing and Analysis, pp. 95–100, Dubrovnik.

Dropuljić B., Skansi S., Kopal R. (2016b), Croatian emotional speech analyses on a basis of acoustic and linguistic features, International Journal of Digital Technology & Economy, 1(2): 85–96.

Eyben F., Schuller B. (2014), openSMILE:): The Munich open-source large-scale multimedia feature extractor, ACM SIGMultimedia Records, 6(4): 4–13, doi: 10.1145/2729095.2729097.

Farsi H., Saleh R. (2014), Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm, 2014 Iranian Conference on Intelligent Systems, pp. 1–5, Bam.

Hassan A. Damper, R.I. (2010), Multi-class and hierarchical SVMs for emotion recognition, Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 2354–2357, Makuhari.

Hendy N.A., Farag H. (2013), Emotion recognition using neural network: A comparative study, International Journal of Computer and Information Engineering, 7(3): 433–439, doi: 10.5281/zenodo.1077145.

Igras M., Ziółko B. (2013), Database of emotional speech recordings [in Polish], Studia Informatica, 34(2B): 67–77.

Jovičić S.T., Kašić Z., Đorđević M., Rajković M. (2004), Serbian emotional speech database: design, processing and evaluation, Proceedings of the 9th International Conference Speech and Computer, pp. 77–81, Saint-Petersburg.

Justin T., Štruc V., Žibert J., Mihelić F. (2015), Development and evaluation of the emotional Slovenian speech database – EmoLuks, [in:] Text, Speech, and Dialogue, Lecture Notes in Computer Science, Král P., Matoušek V. [Eds], Vol. 9302, pp. 351–359, Springer, Cham, doi: 10.1007/978-3-319-24033-6_40.

Kamińska D., Sapiński T., Anbarjafari G. (2017), Efficiency of chosen speech descriptors in relation to emotion recognition, EURASIP Journal on Audio, Speech, and Music Processing, 2017: 3, doi: doi:10.1186/s13636-017-0100-x.

Kamińska D., Sapiński T., Niewiadomy D., Pelikant, A. (2013), Comparison of perceptual features efficiency for automatic identification of emotional states from speech signal [in Polish], Studia Informatica, 34(2B): 59–66, doi: 10.21936/si2013_v34.n2B.50.

Kołakowska A., Landowska A., Szwoch M., Szwoch W., Wrobel M.R. (2014), Emotion recognition and its applications, Human-Computer Systems Interaction: Backgrounds and Applications 3, Advances in Intelligent Systems and Computing, Vol. 300, pp. 51–62, Springer, Cham, doi: 10.1007/978-3-319-08491-6_5.

Lange S., Riedmiller M. (2010), Deep auto-encoder neural networks in reinforcement learning, The 2010 International Joint Conference on Neural Networks, pp. 1–8, Barcelona, doi: 10.1109/IJCNN.2010.5596468.

Lin Y.L., Wei G. (2005), Speech emotion recognition based on HMM and SVM, Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pp. 4898–4901, Guangzhou, doi: 10.1109/ICMLC.2005.1527805.

Makarova V., Petrushin V.A. (2012), Phonetics: Tracing emotions in Russian vowels, [in:] Russian language studies in North America: New perspectives from theoretical and applied linguistics, Makarova, V. [Ed.], pp. 3–42, Athem Press, London, New York, doi: 10.7135/UPO9780857286505.002.

Makarova V., Petrushin V.A. (2002), RUSLANA: A database of Russian emotional utterances, Proceedings of the 7th International Conference on Spoken Language Processing, pp. 2041–2044, Colorado.

Milošević M., Nedeljković Ž., Đurović Ž. (2016), SVM classifier for emotional speech recognition in software environment SEBAS, Proceedings of 3rd International Conference on Electrical, Electronic and Computing Engineering, pp. AUI4.1.1–4, Zlatibor.

Nedeljković Ž., Đurović Ž. (2015), Automatic emotion recognition from speech using hidden Markov models [in Serbian], Proceedings of 59th Conference on Electrical, Electronic and Computing Engineering, pp. AU1.6.1–5, Silver Lake.

Nwe T.L., Foo S.W., Silva L.C.D. (2003), Speech emotion recognition using hidden Markov models, Speech Communication, 41(4): 603–623, doi: 10.1016/S0167-6393(03)00099-2.

Pell M.D., Monetta L., Paulmann S., Kotz S.A. (2009a), Recognizing emotions in a foreign language, Journal of Nonverbal Behavior, 33(2): 107–120, doi: 10.1007/s10919-008-0065-7.

Pell M.D., Paulmann S., Dara C., Alasseri A., Kotz S.A. (2009b), Factors in the recognition of vocally expressed emotions: A comparison of four languages, Journal of Phonetics, 37(4): 417–435, doi: 10.1016/j.wocn.2009.07.005.

Pierna J.A., Baeten V., Renier A.M., Cogdill R.P., Dardenne P. (2004), Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds, Journal of Chemometrics, 18(7–8): 341–349, doi: 10.1002/cem.877.

Popović B., Stanković I., Ostrogonac S. (2013), Temporal discrete cosine transform for speech emotion recognition, Proceedings of IEEE 4th International Conference on Cognitive Infocommunications, pp. 87–90, Budapest, doi: 10.1109/CogInfoCom.2013.6719219 .

Rabiner L. (1989), A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 77(2): 257–286.

Rabiner L., Juang B.H. (1993), Fundamentals of speech recognition, Prentice Hall, New Jersey.

Schuller B., Steidl S., Batliner A. (2009a), The Interspeech 2009 Emotion Challenge, Proceedings of the Annual Conference of the International Speech Communication Association, pp. 312–315, Brighton.

Schuller B., Vlasenko B., Eyben F., Rigoll G., Wendemuth A. (2009b), Acoustic emotion recognition: A benchmark comparison of performances, Proceedings of IEEE Workshop on Automatic Speech Recognition & Understanding. Acoustic emotion recognition, pp. 552–557, Merano.

Shaukat A., Chen K. (2011), Emotional state recognition from speech via soft-competition on different acoustic representations, Proceedings of the International Joint Conference on Neural Networks, pp. 1910–1917, San Jose, doi: 10.1109/IJCNN.2011.6033457.

Shaukat A., Chen K. (2008), Towards automatic emotional state categorization from speech signals, Proceedings of the Annual Conference of the International Speech Communication Association, pp. 2771–2774, Brisbane.

Ślot K., Bronakowski Ł., Cichosz J., Kim H. (2009), Application of Poincare-mapping of voiced-speech segments for emotion sensing, Sensors, 9(12): 9858–9872, doi: 10.3390/s91209858.

Staroniewicz P. (2011), Automatic recognition of emotional state in Polish speech, [in:] Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues, Lecture Notes in Computer Science, Esposito A., Esposito A.M., Martone R., Müller V.C., Scarpetta G. [Eds], Vol. 6456, pp. 347–353, Springer, Berlin-Heidelberg, doi: 10.1007/978-3-642-18184-9_30.

Staroniewicz P., Majewski W. (2009), Polish emotional speech database – recording and preliminary validation, [in:] Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions, Lecture Notes in Computer Science, Esposito A., Vích R. [Eds], Vol. 5641, pp. 42–49, Springer, Berlin-Heidelberg, doi: 10.1007/978-3-642-03320-9_5.

Stuhlsatz A., Meyer C., Eyben F., Zielke T., Meier G., Schuller B. (2011), Deep neural networks for acoustic emotion recognition: raising the benchmarks, Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5688–5691, Prague, doi: 10.1109/ICASSP.2011.5947651.

Uhrin D., Partila P., Voznak M., Chmelikova Z., Hlozak M., Orcik L. (2014), Design and implementation of Czech database of speech emotions, Proceedings of the 22nd Telecommunications Forum, pp. 529–532, Belgrade, doi: 10.1109/TELFOR.2014.7034463.

Vinola C., Vimaladevi K. (2015), A survey on human emotion recognition approaches, databases and applications, Electronic Letters on Computer Vision and Image Analysis, 14(2): 24–44, doi: 10.5565/rev/elcvia.795.

DOI: 10.24425/aoa.2020.132489