Archives of Acoustics, 46, 2, pp. 259–269, 2021

Acoustic Methods in Identifying Symptoms of Emotional States

Zuzanna PIĄTEK
AGH University of Science and Technology

AGH University of Science And Technology

The study investigates the use of speech signal to recognise speakers’ emotional states. The introduction includes the definition and categorization of emotions, including facial expressions, speech and physiological signals. For the purpose of this work, a proprietary resource of emotionally-marked speech recordings was created. The collected recordings come from the media, including live journalistic broadcasts, which show spontaneous emotional reactions to real-time stimuli. For the purpose of signal speech analysis, a specific script was written in Python. Its algorithm includes the parameterization of speech recordings and determination of features correlated with emotional content in speech. After the parametrization process, data clustering was performed to allows for the grouping of feature vectors for speakers into greater collections which imitate specific emotional states. Using the t-Student test for dependent samples, some descriptors were distinguished, which identified significant differences in the values of features between emotional states. Some potential applications for this research were proposed, as well as other development directions for future studies of the topic.
Keywords: emotion recognition; speech signal processing; clustering analysis; Sammon mapping
Full Text: PDF
Copyright © The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).


Abdel-Hamid L. (2020), Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features, Speech Communication, 122: 19–30, doi: 10.1016/j.specom.2020.04.005.

Bhavana A., Chauhanb P., Rajiv H., Shahc R. (2019), Bagged support vector machines for emotion recognition from speech, Knowledge-Based Systems, 184: 104886, 1–7, doi: 10.1016/j.knosys.2019.104886.

Boersma P., Weenink D. (2015–2019), Praat documentation – Manual, from

Cichosz J. (2008), The use of selected speech signal features to recognize and model emotions for the Polish language, [in Polish: Wykorzystanie wybranych cech sygnału mowy do rozpoznawania i modelowania emocji dla języka polskiego], Ph.D. Thesis, Lodz University of Technology, Łódź.

Davletcharova A., Sugathan S., Abraham B., Pappachen James A. (2015), Detection and analysis of emotion from speech signals, Procedia Computer Science, 58: 91–96, doi: 10.1016/j.procs.2015.08.032

Demenko G., Jastrzębska M. (2011), Analysis of voice stress in emergency calls,

[in Polish: Analiza stresu głosowego w rozmowach z telefonu alarmowego], XVIII Conference on Acoustic and Biomedical Engineering 2011, Zakopane.

El Haddad K. et al. (2017), Introducing AmuS: The Amused Speech Database, Proceedings of 5th International Conference on Statistical Language and Speech Processing SLSP 2017At: Le Mans, France, pp. 229–240, doi: 10.1007/978-3-319-68456-7_19.

Igras M., Ziółko B. (2013), Database of emotional speech recording, Studia Informatica, 34(2B): 67–77.

Janicki A., Turkot M. (2008), Recognition of the speaker's emotional state using the support vector machine (SVM), [in Polish:] Rozpoznawanie stanu emocjonalnego mówcy z wykorzystaniem maszyny wektorów wspierających (SVM), Przegląd Telekomunikacyjny- wiadomości telekomunikacyjne, 2008(8–9): 994–1005.

Kamińska D., Pelikant A. (2012), Spontaneus emotion redognition from speech signal using multimodal classification, [in Polish:] Zastosowanie multimodalnej klasyfikacji w rozpoznawaniu stanów emocjonalnych na podstawie mowy spontanicznej, Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska, 3: 36–39.

Kerkeni L. et al. (2019), Automatic speech emotion recognition using machine learning, [in:] Social Media and Machine Learning, doi: 10.5772/intechopen.84856.

Kłaczyński M. (2007), Vibroacoustic phenomena in the human voice channel,

[in Polish:] Zjawiska wibroakustyczne w kanale głosowym człowieka, Ph.D. Thesis, AGH University of Science and Technology, Kraków.

Nisbet R., Miner G., Yale K. (2018), Handbook of Statistical Analysis and Data Mining Applications, 2nd ed., Elsevier, doi: 10.1016/C2012-0-06451-4.

Ntalampiras S. (2021), Speech emotion recognition via learning analogies, Pattern Recognition Letters, 144: 21–26, doi: 10.1016/j.patrec.2021.01.018.

Obrębowski A. (2008), Voice organ and its importance in social communication,

[in Polish:] Narząd głosu i jego znaczenie w komunikacji społecznej, Publisher University of Medical Sciences, Poznan.

Özseven T. (2018), Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition, Applied Acoustics, 142: 70–77, doi: 10.1016/j.apacoust.2018.08.003.

Razuri J.G. et al. (2015), Speech emotion recognition in emotional feedback for Human-Robot Interaction, International Journal of Advanced Research in Artificial Intelligence, 4(2): 20–27, doi: 10.14569/IJARAI.2015.040204.

Sammon J. (1969), A nonlinear mapping for data structure analysis, IEEE Transactions on Computers, C-18(5): 401 – 409, doi: 10.1109/T-C.1969.222678.

Sidorova J. (2007), Speech Emotion Recognition, Master Thesis, Universitat Pompeu Fabra, Barcelona, doi: 10.13140/RG.2.1.3498.0724.

Ślot K. (2010), Biometric recognition. New methods for the quantitative representation of objects, [in Polish: Rozpoznawanie biometryczne. Nowe metody ilościowej reprezentacji obiektów], WKŁ, Warszawa.

Stolar M. et al. (2018), Acoustic characteristics of emotional speech using spectrogram image classification, [in:] 12th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–5, doi: 10.1109/ICSPCS.2018.8631752.

Sun Y., Wen G., Wang J. (2015), Weighted spectral features based on local Hu moments for speech emotion recognition, Biomedical Signal Processing and Control, 18: 80–90, doi: 10.1016/j.bspc.2014.10.008/.

Ververidis D., Kotropoulos C. (2003), A review of emotional speech databases, 9th Panhellenic Conference on Informatics (PCI), Thessaloniki, Greece,

Yeqing Y., Tao T. (2011), An new speech recognition method based on prosodic analysis and SVM in Zhuang language, [in:] 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), pp. 1209–1212, doi: 10.1109/MEC.2011.6025684.

Zetterholm E. (1998), Prosody and voice quality in the expression of emotions, [in:] Proceedings of 7th Australian International Conference on Speech Science and Technology, pp. 109–113, Australian Speech Science and Technology Association, Sydney.

Zhang Z. (2021), Speech feature selection and emotion recognition based on weighted binary cuckoo search, Alexandria Engineering Journal, 60(1): 1499–1507, doi: 10.1016/j.aej.2020.11.004.

Zvarevashe K., Olugbara O. (2020), Ensemble learning of hybrid acoustic features for speech emotion recognition, Algorithms, 3(3), 70, doi: 10.3390/a13030070.

DOI: 10.24425/aoa.2021.136580