10.24425/aoa.2021.136581
Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
References
Anagnostopoulos C.N., Iliou T., Giannoukos I. (2015), Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011, Artificial Intelligence Review, 43: 155–177, doi: 10.1007/s10462-012-9368-5.
Boersma P. (1993), Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proceedings of the Institute of Phonetic Sciences, 17(1193): 97–110.
Boersma P., Weenink D. (2001), Praat, a system for doing phonetics by computer, Glot International, 5(9/10): 341–345.
Breiman L. (2001), Random forests, Machine Learning, 45(1): 5–32, doi: 10.1023/A:1010933404324.
Burkhardt F., Paeschke A., Rolfes M., Sendlmeier W., Weiss B. (2005), A database of German emotional speech, 9th European Conference on Speech Communication and Technology, 5: 1517–1520.
Chang C.-C., Lin C.-J. (2011), LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2: 27:1–27:27, doi: 10.1145/1961189.1961199.
Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): 357–366, doi: 10.1109/TASSP.1980.1163420
Eyben F. (2016), Real-time speech and music classification by large audio feature space extraction, Springer, Cham, doi: 10.1007/978-3-319-27299-3.
Feraru S.M., Zbancioc M.D. (2013), Emotion recognition in Romanian language using lpc LPC features, [In:] 2013 E-Health and Bioengineering Conference (EHB), pp. 1–4, doi: 10.1109/EHB.2013.6707314.
Hao M., Tianhao Y., Fei Y. (2019), The SVM based on SMO optimization for speech emotion recognition, [In:] 2019 Chinese Control Conference (CCC), pp. 7884–7888, doi: 10.23919/ChiCC.2019.8866463.
Kathiresan T., Dellwo V. (2019), Cepstral derivatives in MFCCs for emotion recognition, [In:] 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), pp. 56–60, doi: 10.1109/SIPROCESS.2019.8868573.
Kuan T.-W., Tsai A.-C., Sung P.-H., Wang J.-F., Kuo H.-S. (2016), A robust BFCC feature extraction for ASR system, Artificial Intelligence Research, 5(2): 14–23, doi: 10.5430/air.v5n2p14.
Lee K. H., Kyun Choi H., Jang B. T., Kim D. H. (2019), A study on speech emotion recognition using a deep neural network, [In:] 2019 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1162–1165, doi: 10.1109/ICTC46691.2019.8939830.
Markel J. D., Gray A.H.J. (1976), Linear Prediction of Speech, New York: Springer-Verlag.
Meng H., Yan T., Yuan F., Wei H. (2019), Speech emotion recognition from 3D log-mel spectrograms with deep learning network, IEEE Access, 7: 125868–125881, doi: 10.1109/ACCESS.2019.2938007.
Mitrovic D., Zeppelzauer M., Breiteneder C. (2010), Features for content-based audio retrieval, Advances in Computers, 78: 71–150, doi: 10.1016/S0065-2458(10)78003-7.
Pedregosa F. et al. (2011), Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12: 2825–2830, doi: 10.5555/1953048.2078195 .
Rajak R., Mall R. (2019), Emotion recognition from audio, dimensional and discrete categorization using CNNs, [In:] TENCON 2019 – 2019 IEEE Region 10 Conference (TENCON), pp. 301–305, doi: 10.1109/TENCON.2019.8929459.
Rao K.S., Reddy V.R., Maity S. (2015), Language Identification Using Spectral and Prosodic Features, Springer Publishing Company, Incorporated.
Slot K., Cichosz J., Bronakowski L. (2009), Application of voiced-speech variability descriptors to emotion recognition, [In:] 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–5, doi: 10.1109/CISDA.2009.5356537
Swain M., Routray A., Kabisatpathy P. (2018), Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, 21: 93–120, doi: 10.1007/s10772-018-9491-z.
Ververidis D., Kotropoulos C. (2006), Emotional speech recognition: Resources, features, and methods, Speech Communication, 48: 1162–1181, doi: 10.1016/j.specom.2006.04.003
Zhang H. (2004), The optimality of naive bayes, [In:] Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2004.
Zhu C., Ahmad W. (2019), Emotion recognition from speech to improve human-robot interaction, [In:] 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, pp. 370–375, doi: 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00076.
DOI: 10.24425/aoa.2021.136581