Archives of Acoustics, 46, 1, pp. 41–53, 2021
10.24425/aoa.2021.136559

Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning

Mohammed USMAN
King Khalid University
Saudi Arabia

Mohammed ZUBAIR
King Khalid University
Saudi Arabia

Zeeshan AHMAD
King Khalid University
Saudi Arabia

Monji ZAIDI
King Khalid University
Saudi Arabia

Thafasal IJYAS
King Khalid University
Saudi Arabia

Muneer PARAYANGAT
King Khalid University
Saudi Arabia

Mohd WAJID
Aligarh Muslim University
India

Mohammad SHIBLEE
Taif University
Saudi Arabia

Syed Jaffar ALI
King Khalid University
Saudi Arabia

Measurement of vital signs of the human body such as heart rate, blood pressure, body temperature and respiratory rate is an important part of diagnosing medical conditions and these are usually measured using medical equipment. In this paper, we propose to estimate an important vital sign – heart rate from speech signals using machine learning algorithms. Existing literature, observation and experience suggest the existence of a correlation between speech characteristics and physiological, psychological as well as emotional conditions. In this work, we estimate the heart rate of individuals by applying machine learning based regression algorithms to Mel frequency cepstrum coefficients, which represent speech features in the spectral domain as well as the temporal variation of spectral features. The estimated heart rate is compared with actual measurement made using a conventional medical device at the time of recording speech. We obtain estimation accuracy close to 94% between the estimated and actual measured heart rate values. Binary classification of heart rate as ‘normal’ or ‘abnormal’ is also achieved with 100% accuracy. A comparison of machine learning algorithms in terms of heart rate estimation and classification accuracy is also presented. Heart rate measurement using speech has applications in remote monitoring of patients, professional athletes and can facilitate telemedicine.
Keywords: heart rate from speech; machine learning; MFCC; regression and classification; speech as a biomedical signal
Full Text: PDF
Copyright © The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

References

Borkovec T.D, Wall R.L., Stone N.M. (1974), False Physiological Feedback and the Maintenance of Speech Anxiety, Journal of Abnormal Psychology, 83(2): 164–168.

Bühlmann P., Yu B. (2003), Boosting with the L2 loss, Journal of the American Statistical Association, 98(462): 324–339, doi: 10.1198/016214503000125.

Burton D.A., Stokes K., Hall G.M. (2004), Physiological effects of exercise, Continuing Education in Anaesthesia Critical Care & Pain, 4(6): 185–188, doi: 10.1093/bjaceaccp/mkh050.

Criminisi A., Shotton J., Konukoglu E. (2011), Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends® in Computer Graphics and Vision, 7(2–3): 81–227, doi: 10.1561/0600000035.

Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): 357–366, doi: 10.1109/TASSP.1980.1163420.

Dreiseitl S., Ohno-Machado L. (2002), Logistic regression and artificial neural network classification models: a methodology review, Journal of Biomedical Informatics, 35(5–6): 352–359, doi: 10.1016/S1532-0464(03)00034-0.

Euler C. Von (1982), Some aspects of speech breathing physiology, [in:] Speech Motor Control. Proceedings of an International Symposium on Speech Motor Control, Grillner S., Lindblom B., Lubker J., Persson A. [Eds], Stockholm, May 11–12, 1981, pp. 95–103, doi: 10.1016/B978-0-08-028892-5.50013-X.

Hermansky H., Morgan N. (1994), RASTA Processing of Speech, IEEE Transactions on Speech and Audio Processing, 2(4): 578–589, doi: 10.1109/89.326616.

Hermansky H. (1990), Perceptual Linear Predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, 87(4): 1738–1752, doi: 10.1121/1.399423.

Huang X., Acero A., Hon H.-W. (2001), Spoken Language Processing : A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR.

James A.P. (2015), Heart rate monitoring using human speech spectral features, Human-Centric Computing and Information Sciences, 5(1): 1–12, doi: 10.1186/s13673-015-0052-z.

Kabal P. (2017), Audio File Format Specifications, MMSP Lab, McGill University, http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/CSL/CSL.html.

Kaur J., Kaur R. (2014), Extraction of heart rate parameters using speech analysis, International Journal of Science and Research (IJSR), 3(10): 1374–1376.

Kutner M.H., Nachtsheim C., Neter J., Li W (2004), Applied Linear Statistical Models, 4th ed., Irwin: McGraw Hill.

Laskowski E.R (2018), Heart Rate: What’s Normal?, Mayo Clinic, https://www.mayoclinic.org/healthy-lifestyle/fitness/expert-answers/heart-rate/faq-20057979.

Lin L.I-K. (1989), A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45(1): 255–268, doi: 10.2307/2532051.

Logan B. (2000), Mel frequency cepstral coefficients for music modeling, [In:] 1st International Symposium on Music Information Retrieval, http://ismir2000.ismir.net/papers/logan_paper.pdf.

Lyons J. (2012), Mel Frequency Cepstral Coefficient (MFCC) Tutorial, Practical Cryptography, http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/#computing-the-mel-filter-bank.

MacGill M. (2017), Heart rate: what is a normal heart rate?, Medical News Today, https://www.medicalnewstoday.com/articles/235710.php.

Magre S., Deshmukh R.R., Shrishrimal P.P. (2013), A Comparative Study on Feature Extraction Techniques in Speech Recognition, [In:] International Conference on Recent Advances in Statistics and Their Applications, Aurangabad, https://www.researchgate.net/publication/278549945_A_Comparative_Study_on_Feature_Extraction_Techniques_in_Speech_Recognition.

Merhav N., Lee C.-H. (1993), On the asymptotic statistical behavior of empirical cepstral coefficients, IEEE Transactions on Signal Processing, 41(5): 1990–1993, doi: 10.1109/78.215323.

Mesleh A., Skopin D., Baglikov S., Quteishat A. (2012), Heart rate extraction from vowel speech signals, Journal of Computer Science and Technology, 27(6): 1243–1251, doi: 10.1007/s11390-012-1300-6.

Nasrabadi N.M. (2007), Pattern recognition and machine learning, Journal of Electronic Imaging, 16(4): 049901, doi: 10.1117/1.2819119.

Oppenheim A.V., Verghese G.C. (2015), Signals, Systems & Inference, Pearson.

Orlikoff R.F, Baken R.J. (1989), The effect of the heartbeat on vocal fundamental frequency perturbation, Journal of Speech and Hearing Research, 32(3): 576–582, http://www.ncbi.nlm.nih.gov/pubmed/2779201.

Partila P., Voznak M., Mikulec M., Zdralek J. (2012), Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state, Advances in Electrical and Electronic Engineering, 10(4): 270–275, doi: 10.15598/aeee.v10i4.738.

Poh M.-Z., McDuff D.J., Picard R.W. (2011), Advancements in noncontact, multiparameter physiological measurements using a webcam, IEEE Transactions on Biomedical Engineering, 58(1): 7–11, doi: 10.1109/TBME.2010.2086456.

Ramig L.A. (1983), Effects of physiological aging on vowel spectral noise, Journal of Gerontology, 38(2): 223–225.

Reilly K.J, Moore C.A. (2003), Respiratory sinus arrhythmia during speech production, Journal of Speech, Language, and Hearing Research : JSLHR, 46(1): 164–177, http://www.ncbi.nlm.nih.gov/pubmed/12647896.

Reynolds A., Paivio A. (1968), Cognitive and emotional determinants of speech, Canadian Journal of Psychology/Revue Canadienne de Psychologie, 22(3): 164–175.

Roychowdhury S., Bihis M. (2016), AG-MIC: Azure-based generalized flow for medical image classification, IEEE Access, 4: 5243–5257, doi: 10.1109/ACCESS.2016.2605641.

Sakai M. (2015a), Feasibility study on blood pressure estimations from voice spectrum analysis, International Journal of Computer Applications, 109 (7): 39–43, doi: 10.5120/19204-0848.

Sakai M. (2015a), Modeling the relationship between heart rate and features of vocal frequency, International Journal of Computer Applications, 120(6): 32–37, doi: 10.5120/21233-3986.

Schuller B., Friedmann F., Eyben F. (2013), Automatic recognition of physiological parameters in the human voice: heart rate and skin conductance, [In:] 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7219–7223, doi: 10.1109/ICASSP.2013.6639064.

Schuller B., Friedmann F., Eyben F. (2014), The Munich Biovoice Corpus: effects of physical exercising, heart rate, and skin conductance on human speech production, [In:] Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1506–1510. Reykjavik: European Language Resources Association (ELRA), http://www.lrec-conf.org/proceedings/lrec2014/pdf/611_Paper.pdf.

ScienceEncyclopedia (2019), Speech – the physiology of speech – air, vocal, words, and sound, JRank Articles, Science Encyclopedia, https://science.jrank.org/pages/6371/Speech-physiology-speech.html.

Scully C.G. et al. (2012), Physiological parameter monitoring from optical recordings with a mobile phone, IEEE Transactions on Biomedical Engineering, 59(2): 303–306, doi: 10.1109/TBME.2011.2163157.

Skopin D.E., Baglikov S.U. (2009), Heartbeat feature extraction from vowel speech signal using 2D spectrum representation, [In:] 4th International Conference on Information Technology (ICIT), Amman, Jordan, https://www.zuj.edu.jo/conferences/ICIT09/PaperList/Papers/Image and Signal Processing/450Demitry.pdf.

Tan Z.-H., Lindberg B. (2010), Low-complexity variable frame rate analysis for speech recognition and voice activity detection, IEEE Journal of Selected Topics in Signal Processing, 4(5): 798–807, doi: 10.1109/JSTSP.2010.2057192.

Trouvain J., Truong K.P. (2015), Prosodic characteristics of read speech before and after treadmill running, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany (ISCA), https://research.utwente.nl/en/publications/prosodic-characteristics-of-read-speech-before-and-after-treadmil.

Tufekci Z., Gowdy J.N. (2000), Feature extraction using discrete wavelet transform for speech recognition, [In:] Proceedings of the IEEE SoutheastCon 200), "Preparing for The New Millennium”, pp. 116–123, doi: 10.1109/SECON.2000.845444.

Usman M. (2017), On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes, International Journal of Computing and Digital Systems, 6(3): 119–127, doi: 10.12785/IJCDS/060303.

Usman M., Zubair M., Shiblee M., Rodrigues P., Jaffar S. (2018), Probabilistic modeling of speech in spectral domain using maximum likelihood estimation, Symmetry, 10(12): 750, doi: 10.3390/sym10120750.

Wolf J.J. (1980), Speech signal processing and feature extraction, [In:] Spoken Language Generation and Understanding, pp. 103–128, Dordrecht: Springer Netherlands, doi: 10.1007/978-94-009-9091-3_6.

Yasuma F., Hayano J.-I. (2004), Respiratory sinus arrhythmia: why does the heartbeat synchronize with respiratory rhythm?, Chest, 125(2): 683–690, http://www.ncbi.nlm.nih.gov/pubmed/14769752.

Zhang G., Patuwo B.E., Hu M.Y. (1998), Forecasting with artificial neural networks: the state of the art, International Journal of Forecasting, 14(1): 35–62, doi: 10.1016/S0169-2070(97)00044-7.




DOI: 10.24425/aoa.2021.136559