Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning

Mohammed USMAN; Mohammed ZUBAIR; Zeeshan AHMAD; Monji ZAIDI; Thafasal IJYAS; Muneer PARAYANGAT; Mohd WAJID; Mohammad SHIBLEE; Syed Jaffar ALI

doi:10.24425/aoa.2021.136559

Authors

Mohammed USMAN King Khalid University, Saudi Arabia
Mohammed ZUBAIR King Khalid University, Saudi Arabia
Zeeshan AHMAD King Khalid University, Saudi Arabia
Monji ZAIDI King Khalid University, Saudi Arabia
Thafasal IJYAS King Khalid University, Saudi Arabia
Muneer PARAYANGAT King Khalid University, Saudi Arabia
Mohd WAJID Aligarh Muslim University, India
Mohammad SHIBLEE Taif University, Saudi Arabia
Syed Jaffar ALI King Khalid University, Saudi Arabia

Abstract

Measurement of vital signs of the human body such as heart rate, blood pressure, body temperature and respiratory rate is an important part of diagnosing medical conditions and these are usually measured using medical equipment. In this paper, we propose to estimate an important vital sign – heart rate from speech signals using machine learning algorithms. Existing literature, observation and experience suggest the existence of a correlation between speech characteristics and physiological, psychological as well as emotional conditions. In this work, we estimate the heart rate of individuals by applying machine learning based regression algorithms to Mel frequency cepstrum coefficients, which represent speech features in the spectral domain as well as the temporal variation of spectral features. The estimated heart rate is compared with actual measurement made using a conventional medical device at the time of recording speech. We obtain estimation accuracy close to 94% between the estimated and actual measured heart rate values. Binary classification of heart rate as ‘normal’ or ‘abnormal’ is also achieved with 100% accuracy. A comparison of machine learning algorithms in terms of heart rate estimation and classification accuracy is also presented. Heart rate measurement using speech has applications in remote monitoring of patients, professional athletes and can facilitate telemedicine.

Keywords:

heart rate from speech, machine learning, MFCC, regression and classification, speech as a biomedical signal

References

1. Borkovec T.D, Wall R.L., Stone N.M. (1974), False Physiological Feedback and the Maintenance of Speech Anxiety, Journal of Abnormal Psychology, 83(2): 164–168.

2. Bühlmann P., Yu B. (2003), Boosting with the L2 loss, Journal of the American Statistical Association, 98(462): 324–339, https://doi.org/10.1198/016214503000125

3. Burton D.A., Stokes K., Hall G.M. (2004), Physiological effects of exercise, Continuing Education in Anaesthesia Critical Care & Pain, 4(6): 185–188, https://doi.org/10.1093/bjaceaccp/mkh050

4. Criminisi A., Shotton J., Konukoglu E. (2011), Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends® in Computer Graphics and Vision, 7(2–3): 81–227, https://doi.org/10.1561/0600000035

5. Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): 357–366, https://doi.org/10.1109/TASSP.1980.1163420

6. Dreiseitl S., Ohno-Machado L. (2002), Logistic regression and artificial neural network classification models: a methodology review, Journal of Biomedical Informatics, 35(5–6): 352–359, https://doi.org/10.1016/S1532-0464%2803%2900034-0

7. Euler C. Von (1982), Some aspects of speech breathing physiology, [in:] Speech Motor Control. Proceedings of an International Symposium on Speech Motor Control, Grillner S., Lindblom B., Lubker J., Persson A. [Eds], Stockholm, May 11–12, 1981, pp. 95–103, https://doi.org/10.1016/B978-0-08-028892-5.50013-X

8. Hermansky H., Morgan N. (1994), RASTA Processing of Speech, IEEE Transactions on Speech and Audio Processing, 2(4): 578–589, https://doi.org/10.1109/89.326616

9. Hermansky H. (1990), Perceptual Linear Predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, 87(4): 1738–1752, https://doi.org/10.1121/1.399423

10. Huang X., Acero A., Hon H.-W. (2001), Spoken Language Processing : A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR.

11. James A.P. (2015), Heart rate monitoring using human speech spectral features, Human-Centric Computing and Information Sciences, 5(1): 1–12, https://doi.org/10.1186/s13673-015-0052-z

12. Kabal P. (2017), Audio File Format Specifications, MMSP Lab, McGill University, http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/CSL/CSL.html

13. Kaur J., Kaur R. (2014), Extraction of heart rate parameters using speech analysis, International Journal of Science and Research (IJSR), 3(10): 1374–1376.

14. Kutner M.H., Nachtsheim C., Neter J., Li W (2004), Applied Linear Statistical Models, 4th ed., Irwin: McGraw Hill.

15. Laskowski E.R (2018), Heart Rate: What’s Normal?, Mayo Clinic, https://www.mayoclinic.org/healthy-lifestyle/fitness/expert-answers/heart-rate/faq-20057979

16. Lin L.I-K. (1989), A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45(1): 255–268, https://doi.org/10.2307/2532051

17. Logan B. (2000), Mel frequency cepstral coefficients for music modeling, [In:] 1st International Symposium on Music Information Retrieval, http://ismir2000.ismir.net/papers/logan_paper.pdf

18. Lyons J. (2012), Mel Frequency Cepstral Coefficient (MFCC) Tutorial, Practical Cryptography, http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/#computing-the-mel-filter-bank

19. MacGill M. (2017), Heart rate: what is a normal heart rate?, Medical News Today, https://www.medicalnewstoday.com/articles/235710.php

20. Magre S., Deshmukh R.R., Shrishrimal P.P. (2013), A Comparative Study on Feature Extraction Techniques in Speech Recognition, [In:] International Conference on Recent Advances in Statistics and Their Applications, Aurangabad, https://www.researchgate.net/publication/278549945_A_Comparative_Study_on_Feature_Extraction_Techniques_in_Speech_Recognition

21. Merhav N., Lee C.-H. (1993), On the asymptotic statistical behavior of empirical cepstral coefficients, IEEE Transactions on Signal Processing, 41(5): 1990–1993, https://doi.org/10.1109/78.215323

22. Mesleh A., Skopin D., Baglikov S., Quteishat A. (2012), Heart rate extraction from vowel speech signals, Journal of Computer Science and Technology, 27(6): 1243–1251, https://doi.org/10.1007/s11390-012-1300-6

23. Nasrabadi N.M. (2007), Pattern recognition and machine learning, Journal of Electronic Imaging, 16(4): 049901, https://doi.org/10.1117/1.2819119

24. Oppenheim A.V., Verghese G.C. (2015), Signals, Systems & Inference, Pearson.

25. Orlikoff R.F, Baken R.J. (1989), The effect of the heartbeat on vocal fundamental frequency perturbation, Journal of Speech and Hearing Research, 32(3): 576–582, http://www.ncbi.nlm.nih.gov/pubmed/2779201

26. Partila P., Voznak M., Mikulec M., Zdralek J. (2012), Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state, Advances in Electrical and Electronic Engineering, 10(4): 270–275, https://doi.org/10.15598/aeee.v10i4.738

27. Poh M.-Z., McDuff D.J., Picard R.W. (2011), Advancements in noncontact, multiparameter physiological measurements using a webcam, IEEE Transactions on Biomedical Engineering, 58(1): 7–11, https://doi.org/10.1109/TBME.2010.2086456

28. Ramig L.A. (1983), Effects of physiological aging on vowel spectral noise, Journal of Gerontology, 38(2): 223–225.

29. Reilly K.J, Moore C.A. (2003), Respiratory sinus arrhythmia during speech production, Journal of Speech, Language, and Hearing Research : JSLHR, 46(1): 164–177, http://www.ncbi.nlm.nih.gov/pubmed/12647896

30. Reynolds A., Paivio A. (1968), Cognitive and emotional determinants of speech, Canadian Journal of Psychology/Revue Canadienne de Psychologie, 22(3): 164–175.

31. Roychowdhury S., Bihis M. (2016), AG-MIC: Azure-based generalized flow for medical image classification, IEEE Access, 4: 5243–5257, https://doi.org/10.1109/ACCESS.2016.2605641

32. Sakai M. (2015a), Feasibility study on blood pressure estimations from voice spectrum analysis, International Journal of Computer Applications, 109 (7): 39–43, https://doi.org/10.5120/19204-0848

33. Sakai M. (2015a), Modeling the relationship between heart rate and features of vocal frequency, International Journal of Computer Applications, 120(6): 32–37, https://doi.org/10.5120/21233-3986

34. Schuller B., Friedmann F., Eyben F. (2013), Automatic recognition of physiological parameters in the human voice: heart rate and skin conductance, [In:] 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7219–7223, https://doi.org/10.1109/ICASSP.2013.6639064

35. Schuller B., Friedmann F., Eyben F. (2014), The Munich Biovoice Corpus: effects of physical exercising, heart rate, and skin conductance on human speech production, [In:] Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1506–1510. Reykjavik: European Language Resources Association (ELRA), http://www.lrec-conf.org/proceedings/lrec2014/pdf/611_Paper.pdf

36. ScienceEncyclopedia (2019), Speech – the physiology of speech – air, vocal, words, and sound, JRank Articles, Science Encyclopedia, https://science.jrank.org/pages/6371/Speech-physiology-speech.html

37. Scully C.G. et al. (2012), Physiological parameter monitoring from optical recordings with a mobile phone, IEEE Transactions on Biomedical Engineering, 59(2): 303–306, https://doi.org/10.1109/TBME.2011.2163157

38. Skopin D.E., Baglikov S.U. (2009), Heartbeat feature extraction from vowel speech signal using 2D spectrum representation, [In:] 4th International Conference on Information Technology (ICIT), Amman, Jordan, https://www.zuj.edu.jo/conferences/ICIT09/PaperList/Papers/Image and Signal Processing/450Demitry.pdf.

39. Tan Z.-H., Lindberg B. (2010), Low-complexity variable frame rate analysis for speech recognition and voice activity detection, IEEE Journal of Selected Topics in Signal Processing, 4(5): 798–807, https://doi.org/10.1109/JSTSP.2010.2057192

40. Trouvain J., Truong K.P. (2015), Prosodic characteristics of read speech before and after treadmill running, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany (ISCA), https://research.utwente.nl/en/publications/prosodic-characteristics-of-read-speech-before-and-after-treadmil

41. Tufekci Z., Gowdy J.N. (2000), Feature extraction using discrete wavelet transform for speech recognition, [In:] Proceedings of the IEEE SoutheastCon 200), "Preparing for The New Millennium”, pp. 116–123, https://doi.org/10.1109/SECON.2000.845444

42. Usman M. (2017), On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes, International Journal of Computing and Digital Systems, 6(3): 119–127, https://doi.org/10.12785/IJCDS/060303

43. Usman M., Zubair M., Shiblee M., Rodrigues P., Jaffar S. (2018), Probabilistic modeling of speech in spectral domain using maximum likelihood estimation, Symmetry, 10(12): 750, https://doi.org/10.3390/sym10120750

44. Wolf J.J. (1980), Speech signal processing and feature extraction, [In:] Spoken Language Generation and Understanding, pp. 103–128, Dordrecht: Springer Netherlands, https://doi.org/10.1007/978-94-009-9091-3_6

45. Yasuma F., Hayano J.-I. (2004), Respiratory sinus arrhythmia: why does the heartbeat synchronize with respiratory rhythm?, Chest, 125(2): 683–690, http://www.ncbi.nlm.nih.gov/pubmed/14769752

46. Zhang G., Patuwo B.E., Hu M.Y. (1998), Forecasting with artificial neural networks: the state of the art, International Journal of Forecasting, 14(1): 35–62, https://doi.org/10.1016/S0169-2070%2897%2900044-7

Online first
Early birds
2026, Vol 51
	No 1
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact