Archives of Acoustics, 48, 3, pp. 289–315, 2023
10.24425/aoa.2023.146640

Speech Analysis as a Tool for Detection and Monitoring of Medical Conditions: A review

Magdalena IGRAS-CYBULSKA
ORCID ID 0000-0001-5621-7901
1) Techmo sp. z o.o. 2) AGH University of Science and Technology
Poland

Daria HEMMERLING
ORCID ID 0000-0002-2193-7690
1) Techmo sp. z o.o. 2) AGH University of Science and Technology
Poland

Mariusz ZIÓŁKO
ORCID ID 0000-0001-6260-7850
Techmo sp. z o. o.
Poland

Wojciech DATKA
1) Medical University of Bialystok 2) Jagiellonian University
Poland

Ewa STOGOWSKA
Medical University of Bialystok
Poland

Michał KUCHARSKI
Techmo sp. z o. o.
Poland

Rafał RZEPKA
Hokkaido University

Bartosz ZIÓŁKO
ORCID ID 0000-0001-5485-8879
1) Techmo sp. z o.o. 2) Hokkaido University
Poland

The goal of this article is to present and compare recent approaches which use speech and voice analysis as biomarkers for screening tests and monitoring of some diseases. The article takes into account metabolic, respiratory, cardiovascular, endocrine, and nervous system disorders. A selection of articles was performed to identify studies that assess voice features quantitatively in selected disorders by acoustic and linguistic voice analysis. Information was extracted from each paper in order to compare various aspects of datasets, speech parameters, methods of applied analysis and obtained results. 110 research papers were reviewed and 47 databases were summarized. Speech analysis is a promising method for early diagnosis of certain disorders. Advanced computer voice analysis with machine learning algorithms combined with the widespread availability of smartphones allows diagnostic analysis to be conducted during the patient’s visit to the doctor or at the patient’s home during a telephone conversation. Speech analysis is a simple, low-cost, non-invasive and easy-toprovide method of medical diagnosis. These are remarkable advantages, but there are also disadvantages. The effectiveness of disease diagnoses varies from 65% up to 99%. For that reason it should be treated as a medical screening test and should be an indication of the need for classic medical tests.
Keywords: speech analysis; speech features; acoustic parameters; linguistic analysis; voice biomarkers; screening tests
Full Text: PDF
Copyright © 2023 The Author(s). This work is licensed under the Creative Commons Attribution 4.0 International CC BY 4.0.

References

Afshan A., Guo J., Park S.J., Ravi V., Flint J., Alwan A. (2018), Effectiveness of voice quality features in detecting depression, [in:] Interspeech, pp. 1676–1680, doi: 10.21437/Interspeech.2018-1399.

Al Hanai T., Ghassemi M.M., Glass J.R. (2018), Detecting depression with audio/text sequence modeling of interviews, [in:] Interspeech, pp. 1716–1720, doi: 10.21437/Interspeech.2018-2522.

Alghowinem S., Goecke R., Epps J., Wagner M., Cohn J.F. (2016), Cross-cultural depression recognition from vocal biomarkers, [in:] Interspeech, pp. 1943–1947, doi: 10.21437/Interspeech.2016-1339.

Alghowinem S., Goecke R., Wagner M., Epps J., Breakspear M., Parker G. (2012), From joyous to clinically depressed: Mood detection using spontaneous speech, [in:] Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, Youngblood G.M., McCarthy P.M. [Eds.], pp. 141–146, https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/view/ 4478/4782.

Antolík T.K., Fougeron C. (2013), Consonant distortions in dysarthria due to Parkinson’s disease, amyotrophic lateral sclerosis and cerebellar ataxia, [in:] Interspeech, pp. 2152–2156, doi: 10.21437/Interspeech.2013-509.

Aydin K. et al. (2016), Voice characteristics associated with polycystic ovary syndrome, The Laryngoscope, 126(9): 2067–2072, doi: 10.1002/lary.25818.

Barsties B., Verfaillie R., Roy N., Maryn Y. (2013), Do body mass index and fat volume influence vocal quality, phonatory range, and aerodynamics in females?, CoDAS, 25(4): 310–318, doi: 10.1590/s2317-17822013000400003.

Bedi G. et al. (2015), Automated analysis of free speech predicts psychosis onset in high-risk youths, npj Schizophrenia, 1(1): 15030, doi: 10.1038/npjschz.2015.30.

Bozkurt E., Toledo-Ronen O., Sorin A., Hoory R. (2014), Exploring modulation spectrum features for speech-based depression level classification, [in:] Interspeech, doi: 10.21437/Interspeech.2014-312.

Celebi S. et al. (2013). Acoustic, perceptual and aerodynamic voice evaluation in an obese population, The Journal of Laryngology and Otology, 127(10): 987–990, doi: 10.1017/s0022215113001916.

Chitkara D., Sharma R.K. (2016), Voice based detection of type 2 diabetes mellitus, [in:] 2016 2nd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), pp. 83–87, doi: 10.1109/AEEICB.2016.7538402.

Cummins N., Epps J., Sethu V., Breakspear M., Goecke R. (2013), Modeling spectral variability for the classification of depressed speech, [in:] Interspeech, pp. 857–861, doi: 10.21437/Interspeech.2013-242.

Cummins N., Scherer S., Krajewski J., Schnieder S., Epps J., Quatieri T.F. (2015a), A review of depression and suicide risk assessment using speech analysis, Speech Communication, 71: 10–49, doi: 10.1016/j.specom.2015.03.004.

Cummins N., Sethu V., Epps J., Krajewski J. (2015b), Relevance vector machine for depression prediction, [in:] Interspeech, pp. 110–114, doi: 10.21437/Interspeech.2015-37.

Cummins N., Sethu V., Epps J., Schnieder S., Krajewski J. (2015c), Analysis of acoustic space variability in speech affected by depression, Speech Communication, 75: 27–49, doi: 10.1016/j.specom.2015.09.003.

Da Cunha M.G.B., Passerotti G.H., Weber R., Zilberstein B., Cecconello I. (2011), Voice feature characteristic in morbid obese population, Obesity Surgery, 21(3): 340–344, doi: 10.1007/s11695-009-9959-7.

Dassie-Leite A.P., Behlau M., Nesi-França S., Lima M.N., de Lacerda L. (2018), Vocal evaluation of children with congenital hypothyroidism, Journal of Voice, 32(6): 11–19, doi: 10.1016/j.jvoice.2017.08.006.

Deshpande G., Schuller B. (2020), An overview on audio, signal, speech, & language processing for COVID-19, arXic preprint, doi: 10.48550/arXiv.2005.08579.

Despotovic V., Ismael M., Cornil M., Mc Call R., Fagherazzi G. (2021), Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results, Computers in Biology and Medicine, 138: 104944, doi: 10.1016/j.compbiomed.2021.104944.

DeVault D. et al. (2014), SimSensei kiosk: A virtual human interviewer for healthcare decision support, [in:] AAMAS ’14: Proceedings of the 2014 International Conference on Autonomous Agents and Multiagent Systems, pp. 1061–1068.

Dogan E., Sander C., Wagner X., Hegerl U., Kohls E. (2017), Smartphone-based monitoring of objective and subjective data in affective disorders: Where are we and where are we going? Systematic review, Journal of Medical Internet Research, 19(7): e262, doi: 10.2196/jmir.7006.

Ekblad L.L. et al. (2015), Insulin resistance is associated with poorer verbal fluency performance in women, Diabetologia, 58(11): 2545–2553, doi: 10.1007/s00125-015-3715-4.

Faurholt-Jepsen M. et al. (2016), Voice analysis as an objective state marker in bipolar disorder, Translational psychiatry, 6(7): e856–e856, doi: 10.1038/tp.2016.123.

Gosztolya G., Bagi A., Szalóki S., Szendi I., Hoffmann I. (2018), Identifying schizophrenia based on temporal parameters in spontaneous speech, doi: 10.13140/RG.2.2.10884.78721.

Gosztolya G., Vincze V., Tóth L., Pákáski M., Kálmán J., Hoffmann I. (2019), Identifying mild cognitive impairment and mild alzheimer’s disease based on spontaneous speech using ASR and linguistic features, Computer Speech & Language, 53: 181–197, doi: 10.1016/j.csl.2018.07.007.

Gratch J. et al. (2014), The distress analysis interview corpus of human and computer interviews, [in:] Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 3123–3128, http://www.lrec-conf.org/proceedings/lrec2014/pdf/508_Paper.pdf.

Grósz T., Busa-Fekete R., Gosztolya G., Tóth L. (2015), Assessing the degree of nativeness and Parkinson’s condition using Gaussian processes and deep rectifier neural networks, [in:] Interspeech, pp. 919–923, doi: 10.21437/Interspeech.2015-195.

Grünerbl A. et al. (2014), Smartphone-based recognition of states and state changes in bipolar disorder patients, IEEE Journal of Biomedical and Health Informatics, 19(1): 140–148, doi: 10.1109/jbhi.2014.2343154.

Gugatschka M. et al. (2013), Subjective and objective vocal parameters in women with polycystic ovary syndrome, Journal of Voice, 27(1): 98–100, doi: 10.1016/j.jvoice.2012.07.007.

Guidi A., Schoentgen J., Bertschy G., Gentili C., Scilingo E.P., Vanello N. (2017), Features of vocal frequency contour and speech rhythm in 1250 bipolar disorder, Biomedical Signal Processing and Control, 37: 23–31, doi: 10.1016/j.bspc.2017.01.017.

Guidi A., Scilingo E. P., Gentili C., Bertschy G., Landini L., Vanello N. (2015), Analysis of running speech for the characterization of mood state in bipolar patients, [in:] 2015 AEIT International Annual Conference (AEIT), pp. 1–6, doi: 10.1109/AEIT.2015.7415275.

Hamdan A.-l., Jabbour J., Nassar J., Dahouk I., Azar S.T. (2012), Vocal characteristics in patients with type 2 diabetes mellitus, European Archives of Oto-Rhino-Laryngology, 269(5): 1489–1495, doi: 10.1016/j.amjoto.2012.03.008.

Hamdan A.-L., Safadi B., Chamseddine G., Kasty M., Turfe Z.A., Ziade G. (2014), Effect of weight loss on voice after bariatric surgery, Journal of Voice, 28(5): 618–623, doi: 10.1016/j.jvoice.2014.03.004.

Han J. et al. (2020), An early study on intelligent analysis of speech under COVID-19: Severity, sleep quality, fatigue, and anxiety, arXiv preprint, doi: 10.48550/arXiv.2005.00096.

Hannoun A., Zreik T., Husseini S.T., Mahfoud L., Sibai A., Hamdan A.-l. (2011), Vocal changes in patients with polycystic ovary syndrome, Journal of Voice, 25(4): 501–504, doi: 10.1016/j.jvoice.2009.12.005.

Hassan A., Shahin I., Alsabek M.B. (2020), COVID-19 detection system using recurrent neural networks, [in:] 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), pp. 1–5, doi: 10.1109/CCCI49893.2020.9256562.

Helfer B.S., Quatieri T.F., Williamson J.R., Mehta D.D., Horwitz R., Yu B. (2013), Classification of depression state based on articulatory precision, [in:] Interspeech, pp. 2172–2176, doi: 10.21437/Interspeech.2013-513.

Hemmerling, D., Orozco-Arroyave J.R., Skalski A., Gajda J., Nöth E. (2016), Automatic detection of Parkinson’s disease based on modulated vowels, [in:] Interspeech, pp. 1190–1194, doi: 10.21437/Interspeech.2016-1062.

Hönig F., Batliner A., Nöth E., Schnieder S., Krajewski J. (2014), Automatic modelling of depressed speech: relevant features and relevance of gender, [in:] Interspeech, pp. 1248–1252, doi: 10.21437/Interspeech.2014-313.

Horwitz-Martin R.L. et al. (2016), Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis, [in:] Interspeech, pp. 1205–1209, doi: 10.21437/Interspeech.2016-403.

Huang G., Pencina K.M., Coady J.A., Beleva Y.M., Bhasin S., Basaria S. (2015), Functional voice testing detects early changes in vocal pitch in women during testosterone administration, The Journal of Clinical Endocrinology Metabolism, 100(6): 2254–2260, doi: 10.1210/jc.2015-1669.

Huang K.-Y., Wu C.-H., Kuo Y.-T., Jang F.-L. (2016), Unipolar depression vs. bipolar disorder: An elicitation-based approach to short-term detection of mood disorder, [in:] Interspeech, pp. 1452–1456, doi: 10.21437/Interspeech.2016-620.

Junuzovic-Žunic L., Ibrahimagic A., Altumbabic S. (2019), Voice characteristics in patients with thyroid disorders, The Eurasian Journal of Medicine, 51(2): 101–105, doi: 10.5152/eurasianjmed.2018.18331.

Khorram S., Gideon J., McInnis M.G., Provost E.M. (2016), Recognition of depression in bipolar disorder: Leveraging cohort and person specific knowledge, [in:] Interspeech, pp. 1215–1219, doi: 10.21437/Interspeech.2016-837.

Kiss G., Sztahó D., Tulics M.G. (2021), Application for detecting depression, Parkinson’s disease and dysphonic speech, [in:] Interspeech, pp. 956–957.

Klumpp P., Janu T., Arias-Vergara T., Vásquez-Correa J.C., Orozco-Arroyave J.R., Nöth E. (2017), Apkinson – A mobile monitoring solution for Parkinson’s disease, [in:] Interspeech, pp. 1839–1843, doi: 10.21437/Interspeech.2017-416.

Kones R., Rumana U. (2017), Cardiometabolic diseases of civilization: History and maturation of an evolving global threat. An update and call to action, Annals of Medicine, 49(3): 260–274, doi: 10.1080/07853890.2016.1271957.

Kopp W. (2019), How western diet and lifestyle drive the pandemic of obesity and civilization diseases, Diabetes, Metabolic Ayndrome and Obesity: Targets and Therapy, 12: 2221–2236, doi: 10.2147/DMSO.S216791.

Laguarta J., Hueto F., Subirana B. (2020), COVID-19 artificial intelligence diagnosis using only cough recordings, IEEE Open Journal of Engineering in Medicine and Biology, 1: 275–281, doi: 10.1109/OJEMB.2020.3026928.

Lechien J. et al. (2020), Features of mild-to-moderate COVID-19 patients with dysphonia, Journal of Voice, doi: 10.1016/j.jvoice.2020.05.012.

Lopez-Otero P., Docio-Fernandez L.D., Abad A., Garcia-Mateo C. (2017), Depression detection using automatic transcriptions of de-identified speech, [in:] Interspeech, pp. 3157–3161, doi: 10.21437/Interspeech.2017-1201.

Low D.M., Bentley K.H., Ghosh S.S. (2020), Automated assessment of psychiatric disorders using speech: A systematic review, Laryngoscope Investigative Otolaryngology, 5(1): 96–116, doi: 10.1002/lio2.354

Mallela J. et al. (2020), Raw speech waveform based classification of patients with ALS, Parkinson’s disease and healthy controls using CNN-BLSTM, [in:] Interspeech, pp. 4586–4590, doi: 10.21437/Interspeech.2020-2221.

Maor E., Sara J.D., Orbelo D.M., Lerman L.O., Levanon Y., Lerman A. (2018), Voice signal characteristics are independently associated with coronary artery disease, Mayo Clinic Proceedings, pp. 840–847, doi: 10.1016/j.mayocp.2017.12.025.

McGinnis E.W. et al. (2019), Giving voice to vulnerable children: Machine learning analysis of speech detects anxiety and depression in early childhood, IEEE Journal of Biomedical and Health Informatics, 23(6): 2294–2301, doi: 10.1109/JBHI.2019.2913590.

Mirheidari B., Blackburn D., Walker T., Venneri A., Reuber M., Christensen H. (2018), Detecting signs of dementia using word vector representations, [in:] Interspeech, pp. 1893–1897, doi: 10.21437/Interspeech.2018-1764.

Mohammadzadeh A., Heydari E., Azizi F. (2011), Speech impairment in primary hypothyroidism, Journal of Endocrinological Investigation, 34(6): 431–433, doi: 10.1007/BF03346708.

Moro-Velazquez L., Gomez-Garcia J.A., Arias-Londoño J.D., Dehak N., Godino-Llorente J.I. (2021), Advances in Parkinson’s disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects, Biomedical Signal Processing and Control, 66: 102418, doi: 10.1016/j.bspc.2021.102418.

Mota N.B. et al. (2012), Speech graphs provide a quantitative measure of thought disorder in psychosis, PLOS ONE, 7(4): e34928. doi: 10.1371/journal.pone.0034928.

Mundt J.C., Vogel A.P., Feltner D.E., Lenderking W.R. (2012), Vocal acoustic biomarkers of depression severity and treatment response, Biological psychiatry, 72(7): 580–587, doi: 10.1016/j.biopsych.2012.03.015.

Orozco-Arroyave J.R., Arias-Londoño J.F., Vargas-Bonilla J.F., Gonzalez-Rativa M.C., Nöth E. (2014a), New spanish speech corpus database for the analysis of people suffering from Parkinson’s disease, [in:] LREC, pp. 342–347.

Orozco-Arroyave J.R. et al. (2014b), Automatic detection of Parkinson’s disease from words uttered in three different languages, [in:] Interspeech, doi: 10.21437/Interspeech.2014-375.

Pan Y., Mirheidari B., Reuber M., Venneri A., Blackburn D., Christensen H. (2020), Improving detection of Alzheimer’s disease using automatic speech recognition to identify high-quality segments for more robust feature extraction, [in:] Interspeech, pp. 4961–4965, doi: 10.21437/Interspeech.2020-2698.

Pareek V., Sharma R.K. (2016), Coronary heart disease detection from voice analysis, [in:] 2016 IEEE Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), pp. 1–6, doi: 10.1109/SCEECS.2016.7509344.

Pettorino M., Gu W., Półrola P., Fan P. (2017), Rhythmic characteristics of Parkinsonian speech: A study on Mandarin and Polish, [in:] Interspeech, pp. 3172–3176, doi: 10.21437/Interspeech.2017-850.

Pinheiro A.P., Niznikiewicz M. (2019), Altered attentional processing 1395 of happy prosody in schizophrenia, Schizophrenia Research, 206: 217–224, doi: 10.1016/j.schres.2018.11.024.

Pinkas G., Karny Y., Malachi A., Barkai G., Bachar G., Aharonson V. (2020), SARS-CoV-2 detection from voice, IEEE Open Journal of Engineering in Medicine and Biology, 1: 268–274, doi: 10.1109/ojemb.2020.3026468.

Pinto S. et al. (2016), Dysarthria in individuals with Parkinson’s disease: A protocol for a binational, cross-sectional, case-controlled study in French and European Portuguese (FraLusoPark), BMJ Open, 6(11): doi: 10.1136/bmjopen-2016-012885.

Pinyopodjanard S., Suppakitjanusant P., Lomprew P., Kasemkosin N., Chailurkit L., Ongphiphadhanakul B. (2019), Instrumental acoustic voice characteristics in adults with type 2 diabetes, Journal of Voice, 35(1): 116–121, doi: 10.1016/j.jvoice.2019.07.003.

Pompili A. et al. (2020), Assessment of Parkinson’s disease medication state through automatic speech analysis, arXiv preprint, doi: 10.48550/arXiv.2005.14647.

Rohanian M., Hough J., Purver M. (2021), Alzheimer’s dementia recognition using acoustic, lexical, disfluency and speech pause features robust to noisy inputs, [in:] Interspeech, pp. 3820–3824, doi: 10.21437/Interspeech.2021-1633.

Rusz J. et al. (2018), Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(8): 1495–1507, doi: 10.1109/TNSRE.2018.2851787.

Sadeghian R., Schaffer J.D., Zahorian S.A. (2017), Speech processing approach for diagnosing dementia in an early stage, [in:] Interspeech, pp. 2705–2709, doi: 10.21437/Interspeech.2017-1712.

Sahu S., Espy-Wilson C.Y. (2016), Speech features for depression detection, [in:] Interspeech, pp. 1928–1932, doi: 10.21437/Interspeech.2016-1566.

Sattler C. et al. (2017), Interdisciplinary longitudinal study on adult development and aging (ILSE), [in:] Encyclopedia of Geropsychology, Pachana N.A. [Ed.], pp. 1213–1222, Springer, doi: 10.1007/978-981-287-082-7_238.

Scherer S., Stratou G., Gratch J., Morency L.-P. (2013a), Investigating 1435 voice quality as a speaker-independent indicator of depression and PTSD, [in:] Interspeech, pp. 847–851, doi: 10.21437/Interspeech.2013-240.

Scherer S. et al. (2013b), Automatic behavior descriptors for psychological disorder analysis, [in:] 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8, doi: 10.1109/FG.2013.6553789.

Seneviratne N., Williamson J.R., Lammert A.C., Quatieri T.F., Espy-Wilson C. (2020), Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression, [in:] Interspeech, pp. 4551–4555, doi: 10.21437/Interspeech.2020-2758.

Sharma N. et al. (2020), Coswara – A database of breathing, cough, and voice sounds for COVID-19 diagnosis, arXiv preprint, pp. 4811–4815, doi: 10.21437/Interspeech.2020-2768.

Simantiraki O., Charonyktakis P., Pampouchidou A., Tsiknakis M., Cooke M. (2017), Glottal source features for automatic speech-based depression assessment, [in:] Interspeech, pp. 2700–2704, doi: 10.21437/Interspeech.2017-1251.

Sirmans S.M., Pate K.A. (2014), Epidemiology, diagnosis, and management of polycystic ovary syndrome, Clinical Epidemiology, 6: 1–13, doi: 10.2147/clep.s37559.

Skodda S., Grönheit W., Schlegel U. (2011), Intonation and speech rate in Parkinson’s disease: General and dynamic aspects and responsiveness to levodopa admission, Journal of Voice, 25(4): e199–e205, doi: 10.1016/j.jvoice.2010.04.007.

Solomon N.P., Helou L.B., Dietrich-Burns K., Stojadinovic A. (2011), Do obesity and weight loss affect vocal function?, [in:] Seminars in Speech and Language, 31(1): 31–42, doi: 10.1055/s-0031-1271973.

de Souza L.B.R., Pereira R.M., dos Santos M.M., Godoy C.M.A. (2014), Fundamental frequency, phonation maximum time and vocal complaints in morbidly obese women, ABCD. Arquivos Brasileiros de Cirurgia Digestiva, 27(1): 43–46. doi: 10.1590/ s0102-67202014000100011.

de Souza L.B.R., dos Santos M.M. (2018), Body mass index and acoustic voice parameters: Is there a relationship?, Brazilian Journal of Otorhinolaryngology, 84(4): 410–415, doi: 10.1016/j.bjorl.2017.04.003.

Stasak B., Epps J., Cummins N., Goecke R. (2016), An investigation of emotional speech in depression classification, [in:] Interspeech, pp. 485–489, doi: 10.21437/Interspeech.2016-867.

Stasak B., Epps J., Goecke R. (2017), Elicitation design for acoustic depression classification: An investigation of articulation effort, linguistic complexity, and word affect, [in:] Interspeech, pp. 834–838, doi: 10.21437/Interspeech.2017-1223.

Stasak B., Huang Z., Razavi S., Joachim D., Epps J. (2021). Automatic detection of COVID-19 based on short-duration acoustic smartphone speech analysis, Journal of Healthcare Informatics Research, 5(2): 201–217, doi: 10.1007/s41666-020-00090-4.

Stogowska E., Kamnski K.A., Ziółko B., Kowalska I. (2022), Voice changes in reproductive disorders, thyroid disorders and diabetes: A review, Endocrine Connections, 11(3): e201505, doi: 10.1530/EC-21-0505.

Subirana B. et al. (2020), Hi sigma, do I have the Coronavirus?: Call for a new artificial intelligence approach to support health care professionals dealing with the COVID-19 pandemic, arXiv preprint, doi: 10.48550/arXiv.2004.06510.

Sztahó D., Kiss G., Vicsi K. (2015), Estimating the severity of Parkinson’s disease from speech using linear regression and database partitioning, [in:] Interspeech, pp. 498–502, doi: 10.21437/Interspeech.2015-183.

Ujiro T. et al. (2018), Detection of dementia from responses to atypical questions asked by embodied conversational agents, [in:] Interspeech, pp. 1691–1695, doi: 10.21437/Interspeech.2018-1514.

Valstar M. et al. (2013), Avec 2013: The continuous audio/visual emotion and depression recognition challenge, [in:] AVEC’13 Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pp. 3–10, doi: 10.1145/2512530.2512533.

Vásquez-Correa J.C., Arias-Vergara T., Orozco-Arroyave J.R., Nöth E. (2018), A multitask learning approach to assess the dysarthria severity in patients with Parkinson’s disease, [in:] Interspeech, pp. 456–460, doi: 10.21437/interspeech.2018-1988.

Vásquez-Correa J.C., Arias-Vergara T., Orozco-Arroyave J.R., Vargas-Bonilla J.F., Arias-Londoño J.D., Nöth E. (2015), Automatic detection of Parkinson’s disease from continuous speech recorded in non-controlled noise conditions, [in:] Interspeech, pp. 105–109, doi: 10.21437/Interspeech.2015-36.

Vásquez-Correa J.C., Orozco-Arroyave J.R., Nöth E. (2017), Convolutional neural network to model articulation impairments in patients with Parkinson’s disease, [in:] Interspeech, pp. 314–318, doi: 10.21437/Interspeech.2017-1078.

Villa-Cañas T., Arias-Londoño J.D., Orozco-Arroyave J.R., Vargas-Bonilla J.F., Nöth E.

(2015), Low-frequency components analysis in running speech for the automatic detection of Parkinson’s disease, [in:] Interspeech, pp. 100–104, doi: 10.21437/Interspeech.2015-35.

Villatoro-Tello E., Dubagunta P., Fritsch J., Ramírez-de-la Rosa G., Motlicek P., Magimai-Doss M. (2021), Late fusion of the available lexicon and raw waveform-based acoustic modeling for depression and dementia recognition, [in:] Interspeech, pp. 1927-1931, doi: 10.21437/Interspeech.2021-1288.

Wang J., Kothalkar P.V., Cao B., Heitzman D. (2016), Towards automatic detection of amyotrophic lateral sclerosis from speech acoustic and articulatory samples, [in:] Interspeech, pp. 1195–1199, doi: 10.21437/Interspeech.2016-1542.

Wankerl S., Nöth E., Evert S. (2017), An n-gram based approach to the automatic diagnosis of Alzheimer’s disease from spoken language, [in:] Interspeech, pp. 3162–3166, doi: 10.21437/Interspeech.2017-1572.

Warnita T., Inoue N., Shinoda K. (2018), Detecting Alzheimer’s disease using gated convolutional neural network from audio data, [in:] Interspeech, pp. 1706–1710, doi: 10.21437/Interspeech.2018-1713.

Wei W.,Wang J., Ma J., Cheng N., Xiao J. (2020), A real-time robot-based auxiliary system for risk evaluation of COVID-19 infection, arXiv preprint, doi: 10.48550/arXiv.2008.07695.

Weiner J., Angrick M., Umesh S., Schultz T. (2018), Investigating the effect of audio duration on dementia detection using acoustic features, [in:] Interspeech, pp. 2324–2328, doi: 10.21437/Interspeech.2018-57.

Weiner J., Herff C., Schultz T. (2016), Speech-based detection of Alzheimer’s disease in conversational German, [in:] Interspeech, pp. 1938–1942, doi: 10.21437/Interspeech.2016-100.

Wodzinski M., Skalski A., Hemmerling D., Orozco-Arroyave J.R., Nöth E. (2019), Deep learning approach to Parkinson’s disease detection using voice recordings and convolutional neural network dedicated to image classification, [in:] 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 717–720, doi: 10.1109/embc.2019.8856972.

Xezonaki D., Paraskevopoulos G., Potamianos A., Narayanan S. (2020), Affective conditioning on hierarchical attention networks applied to depression detection from transcribed clinical interviews, [in:] Interspeech, pp. 4556–4560, doi: 10.21437/Interspeech.2020-2819.

Yang Y., Fairbairn C., Cohn J.F. (2012), Detecting depression severity from vocal prosody, IEEE Transactions on Affective Computing, 4(2): 142–150, doi: 10.1109/T-AFFC.2012.38.

Zhan A. et al. (2016), High frequency remote monitoring of Parkinson’s disease via smartphone: Platform overview and medication response detection, arXiv preprint, doi: 10.48550/arXiv.1601.00960.

Zhao Z. et al. (2020), Hybrid network feature extraction for depression assessment from speech, [in:] Interspeech, pp. 4956–4960, doi: 10.21437/Interspeech.2020-2396.

Zlotnik A., Montero J.M., San-Segundo R., Gallardo-Antolín A. (2015), Random forest-based prediction of Parkinson’s disease progression using acoustic, ASR and intelligibility features, [in:] Interspeech, pp. 503–507, doi: 10.21437/Interspeech.2015-184.




DOI: 10.24425/aoa.2023.146640