Archives of Acoustics, 43, 3, pp. 465–475, 2018

Teaching Machines on Snoring: A Benchmark on Computer Audition for Snore Sound Excitation Localisation

Technical University of Munich, University of Passau

Christoph JANOTT
Technical University of Munich

Zixing ZHANG
University of Passau


University of Passau

Clemens HEISER
Technical University of Munich

Clinic for ENT Medicine, Head and Neck Surgery, Alfried Krupp Krankenhaus, Essen, Germany

Michael HERZOG
Clinic for ENT Medicine, Head and Neck Surgery, Cottbus, Germany

Technical University of Munich

University of Passau, Imperial College London, audEERING GmbH

This paper proposes a comprehensive study on machine listening for localisation of snore sound excitation. Here we investigate the effects of varied frame sizes, and overlap of the analysed audio chunk for extracting low-level descriptors. In addition, we explore the performance of each kind of feature when it is fed into varied classifier models, including support vector machines, $k$-nearest neighbours, linear discriminant analysis, random forests, extreme learning machines, kernel-based extreme learning machines, multilayer perceptrons, and deep neural networks. Experimental results demonstrate that, wavelet packet transform energy can outperform most other features. A deep neural network trained with subband energy ratios reaches the highest performance achieving an unweighted average recall of 72.8% from four types for snoring.
Keywords: snore sound; obstructive sleep apnea; acoustic features; machine learning
Full Text: PDF


Abdel-Hamid O., Mohamed A.-R., Jiang H., Deng L., Penn G., Yu D. (2014), Convolutional neural networks for speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 10, 1533–1545.

Agrawal S., Stone P., McGuinness K., Morris J., Camilleri A. (2002), Sound frequency analysis and the site of snoring in natural and induced sleep, Clinical Otolaryngology & Allied Sciences, 27, 3, 162–166.

Aldrich M.S. (1999), Sleep medicine, Oxford University Press, New York, USA.

Basheer I., Hajmeer M. (2000), Artificial neural networks: fundamentals, computing, design, and application, Journal of Microbiological Methods, 43, 1, 3–31.

Beeton R.J., Wells I., Ebden P., Whittet H., Clarke J. (2007), Snore site discrimination using statistical moments of free field snoring sounds recorded during sleep nasendoscopy, Physiological Measurement, 28, 10, 1225–1236.

Bishop C.M. (2006), Pattern recognition and machine learning, Springer, New York, US.

Breiman L. (2001), Random forests, Machine Learning, 45, 1, 5–32.

Chang C.-C., Lin C.-J. (2011), LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27, software available at libsvm.

Cortes C., Vapnik V. (1995), Support-vector networks, Machine Learning, 20, 3, 273–297.

El Badawey M.R., McKee G., Marshall H., Heggie N., Wilson J.A. (2003), Predictive value of sleep nasendoscopy in the management of habitual snorers, Annals of Otology, Rhinology & Laryngology, 112, 1, 40–44.

Eyben F. (2015), Real-time speech and music classification by large audio feature space extraction, Springer International Publishing, Cham, Switzerland.

Eyben F. et al. (2016), The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing, IEEE Transactions on Affective Computing, 7, 2, 190–202.

Eyben F., Weninger F., Gross F., Schuller B. (2013), Recent developments in opensmile, the munich open-source multimedia feature extractor, [in:] Proc. ACM MM, pp. 835–838, Barcelona, Catalunya, Spain.

Eyben F., Wöllmer M., Schuller B. (2010), Opensmile: the munich versatile and fast open-source audio feature extractor, [in:] Proc. ACM MM, pp. 1459– 1462, Firenze, Italy.

Fiz J.A., Jane R. (2012), Snoring analysis. A complex question, Journal of Sleep Disorders: Treatment and Care, 1, 1, 1–3.

Herzog M., Plössl S., Glien A., Herzog B., Rohrmeier C., Kühnel T., Plontke S., Kellner P. (2014), Evaluation of acoustic characteristics of snoring sounds obtained during drug-induced sleep endoscopy, Sleep and Breathing, pp. 1–9.

Hessel N., de Vries N. (2002), Diagnostic work-up of socially unacceptable snoring. II. Sleep endoscopy, European Archives of Oto-Rhino-Laryngology, 259, 158– 161.

Hill P., Lee B., Osborne J., Osman E. (1999), Palatal snoring identified by acoustic crest factor analysis, Physiological Measurement, 20, 2, 167–174.

Huang G.-B. (2014), An insight into extreme learning machines: random neurons, random features and kernels, Cognitive Computation, 6, 3, 376–390.

Huang G.-B., Zhu, Q.-Y., Siew C.-K. (2006), Extreme learning machine: theory and applications, Neurocomputing, 70, 1, 489–501.

Kezirian E.J., Hohenhorst W., de Vries N. (2011), Drug-induced sleep endoscopy: the vote classification, European Archives of Oto-Rhino-Laryngology, 268, 8, 1233–1236.

Marin J.M., Carrizo S.J., Vicente E., Agusti A.G. (2005), Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study, The Lancet, 365, 9464, 1046–1053.

Miyazaki S., Itasaka Y., Ishikawa K., Togawa K. (1998), Acoustic analysis of snoring and the site of airway obstruction in sleep related respiratory disorders, Acta Oto-Laryngologica, 118, 537, 47–51.

Mokhlesi B., Ham S., Gozal D. (2016), The effect of sex and age on the comorbidity burden of osa: an observational analysis from a large nationwide us health claims database, The European Respiratory Journal, 47, 4, 1162–1169.

Pancoast S., Akbacak M. (2012), Bag-of-audiowords approach for multimedia event classification, [in:] Proceedings of INTERSPEECH, pp. 2105–2108, Portland, Oregon.

Peppard P.E., Young T., Barnet J.H., Palta M., Hagen E.W., Hla K.M. (2013), Increased prevalence of sleep-disordered breathing in adults, American Journal of Epidemiology, 177, 9, 1006–1014.

Peppard P.E., Young T., Palta M., Skatrud J. (2000), Prospective study of the association between sleep-disordered breathing and hypertension, New England Journal of Medicine, 342, 19, 1378–1384.

Pevernagie D., Aarts R.M., De Meyer M. (2010), The acoustics of snoring, Sleep Medicine Reviews, 14, 2, 131–144.

Qian K., Fang Y., Xu Z., Xu H. (2013), Comparison of two acoustic features for classification of different snore signals, Chinese Journal of Electron Devices, 36, 4, 455–459.

Qian K. et al. (2017), Classification of the excitation location of snore sounds in the upper airway by acoustic multi-feature analysis, IEEE Transactions on Biomedical Engineering, 64, 8, 1731–1741.

Qian K., Janott C., Zhang Z., Heiser C., Schuller B. (2016), Wavelet features for classification of vote snore sounds, [in:] Proc. IEEE ICASSP, pp. 221–225, Shanghai, China.

Qian K., Xu Z., Xu H., Ng B.P. (2014), Automatic detection of inspiration related snoring signals from original audio recording, [in:] Proc. ChinaSIP, pp. 95– 99, Xi’an, China.

Qian K., Xu Z., Xu H., Wu Y., Zhao Z. (2015), Automatic detection, segmentation and classification of snore related signals from overnight audio recording, IET Signal Processing, 9, 1, 21–29.

Roebuck A. et al. (2014), A review of signals used in sleep analysis, Physiological Measurement, 35, 1, R1– R57.

Sak H., Senior A.W., Beaufays F. (2014), Long short-term memory recurrent neural network architectures for large scale acoustic modeling, [in:] Proceedings of INTERSPEECH, pp. 338–342, Singapore.

Schmitt M. et al. (2016), A bag-of-audio-words approach for snore sounds excitation localisation, [in:] Proc. ITG Speech Communication, pp. 230–234, Paderborn, Germany.

Schuller B., Steidl S., Batliner A. (2009), The interspeech 2009 emotion challenge, [in:] Proc. INTERSPEECH, pp. 312–315, Brighton, UK.

Schuller B. et al. (2013), The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, [in:] Proc. INTERSPEECH, pp. 148–152, Lyon, France.

Spiegel M.R., Schiller J.J., Srinivasan R.A., LeVan M. (2009), Probability and statistics, McGraw- Hill, New York, NY, USA.

Strollo Jr P.J., Rogers R.M. (1996), Obstructive sleep apnea, New England Journal of Medicine, 334, 2, 99–104.

Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P.-A. (2010), Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, 11, 3371–3408.

Yaggi H.K., Concato J., Kernan W.N., Lichtman J.H., Brass L.M., Mohsenin V. (2005), Obstructive sleep apnea as a risk factor for stroke and death, New England Journal of Medicine, 353, 19, 2034–2041.

Young T., Palta M., Dempsey J., Skatrud J., Weber S., Badr S. (1993), The occurrence of sleepdisordered breathing among middle-aged adults, New England Journal of Medicine, 328, 17, 1230–1235.

DOI: 10.24425/123918

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)