Archives of Acoustics, 43, 4, pp. 593–602, 2018

Speech and Music - Nonlinear Acoustical Decoding in Neurocognitive Scenario

Deepa Ghosh Research Foundation

Deepa Ghosh Research Foundation

Speech and music signals are multifractal phenomena. The time displacement profile of speech and music signal show strikingly different scaling behaviour. However, a full complexity analysis of their frequency and amplitude has not been made so far. We propose a novel complex network based approach (Visibility Graph) to study the scaling behaviour of frequency wise amplitude variation of speech and music signals over time and then extract their PSVG (Power of Scale freeness of Visibility Graph). From this analysis it emerges that the scaling behaviour of amplitude-profile of music varies a lot from frequency to frequency whereas it’s almost consistent for the speech signal. Our left auditory cortical areas are proposed to be neurocognitively specialised in speech perception and right ones in music. Hence we can conclude that human brain might have adapted to the distinctly different scaling behaviour of speech and music signals and developed different decoding mechanisms, as if following the so called Fractal Darwinism. Using this method, we can capture all non-stationary aspects of the acoustic properties of the source signal to the deepest level, which has huge neurocognitive significance. Further, we propose a novel non-invasive application to detect neurological illness (here autism spectrum disorder, ASD), using the quantitative parameters deduced from the variation of scaling behaviour for speech and music.
Keywords: speech signal; multifractality; Visibility Graph; Fractal Darwinism; neurocognitive disorders.
Full Text: PDF
Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN).


Ahmadlou M., Adeli H., Adeli A. (2012), Improved visibility graph fractality with application for the diagnosis of autism spectrum disorder, Physica A: Statistical Mechanics and its Applications, 391, 20, 4720–4726.

Babloyantz A., Salazar J.M., Nicolis C. (1985), Evidence of chaotic dynamics of brain activity during the sleep cycle, Physics Letters A, 111, 3, 152–156, doi: 10.1016/0375-9601(85)90444-X.

Bhaduri A., Bhaduri S., Ghosh D. (2017), Visibility graph analysis of heart rate time series and bio-marker of congestive heart failure, Physica A: Statistical Mechanics and its Applications, 482, 786–795, doi: 10.1016/j.physa.2017.04.091.

Bhaduri A., Ghosh D. (2016a), Quantitative assessment of heart rate dynamics during meditation: An ECG based study with multi-fractality and visibility graph, Frontiers in physiology, 7, 44, doi: 10.3389/fphys.2016.00044.

Bhaduri S., Chakraborty A., Ghosh D. (2016), Speech emotion quantification with chaos-based modified visibility graph-possible precursor of suicidal tendency, Journal of Neurology and Neuroscience, 7, 3, 100, doi: 10.21767/2171-6625.1000100.

Bhaduri S., Ghosh D. (2015), Electroencephalographic data analysis with visibility graph technique for quantitative assessment of brain dysfunction, Clinical EEG and Neuroscience, 46, 3, 218–223, doi: 10.1177/1550059414526186..

Bhaduri S., Das R., Ghosh D. (2016), Non-invasive detection of Alzheimer’s disease – multifractality of emotional speech, Journal of Neurology and Neuroscience, 7, 2, 84, doi: 10.21767/2171-6625.100084.

Bhaduri S., Ghosh D. (2016b), Speech, music and multifractality, Current Science (00113891), 110, 9, 1817–1822, doi: 10.18520/cs/v110/i9/1817-1822.

Binnig G., Baatz M., Klenk J., Schmidt G. (2002), Will machines start to think like humans? Artificial versus natural Intelligence, Europhysics news, 33, 2, 44–47, doi: 10.1051/epn:2002202.

Bullmore E.T., Bassett D.S. (2011), Brain graphs: graphical models of the human brain connectome, Annual Review of Clinical Psychology, 7, 113–140, doi: 10.1146/annurev-clinpsy-040510-143934.

Chen Z., Ivanov P.C., Hu K., Stanley H.E. (2002), Effect of nonstationarities on detrended fluctuation analysis, Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 65, 4, 041107–041122, doi: 10.1103/PhysRevE.65.041107.

Cohen M.A., Grossberg S., Wyse L.L. (1995), A spectral network model of pitch perception, The Journal of the Acoustical Society of America, 98, 2, 862–879.

El-Maleh K., Klein M., Petrucci G., Kabal P. (2000), Speech/music discrimination for multimedia applications, [In:] 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), Istanbul, Turkey, Vol.4, pp. 2445-2448, doi: 10.1109/ICASSP.2000.859336

Fulop S.A., Fitz K. (2006), Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications, The Journal of the Acoustical Society of America, 119, 1, 360–371, doi: 10.1121/1.2133000.

Gallagher R., Appenzeller T. (1999), Beyond reductionism, Science, 284, 5411, 79, doi: 10.1126/science.284.5411.79.

Ganguli M. et al. (2011), Classification of neurocognitive disorders in DSM-5: a work in progress, The American Journal of Geriatric Psychiatry : Official Journal of the American Association for Geriatric Psychiatry, 19, 3, 205–210.

González D.C., Ling L.L., Violaro F. (2012), Analysis of the multifractal nature of speech signals, [In:] Alvarez L., Mejail M., Gomez L., Jacobo J. [Eds], Progress in pattern recognition, image analysis, computer vision, and applications, CIARP 2012, Lecture Notes in Computer Science, Vol. 7441, pp. 740–748, Springer, Berlin, Heidelberg, doi: 10.1007/978-3-642-33275-3_91.

Harb H., Chen L. (2003), Robust speech music discrimination using spectrum's first order statistics and neural networks, [In:] Proceedings of Seventh International Symposium on Signal Processing and Its Applications, Vol. 2, pp. 125–128, doi: 10.1109/ISSPA.2003.1224831.

Hickok G., Poeppel D. (2000), Towards a functional neuroanatomy of speech perception, Trends in Cognitive Sciences, 4, 4, 131–138, doi: 10.1016/S1364-6613(00)01463-7.

Higgins J.P. (2002), Nonlinear systems in medicine, Yale Journal of Biology and Medicine, 75, 5–6, 247–260.

Horgan J. (1995), From complexity to perplexity, Scientific American, 272, 6, 104–109.

Hsü K.J., Hsü A.J. (1990), Fractal geometry of music, Proceedings of the National Academy of Sciences of the United States of America, 87, 3, 938–941.

Huang N.E. et al. (1998), The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 454, 1971, 903–995, doi: 10.1098/rspa.1998.0193.

Huang Y.X., Schmitt F.G., Hermand J.P., Gagne Y., Lu Z.M., Liu Y. L. (2011), Arbitrary-order Hilbert spectral analysis for time series possessing scaling statistics: Comparison study with detrended fluctuation analysis and wavelet leaders, Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 84, 1, 016208, doi: 10.1103/PhysRevE.84.016208.

Jafari G.R., Pedram P., Hedayatifar L. (2007), Long-range correlation and multifractality in Bach's Inventions pitches, Journal of Statistical Mechanics: Theory and Experiment, 2007, 04, P04012.

Jiang S., Bian C., Ning X., Ma Q.D. (2013), Visibility graph analysis on heartbeat dynamics of meditation training, Applied Physics Letters, 102, 25, 253702, doi: 10.1063/1.4812645.

Joos M. (1948), Acoustic phonetics, Language, 24, 2, 5–136, doi:10.2307/522229.

Kantelhardt J.W., Koscielny-Bunde E., Rego H.H., Havlin S., Bunde A. (2001), Detecting long-range correlations with detrended fluctuation analysis, Physica A: Statistical Mechanics and its Applications, 295, 3–4, 441–454, doi: 10.1016/S0378-4371(01)00144-3.

Kantelhardt J.W., Zschiegner S.A., Koscielny-Bunde E., Havlin S., Bunde A., Stanley H.E. (2002), Multifractal detrended fluctuation analysis of nonstationary time series, Physica A: Statistical Mechanics and its Applications, 316, 1–4, 87–114, doi: 10.1016/S0378-4371(02)01383-3.

Kinsner W., Grieder W. (2008), Speech segmentation using multifractal measures and amplification of signal features, Proceedings of the 7th IEEE International Conference on Cognitive Informatics, ICCI 2008, Vol. 1, pp. 351–356.

Lacasa L., Luque B., Ballesteros F., Luque J., Nuño J.C. (2008), From time series to complex networks: The visibility graph, Proceedings of the National Academy of Sciences, 105, 13, 4972–4975, doi: 10.1073/pnas.0709247105.

Lacasa L., Luque B., Luque J., Nuño J.C. (2009), The visibility graph: A new method for estimating the Hurst exponent of fractional Brownian motion, EPL (Europhysics Letters), 86, 3, 30001,

Langi A.Z.R., Soemintapura K., Kinsner W. (1997), Multifractal processing of speech signals, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat. No. 97TH8237), Vol. 1, pp. 527–531, doi: 10.1109/ICICS.1997.647154 .

Levelt W.J.M. (1999), Models of word production, Trends in Cognitive Sciences, 3, 6, 223–232.

Mandelbrot B.B. (1967), How long is the coast of Britain? Statistical self-similarity and fractional dimension, Science, 156, 3775, 636–638, doi: 10.1126/science.156.3775.636.

Mandelbrot B.B. (1983), The fractal geometry of nature, American Journal of Physics, 51, 286, doi: 10.1119/1.13295.

Maragos P., Potamianos A. (1999), Fractal dimensions of speech sounds: Computation and application to automatic speech recognition, The Journal of the Acoustical Society of America, 105, 3, 223–232.

Mckay C., Fujinaga I. (2004), Automatic genre classification using large high-level musical feature sets, Proceedings of the International Society of Music Information Retrieval Conference, ISMIR 2004, Vol. 1, pp. 525–530.

Mikulecky D.C. (2001), The emergence of complexity: Science coming of age or science growing old?, Computers and Chemistry, 25, 4, 341–348.

Nilanjana P., Anirban B., Susmita B., Dipak G. (2016), Non-invasive alarm generation for sudden cardiac arrest: a pilot study with visibility graph technique, Translational Biomedicine, 7, 3, doi: 10.21767/2172-0479.100079

Oświęcimka P., Kwapień J., Celińska I., Drożdż S., Rak R. (2011), Computational approach to multifractal music, arXiv preprint arXiv:1106.2902,

Oświęcimka P., Kwapień J., Drożdż S. (2006), Wavelet versus detrended fluctuation analysis of multifractal structures, Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 74, 1, 016103, doi: 10.1103/PhysRevE.74.016103.

Panagiotakis C., Tziritas G. (2005), A speech/music discriminator based on RMS and zero-crossings, IEEE Transactions on Multimedia, 7, 1, 155–166.

Peitgen H.-O., Jürgens H., Saupe D. (2004), Chaos and fractals, New York, NY: Springer.

Poincaré H. (1890), On the problem of three bodies and equations of dynamics [in French: Sur le problème des trois corps et les équations de la dynamique], Acta Mathematica, 13, 1, A3-A270, doi: 10.1007/BF02392506.

Proctor R.W., Van Zandt T. (2008), Human factors in simple and complex systems, Taylor and Francis, CRC Press, Boca Raton.

Rosen S., Howell P. (2010), Signals and systems for speech and hearing, BRILL.

Samson S., Zatorre R.J. (1994), Contribution of the right temporal lobe to musical timbre discrimination, Neuropsychologia, 32, 2, 231–240.

Serrano E., Figliola A. (2009), Wavelet leaders: A new method to estimate the multifractal singularity spectra, Physica A: Statistical Mechanics and its Applications, 388, 14, 2793–2805.

Shannon R.V., Zeng F.G., Kamath V., Wygonski J., Ekelid M. (1995), Speech recognition with primarily temporal cues, Science, 270, 5234, 303–304, doi: 10.1126/science.270.5234.303.

Sporns O., Tononi G., Kötter R. (2005), The human connectome: a structural description of the human brain, PLoS Computational Biology, 1, 4, e42, doi: 10.1371/journal.pcbi.0010042.

Su Z.Y., Wu T. (2006), Multifractal analyses of music sequences, Physica D: Nonlinear Phenomena, 221, 2, 188–194.

Tricot C. (1988), Dimension fractale et spectre, Journal De Chimie Physique, 85, 379–384.

Trost W., Ethofer T., Zentner M., Vuilleumier P. (2011), Mapping aesthetic musical emotions in the brain, Cerebral Cortex, 22, 12, 2769–2783.

Vaggione H. (2001), Some ontological remarks about music composition processes, Computer Music Journal, 25, 1, 54–61, doi: 10.1162/014892601300126115.

Van der Merwe P. (1989), Origins of the popular style : the antecedents of twentieth-century popular music, Clarendon Press.

Varnet L., Wang T., Peter C., Meunie, F., Hoen, M. (2015), How musical expertise shapes speech perception: evidence from auditory classification images, Scientific Reports, 5, 14489, doi: 10.1038/srep14489.

Voss R.F., Clarke J. (1975), ‘1/fnoise’ in music and speech, Nature, 258, 317–318, doi: 10.1038/258317a0.

Wold E., Blum T., Keislar D., Wheaten J. (1996), Content-based classification, search, and retrieval of audio, IEEE Multimedia, 3, 3, 27–36.

Wolfe J. (2002), Speech and music, acoustics and coding, and what music might be ‘for’, Proceedings of the 7th International Conference on Music Perception and Recognition, Sydney 2002, Vol. 6, pp. 10–13.

Zatorre R.J., Belin P., Penhune V.B. (2002), Structure and function of auditory cortex: music and speech, Trends in Cognitive Sciences, 6, 1, 37–46.

Zatorre R.J., Evans A., Meyer E. (1994), Neural mechanisms underlying melodic perception and memory for pitch, The Journal of Neuroscience : The Official Journal of The Society for Neuroscience, 14, 4, 1908–1919.

Zatorre R.J., Evans A.C., Meyer E., Gjedde A. (1992), Lateralization of phonetic and pitch processing in speech perception, Science, 256, 846–849, doi: 10.1126/science.1589767.

DOI: 10.24425/aoa.2018.125153