Archives of Acoustics, 42, 2, pp. 223–233, 2017
10.1515/aoa-2017-0025

Music Performers Classification by Using Multifractal Features: A Case Study

Natasa RELJIN
University of Connecticut
United States

David POKRAJAC
Delaware State University
United States

In this paper, we investigated the possibility to classify different performers playing the same melodies at the same manner being subjectively quite similar and very difficult to distinguish even for musically skilled persons. For resolving this problem we propose the use of multifractal (MF) analysis, which is proven as an efficient method for describing and quantifying complex natural structures, phenomena or signals. We found experimentally that parameters associated to some characteristic points within the MF spectrum can be used as music descriptors, thus permitting accurate discrimination of music performers. Our approach is tested on the dataset containing the same songs performed by music group ABBA and by actors in the movie Mamma Mia. As a classifier we used the support vector machines and the classification performance was evaluated by using the four-fold cross-validation. The results of proposed method were compared with those obtained using mel-frequency cepstral coefficients (MFCCs) as descriptors. For the considered two-class problem, the overall accuracy and F-measure higher than 98% are obtained with the MF descriptors, which was considerably better than by using the MFCC descriptors when the best results were less than 77%.
Keywords: music classification; multifractal analysis; support vector machines; cross-validation; mel-frequency cepstral coefficients
Full Text: PDF

References

Audacity software, http://audacityteam.org/, Retrieved Feb. 09, 2017.

Barbedo J.G.A., Lopes A., (2007), Automatic genre classification of musical signals, EURASIP Journal of Advances in Signal Processing, Article ID 64960.

Berenzweig A.L., Ellis D.P.W., Lawrence S. (2002), Using voice segments to improve artist classification of music, Proceedings of the Audio Engineering Society (AES) 22nd International Conference on Virtual, Synthetic and Entertainment Audio, pp. 119–122, Espoo, Finland.

Bigerelle M., Iost A. (2000), Fractal dimension and classification of music, Chaos, Solitons and Fractals, 11, 14, 2179–2192.

Bishop C.M. (2006), Pattern recognition and machine learning, Springer, New York.

Bovill C. (1996), Fractal geometry in architecture and design, Springer Science & Business Media, Boston: Birkhauser.

Buldyrev S., Goldberger A., Havlin S., Peng C, Stanley H. (1994), Fractals in biology and medicine: From DNA to heartbeat, [in:] Fractals in science, Bunde A., Havlin S. (Eds.), pp. 49–87, Berlin: Springer-Verlag.

Chang C.-C., Lin C.-J. (2011), LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2, 3, 27:1–27:27, http://www.csie.ntu.edu.tw/~cjlin/libsvm, Retrieved Feb. 09, 2017.

Chhabra A., Jensen R.V. (1989), Direct determination of the f(α) singularity spectrum, Physical Review Letters, 62, 12, 1327–1330.

Chudy M. (2008), Automatic identification of music performer using the linear prediction cepstral coefficients method, Archives of Acoustics, 33, 1, 27–33.

Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 4, 357–366.

Ezeiza A., de Ipina K.L., Hernandez C., Barroso N. (2011), Combining mel frequency cepstral coefficients and fractal dimensions for automatic speech recognition, Advances in Nonlinear Speech Processing (NOLISP 2011), Lecture Notes in Computer Science (LNAI). 7015, pp. 183-189, Las Palmas de Gran Canaria, Spain.

Falconer K. (2003), Fractal geometry: Mathematical foundations and application. 2nd ed., John Wiley & Sons, Ltd.

Feng L., Nielsen A.B., Hansen L.K. (2008), Vocal segment classification in popular music, Proceedings of 9th International Symposium on Music Information Retrieval (ISMIR08), pp. 121–126, Philadelphia, PA, USA, 2008.

Gavrovska A., Zajic G., Reljin I., Reljin B. (2013), Classification of prolapsed mitral valve versus healthy heart from phonocardiograms by multifractal analysis, Computational and Mathematical Methods in Medicine, Article ID 376152.

Gomez E., Gouyon F., Herrera P., Amatrian X. (2013), MPEG-7 for content-based music processing, Digital Media Processing for Multimedia Interactive Services: Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services: Queen Mary, University of London.

Gonzalez D.C., Ling L.L., Violaro F. (2012), Analysis of the multifractal nature of speech signals, Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, 7441, 740–748.

Grassberger P. (1983), Generalized dimensions of strange attractors, Physics Letters A, 97, 6, 227–230.

Guo G., Li S.Z. (2003), Content-based audio classification and retrieval by support vector machines, IEEE Transactions on Neural Networks, 14, 1, 209–215.

Harte D. (2001), Multifractals: Theory and applications, Chapman and Hall.

Hentschel H.G.E, Procaccia I. (1983), The infinite number of generalized dimensions of fractals and strange attractors, Physica D: Nonlinear Phenomena, 8, 3, 435-444.

Higuchi, T. (1988), Approach to an irregular time series on the basis of the fractal theory, Physica D, 31, 277–283.

Huang X., Acero A., Hon H. (2001), Spoken language processing – A guide to theory, algorithm, and system development, Prentice Hall PTR, New Jersey.

Hsu K.J., Hsu A.J. (1990), Fractal geometry of music, Proceedings of the National Academy of Sciences of the USA, 87, 3, 938–341.

Hsu C.-W., Lin C.-J. (2002), A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13, 2, March 2002, 415–425.

Hughes D., Manaris B. (2012), Fractal dimensions of music and automatic playlist generation, Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 436–440, Piraues, Greece.

Iannaccone P.M., Khokha M.K. (1996), Fractal geometry in biological systems, CRC Press, Boca Raton, FL.

Jensen J.H., Christensen M.G., Ellis D.P.W., Jensen S.H. (2009), Quantitative analysis of a common audio similarity measure, IEEE Transactions on Audio, Speech, and Language Processing, 17, 4, 693–703.

Kecman V. (2001), Learning and soft computing: Support vector machines, neural networks, and fuzzy logic models, The MIT Press, Cambridge, MA, USA.

Kostek B. (2004), Musical instrument classification and duet analysis employing music information retrieval techniques (Invited Paper), Proceedings of IEEE, 92, 4, 712–729.

Kostek B. (2011), Report of the ISMIS 2011 contest: Music information retrieval, Proceedings of 19th International Symposium ISMIS, pp. 715–724, Warsaw, Poland.

Krajewski J., Schnieder S., Sommer D., Batliner A., Schuller B. (2012), Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech, Neurocomputing. 84, 65–75.

Kryszkiewicz M., Rybinski H., Skowron A., Raś Z.W. [Eds.] (2011), Foundations of Intelligent Systems – 19th International Symposium, ISMIS 2011, Warsaw, Poland, June 28–30, Proceedings, Springer series: Lectures Notes in Artificial Intelligence, Vol. 6804, ISBN 978-3-642-21915-3.

Lee C.-H., Shih J.-L., Yu K.-M., Lin H.-S. (2009), Automatic music genre classification based on modulation spectral analysis of spectral analysis of spectral and cepstral features, IEEE Transactions on Multimedia, 11, 4, 670–682.

Li D., Sethi I.K., Dimitrova N., McGee T. (2001), Classification of general audio data for content-based retrieval, Pattern Recognition Letters, 22, 5, 533–544.

Lindsay A., Herre J. (2011), MPEG-7 and MPEG-7 audio – An overview, AES Journal 49, 7–8, 589–594.

Logan B. (2000), Mel frequency cepstral coefficients for music modeling, Proceedings of 1st International Symposium on Music Information Retrieval (ISMIR00), pp. 5-11, Plymouth, MA, USA.

Mandelbrot B.B. (1967), How long is the coast of Britain? Statistical self-similarity and fractional dimension, Science, 156, 636-638.

Mandelbrot B.B. (1982), The fractal geometry of nature, W.H. Freeman, Oxford.

Maragos P., Potamianos A. (1999), Fractal dimensions of speech sounds: Computation and application to automatic speech recognition, Journal of Acoustical Society of America, 105, 3, 1925–1932.

McKinney M.F., Breebaart J. (2003), Features for audio and music classification, Proceedings of 4th International Symposium on Music Information Retrieval (ISMIR03), pp. 151–158, Baltimore, MD, USA.

Muda L., Begam M., Elamvazuthi I. (2010), Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques, Journal of Computing, 2, 3, 138–143.

Peitgen H.-O., Juergens H., Saupe D. (2004), Chaos and fractals, 2nd Ed, Springer.

Pitsikalis V., Maragos P. (2009), Analysis and classification of speech signals by generalized fractal dimension features, Speech Communication, 51, 12, 1206–1223.

Rabiner L., Juang B-H. (1993), Fundamentals of Speech Recognition, Prentice-Hall.

Rein S., Reisslein M. (2006), Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets, Elsevier-Information Sciences, 176, 12, 1629–1655.

Reljin I., Reljin B., Pavlović I., Rakočević I. (2000), Multifractal analysis of gray-scale images, Proceedings of 10th IEEE Mediterranean Electrotechnical Conference (MELECON-2000), pp. 490–493, Lemesos, Cyprus.

Reljin I., Reljin B., Avramov-Ivic M., Jovanovic D., Plavec G., Petrovic S., Bogdanovic G. (2008), Multifractal analysis of the UV/VIS spectra of malignant ascites: Confirmation of the diagnostic validity of a clinically evaluated spectral analysis, Physica A: Statistical Mechanics and its Applications, 387, 14, 3563–3573.

Reljin N., Reyes B.A., Chon K.H. (2015), Tidal volume estimation using the blanket fractal dimension of the tracheal sounds acquired by smartphone, Sensors, 15, 5, 9773–9790.

Rosner A., Schuller B., Kostek B. (2014), Classification of music genres based on music separation into harmonic and drum components, Archives of Acoustics, 39, 4, 629–638.

Sabanal S., Nakagawa M. (1996), The fractal properties of vocal sounds and their application in the speech recognition model, Chaos, Solitons and Fractals, 7, 11, 1825–1843.

Sedivy R, Mader R. (1997), Fractals, chaos and cancer: Do they coincide?, Cancer Investigation, 15, 6, 601–607.

Schedl M., Flexer A., Urbano J. (2013), The neglected user in music information retrieval research, Journal of Intelligent Information Systems, 41, 523–539.

Sheridan S. (2012), The complete ABBA, 2nd Ed, Titan Books, London, UK.

Slaney M. (1998), Auditory toolbox, version 2, Technical report 1998-010, Interval Research Corporation.

Stanley H.E., Meakin P. (1988), Multifractal phenomena in physics and chemistry. Nature 335, 405–409.

Stevens S.S., Volkmann J., Newman E.B. (1937), A scale for the measurement of the psychological magnitude pitch, Journal of the Acoustical Society of America, 8, 3, 185–190.

Su Z.-Y., Wu T. (2006), Multifractal analyses of music sequences, Physica D: Nonlinear Phenomena, 221, 2, 188–194.

Su Z.-Y., Wu T. (2007), Music walk, fractal geometry in music, Physica A: Statistical Mechanics and its Applications, 380, 418–428.

Tan P.-N., Steinbach M., Kumar V. (2005), Introduction to data mining, Addison-Wesley, Upper Saddle River, NJ, USA.

Tsai W.-H., Wang H.-M. (2006), Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals, IEEE Transactions on Audio, Speech, and Language Processing, 14, 1, 330–341.

Tzanetakis G., Cook P. (2002), Musical genre classification of audio signals, IEEE Transactions on Speech and Audio Processing, 10, 5, 293–302.

Vapnik V.N. (1998), Statistical learning theory, John Wiley & Sons, New York.

Vasiljevic J., Reljin B., Sopta J., Mijucic V., Tulic G., Reljin I. (2012), Application of multifractal analysis on microscopic images in the classification of metastatic bone disease, Biomedical Microdevices, 14, 3, 541–548.

Véhel J.L., Mignot P. (1994), Multifractal segmentation of images, Fractals, 2, 3, 379-382.

Véhel J.L. (1996), Fractal approaches in signal processing, [in:] Fractal geometry and analysis: The Mandelbrot festschrift, Evertsz C.J.G., Peitgen H.-O., Voss. R.F. [Eds.], World Scientific.

Véhel J.L. (1998), Introduction to the multifractal analysis of images, Fractal image encoding and analysis, 159, 299–341.

Wold E., Blum T., Keislar D., Wheaton J. (1996), Content-based classification, search, and retrieval of audio, IEEE Multimedia, 3, 3, 27–36.

Zlatintsi A., Maragos P. (2013), Multiscale fractal analysis of musical instrument signals with application to recognition, IEEE Transactions on Audio, Speech, and Language Processing, 21, 4, 737–748.




DOI: 10.1515/aoa-2017-0025

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)