Archives of Acoustics, 44, 3, pp. 561–573, 2019

Prediction of Psychoacoustic Metrics Using Combination of Wavelet Packet Transform and an Optimized Artificial Neural Network

Isfahan University of Technology
Iran, Islamic Republic of

Isfahan University of Technology
Iran, Islamic Republic of

Isfahan University of Technology
Iran, Islamic Republic of

In this paper, a modified sound quality evaluation (SQE) model is developed based on combination of an optimized artificial neural network (ANN) and the wavelet packet transform (WPT). The presented SQE model is a signal processing technique, which can be implemented in current microphones for predicting the sound quality. The proposed method extracts objective psychoacoustic metrics including loudness, sharpness, roughness, and tonality from sound samples, by using a special selection of multi-level nodes of the WPT combined with a trained ANN. The model is optimized using the particle swarm optimization (PSO) and the back propagation (BP) algorithms. The obtained results reveal that the proposed model shows the lowest mean square error and the highest correlation with human perception while it has the lowest computational cost compared to those of the other models and software.
Keywords: sound quality measurement; psychoacoustic metrics; wavelet packet transform; optimized artificial neural network
Full Text: PDF
Copyright © The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).


Aures W. (1985), Method for calculating auditory roughness, Acustica, 58, 268–281.

Aures W. (1985), Berechnungsverfahren für den sensorischen Wohlklang beliebiger Schallsignale, Acustica, 59, 130–141.

Beheshti Z., Shamsuddin S.M.H., Beheshti E., Yuhaniz S.S. (2014), Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis, Soft Computing, 18, 11, 2253–2270. doi: 10.1007/s00500-013-1198-0.

Blauert J., Jekosch U. (1998), Product-sound quality: A New aspect of machinery noise, Archives of Acoustics, 23, 1, 105–124.

Błazejewski A., Kozioł P., Łuczak M. (2014), Acoustical analysis of enclosure as initial approach to vehicle induced noise analysis Comparatevely using STFT and wavelets, Archives of Acoustics, 39, 3, 385–394, doi: 10.2478/aoa-2014-0042.

Carletti E. (2013), A perception-based method for the noise control of construction machines, Archives of Acoustics, 38, 2, 253–258, doi: 10.2478/aoa-2013-0030.

Chen X., Hu H., Liu F., Gao X.X. (2011), Image reconstruction for an electrical capacitance tomography system based on a least-squares support vector machine and a self-adaptive particle swarm optimization algorithm, Measurement Science and Technology, 22, doi: 10.1088/0957-0233/22/10/104008.

Dunn M.S., Erickson D., Avenue H., Gregory S. (2013), Recommended standards for newborn ICU design, eighth edition, Journal of Perinatology, 33, S2–S16, doi: 10.1038/jp.2013.10.

Fastl H., Zwicker E. (2007), Psychoacoustics: facts and models, Springer, Berlin, Germany, 3rd ed., retrieved from

Fausett L. (1994), Fundamentals of Neural Networks, Prentice-Hall, Englewood Cliffs, NJ.

Gori M., Tesi A. (1992), On the Problem of Local Minima in Recurrent Neural Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 76–86, doi: 10.1109/34.107014.

Hasting A., Davies P. (2002), An examination of Aures’s model of tonality, Proceeding on Sound Quality Symposium, 29, 4–9.

Hecht-Nielsen R. (1992), Theory of the backpropagation neural network, [in:] H. Wechsler, V. Fairfax [Eds.], Neural networks for perception: computation, learning, architectures, vol. 2, pp. 65–93, Harcourt Brace & Co., Orlando, FL,

Huang H.B., Li R.X., Huang X.R., Yang M.L., Ding W.P. (2015), Sound quality evaluation of vehicle suspension shock absorber rattling noise based on the Wigner-Ville distribution, Applied Acoustics, 100, 18–25, doi: 10.1016/j.apacoust.2015.06.018.

Jaddi N.S., Abdullah S. (2018), Optimization of neural network using kidney-inspired algorithm with control of filtration rate and chaotic map for real-world rainfall forecasting, Engineering Applications of Artificial Intelligence, 67, 246–259, doi: 10.1016/j.engappai.2017.09.012.

Kaczmarek T., Preis A. (2010), Annoyance of time-varying road-traffic noise, Archives of Acoustics, 35, 3, 383–393, doi: 10.2478/v10168-010-0032-2.

Kim E.Y., Lee Y.J., Lee S.K. (2012), Sound metric design for evaluation of tonal sound in laser printer, International Journal of Precision Engineering and Manufacturing, 13, 1349–1358, doi: 10.1007/s12541-012-0178-0.

Klonari D., Pastiadis K., Papadelis G., Papanikolao G. (2011), Loudness assessment of musical tones equalized in a-weighted level, Archives of Acoustics, 36, 2, 239–250, doi: 10.2478/v10168-011-0019-7.

Kuo S., Morgan D. (1996), Active noise control systems: algorithms and DSP implementations, Wiley, New York, NY, USA.

Leite R.P., Paul S., Gerges S.N.Y. (2008), A sound quality-based investigation of the HVAC system noise of an automobile model, Applied Acoustics, 70, 1–10, doi: 10.1016/j.apacoust.2008.06.010.

Liu H., Zhang J., Guo P., Bi F., Yu H., Ni G. (2015), Sound quality prediction for engine-radiated noise, Mechanical Systems and Signal Processing, 56, 277–287, doi: 10.1016/j.ymssp.2014.10.005.

Majeed S.A., Husain H., Samad S.A. (2015), Phase autocorrelation bark wavelet transform (PACWT) features for robust speech recognition, Archives of Acoustics, 40, 1, 25–31. doi: 10.1515/aoa-2015-0004.

Mallat S. (2009), A wavelet tour of signal processing, Academic Press, 3rd ed., Burlington, MA, doi: 10.1016/B978-0-12-374370-1.X0001-8.

Miskiewicz A., Rogala T., Szczepańska-Antosik J. (2007), Perceived roughness of two simultaneous harmonic complex tones, Archives of Acoustics, 32, 4, 737–748.

Olbrych S. (2010), Noise pollution in the NICU, Case Western Reserve University, retrived from

de Oliveira L.P.R., Janssens K., Gajdatsy P., Van der Auweraer H., Varoto P.S., Sas P., Desmet W. (2009), Active sound quality control of engine induced cavity noise, Mechanical Systems and Signal Processing, 23, 2, 476–488, doi: 10.1016/j.ymssp.2008.04.005.

Parfieniuk M., Baszun J., Petrovsky A.A. (2006), Computing of masking thresholds for audio coders based on a quaternionic 4-band wavelet packet transform, Archives of Acoustics, 31, 1, 155–165.

Parmanen J. (2007), A-weighted sound pressure level as a loudness/annoyance indicator for environmental sounds – Could it be improved?, Applied Acoustics, 68, 58–70, doi: 10.1016/j.apacoust.2006.02.004.

Parsons C.E., Young K.S., Craske M.G., Stein A.L., Kringelbach M.L. (2014), Introducing the Oxford Vocal (OxVoc) Sounds database: A validated set of non-acted affective sounds from human infants, adults, and domestic animals, Frontiers in Psychology, 5, 562, doi: 10.3389/fpsyg.2014.00562.

Pleban D. (2010), Method of acoustic assessment of machinery based on global acoustic quality index, Archives of Acoustics, 35, 2, 223–235.

Pleban D. (2014), Definition and measure of the sound quality of the machine, Archives of Acoustics, 39, 1, 17–23, doi: 10.2478/aoa-2014-0003.

Qin J., Sun P. (2015), Applications and comparison of continuous wavelet transforms on analysis of A-wave impulse noise, Archives of Acoustics, 40, 4, 503–512, doi: 10.1515/aoa-2015-0050.

Razmjooy N., Mousavi B.S., Soleymani F. (2013), A hybrid neural network imperialist competitive algorithm for skin color segmentation, Mathematical and Computer Modelling, 57, 848–856. doi: 10.1016/j.mcm.2012.09.013.

Silva M.C.G. (2002), Measurements of comfort in vehicles, Measurement Science and Technology, 13, 41–60.

Szczepańska-Antosik J. (2008), Roughness of two simultaneous harmonic complex tones in various pitch registers, Archives of Acoustics, 33, 1, 73–78.

Vencovský V. (2016), Roughness prediction based on a model of cochlear hydrodynamics, Archives of Acoustics, 41, 2, 189–201, doi: 10.1515/aoa-2016-0019.

Wang Y.S. (2009), Sound quality estimation for nonstationary vehicle noises based on discrete wavelet transform, Journal of Sound and Vibration, 324, 3, 1124–1140, doi: 10.1016/j.jsv.2009.02.034.

Wang Y.S., Lee C.M., Kim D.G., Xu Y. (2007), Sound-quality prediction for nonstationary vehicle interior noise based on wavelet pre-processing neural network model, Journal of Sound and Vibration, 299, 4, 933–947, doi: 10.1016/j.jsv.2006.07.034.

Wang Y.S., Shen G.Q., Xing Y.F. (2014), A sound quality model for objective synthesis evaluation of vehicle interior noise based on artificial neural network, Mechanical Systems and Signal Processing, 45, 1, 255–266, doi: 10.1016/j.ymssp.2013.11.001.

Xing Y.F.F., Wang Y.S.S., Shi L., Guo H., Chen H. (2016), Sound quality recognition using optimal wavelet-packet transform and artificial neural network methods, Mechanical Systems and Signal Processing, 66–67, 875–892, doi: 10.1016/j.ymssp.2015.05.003.

Zeng X., Zhao W., Sheng J. (2008), Corresponding relationships between nodes of decomposition tree of wavelet packet and frequency bands of signal subspace, Acta Seismologica Sinica, 21, 1, 91–97, doi: 10.1007/s11589-008-0091-x.

Zhang E., Hou L., Shen C., Shi Y., Zhang Y. (2015), Sound quality prediction of vehicle interior noise and mathematical modeling using a back propagation neural network (BPNN) based on article swarm optimization (PSO), Measurement Science and Technology, 27, 1, 15801, doi: 10.1088/0957-0233/27/1/015801.

Zhang J.R., Zhang J., Lok T.M., Lyu M. R. (2007), A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training, Applied Mathematics and Computation, 185, 2, 1026–1037, doi: 10.1016/j.amc.2006.07.025.

Żwan P. (2008), Automatic singing quality recognition employing artificial neural networks, Archives of Acoustics, 33, 1, 65–71,

DOI: 10.24425/aoa.2019.129271