10.24425/aoa.2019.129259
Speech Enhancement Using Sliding Window Empirical Mode Decomposition and Hurst-based Technique
iterations is determined by decomposition of many signal windows by standard algorithm and calculating the average number of sifting steps for each module. Based on this approach, the time complexity is reduced significantly with suitable quality of decomposition. Finally, the experimental results show the considerable improvements in speech enhancement under non-stationary noise environments.
References
Chatlani N., Soraghan J.J. (2012), EMD-based filtering (EMDF) of low-frequency noise for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1158–1166.
Dwijayanti S., Yamamori K., Miyoshi M. (2018), Enhancement of speech dynamics for voice activity detection using DNN, EURASIP Journal on Audio, Speech, and Music Processing, 2018, 10, 15 pages.
Gerkmann T., Hendriks R.C. (2012), Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1383–1393.
Ghahabi O., Zhou W., Fischer V. (2018), A robust voice activity detection for real-time automatic speech recognition, [in:] Proceedings of ESSV 2018, Ulm, Germany.
Hamid M.E., Das S., Hirose K., Molla M.K.I. (2012), Speech enhancement using EMD Based Adaptive Soft-Thresholding (EMD-ADT), International Journal of Signal Processing, Image Processing and Pattern Recognition, 5, 2, 1–16.
Hawaldar S., Dixit M. (2011), Speech enhancement for non-stationary noise environments, Signal Image Processing, 2, 4, 129–136.
Ji Y., Baek Y., Park Y.C. (2017), Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 1, 25.
Jin Y.G., Shin J.W., Kim N.S. (2017), Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement, The Journal of the Acoustical Society of America, 141, 3, EL228–EL233.
Kasap C., Arslan M.L. (2013), A unified approach to speech enhancement and voice activity detection, Turkish Journal of Electrical Engineering Computer Sciences, 21, 2, 527–547.
Khaldi K., Boudraa A.O., Komaty A. (2014), Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator, The Journal of the Acoustical Society of America, 135, 1, 451–459.
Kulkarni D.S., Deshmukh R.R., Shrishrimal P.P. (2016), A review of speech signal enhancement techniques, International Journal of Computer Applications, 139, 14, 23–26.
Mai V.K., Pastor D., Aïssa-El-Bey A., Le-Bidan R. (2015), Robust estimation of on-stationary noise power spectrum for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 23, 4, 670–682.
Mandic D.P., Rehman N.U., Wu Z., Huang N.E. (2013), Empirical mode decomposition-based time-frequency analysis of multivariate signals: the power of adaptive data analysis, IEEE Signal Processing Magazine, 30, 6, 74–86.
Mert A., Akan A. (2014), Detrended fluctuation thresholding for empirical mode decomposition based denoising, Digital Signal Processing, 32, 48–56.
Pasad A., Sabu K., Rao P. (2017), Voice activity detection for children's read speech recognition in noisy conditions, [in:] 2017 IEEE Twenty-third National Conference on Communications (NCC), pp. 1–6, March 2–4 , Chennai, India.
Shen L., Yin Q., Zhang Q., Lu M., Liu Z., Zhen H. (2012), Speech enhancement using EMD in low SNR environment. In IEEE Proceedings of the 2012 Second International Conference on Electric Technology and Civil Engineering, pp. 2588–2592, May 18–20.
Soni M.H., Shah N., Patil H.A. (2018), Time-frequency masking-based speech enhancement using generative adversarial network, [in:] 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5039–5043.
Taal C.H., Hendriks R.C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 2125–2136.
Vihari S., Murthy A.S., Soni P., Naik D.C. (2016), Comparison of speech enhancement algorithms, Procedia Computer Science, 89, 666–676.
wa Maina C., MacLaren Walsh J. (2011), Joint speech enhancement and speaker identification using approximate Bayesian inference, IEEE Transactions on Audio, Speech, and Language Processing, 19, 6, 1517–1529.
Zao L., Coelho R. (2011), Colored noise based multicondition training technique for robust speaker identification, IEEE Signal Processing Letters, 18, 11, 675–678.
Zao L., Coelho R., Flandrin P. (2014), Speech enhancement with EMD and hurst-based mode selection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 5, 899–911.
Zeiler A., Faltermeier R., Keck I.R., Tomé A.M., Puntonet C.G., Lang E.W. (2010), Empirical mode decomposition – an introduction, [in:] 2010 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 18–23 July, Barcelona, Spain.
Zhang Y., Tang Z.M., Li Y.P., Luo Y. (2014), A hierarchical framework approach for voice activity detection and speech enhancement, The Scientific World Journal, 2014, Article ID 723643, 8 pages.
Zhao Y., Zhao X., Wang B. (2014), A speech enhancement method based on sparse reconstruction of power spectral density, Computers Electrical Engineering, 40, 4, 1080–1089.
DOI: 10.24425/aoa.2019.129259