Speech Enhancement Using Sliding Window Empirical Mode Decomposition and Hurst-based Technique

Selvaraj POOVARASAN; Eswaran CHANDRA

doi:10.24425/aoa.2019.129259

Authors

Selvaraj POOVARASAN Bharathiar University, India
Eswaran CHANDRA Bharathiar University, India

Abstract

The most challenging in speech enhancement technique is tracking non-stationary noises for long speech segments and low Signal-to-Noise Ratio (SNR). Different speech enhancement techniques have been proposed but, those techniques were inaccurate in tracking highly non-stationary noises. As a result, Empirical Mode Decomposition and Hurst-based (EMDH) approach is proposed to enhance the signals corrupted by non-stationary acoustic noises. Hurst exponent statistics was adopted for identifying and selecting the set of Intrinsic Mode Functions (IMF) that are most affected by the noise components. Moreover, the speech signal was reconstructed by considering the least corrupted IMF. Though it increases SNR, the time and resource consumption were high. Also, it requires a significant improvement under nonstationary noise scenario. Hence, in this article, EMDH approach is enhanced by using Sliding Window (SW) technique. In this SWEMDH approach, the computation of EMD is performed based on the small and sliding window along with the time axis. The sliding window depends on the signal frequency band. The possible discontinuities in IMF between windows are prevented by the total number of modes and the number of sifting iterations that should be set a priori. For each module, the number of sifting iterations is determined by decomposition of many signal windows by standard algorithm and calculating the average number of sifting steps for each module. Based on this approach, the time complexity is reduced significantly with suitable quality of decomposition. Finally, the experimental results show the considerable improvements in speech enhancement under non-stationary noise environments.

Keywords:

Speech Enhancement, Empirical Mode Decomposition, Intrinsic Mode Functions, Hurst exponent, Sliding Window EMD

References

1. Chatlani N., Soraghan J.J. (2012), EMD-based filtering (EMDF) of low-frequency noise for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1158–1166.

2. Dwijayanti S., Yamamori K., Miyoshi M. (2018), Enhancement of speech dynamics for voice activity detection using DNN, EURASIP Journal on Audio, Speech, and Music Processing, 2018, 10, 15 pages.

3. Gerkmann T., Hendriks R.C. (2012), Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1383–1393.

4. Ghahabi O., Zhou W., Fischer V. (2018), A robust voice activity detection for real-time automatic speech recognition, [in:] Proceedings of ESSV 2018, Ulm, Germany.

5. Hamid M.E., Das S., Hirose K., Molla M.K.I. (2012), Speech enhancement using EMD Based Adaptive Soft-Thresholding (EMD-ADT), International Journal of Signal Processing, Image Processing and Pattern Recognition, 5, 2, 1–16.

6. Hawaldar S., Dixit M. (2011), Speech enhancement for non-stationary noise environments, Signal Image Processing, 2, 4, 129–136.

7. Ji Y., Baek Y., Park Y.C. (2017), Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 1, 25.

8. Jin Y.G., Shin J.W., Kim N.S. (2017), Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement, The Journal of the Acoustical Society of America, 141, 3, EL228–EL233.

9. Kasap C., Arslan M.L. (2013), A unified approach to speech enhancement and voice activity detection, Turkish Journal of Electrical Engineering Computer Sciences, 21, 2, 527–547.

10. Khaldi K., Boudraa A.O., Komaty A. (2014), Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator, The Journal of the Acoustical Society of America, 135, 1, 451–459.

11. Kulkarni D.S., Deshmukh R.R., Shrishrimal P.P. (2016), A review of speech signal enhancement techniques, International Journal of Computer Applications, 139, 14, 23–26.

12. Mai V.K., Pastor D., Aïssa-El-Bey A., Le-Bidan R. (2015), Robust estimation of on-stationary noise power spectrum for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 23, 4, 670–682.

13. Mandic D.P., Rehman N.U., Wu Z., Huang N.E. (2013), Empirical mode decomposition-based time-frequency analysis of multivariate signals: the power of adaptive data analysis, IEEE Signal Processing Magazine, 30, 6, 74–86.

14. Mert A., Akan A. (2014), Detrended fluctuation thresholding for empirical mode decomposition based denoising, Digital Signal Processing, 32, 48–56.

15. Pasad A., Sabu K., Rao P. (2017), Voice activity detection for children's read speech recognition in noisy conditions, [in:] 2017 IEEE Twenty-third National Conference on Communications (NCC), pp. 1–6, March 2–4 , Chennai, India.

16. Shen L., Yin Q., Zhang Q., Lu M., Liu Z., Zhen H. (2012), Speech enhancement using EMD in low SNR environment. In IEEE Proceedings of the 2012 Second International Conference on Electric Technology and Civil Engineering, pp. 2588–2592, May 18–20.

17. Soni M.H., Shah N., Patil H.A. (2018), Time-frequency masking-based speech enhancement using generative adversarial network, [in:] 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5039–5043.

18. Taal C.H., Hendriks R.C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 2125–2136.

19. Vihari S., Murthy A.S., Soni P., Naik D.C. (2016), Comparison of speech enhancement algorithms, Procedia Computer Science, 89, 666–676.

20. wa Maina C., MacLaren Walsh J. (2011), Joint speech enhancement and speaker identification using approximate Bayesian inference, IEEE Transactions on Audio, Speech, and Language Processing, 19, 6, 1517–1529.

21. Zao L., Coelho R. (2011), Colored noise based multicondition training technique for robust speaker identification, IEEE Signal Processing Letters, 18, 11, 675–678.

22. Zao L., Coelho R., Flandrin P. (2014), Speech enhancement with EMD and hurst-based mode selection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 5, 899–911.

23. Zeiler A., Faltermeier R., Keck I.R., Tomé A.M., Puntonet C.G., Lang E.W. (2010), Empirical mode decomposition – an introduction, [in:] 2010 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 18–23 July, Barcelona, Spain.

24. Zhang Y., Tang Z.M., Li Y.P., Luo Y. (2014), A hierarchical framework approach for voice activity detection and speech enhancement, The Scientific World Journal, 2014, Article ID 723643, 8 pages.

25. Zhao Y., Zhao X., Wang B. (2014), A speech enhancement method based on sparse reconstruction of power spectral density, Computers Electrical Engineering, 40, 4, 1080–1089.

Online first
Early birds
2026, Vol 51
	No 1
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Speech Enhancement Using Sliding Window Empirical Mode Decomposition and Hurst-based Technique

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact