Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

Houguang LIU; Wenbo WANG; Lin XUE; Jianhua YANG; Zhihua WANG; Chunli HUA

doi:10.24425/aoa.2020.134072

Authors

Houguang LIU China Univerity of Mining and Technology, China
Wenbo WANG China Univerity of Mining and Technology, China
Lin XUE China Univerity of Mining and Technology, China
Jianhua YANG China Univerity of Mining and Technology, China
Zhihua WANG China Univerity of Mining and Technology, China
Chunli HUA China Univerity of Mining and Technology, China

Abstract

Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement (SE). However, there are two problems reducing the performance of the traditional NMFbased SE algorithms. One is related to the overlap-and-add operation used in the short time Fourier transform (STFT) based signal reconstruction, and the other is the Euclidean distance used commonly as an objective function; these methods can cause distortion in the SE process. In order to get over these shortcomings, we propose a novel SE joint framework which combines the discrete wavelet packet transform (DWPT) and the Itakura-Saito nonnegative matrix factorisation (ISNMF). In this approach, the speech signal was first split into a series of subband signals using the DWPT. Then, the ISNMF was used to enhance the speech for each subband signal. Finally, the inverse DWPT (IDWT) was utilised to reconstruct these enhanced speech subband signals. The experimental results show that the proposed joint framework effectively enhances the performance of speech enhancement and performs better in the unseen noise case compared to the traditional NMF methods.

Keywords:

speech enhancement, discrete wavelet packet transform, nonnegative matrix factorisation, Itakura-Saito divergence

References

1. Bavkar S., Sahare S. (2013), PCA based single channel speech enhancement method for highly noisy environment, Proceedings of International Conference on Advances in Computing, pp. 1103–1107, Mysore, https://doi.org/10.1109/ICACCI.2013.6637331

2. Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics Speech & Signal Processing, 27(2): 113–120, https://doi.org/10.1109/TASSP.1979.1163209

3. Bouzid A., Ellouze N. (2016), Speech enhancement based on wavelet packet of an improved principal component analysis, Computer Speech & Language, 35: 58–72, https://doi.org/10.1016/j.csl.2015.06.001

4. Chien J.T., Yang P.K. (2015), Bayesian factorization and learning for monaural source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(1): 185–195, https://doi.org/10.1109/TASLP.2015.2502141

5. Coifman R.R., Wickerhauser M.V. (1992), Entropy-based algorithms for best basis selection, IEEE Transactions on Information Theory, 38(2): 713–718, https://doi.org/10.1109/18.119732

6. Févotte C., Le Roux J., Hershey J.R. (2013), Non-negative dynamical system with application to speech and audio, Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3158–3162, Vancouver, https://doi.org/10.1109/ICASSP.2013.6638240

7. Gokhale M., Khanduja D.K. (2010), Time domain signal analysis using wavelet packet decomposition approach, International Journal of Communications, Network and System Sciences, 3(3): 321–329, https://doi.org/10.4236/ijcns.2010.33041

8. Grancharov V., Samuelsson J., Kleijn B. (2006), On causal algorithms for speech enhancement, IEEE Transactions on Speech & Audio Processing, 14(3): 764–773, https://doi.org/10.1109/TSA.2005.857802

9. Hansen J.H., Pellom B.L. (1998), An effective quality evaluation protocol for speech enhancement algorithms, Proceedings of Fifth International Conference on Spoken Language Processing, pp. 0917–0921, Sydney.

10. Islam M.S., Al Mahmud T.H., Khan W.U., Ye Z. (2019), Supervised single channel speech enhancement based on dual-tree complex wavelet transforms and nonnegative matrix factorization using the joint learning process and subband smooth ratio mask, Electronics, 8(3): 353–371, https://doi.org/10.3390/electronics8030353

11. Krawczyk-Becker M., Gerkmann T. (2016), An evaluation of the perceptual quality of phase-aware single-channel speech enhancement, Journal of the Acoustical Society of America, 140(4): EL364–EL369, https://doi.org/10.1121/1.4965288

12. Lai Y.-H., Chen F., Wang S.-S., Lu X., Tsao Y., Lee C.-H. (2016), A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation, IEEE Transactions on Biomedical Engineering, 64(7): 1568–1578, https://doi.org/10.1109/TBME.2016.2613960

13. Lee D.D., Seung H.S. (1999), Learning the parts of objects by non-negative matrix factorization, Nature, 401(6755): 788–791, https://doi.org/10.1038/44565

14. Lee S., Han D.K., Ko H. (2017), Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities, Applied Acoustics, 117: 257–262, https://doi.org/10.1016/j.apacoust.2016.04.024

15. Li J., Sakamoto S., Hongo S., Akagi M., Suzuki Y.I. (2011), Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication, Speech Communication, 53(5): 677–689, https://doi.org/10.1016/j.specom.2010.04.009

16. Li Y., Zhang X., Sun M. (2017), Robust Non‐negative matrix factorization with β‐divergence for speech separation, ETRI Journal, 39(1): 21–29, https://doi.org/10.4218/etrij.17.0115.0122

17. Luts H. et al. (2010), Multicenter evaluation of signal enhancement algorithms for hearing aids, Journal of the Acoustical Society of America, 127(3): 1491–1505, https://doi.org/10.1121/1.3299168

18. Magron P., Virtane B. (2018), Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization, Proceedings of 2018 Conference of the International Speech Communication Association (INTERSPEECH), pp. 856–860, Graz, https://doi.org/10.21437/Interspeech.2018-1840

19. Mavaddaty S., Ahadi S.M., Seyedin S. (2017), Speech enhancement using sparse dictionary learning in wavelet packet transform domain, Computer Speech & Language, 44: 22–47, https://doi.org/10.1016/j.csl.2017.01.009

20. Mohammadiha N., Smaragdis P., Leijon A. (2013), Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Transactions on Audio, Speech, and Language Processing, 21(10): 2140–2151, https://doi.org/10.1109/TASL.2013.2270369

21. Mowlaee P., Saeidi R. (2014), Time-frequency constraints for phase estimation in single-channel speech enhancement, Proceedings of 2014 14th International Workshop on Acoustic Signal Enhancement, pp. 337–341, Juan-les-Pins, https://doi.org/10.1109/IWAENC.2014.6954314

22. Nakano M., Kameoka H., Le Roux J., Kitano Y., Ono N., Sagayama S. (2010), Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence, Proceedings of 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 283–288, Kittila, https://doi.org/10.1109/MLSP.2010.5589233

23. Nie S., Shan L., Wenju L., Xueliang Z., Jianhua T. (2018), Deep learning based speech separation via NMF-style reconstructions, IEEE/ACM Transactions on Audio Speech & Language Processing, 26(11): 2043–2055, https://doi.org/10.1109/TASLP.2018.2851151

24. Panfili L. M., Haywood J., McCloy D.R., Souza P.E., Wright R.A. (2017), The UW/NU Corpus, Version 2.0, https://depts.washington.edu/phonlab/projects/uw-nu.php

25. Rix A.W., Beerends J.G., Hollier M.P., Hekstra A.P. (2001), Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 749–752, Salt Lake City, https://doi.org/10.1109/ICASSP.2001.941023

26. Saleem N., Khattak M.I.I., Ali M.Y., Shafi M. (2019), Deep neural network for supervised single-channel speech enhancement, Archives of Acoustics, 44(1): 3–12, https://doi.org/10.24425/aoa.2019.126347

27. Saleem N., Khattak M.I., Shafi M. (2018), Unsupervised speech enhancement in low SNR environments via sparseness and temporal gradient regularization, Applied Acoustics, 141: 333–347, https://doi.org/10.1016/j.apacoust.2018.07.027

28. Scalart P., Filho J.V. (1996), Speech enhancement based on a priori signal to noise estimation, Proceedings of 1996 IEEE International Conference on Acoustics, pp. 629–632, Atlanta, https://doi.org/10.1109/ICASSP.1996.543199

29. Sun D.L., Fevotte C. (2014), Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence, Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 6201–6205, Florence, https://doi.org/10.1109/ICASSP.2014.6854796

30. Sun P., Qin J. (2016), Wavelet packet transform based speech enhancement via two-dimensional SPP estimator with generalized gamma priors, Archives of Acoustics, 41(3): 579–590, https://doi.org/10.1515/aoa-2016-0056

31. Taal C.H., Hendriks R.C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19(7):, 2125–2136, https://doi.org/10.1109/TASL.2011.2114881

32. Varga A., Steeneken H.J. (1993), Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12(3): 247–251, https://doi.org/10.1016/0167-6393%2893%2990095-3

33. Varshney Y.V., Abbasi Z.A., Abidi M.R., Farooq O. (2017), Frequency selection based separation of speech signals with reduced computational time using sparse NMF, Archives of Acoustics, 42(2): 287–295, https://doi.org/10.1515/aoa-2017-0031

34. Veisi H., Sameti H., Aroudi A. (2015), Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions, Iet Signal Processing, 9(2): 177–185, https://doi.org/10.1049/iet-spr.2014.0032

35. Wang D., Jiang M., Niu F., Cao Y., Zhou C. (2018), Speech Enhancement Control Design Algorithm for Dual-Microphone Systems Using β-NMF in a Complex Environment, Complexity, 2018, Article ID 6153451, https://doi.org/10.1155/2018/6153451

36. Wang D., Chen J. (2018), Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10): 1702–1726, https://doi.org/10.1109/TASLP.2018.2842159

37. Wang D., Hansen J.H.L. (2018), Speech enhancement for cochlear implant recipients, Journal of the Acoustical Society of America, 143(4): 2244–2254, https://doi.org/10.1121/1.5031112

38. Wang M., Zhang E., Tang Z. (2018), Speech Enhancement Based on NMF under Electric Vehicle Noise Condition, IEEE Access, 6: 9147–9159, https://doi.org/10.1109/ACCESS.2018.2797165

39. Wang S.S., Chern A., Tsao Y., Hung J.W., Lai Y.H., Su B. (2016), Wavelet speech enhancement based on nonnegative matrix factorization, IEEE Signal Processing Letters, 23(8): 1101–1105, https://doi.org/10.1109/LSP.2016.2571727

40. Wang S.S. et al. (2015), Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm, Proceedings of 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 365–369, Hong Kong, https://doi.org/10.1109/APSIPA.2015.7415295

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact