Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation

Chengli SUN; Conglin YUAN

doi:10.24425/aoa.2019.129724

Authors

Chengli SUN Nanchang Hangkong University, China
Conglin YUAN Nanchang Hangkong University, China

Abstract

Speech enhancement in strong noise condition is a challenging problem. Low-rank and sparse matrix decomposition (LSMD) theory has been applied to speech enhancement recently and good performance was obtained. Existing LSMD algorithms consider each frame as an individual observation. However, real-world speeches usually have a temporal structure, and their acoustic characteristics vary slowly as a function of time. In this paper, we propose a temporal continuity constrained low-rank sparse matrix decomposition (TCCLSMD) based speech enhancement method. In this method, speech separation is formulated as a TCCLSMD problem and temporal continuity constraints are imposed in the LSMD process. We develop an alternative optimisation algorithm for noisy spectrogram decomposition. By means of TCCLSMD, the recovery speech spectrogram is more consistent with the structure of the clean speech spectrogram, and it can lead to more stable and reasonable results than the existing LSMD algorithm. Experiments with various types of noises show the proposed algorithm can achieve a better performance than traditional speech enhancement algorithms, in terms of yielding less residual noise and lower speech distortion.

Keywords:

speech enhancement, temporal continuity, low-rank and sparse decomposition

References

1. Abdali S., NaserSharif B. (2017), Non-negative matrix factorization for speech/music separation using source dependent decomposition rank, temporal continuity term and filtering, Biomedical Signal Processing and Control, 36, 168–175, https://doi.org/10.1016/j.bspc.2017.03.010.

2. Bando Y. et al. (2018), Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 2, 215–230, https://doi.org/10.1109/TASLP.2017.2772340.

3. Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Audio, Speech, and Signal Processing, 27, 2, 113–120, https://doi.org/10.1109/TASSP.1979.1163209.

4. Bouwmans T., Sobral A., Javed S., Jung S.K., Zahzah E.-H. (2017), Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset, Computer Science Review, 23, 1–71, https://doi.org/10.1016/j.cosrev.2016.11.001.

5. Cai J.F., Candès E.J., Shen Z. (2010), A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20, 4, 1956–1982, https://doi.org/10.1137/080738970.

6. Candes E.J., Li X., Ma Y., Wright J. (2011), Robust principal component analysis? Journal of the ACM, 58, 3, 1–37, https://doi.org/10.1145/1970392.1970395.

7. Candes E.J., Plan Y. (2010), Matrix completion with noise, Proceedings of the IEEE, 98, 6, 925–936, https://doi.org/10.1109/JPROC.2009.2035722.

8. Cohen I. (2004), Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Processing Letters, 11, 9, 725–728, https://doi.org/10.1109/LSP.2004.833478.

9. Ephraim Y., Van Trees H. (1995), A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, 3(4), 251–266, https://doi.org/10.1109/89.397090.

10. Hermus K., Wambacq P., Hamme H.V. (2007), A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP Journal on Advances in Signal Processing, 1–15, https://doi.org/10.1155/2007/45821.

11. Hu Y., Loizou P.C. (2003), A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Audio, Speech and Language Processing, 11, 4, 334–342, https://doi.org/10.1109/TSA.2003.814458.

12. Hu Y., Loizou P.C. (2008), Evaluation of objective quality measures for speech enhancement, IEEE Transactions on Audio, Speech and Language Processing, 16, 1, 229–230, https://doi.org/10.1109/TASL.2007.911054.

13. Jin K.H., Ye J.C. (2018), Sparse and low-rank decomposition of a hankel structured matrix for impulse noise removal, IEEE Transactions on Image Processing, 27, 3, 1448–1461, https://doi.org/10.1109/TIP.2017.2771471.

14. Kammi S., Mollaei M.R.K. (2017), Noisy speech enhancement with sparsity regularization, Speech Communication, 87, 58–69, https://doi.org/10.1016/j.specom.2017.01.003.

15. Kheder W.B., Matrouf D., Bousquet P.-M., Bonastre J.-F., Ajili M. (2017), Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition, Computer Speech & Language, 45, 104–122, https://doi.org/10.1016/j.csl.2016.12.007.

16. Kolbæk M., Tan Z.-H., Jensen J. (2017), Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25, 1, 153–167, https://doi.org/10.1109/TASLP.2016.2628641.

17. Li X., Fan M., Liu L., Li W. (2018), Distributed-microphones based in-vehicle speech enhancement via sparse and low-rank spectrogram decomposition, Speech Communication, 98, 51–62, 10.1016/j.specom.2017.12.008.

18. Liu H., Peng J. (2018), Sparse signal recovery via alternating projection method, Signal Processing, 143, 161–170, https://doi.org/10.1016/j.sigpro.2017.09.003.

19. Loizou P.C. (2007), Speech Enhancement: Theory and Practice, New York: Taylor & Francis.

20. Lu Y., Loizou P.C. (2008), A geometric approach to spectral subtraction, Speech Communication, 50, 6, 453–466, https://doi.org/10.1016/j.specom.2008.01.003.

21. Mavaddaty S., Ahadi S. M., Seyedin S. (2016), A novel speech enhancement method by learnable sparse and low-rank decompositionand domain adaptation, Speech Communication, 76, 42–60, 10.1016/j.specom.2015.11.003.

22. Mohammadiha N., Arne L. (2013), Nonnegative HMM for babble noise derived from speech HMM: Application to speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 21, 5, 998–1011, https://doi.org/10.1109/TASL.2013.2243435.

23. Moor, de B. (1993), The singular value decomposition and long and short spaces of noisy matrices, IEEE Transactions on Signal Processing, 41, 9, 2826–2839, https://doi.org/10.1109/78.236505.

24. Paliwal K., Schwerin B., Wójcicki K. (2012), Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator, Speech Communication, 54, 2, 282–305, https://doi.org/10.1016/j.specom.2011.09.003.

25. Paliwal K., Wójcicki K., Schwerin B. (2010), Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Communication, 52, 5, 450–475, doi: /10.1016/j.specom.2010.02.004.

26. Plapous C., Marro C., Scalart P. (2006), Improved signal-to-noise ratio estimation for speech enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, 14, 6, 2098–2108, https://doi.org/10.1109/TASL.2006.872621.

27. Quatieri T. (2002), Discrete-time speech signal processing: principles and practice, Prentice Hall, Upper Saddle River, NJ.

28. Rugini L., Banelli P. (2016), On the equivalence of maximum SNR and MMSE estimation: applications to additive non-Gaussian channels and quantized observations, IEEE Transactions on Signal Processing, 64, 23, 6190–6199, https://doi.org/10.1109/TSP.2016.2607152.

29. Scalart P., Vieira-Filho J. (1996), Speech enhancement based on a priori signal to noise estimation. Proceedings on 21st IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, Atlanta, GA, https://doi.org/10.1109/ICASSP.1996.543199.

30. Shannon B., Paliwal K. (2006), Role of phase estimation in speech enhancement, [in:] INTERSPEECH-2006, paper 1330-Tue3FoP.4,

https://www.isca-speech.org/archive/archive_papers/interspeech_2006/i06_1330.pdf.

31. Shi J., Song W. (2016), Sparse principal component analysis with measurement errors, Journal of Statistical Planning and Inference, 175, 87–99, https://doi.org/10.1016/j.jspi.2016.03.001.

32. Stark A., Paliwal K. (2011), Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition, Speech Communication, 53, 1, 51–61, 10.1016/j.specom.2010.08.001.

33. Sun C., Mu J. (2015), An eigenvalue filtering based subspace approach for speech enhancement, Noise Control Engineering Journal, 63, 1, 36–48, https://doi.org/10.3397/1/376305.

34. Sun C., Xie J., Leng Y. (2016), A signal subspace speech enhancement approach based on joint low-rank and sparse matrix decomposition, Archives of Acoustics, 41, 2, 245–254, 10.1515/aoa-2016-0024.

35. Sun C., Zhu Q., Wan M. (2014), A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition, Speech Communication, 60, 44–55, https://doi.org/10.1016/j.specom.2014.03.002.

36. Sun M., Li Y., Gemmeke J.F., Zhang X. (2015), Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence, IEEE Transactions on Audio, Speech, and Language Processing, 23, 7, 1233–1242, https://doi.org/10.1109/TASLP.2015.2427520.

37. Tan H., Cheng B., Feng J., Feng G., Wang W., Zhang Y.-J. (2013), Low-n-rank tensor recovery based on multi-linear augmented Lagrange multiplier method, Neurocomputing, 119, 144–152, https://doi.org/10.1016/j.neucom.2012.03.039.

38. Virtanen T. (2007), Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15, 3, 1066–1074, https://doi.org/10.1109/TASL.2006.885253.

39. Wiener N. (1949), Extrapolation, interpolation, and smoothing of stationary time series, New York: Wiley.

40. Wright J., Ganesh A., Rao S., Peng Y., Ma Y. (2009), Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, [in:] Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Eds), pp. 2080–2088,

http://papers.nips.cc/paper/3704-robust-principal-component-analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization.pdf.

41. Xu H., Caramanis C., Sanghavi S. (2012), Robust PCA via outlier pursuit, IEEE Transactions on Information Theory, 58, 5, 3047–3064, https://doi.org/10.1109/TIT.2011.2173156.

42. Zhang Y., Zhao Y. (2013), Real and imaginary modulation spectral subtraction for speech enhancement, Speech Communication, 55, 4, 509–522, https://doi.org/10.1016/j.specom.2012.09.005.

43. Zhen L., Peng D., Yi Z., Xiang Y., Chen P. (2017), Underdetermined blind source separation using sparse coding, IEEE Transactions on Neural Networks and Learning Systems, 28, 12, 3102–3108, 10.1109/TNNLS.2016.2610960.

Online first
2025, Vol 50
	No 1	No 2	No 3
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact