Archives of Acoustics, 44, 4, pp. 681–692, 2019

Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation

Chengli SUN
Nanchang Hangkong University

Conglin YUAN
Nanchang Hangkong University

Speech enhancement in strong noise condition is a challenging problem. Low-rank and sparse matrix decomposition (LSMD) theory has been applied to speech enhancement recently and good performance was obtained. Existing LSMD algorithms consider each frame as an individual observation. However, real-world speeches usually have a temporal structure, and their acoustic characteristics vary slowly as a function of time. In this paper, we propose a temporal continuity constrained low-rank sparse matrix decomposition (TCCLSMD) based speech enhancement method. In this method, speech separation is formulated as a TCCLSMD problem and temporal continuity constraints are imposed in the LSMD process. We develop an alternative optimisation algorithm for noisy spectrogram decomposition. By means of TCCLSMD, the recovery speech spectrogram is more consistent with the structure of the clean speech spectrogram, and it can lead to more stable and reasonable results than the existing LSMD algorithm. Experiments with various types of noises show the proposed algorithm can achieve a better performance than traditional speech enhancement algorithms, in terms of yielding less residual noise and lower speech distortion.
Keywords: speech enhancement; temporal continuity; low-rank and sparse decomposition
Full Text: PDF


Abdali S., NaserSharif B. (2017), Non-negative matrix factorization for speech/music separation using source dependent decomposition rank, temporal continuity term and filtering, Biomedical Signal Processing and Control, 36, 168–175, doi: 10.1016/j.bspc.2017.03.010.

Bando Y. et al. (2018), Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 2, 215–230, doi: 10.1109/TASLP.2017.2772340.

Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Audio, Speech, and Signal Processing, 27, 2, 113–120, doi: 10.1109/TASSP.1979.1163209.

Bouwmans T., Sobral A., Javed S., Jung S.K., Zahzah E.-H. (2017), Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset, Computer Science Review, 23, 1–71, doi: 10.1016/j.cosrev.2016.11.001.

Cai J.F., Candès E.J., Shen Z. (2010), A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20, 4, 1956–1982, doi: 10.1137/080738970.

Candes E.J., Li X., Ma Y., Wright J. (2011), Robust principal component analysis? Journal of the ACM, 58, 3, 1–37, doi: 10.1145/1970392.1970395.

Candes E.J., Plan Y. (2010), Matrix completion with noise, Proceedings of the IEEE, 98, 6, 925–936, doi: 10.1109/JPROC.2009.2035722.

Cohen I. (2004), Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Processing Letters, 11, 9, 725–728, doi: 10.1109/LSP.2004.833478.

Ephraim Y., Van Trees H. (1995), A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, 3(4), 251–266, doi: 10.1109/89.397090.

Hermus K., Wambacq P., Hamme H.V. (2007), A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP Journal on Advances in Signal Processing, 1–15, doi: 10.1155/2007/45821.

Hu Y., Loizou P.C. (2003), A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Audio, Speech and Language Processing, 11, 4, 334–342, doi: 10.1109/TSA.2003.814458.

Hu Y., Loizou P.C. (2008), Evaluation of objective quality measures for speech enhancement, IEEE Transactions on Audio, Speech and Language Processing, 16, 1, 229–230, doi: 10.1109/TASL.2007.911054.

Jin K.H., Ye J.C. (2018), Sparse and low-rank decomposition of a hankel structured matrix for impulse noise removal, IEEE Transactions on Image Processing, 27, 3, 1448–1461, doi: 10.1109/TIP.2017.2771471.

Kammi S., Mollaei M.R.K. (2017), Noisy speech enhancement with sparsity regularization, Speech Communication, 87, 58–69, doi: 10.1016/j.specom.2017.01.003.

Kheder W.B., Matrouf D., Bousquet P.-M., Bonastre J.-F., Ajili M. (2017), Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition, Computer Speech & Language, 45, 104–122, doi: 10.1016/j.csl.2016.12.007.

Kolbæk M., Tan Z.-H., Jensen J. (2017), Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25, 1, 153–167, doi: 10.1109/TASLP.2016.2628641.

Li X., Fan M., Liu L., Li W. (2018), Distributed-microphones based in-vehicle speech enhancement via sparse and low-rank spectrogram decomposition, Speech Communication, 98, 51–62, 10.1016/j.specom.2017.12.008.

Liu H., Peng J. (2018), Sparse signal recovery via alternating projection method, Signal Processing, 143, 161–170, doi: 10.1016/j.sigpro.2017.09.003.

Loizou P.C. (2007), Speech Enhancement: Theory and Practice, New York: Taylor & Francis.

Lu Y., Loizou P.C. (2008), A geometric approach to spectral subtraction, Speech Communication, 50, 6, 453–466, doi: 10.1016/j.specom.2008.01.003.

Mavaddaty S., Ahadi S. M., Seyedin S. (2016), A novel speech enhancement method by learnable sparse and low-rank decompositionand domain adaptation, Speech Communication, 76, 42–60, 10.1016/j.specom.2015.11.003.

Mohammadiha N., Arne L. (2013), Nonnegative HMM for babble noise derived from speech HMM: Application to speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 21, 5, 998–1011, doi: 10.1109/TASL.2013.2243435.

Moor, de B. (1993), The singular value decomposition and long and short spaces of noisy matrices, IEEE Transactions on Signal Processing, 41, 9, 2826–2839, doi: 10.1109/78.236505.

Paliwal K., Schwerin B., Wójcicki K. (2012), Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator, Speech Communication, 54, 2, 282–305, doi: 10.1016/j.specom.2011.09.003.

Paliwal K., Wójcicki K., Schwerin B. (2010), Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Communication, 52, 5, 450–475, doi: /10.1016/j.specom.2010.02.004.

Plapous C., Marro C., Scalart P. (2006), Improved signal-to-noise ratio estimation for speech enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, 14, 6, 2098–2108, doi: 10.1109/TASL.2006.872621.

Quatieri T. (2002), Discrete-time speech signal processing: principles and practice, Prentice Hall, Upper Saddle River, NJ.

Rugini L., Banelli P. (2016), On the equivalence of maximum SNR and MMSE estimation: applications to additive non-Gaussian channels and quantized observations, IEEE Transactions on Signal Processing, 64, 23, 6190–6199, doi: 10.1109/TSP.2016.2607152.

Scalart P., Vieira-Filho J. (1996), Speech enhancement based on a priori signal to noise estimation. Proceedings on 21st IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, Atlanta, GA, doi: 10.1109/ICASSP.1996.543199.

Shannon B., Paliwal K. (2006), Role of phase estimation in speech enhancement, [in:] INTERSPEECH-2006, paper 1330-Tue3FoP.4,

Shi J., Song W. (2016), Sparse principal component analysis with measurement errors, Journal of Statistical Planning and Inference, 175, 87–99, doi: 10.1016/j.jspi.2016.03.001.

Stark A., Paliwal K. (2011), Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition, Speech Communication, 53, 1, 51–61, 10.1016/j.specom.2010.08.001.

Sun C., Mu J. (2015), An eigenvalue filtering based subspace approach for speech enhancement, Noise Control Engineering Journal, 63, 1, 36–48, doi: 10.3397/1/376305.

Sun C., Xie J., Leng Y. (2016), A signal subspace speech enhancement approach based on joint low-rank and sparse matrix decomposition, Archives of Acoustics, 41, 2, 245–254, 10.1515/aoa-2016-0024.

Sun C., Zhu Q., Wan M. (2014), A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition, Speech Communication, 60, 44–55, doi: 10.1016/j.specom.2014.03.002.

Sun M., Li Y., Gemmeke J.F., Zhang X. (2015), Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence, IEEE Transactions on Audio, Speech, and Language Processing, 23, 7, 1233–1242, doi: 10.1109/TASLP.2015.2427520.

Tan H., Cheng B., Feng J., Feng G., Wang W., Zhang Y.-J. (2013), Low-n-rank tensor recovery based on multi-linear augmented Lagrange multiplier method, Neurocomputing, 119, 144–152, doi: 10.1016/j.neucom.2012.03.039.

Virtanen T. (2007), Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15, 3, 1066–1074, doi: 10.1109/TASL.2006.885253.

Wiener N. (1949), Extrapolation, interpolation, and smoothing of stationary time series, New York: Wiley.

Wright J., Ganesh A., Rao S., Peng Y., Ma Y. (2009), Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, [in:] Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Eds), pp. 2080–2088,

Xu H., Caramanis C., Sanghavi S. (2012), Robust PCA via outlier pursuit, IEEE Transactions on Information Theory, 58, 5, 3047–3064, doi: 10.1109/TIT.2011.2173156.

Zhang Y., Zhao Y. (2013), Real and imaginary modulation spectral subtraction for speech enhancement, Speech Communication, 55, 4, 509–522, doi: 10.1016/j.specom.2012.09.005.

Zhen L., Peng D., Yi Z., Xiang Y., Chen P. (2017), Underdetermined blind source separation using sparse coding, IEEE Transactions on Neural Networks and Learning Systems, 28, 12, 3102–3108, 10.1109/TNNLS.2016.2610960.

DOI: 10.24425/aoa.2019.129724

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)