Archives of Acoustics, 41, 2, pp. 245–254, 2016
10.1515/aoa-2016-0024

A Signal Subspace Speech Enhancement Approach Based on Joint Low-Rank and Sparse Matrix Decomposition

Chengli SUN
Nanchang Hangkong University
China

Jianxiao XIE
Nanchang Hangkong University
China

Yan LENG
Shandong Normal University
China

Subspace-based methods have been effectively used to estimate enhanced speech from noisy speech samples. In the traditional subspace approaches, a critical step is splitting of two invariant subspaces associated with signal and noise via subspace decomposition, which is often performed by singular-value decomposition or eigenvalue decomposition. However, these decomposition algorithms are highly sensitive to the presence of large corruptions, resulting in a large amount of residual noise within enhanced speech in low signal-to-noise ratio (SNR) situations. In this paper, a joint low-rank and sparse matrix decomposition (JLSMD) based subspace method is proposed for speech enhancement. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank value for the underlying clean speech matrix. Then the subspace decomposition is performed by means of JLSMD, where the decomposed low-rank part corresponds to enhanced speech and the sparse part corresponds to noise signal, respectively. An extensive set of experiments have been carried out for both of white Gaussian noise and real-world noise. Experimental results show that the proposed method performs better than conventional methods in many types of strong noise conditions, in terms of yielding less residual noise and lower speech distortion.
Keywords: subspace speech enhancement; singular value decomposition; joint low-rank and sparse ma- trix decomposition.
Full Text: PDF

References

Abolhassani A.H., Selouani S.-A., O’Shaughnessy D. (2007), Speech enhance-ment using PCA and variance of the reconstruction error model identification, Automatic Speech Recognition & Understanding.

Bakamides S., Dendrinos M., Carayannis G. (1991), SVD analysis by synthesis of harmonic signals, IEEE Trans. Signal Processing, 39, 472–477.

Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process, 27, 113–120.

Candes E.J., Plan Y. (2010), Matrix Completion With Noise.

Candes E.J., Terence T. (2010), The power of convex relaxation: near-optimal matrix completion, IEEE Transactions on Information Theory, 56, 2053–2080.

Candes E.J., Li X., Ma Y., Wright J. (2011), Robust Principal Component Analysis?, Journal of the ACM, 58, 1–37.

Chang S.G., Yu B., Vetterli M. (2000), Adaptive Wavelet Thresholding for Image Denoising and Compression, IEEE Transactions on Information Theory, 9, 1532–1547.

Chambers J. (1977), Computational method for data analysis, New York, Wiley.

Dendrinos M., Bakamides S., Carayannis G. (1991), Speech enhancement from noise: A regenerative approach, Speech Communication, 10, 45–57.

Ephraim Y., Malah D. (1984), Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process, ASSP-32, 109–1121.

Ephraim Y., Van Trees H. (1995), A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process., 3, 251–266.

Fazel M., Candes E., Recht B., Parrilo P. (2008), Compressed sensing and robust recovery of low rank matrices, [in:] Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA.

Gannot S., Burshtein D., Weinstein E. (1998), Iterative and Sequential Kalman filter based speech enhancement algorithms, IEEE Trans. Acoust. Speech Signal Process, 6, 373–385.

Golub G., Van Loan C. (1989), Matrix computations, 2nd ed, Baltimore, MD: The Johns Hopkins University Press.

Hu Y., Loizou P.C. (2003), A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise, IEEE Trans. on Speech and Audio Processing, 11, 334–341.

Hu Y., Loizou P. (2008), Evaluation of objective quality measures for speech enhancement, IIEEE Trans. Speech Audio Process., 16, 229–238.

Jax P., Vary P. (2003), Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov medol, TEEE International Conference on Acoudtics, Speech, and Signal Processing, 8, 680–683.

Jin W., Scordilis M.S. (2006), Speech enhancement by residual domain constrained optimization, Speech Communication, 148.

Jolliffe I.T. (2002), Principal Component Analysis, Springer, New York.

Kim J.B., Lee K.Y., Lee C.W. (2000), On the applications of the interacting multiple model algorthm for enhancing noisy speech, IEEE Trans. Acoust. Speech Signal Process, 8, 349–352.

Mallat S. (1999), A Wavelet Tour of Signal Processing, California: Academic press 2nd Edition.

Mardani M., Mateos G. (2013), Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies, IEEE Trans. Inf. Theory, 59.

Moor B. (1993), The singular value decomposition and long and short spaces of noisy matrix, IEEE Transactions on Signal Processing, 41, 9, 2826–2838.

Peng Y., Ganesh A., Wright J., Xu W., Ma Y. (2012), RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images, IEEE Transactions on Pattern Analysis and Machine Intelligence.

Plapous C., Marro C., Scalart P. (2006), Improved Signal-to-Noise Ratio Estimation for Speech Enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, 14, 2098–2108.

Quatieri T. (2002), Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall, Upper Saddle River, NJ.

Saadoune A., Selouani A., Selouani S.A. (2014), Perceptual subspace speech enhancement using variance of the reconstruction error, Digital Signal Processing, 24.

Sun C., Zhang Q., Wang M. (2014), A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition, Speech Communication, pp. 44–55.

Toh K., Yun S. (2010), An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pacific J. Optim., pp. 615–640.

Tufts D., Kumaresan R. (1982), Esimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood, Proc. IEEE, 70.

Tufts D., Kumaresan R., Kirsteins I. (1982), Data adaptive signal estimation by singular value decomposition of a data matrix, Proc. IEEE, 70, 684–685.

Vaseghi S.V. (2006), Advanced Digital Signal Processing and Noise Reduction, Third Edition, John Wiley & Sons Ltd.

Virag N. (1999), Single channel speech enhancement based on masking properties of the human auditory system[J], IEEE Trans. Acoust. Speech Signal Process, 7, 126–323.

Wright J., Peng Y., Ma Y. (2009), Robust Principal Component Analysis: Exact Recovery of Corrupted Low-rank Matrices by Convex Optimization, [in:] NIPS. 35. Xu H., Caramanis C., Sanghavi S. (2012), Robust PCA via outlier pursuit, IEEE Transactions on Information Theory, 58, 3047–3064.

Zehtabian A., Hassanpour H., Zehtabian S. (2010), A novel speech enhancement approach based on singular value decomposition and genetic algorithm, International Conference of Soft Computing and Pattern Recognition, pp. 430–435.

Zhou X., Yang C., Yu W. (2013), Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation, IEEE Trans. on Pattern Analysis and Machine Intelligence, 35, 597–610.

Zhou T., Tao D. (2011), GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case, [in:] Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA.




DOI: 10.1515/aoa-2016-0024

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)