Frequency Selection Based Separation of Speech Signals with Reduced Computational Time Using Sparse NMF

Yash Vardhan VARSHNEY; Zia Ahmad ABBASI; Musiur Raza ABIDI; Omar FAROOQ

doi:10.1515/aoa-2017-0031

Authors

Yash Vardhan VARSHNEY Aligarh Muslim University, India
Zia Ahmad ABBASI Aligarh Muslim University, India
Musiur Raza ABIDI Aligarh Muslim University, India
Omar FAROOQ Aligarh Muslim University, India

Abstract

Application of wavelet decomposition is described to speed up the mixed speech signal separation with the help of non-negative matrix factorisation (NMF). It is assumed that the basis vectors of training data of individual speakers had been recorded. In this paper, the spectrogram magnitude of a mixed signal has been factorised with the help of NMF with consideration of sparseness of speech signals. The high frequency components of signal contain very small amount of signal energy. By rejecting the high frequency components, the size of input signal is reduced, which reduces the computational time of matrix factorisation. The signal of lower energy has been separated by using wavelet decomposition. The present work is done for wideband microphone speech signal and standard audio signal from digital video equipment. This shows an improvement in the separation capability using the proposed model as compared with an existing one in terms of correlation between separated and original signals. Obtained signal to distortion ratio (SDR) and signal to interference ratio (SIR) are also larger as compare of the existing model. The proposed model also shows a reduction in omputational time, which results in faster operation.

Keywords:

sparse NMF, mixed speech recognition, machine learning

References

[1] Lee D.D., Seung H.S. (1999), Learning the pans of objects with nonnegative matrix factorization, Nature 401, 788–791.

[2] Paatero P., Tapper U. (1994), Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111–126.

[3] Cho Y-C., Choi S., Bang S-Y. (2003), Non-negative component parts of sound for classification, 3rd IEEE International Symposium on Signal Processing and Information Technology, 633–636.

[4] Benetos E., Kotti M., Kotropoulos C. (2006), Musical instrument classification using non-negative Matrix factorization algorithms and subset feature selection, IEEE International Conference on Acoustics, Speech and Signal Processing, 5, 221–224.

[5] Demir C., Saraclar M., Cemgil A.T. (2013), Single-channel speech-music separation for robust ASR with mixture models, IEEE Transactions on Audio, Speech, and Language Processing, 21, 4, 725–736.

[6] Schmidt M.N., Olsson R.K. (2006), Single-channel speech separation using sparse non-negative matrix factorization, 9th International Conference on Spoken Language Processing, Pittsburgh, PA, USA.

[7] Hoyer P.O. (2004), Non-negative matrix factorization with sparseness constraint, Journal of Machine Learning Research, 1457–1469.

[8] Wang Y., Li Y., Ho K.C., Zare A., Skubic M. (2014), Sparsity promoted non-negative matrix factorization for source separation and detection, 19th International Conference on Digital Signal Processing (DSP), 640–645.

[9] Nasersharif B., Abdali S. (2015), Speech/music separation using non-negative matrix factorization with combination of cost functions, International Symposium on Artificial Intelligence and Signal Processing (AISP), 107–111.

[10] Févotte C., Bertin N., Durrieu J. (2009), Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis, Neural Computation, 21, 793–830.

[11] Lee D.D., Seung H.S. (2000), Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, 13, 556–562.

[12] Zhu B., Li W., Li R., Xue X. (2013), Multi-stage non-negative matrix factorization for monaural singing voice separation, IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, 2096–2107.

[13] Wang Z., Sha F. (2014), Discriminative non-negative matrix factorization for single-channel speech separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3749–3753, Florence, Italy, 4–9 May.

[14] Jingu k., Haesun P. (2008), Sparse nonnegative matrix factorization for clustering, Georgia Institute of Technology, GT-CSE-08-01.

[15] Upadhyaya P., Farooq O., Varshney P., Upadhyaya A. (2013), Enhancement of VSR using low dimension visual feature, International Conference of Multimedia, Signal Processing and Communication Technologies (IMPACT), Aligarh, India, pp. 71–74.

[16] Walpole R.E., Myers R.H., Myers S.L., Ye K.E. (2016), Probability and Statistics for Engineers and Scientists, 9th ed., Pearson, ISBN: 978-0-3216-2911-1, p. 433.

[17] Vincent E., Gribonval R., Fevotte C. (2006), Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech, and Language Processing, 14, 1462–1469.

[18] Févotte C., Gribonval R., Vincent E. (2005), BSS_EVAL toolbox user guide revision 2.0, Tech. Rep. 1706, IRISA, Rennes, France.

[19] Reetz H., Jongman A. (2011), Phonetics: transcription, production, acoustics, and perception, Wiley-Blackwell, ISBN: 978-1-4443-5854-4, pp. 182–200.

Online first
2025, Vol 50
	No 1	No 2	No 3
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Frequency Selection Based Separation of Speech Signals with Reduced Computational Time Using Sparse NMF

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact