Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors

Pengfei SUN; Jun QIN

doi:10.1515/aoa-2016-0056

Authors

Pengfei SUN Southern Illinois University Carbondale, United States
Jun QIN Southern Illinois University Carbondale, United States

Abstract

Although various speech enhancement techniques have been developed for different applications, existing methods are limited in noisy environments with high ambient noise levels. Speech presence probability (SPP) estimation is a speech enhancement technique to reduce speech distortions, especially for low signal-to-noise ratios (SNRs) scenario. In this paper, we propose a new two-dimensional (2D) Teager energy operators (TEOs) improved SPP estimator for speech enhancement in time-frequency (T-F) domain. Wavelet packet transform (WPT) as a multiband decomposition technique is used to concentrate the energy distribution of speech components. A minimum mean-square error (MMSE) estimator is obtained based on the generalized gamma distribution speech model in WPT domain. In addition, the speech samples corrupted by environment and occupational noise (i.e., machine shop, factory and station) at different input SNRs are used to validate the proposed algorithm. Results suggest that the proposed method achieves a significant enhancement on perceptual quality, compared with four conventional speech enhancement algorithms (i.e., MMSE-84, MMSE-04, Wiener-96, and BTW).

Keywords:

speech enhancement, speech presence probability, wavelet packet transform, two-dimensional Teager energy operator.

References

1. AudioMiCro, Free Industrial and Machinery Sound Effects, Retrived November 29th, 2015, from http://www.audiomicro.com/free-sound-effects/free-industrial-and-machinery/

2. Bahoura M., Rouat J. (2006), Wavelet speech enhancement based on time-scale adaptation, Speech Communication, 48, 12, 1620–1637.

3. Bahoura M., Rouat J. (2001), Wavelet speech enhancement based on the teager energy operator, Signal Processing Letters, IEEE, 8, 1, 10–12.

4. Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, Acoustics, Speech and Signal Processing, IEEE Transactions on, 27, 2,113–120.

5. Bovik A. Maragos C.P., Quatieri T.F. (1993), Am-fm energy detection and separation in noise using multiband energy operators, Signal Processing, IEEE Transactions on, 41, 12, 3245–3265.

6. Chang S.G., Yu B., Vetterli M. (2000), Adaptive wavelet thresholding for image denoising and compression, Image Processing, IEEE Transactions on, 9, 9, 1532–1546.

7. Cohen I., Berdugo B. (2001), Speech enhancement for non-stationary noise environments, Signal processing, 81, 11, 2403–2418.

8. Cohen I. (2003), Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, Speech and Audio Processing, IEEE Transactions on, 11, 5, 466–475.

9. Cohen I. (2004), Speech enhancement using a noncausal a priori snr estimator, Signal Processing Letters, IEEE, 11, 9, 725–728.

10. Dunn R.B., Quatieri T.F., Kaiser J.F. (1993), Detection of transient signals using the energy operator, Acoustics, Speech, and Signal Processing, ICASSP., 1993 IEEE International Conference on, pp. 145–148.

11. Ephraim Y., Malah D. (1984), Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, Acoustics, Speech and Signal Processing, IEEE Transactions on, 32, 6, 1109–1121.

12. Ephraim Y., Van Trees H.L. (1995), A signal subspace approach for speech enhancement, Acoustics, Speech and Signal Processing, IEEE Transactions on, 3, 4, 251–266.

13. Erkelens J.S., Hendriks R.C., Heusdens R., Jensen J. (2007), Minimum mean-square error estimation of discrete fourier coe_cients with generalized gamma priors, Audio, Speech, and Language Processing, IEEE Transactions on, 15, 6, 1741–1752.

14. Fisher E., Tabrikian J., Dubnov S. (2006), Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, Audio, Speech, and Language Processing, IEEE Transactions on, 14, 2, 502–510.

15. Gerkmann T., Breithaupt C., Martin R. (2008), Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors, Audio, Speech, and Language Processing, IEEE Transactions on, 16, 5, 910–919.

16. Ghanbari Y., Karami-Mollaei M. R. (2006), A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets, Speech communication, 48, 8, 927–940.

17. Hendriks R.C., Gerkmann T., Jensen J. (2013), Dft-domain based single-microphone noise reduction for speech enhancement: a survey of the state of the art, Synthesis Lectures on Speech and Audio Processing, 9, 1, 80–84.

18. Hu Y., Loizou P.C. (2004), Speech enhancement based on wavelet thresholding the multitaper spectrum, Speech and Audio Processing, IEEE Transactions on, 12 , 1, 59–67.

19. Hu Y., Loizou P.C. (2007), Subjective comparison and evaluation of speech enhancement algorithms, Speech communication, 49, 7, 588–601.

20. Johnson M.T., Yuan X., Ren Y. (2007), Speech signal enhancement through adaptive wavelet thresholding, Speech Communication, 49, 2, 123–133.

21. Kaiser J.F. (1993), Some useful properties of teager's energy operators, Acoustics, Speech, and Signal Processing, ICASSP-93, IEEE International Conference on, pp. 149–152.

22. Kandia V., Stylianou Y. (2006), Detection of sperm whale clicks based on the teager-kaiser energy operator, Applied Acoustics, 67, 11, 1144–1163.

23. Langner B., Black A.W. (2004), Creating a database of speech in noise for unit selection synthesis, Fifth ISCA Workshop on Speech Synthesis, 229–230.

24. Loizou P.C., Speech enhancement: theory and practice, CRC press, 2013.

25. Martin R. (2002), Speech enhancement using mmse short time spectral estimation with gamma distributed speech priors, Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference, pp. 253–256.

26. Martin R. (2005), Speech enhancement based on minimum mean-square error estimation and supergaussian priors, Speech and Audio Processing, IEEE Transactions on, 13, 5, 845–856.

27. Mohammadiha N., Martin R., Leijon A. (2013), Spectral domain speech enhancement using hmm state-dependent super-gaussian priors, Signal Processing Letters, IEEE, 20, 3, 253–256.

28. Park J., Kim J.-W., Chang J.-H., Jin Y.G., Kim N.S. (2015), Estimation of speech absence uncertainty based on multiple linear regression analysis for speech enhancement, Applied Acoustics, 87, 2015, 205–211.

29. Sanam T. F., Shahnaz C. (2013), Noisy speech enhancement based on an adaptive threshold and a modified hard thresholding function in wavelet packet domain, Digital Signal Processing, 23, 3, 941–951.

30. Scalart P. (1996), Speech enhancement based on a priori signal to noise estimation, Acoustics, Speech, and Signal Processing, ICASSP Conference Proceedings, IEEE International Conference on, pp. 629–632.

31. Simoncelli E. P., and Adelson E. H. (1996), Noise removal via bayesian wavelet coring, Image Processing Proceedings., International Conference on, pp.379-382.

32. Tasmaz H., Ercelebi E. (2008), Speech enhancement based on undecimated wavelet packet-perceptual flterbanks and mmse-stsa estimation in various noise environments, Digital Signal Processing, 18, 5, 797–812.

33. Weickert T., Benjaminsen C., Kiencke U. (2008), Analytic complex wavelet packets for speech enhancement, Acoustics, Speech and Signal Processing, ICASSP 2008. IEEE International Conference, pp. 3269–3272.

34. Ying G., Mitchell C., Jamieson L. (1993), Endpoint detection of isolated utterances based on a modified teager energy measurement, Acoustics, Speech, and Signal Processing, ICASSP-93, IEEE International Conference on, pp. 732–735.

Online first
Early birds
2026, Vol 51
	No 1
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact