Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Mikolaj KUNDEGORSKI; Philip J.B. JACKSON; Bartosz ZIÓŁKO

doi:10.2478/aoa-2014-0045

Authors

Mikolaj KUNDEGORSKI School of Engineering and Computing Sciences, Durham University, Durham, UK, United Kingdom
Philip J.B. JACKSON Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, UK, United Kingdom
Bartosz ZIÓŁKO Department of Electronics, AGH University of Science and Technology, Kraków, Poland, Poland

Abstract

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

Keywords:

speech enhancement, reverberation, ASR, Polish.

References

Alinaghi, A., Wang, W., Jackson, P. J. B., 2011. Integrating binaural cues and blind source separation method for separating reverberant speech mix- tures. In: Proc. of ICASSP, Praque. pp. 209–212.

Blauert, J., 1997. Spatial Hearing: The Psychophysics of Human Sound Localization, 2nd Edition. MIT Press.

Boll, S., 1979. Suppression of acoustic noise in speech using spectral sub- traction. Acoustics Speech and Signal Processing IEEE Trans. on 27 (2), 113–120.

Chien, J.-T., Lai, P.-Y., 2005. Car speech enhancement using a microphone array. Int. Journal of Speech Technology 8, 79–91.

Drgas, S., Kociński, J., Sęk, A., 2008. Logatom articulation index evaluation of speech enhanced by blind source separation and single-channel noise reduction. Archives of Acoustics 33 (4).

Fukumori, T., Nakayama, M., Nishiura, T., Yamashita, Y., Oct 2013. Esti- mation of speech recognition performance in noisy and reverberant envi- ronments using pesq score and acoustic parameters. In: Signal and Infor- mation Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific. pp. 1–4.

Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., Zue, V., 1993. TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia.

Gomez, R., Kawahara, T., 2010. Robust speech recognition based on derever- beration parameter optimization using acoustic model likelihood. Audio, Speech and Language Processing, IEEE Trans. on 18 (7), 1708–1716.

Grocholewski, S., 1998. First database for spoken Polish. In: Proc. of International Conference on Language Resources and Evaluation, Grenada. pp. 1059–1062.

Hartmann, W. M., 1999. How we localize sound. Physics Today 52 (11), 24–29.

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B., 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29 (6), 82.

Hummersone, C., Mason, R., Brookes, T., 2010. Dynamic precedence effect modeling for source separation in reverberant environments. Audio, Speech, and Language Processing, IEEE Trans. on 18 (7), 1867–1871.

Jeub, M., Schafer, M., Esch, T., Vary, P., 2010. Model-based dereverberation preserving binaural cues. Audio, Speech, and Language Processing, IEEE Trans. on 18 (7), 1732 –1745.

Krishnamoorthy, P., Prasanna, S., 2009. Reverberant speech enhanceent by temporal and spectral processing. Audio, Speech, and Language Processing, IEEE Trans. on 17 (2), 253 –266.

Leonard, R., Doddington, G., 1993. Tidigits. Linguistic Data Consortium, Philadelphia.

Li, K.and Guo, Y., Fu, Q., Yan, Y., Jan 2012. A two microphone-based approach for speech enhancement in adverse environments. In: Consumer Electronics (ICCE), 2012 IEEE International Conference on. pp. 41–42.

Litovsky, R., Colburn, H., Yost, W., Guzman, S., Oct. 1999. The precedence effect. J. Acoust. Soc. Am. 106, 1633–1654.

Mandel, M., Weiss, R., Ellis, D., 2010. Model-based expectation- maximization source separation and localization. Audio, Speech, and Language Processing, IEEE Trans. on 18 (2), 382–394.

Nakatani, T., Kinoshita, K., Miyoshi, M., 2007. Harmonicity-based blind dereverberation for single-channel speech signals. Audio, Speech, and Language Processing, IEEE Trans. on 15 (1), 80 –95.

Naylor, P. A., Gaubitch, N. D., 2005. Speech dereverberation. In: Proc. of Int. Workshop Acoust. Echo Noise Control, Eindhoven.

Palomaki, K. J., Brown, G. J., Wang, D., 2004. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43 (4), 361–378.

Pearce, D., Hirsch, H., 2000. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR. pp. 29–32.

Pearson, J., Lin, Q., Che, C., Yuk, D.-S., Jin, L., de Vries, B., Flanagan, J., 1996. Robust distant-talking speech recognition. In: Proc. of ICASSP, Atlanta. Vol. 1. pp. 21 –24.

Sawada, H., Araki, S., Makino, S., 2007. A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp. 139 –142.

Seltzer, M. L., Raj, B., Stern, R. M., 2004. Likelihood-mazimizing beamforming for robust hands-free speech recognition. Speech and Audio Processing, IEEE Trans. on 12, 489 – 498.

Shi, G., Aarabi, P., 2003. Robust digit recognition using phase-dependent time-frequency masking. In: Proc. of ICASSP, Hong Kong. pp. 684–687.

Vincent, E., Gribonval, R., Fevotte, C., 2006. Performance measurement in blind audio source separation. Audio, Speech, and Language Processing, IEEE Trans. on 14 (4), 1462 –1469.

Ward, D., Kennedy, R., Williamson, R., 2001. Constant directivity beamforming. In: Microphone Arrays. Springer-Verlag.

Wu, M., Wang, D., 2006. A two-stage algorithm for one-microphone reverberant speech enhancement. Audio, Speech, and Language Processing, IEEE Trans. on 14, 774–784.

Young, S. J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., 2006. The HTK Book Version 3.4. Cambridge University Press.

Ziółko, B., Manandhar, S., Wilson, R., Ziółko, M., Gałka, J., 2008. Application of HTK to the Polish language. In: Proc. of International Conference on Audio, Language and Image Processing, Shanghai.

Ziółko, M., Gałka, J., Ziółko, B., Jadczyk, T., Skurzok, D., Mąsior, M., 2011. Automatic speech recognition system dedicated for Polish. In: Proc. of Interspeech, Florence.

Online first
2025, Vol 50
	No 1	No 2
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Downloads

Authors

Abstract

Keywords:

References

Most read articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact