10.24425/aoa.2020.134058
A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems
References
Abdel-Hamid O., Mohamed A., Jiang H., Penn G. (2012), Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, [in:] 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280, doi: 10.1109/ICASSP.2012.6288864.
Alexanderson S., Beskow J. (2014), Animated Lombard speech: motion capture, facial animation and visual intelligibility of speech produced in adverse conditions, Computer Speech & Language, 28(2): 607–618, doi: 10.1016/j.csl.2013.02.005.
Boril H. (2008), Robust speech recognition: Analysis and equalization of Lombard effect in Czech corpora, Ph.D. thesis, Czech Technical University in Prague, Czech Rep., https://personal.utdallas.edu/_hynek/.
Boril H., Hansen J.H. (2010), Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments, IEEE Transactions on Audio, Speech, and Language Processing, 18(6): 1379–1393, doi: 10.1109/TASL.2009.2034770.
Bou-Ghazale S.E., Hansen J.H. (1994), Duration and spectral based stress token generation for hmm speech recognition under stress, [in:] 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94, Vol. 1, pp. I/413–I/416, doi: 10.1109/ICASSP.1994.389268.
Davis C., Kim J., Grauwinkel K., Mixdorff H. (2006), Lombard speech: auditory (A), visual (V) and AV effects, [in:] Proceedings of the Third International Conference on Speech Prosody, Citeseer, pp. 248–252.
Drugman T., Dutoit T. (2010), Glottal-based analysis of the Lombard effect, [in:] Interspeech, pp. 2610– 2613.
Garnier M., Henrich N. (2014), Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?, Computer Speech & Language, 28(2): 580–597, doi: 10.1016/j.csl.2013.07.005Get.
Garnier M., Henrich N., Dubois D. (2010), Influence of sound immersion and communicative interaction on the Lombard effect, Journal of Speech, Language, and Hearing Research, 53(3): 588–608, doi: 10.1044/1092-4388(2009/08-0138).
Graciarena M., Franco H., Sonmez K., Bratt H. (2003), Combining standard and throat microphones for robust speech recognition, IEEE Signal Processing Letters, 10(3): 72–74, doi: 10.1109/LSP.2003.808549.
Hansen J.H. (1994), Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, IEEE Transactions on Speech and Audio Processing, 2(4): 598–614, doi: 10.1109/89.326618.
Hansen J.H., Bria O.N. (1990), Lombard effect compensation for robust automatic speech recognition in noise, [in:] First International Conference on Spoken Language Processing, pp. 1125–1128, https://www.isca-speech.org/archive/icslp_1990/i90_1125.html.
Hansen J.H., Varadarajan V. (2009), Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition, IEEE Transactions on Audio, Speech, and Language Processing, 17(2): 366–378, 2009, doi: 10.1109/TASL.2008.2009019.
Heracleous P., Ishi C.T., Sato M., Ishiguro H., Hagita N. (2013), Analysis of the visual Lombard effect and automatic recognition experiments, Computer Speech & Language, 27(1): 288–300, doi: 10.1016/j.csl.2012.06.003.
Hinton G. et al. (2012), Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29(6): 82–97, doi: 10.1109/MSP.2012.2205597.
Jou S.-C., Schultz T., Waibel A. (2004), Adaptation for soft whisper recognition using a throat microphone, [in:] Eighth International Conference on Spoken Language Processing, pp. 1493–1496, https://www.isca-speech.org/archive/interspeech_2004/i04_1493.html.
Junqua J.-C., Anglade Y. (1990), Acoustic and perceptual studies of Lombard speech: application to isolated-words automatic speech recognition, [in:] International Conference on Acoustics, Speech, and Signal Processing, ICASSP-90, Vol. 2, pp. 841–844, doi: 10.1109/ICASSP.1990.115969.
Khan A.N., Gangashetty S.V., Yegnanarayana B. (2003), Syllabic properties of three Indian languages: implications for speech recognition and language identification, [in:] International Conference on Natural Language Processing, pp. 125–134.
Lane H., Tranel B. (1971), The Lombard sign and the role of hearing in speech, Journal of Speech, Language, and Hearing Research, 14(4): 677–709, doi: 10.1044/jshr.1404.677.
Lombard E. (1911), The sign of the elevation of the voice [in French: Le signe de l’élévation de la voix], Annales des Maladies de l’Oreille, du Larynx, du Nez et du Pharynx, 37(2): 101–119.
Marxer R., Barker J., Alghamdi N., Maddock S. (2018), The impact of the Lombard effect on audio and visual speech recognition systems, Speech Communication, 100: 58–68, doi: 10.1016/j.specom.2018.04.006.
Palaz D., Collobert R., Magimai-Doss M. (2013), Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, CoRR, Vol. abs/1304.1018, online, http://arxiv.org/abs/1304.1018.
Pisoni D., Bernacki R., Nusbaum H., Yuchtman M. (1985), Some acoustic-phonetic correlates of speech produced in noise, [in:] IEEE International Conference on Acoustics, Speech, and Signal Process ing, ICASSP’85, Vol. 10, pp. 1581–1584, doi: 10.1109/ICASSP.1985.1168217.
Rajasekaran P., Doddington G., Picone J. (1986), Recognition of speech under stress and in noise, [in:] IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86, Vol. 11, pp. 733–736, doi: 10.1109/ICASSP.1986.1169207.
Roucos S., Viswanathan V., Henry C., Schwartz R. (1986), Word recognition using multisensor speech input in high ambient noise, [in:] IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86, pp. 737–740, doi: 10.1109/ICASSP.1986.1169208.
Sainath T.N., Mohamed A., Kingsbury B., Ramabhadran B. (2013), Deep convolutional neural networks for LVCSR, [in:] 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8614–8618, doi: 10.1109/ICASSP.2013.6639347.
Shahina A. (2007), Processing throat microphone speech, Ph.D. thesis, Indian Institute of Technology, Madras.
Shahina A., Yegnanarayana B. (2007), Mapping speech spectra from throat microphone to closespeaking microphone: a neural network approach, EURASIP Journal on Advances in Signal Processing, 2007: 087219, doi: 10.1155/2007/87219.
Sadasivam U.M., Shahina A., Khan A.N., Divya J. (2015), Spectral transformation of Lombard speech to normal speech for speaker recognition systems, [in:] International Conference Soft Computing Systems.
DOI: 10.24425/aoa.2020.134058