Archives of Acoustics, 31, 3, pp. 275-288, 2006

Vowel recognition based on acoustic and visual features

P. DALKA
Gdańsk University of Technology, Multimedia Systems Department

B. KOSTEK
Gdańsk University of Technology, International Center of Hearing and Speech

A. CZYŻEWSKI
Gdańsk University of Technology, Multimedia Systems Department

The aim of the research work presented is to show a system that may facilitate speech training for hearing impaired people. The system engineered combines both acoustic and visual vowel data acquisition and analysis modules. The acoustic feature extraction involves mel-cepstral analysis. The Active Shape Model method is used for extracting visual speech features from the shape and movement of the lips. Artificial Neural Networks (ANNs) are utilized as the classifier, feature vectors extracted combine both modalities of the human speech. The system is validated with the recordings of speakers that were not used for the lip model creating and for the ANN training. Additional experiments with the degraded acoustic information are carried out in order to test the system robustness against various distortions affecting speech utterances.
Keywords: bi-modal automatic speech utterance recognition, phoneme visual and acoustic features, artificial neural networks, Active Shape Model method
Full Text: PDF


Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)