Archives of Acoustics, 36, 1, pp. 29–47, 2011

Phoneme Segmentation Based on Wavelet Spectra Analysis

Bartosz ZIÓŁKO
AGH University of Science and Technology Department of Electronics

Suresh MANANDHAR
University of York Department of Computer Science

Richard C. WILSON
University of York Department of Computer Science

Mariusz ZIÓŁKO
AGH University of Science and Technology Department of Electronics

A phoneme segmentation method based on the analysis of discrete wavelet transform
spectra is described. The localization of phoneme boundaries is particularly
useful in speech recognition. It enables one to use more accurate acoustic models
since the length of phonemes provide more information for parametrization.
Our method relies on the values of power envelopes and their first derivatives for
six frequency subbands. Specific scenarios that are typical for phoneme boundaries
are searched for. Discrete times with such events are noted and graded using
a distribution-like event function, which represent the change of the energy distribution
in the frequency domain. The exact definition of this method is described in
the paper. The final decision on localization of boundaries is taken by analysis of
the event function. Boundaries are, therefore, extracted using information from all
subbands. The method was developed on a small set of Polish hand segmented words
and tested on another large corpus containing 16 425 utterances. A recall and precision
measure specifically designed to measure the quality of speech segmentation
was adapted by using fuzzy sets. From this, results with F-score equal to 72.49%
were obtained.
Keywords: speech recognition; speech segmentation; discrete wavelet transform
Full Text: PDF
Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN).