Archives of Acoustics,
36, 1, pp. 29–47, 2011
Phoneme Segmentation Based on Wavelet Spectra Analysis
A phoneme segmentation method based on the analysis of discrete wavelet transform
spectra is described. The localization of phoneme boundaries is particularly
useful in speech recognition. It enables one to use more accurate acoustic models
since the length of phonemes provide more information for parametrization.
Our method relies on the values of power envelopes and their first derivatives for
six frequency subbands. Specific scenarios that are typical for phoneme boundaries
are searched for. Discrete times with such events are noted and graded using
a distribution-like event function, which represent the change of the energy distribution
in the frequency domain. The exact definition of this method is described in
the paper. The final decision on localization of boundaries is taken by analysis of
the event function. Boundaries are, therefore, extracted using information from all
subbands. The method was developed on a small set of Polish hand segmented words
and tested on another large corpus containing 16 425 utterances. A recall and precision
measure specifically designed to measure the quality of speech segmentation
was adapted by using fuzzy sets. From this, results with F-score equal to 72.49%
were obtained.
spectra is described. The localization of phoneme boundaries is particularly
useful in speech recognition. It enables one to use more accurate acoustic models
since the length of phonemes provide more information for parametrization.
Our method relies on the values of power envelopes and their first derivatives for
six frequency subbands. Specific scenarios that are typical for phoneme boundaries
are searched for. Discrete times with such events are noted and graded using
a distribution-like event function, which represent the change of the energy distribution
in the frequency domain. The exact definition of this method is described in
the paper. The final decision on localization of boundaries is taken by analysis of
the event function. Boundaries are, therefore, extracted using information from all
subbands. The method was developed on a small set of Polish hand segmented words
and tested on another large corpus containing 16 425 utterances. A recall and precision
measure specifically designed to measure the quality of speech segmentation
was adapted by using fuzzy sets. From this, results with F-score equal to 72.49%
were obtained.
Keywords:
speech recognition; speech segmentation; discrete wavelet transform
Full Text:
PDF
Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN).