Prosody annotation for unit selection TTS synthesis

This paper concerns prosody annotation and intonation modeling, especially for the application in a corpus based speech synthesis. In order to establish the rules of the automatic intonation modeling, a four hour fully annotated speech database has been acoustically and perceptually analyzed. The speech material included different text types, dialogs and prosodically rich phrases.
As the result of these analyses, a basic prosodic annotation including 6 pitch accent types and 5 types of prosodic phrases have been distinguished. Moreover, the analyses made it possible to define rules for a semi-automatic stylization and parametrization of intonation contours for the application in text-to-speech and speech recognition systems. The assumptions behind the stylization method and results of the quantitative and qualitative evaluation of the stylization accuracy based on the speech consisting of ca. 1000 phrases coming from a literary text read by female and male speakers are discussed. Finally, a classification of pitch accents and boundary tones based on the parameterization is presented.
Keywords: speech synthesis and recognition, segmental and suprasegmental (prosodic) annotation, intonation modeling, intonation stylization, pitch accents, boundary tones
