Archives of Acoustics, 41, 2, pp. 233–243, 2016

Detection of Sentence Boundaries in Polish Based on Acoustic Cues

Magdalena IGRAS
AGH University of Science and Technology in Krakow

Bartosz ZIÓŁKO
AGH University of Science and Technology in Krakow

In this article the authors investigated and presented the experiments on the sentence boundaries annotation from Polish speech using acoustic cues as a source of information. The main result of the investigation is an algorithm for detection of the syntactic boundaries appearing in the places of punctuation marks. In the first stage, the algorithm detects pauses and divides a speech signal into segments. In the second stage, it verifies the configuration of acoustic features and puts hypotheses of the positions of punctuation marks. Classification is performed with parameters describing phone duration and energy, speaking rate, fundamental frequency contours and frequency bands. The best results were achieved for Naive Bayes classifier. The efficiency of the algorithm is 52% precision and 98% recall. Another significant outcome of the research is statistical models of acoustic cues correlated with punctuation in spoken Polish.
Keywords: punctuation, sentence boundary, spoken language, prosody, Polish
Full Text: PDF


Barczewska K., Igras M. (2013), Detection of disfluencies in speech signal, Challenges of Modern Technology, 32, 1–2, 127–154.

Baron D., Shriberg E., Stolcke A. (2002), Automatic punctuation and disfluency detection in multiparty meetings using prosodic and lexical cues, Proceedings of the International Conference on Spoken Language Processing, 949–952.

Batista F., Caseiro D., Mamede N., Trancoso I. (2008), Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news, Speech Commun., 50, 10, 847–862.

Beeferman A.B.D., Lafferty J. (1998), Cyberpunc: a lightweight punctuation annotation system for speech, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 689–692.

Chistikov P., Khomitsevich O. (2013), Improving prosodic break detection in a Russian TTS system, Speech and Computer, ser. Lecture Notes in Computer Science 8113, Springer International Publishing, 181–188.

Christensen H., Gotoh Y., Renals S. (2001), Punctuation annotation using statistical prosody models, Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding, 35–40.

Cole J., Mo Y., Baek S. (2010), The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech, Language and Cognitive Processes, 25, 7–9, 1141–1177.

Demenko G., Wagner A. (2007), Prosody annotation for unit selection TTS synthesis, Archives of Acoustics, 32, 1, 25–40.

Demenko G. (1999), Analysis of Polish Suprasegmentals for Speech Technology [in Polish: Analiza cech suprasegmentalnych jezyka polskiego na potrzeby technologii mowy], ser. Seria Jezykoznawstwo Stosowane, Poznań, Wyd. Naukowe UAM.

Dłuska M. (1976), Polish prosody [in Polish: Prozodia języka polskiego], Warszawa, Państwowe Wydawnictwo Naukowe.

Fach M.L. (1999), A comparison between syntactic and prosodic phrasing, Proceedings of the European Conference on Speech Communication and Technology, 527–530.

Frąckowiak-Richter L. (1973), The duration of Polish vowels. Speech analysis and Synthesis III, PWN.

Gotoh Y., Renals S. (2000), Sentence Boundary Detection in Broadcast Speech Transcripts, Proc. of ISCA Workshop: Automatic Speech Recognition: Challenges for the new Millennium, 228–235.

Grabe E., Karpiński M. (2003), Universal and language-specific aspects of intonation in English and Polish, Proceedings of the 15th International Congress of Phonetic Sciences, 3–9 August, Barcelona, 1061–1064.

Grocholewski S. (1997), CORPORA – speech database for Polish diphones, Proceedings of Eurospeech.

Huang J., Zweig G. (2002), Maximum Entropy Model for Punctuation Annotation from Speech, Proc. International Conference on Spoken Language Processing (ICSLP).

Igras M., Ziółko B., Jadczyk T. (2012), Audiovisual database of Polish speech recordings, Studia Informatica, 33, 2B, 163–172.

Igras M., Ziółko B. (2013a), Different types of pauses as a source of information for biometry, MAVEBA, Florence, 197–200.

Igras M., Ziółko B., Ziółko M. (2013b), Length of phonemes in a context of their positions in Polish sentences, Proceedings of SIGMAP, the International Conference on Signal Processing and Multimedia Applications, 59–64.

Igras M., Ziółko B. (2013c), Wavelet method for breath detection in audio signals, Multimedia and Expo (ICME), IEEE International Conference on.

Igras M., Ziółko B., Ziółko M. (2014a), Is phoneme length and phoneme energy useful in automatic speaker recognition?, XXII Annual Pacific Voice Conference, Krakow.

Igras M., Ziółko B. (2014b), Role of acoustic features in marking stress and delimiting sentence boundaries in spoken Polish, Acta Physica Polonica A, 126, 6, 1246–1257.

Jassem W. (1962), Accent of Polish [in Polish: Akcent języka polskiego], Wrocław, Ossolineum.

Jassem W. (1973), Rudiments of acoustic phonetics [in Polish: Podstawy fonetyki akustycznej], Warszawa, Państwowe Wydawnictwo Naukowe.

Karpowicz T. (2012), Culture of Polish: pronunciation, ortography, punctuation [in Polish: Kultura języka polskiego: Wymowa, ortografia, interpunkcja], Wydawnictwo Naukowe PWN, Warszawa.

Kim J.H., Woodland P.C. (2003), A combined punctuation generation and speech recognition system and its performance enhancement using prosody, Speech Communication, 41, 4, 563–577.

Klessa K., Karpiński M., Kleśta J. (2002), A preliminary study of the intonational phrase, nuclear melody and pauses in Polish semi-spontaneous narration, Speech Prosody Proceedings.

Klessa K., Śledziński D. (2008), A study of chosen temporal relations within syllable structure in Polish, Speech and Language Technology.

Klessa K. (2011), Polish segmental duration: selected observations based on corpus data, Speech and Language Technology, Special Issue dedicated to Wiktor Jassem, 94–104.

Kolar J., Svec J, Psutka J. (2004), Automatic punctuation annotation in Czech broadcast news speech, Saint-Petersburg, SPIIRAS, 319–325.

L¨o¨of J., Gollan C., Ney H. (2009), Cross-language bootstrapping for unsupervised acoustic model training: Rapid development of a Polish speech recognition system, Proceedings of Interspeech, Brighton, 88–91.

Łuczyński E. (1999), Contemporary Polish punctuation: a norm and an usus [in Polish: Współczesna interpunkcja polska. Norma a uzus], Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk.

Makhoul J., Kubala F., Schwartz R., Weischedel R. (1999), Performance measures for information extraction, Proceedings of DARPA Broadcast News Workshop, pp. 249–252.

Malisz Z., Wagner P. (2012), Acoustic-phonetic realisation of Polish syllable prominence: a corpus study, Rhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem, 105–114.

Navas E., Hernez I., Sainz I. (2008), Evaluation of automatic break insertion for an agglutinative and inflected language, Speech Communication, 50, 1112, 888–899.

Ostaszewska D., Tambor J. (2000), Phonetics and phonology of modern Polish language [in Polish: Fonetyka i fonologia współczesnego języka polskiego], PWN.

Przyłubski F. (1953), A few words on the history of comma [in Polish: Kilka słów o historii przecinka], Poradnik Językowy, 8.

PWN Dictionary 2013, Polish spelling and punctuation rules [Online], Accessed: 19/05/2015.

Shriberg E., Stolcke A., Hakkani-Tür D., Tür G. (2000), Prosody-based automatic segmentation of speech into sentences and topics, Speech Communication, 32, 1–2, 127–154, 10.1016/S0167-6393(00)00028-5.

Shriberg E., Stolcke A. (2004), Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing, Proceedings of International Conference on Speech Prosody, Nara, Japan.

Steffen Batóg M. (1996), Structure of melody contour in Polish [in Polish: Struktura przebiegu melodii polskiego języka ogólnego], Poznan, Wydawnictwo UAM.

Stevenson M., Gaizauskas R. (2000), Experiments on Sentence Boundary Detection, Proc. Conference on Applied Natural Language Processing (ANLP), 84–89.

Wang D., Lu L., Zhang H.J. (2003), Speech segmentation without speech recognition, Acoustics, Speech, and Signal Processing Proceedings (ICASSP ’03). 2003 IEEE International Conference on, 468–471.

Wang D., Narayanan S.S. (2004), A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues, ICASSP.

Vicsi K., Szaszak G. (2006), Prosodic cues for automatic phrase boundary detection in ASR, Proceedings of the 9th International Conference on Text, Speech and Dialogue, ser. TSD’06, Berlin, Heidelberg: Springer-Verlag, 547–554.

Wagner A. (2008), A comprehensive model of intonation for application in speech synthesis, Dissertation,Wydawnictwo Naukowe Uniwersytetu im. Adama Mickiewicza, Poznań.

Wagner A. (2010), Acoustic cues for automatic determination of phrasing, Proceedings of Speech Prosody.

Wagner A., Bachan J., Klessa, K., Demenko G. (2015), Przegląd wybranych aspektów analizy prozodii mowy spontanicznej na potrzeby technologii mowy Prace Filologiczne, (LXVI), 271–298.

Wypych M. (2011), A system recognizing intonation structures in speech signal [in Polish: Układ rozpoznający struktury intonacyjne w sygnale mowy], Dissertation, PAN, Warszawa.

Zahorian S.A., Hu H. (2008), A spectral/temporal method for robust fundamental frequency tracking, The Journal of the Acoustical Society of America, 123, 4559–4571.

Ziółko M., Gałka J., Ziółko B., Jadczyk T., Skurzok D., Mąsior M. (2011), Automatic speech recognition system dedicated for Polish, Proceedings of Interspeech, Florence.

Ziółko B., Ziółko M. (2011), Time durations of phonemes in Polish language for speech and speaker recognition, Human Language Technology. Challenges for Computer Science and Linguistics. Lecture Notes in Computer Science, 6562/2011, 105–114.

DOI: 10.1515/aoa-2016-0023

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)