Voiceless Stop Consonant Modelling and Synthesis Framework Based on Miso Dynamic System

Gražina KORVEL; Bożena KOSTEK

doi:10.1515/aoa-2017-0039

Authors

Gražina KORVEL ilnius University, Lithuania
Bożena KOSTEK Audio Acoustics Lab., Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Poland

Abstract

A voiceless stop consonant phoneme modelling and synthesis framework based on a phoneme modelling in low-frequency range and high-frequency range separately is proposed. The phoneme signal is decomposed into the sums of simpler basic components and described as the output of a linear multiple-input and single-output (MISO) system. The impulse response of each channel is a third order quasipolynomial. Using this framework, the limit between the frequency ranges is determined. A new limit point searching three-step algorithm is given in this paper. Within this framework, the input of the low - frequency component is equal to one, and the impulse response generates the whole component. The high-frequency component appears when the system is excited by semi-periodic impulses. The filter impulse response of this component model is single period and decays after three periods. Application of the proposed modellingframework for the voiceless stop consonant phoneme has shown that the quality of the model is sufficiently good.

Keywords:

speech synthesis, consonant phonemes, phoneme modelling framework, MISO system

References

1. AGH Corpora, Audiovisual Polish Speech Corpus, http://www.dsp.agh.edu.pl/en:resources:korpusav# wdgpivxrpln (accessed Jan., 2017).

2. Bergier M. (2014), Instruction and production training practice on awareness raising, awareness in action: the role of consciousness in language acquisition, [in:] Second language learning and teaching, Łyda A., Szczesniak K. [Eds.], Springer International Publishing, https://doi.org/10.1007/978-3-319-00461-7 7.

3. Birkholz P. (2013), Modeling consonant-vowel coarticulation for articulatory speech synthesis, PLoS ONE 8, 4, e60603, https://doi.org/10.1371/journal.pone.0060603

4. Brocki Ł., Marasek K. (2015), Deep belief neural networks and bidirectional long-short term memory hybrid for speech recognition, Archives of Acoustics, 40, 2, 191–195, https://doi.org/10.1515/aoa-2015-0021

5. Chai T., Draxler R.R. (2014), Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature, Geoscientific Model Developement, 7, 1247–1250, https://doi.org/10.5194/gmd-7-1247-2014

6. Czyzewski A., Kostek B., Bratoszewski P., Kotus J., Szykulski M. (2017), An audio-visual corpus for multimodal automatic speech recognition, J. of Intelligent Information Systems, 1, 1–26, https://doi.org/10.1007/s10844-016-0438-z

7. Demenko G., Mobius B., Klessa K. (2010), Implementation of Polish speech synthesis for the boss system, Bulletin of the Polish Academy of Sciences Technical Sciences, 58, 3, https://doi.org/10.2478/V10175-010-0035-1 http://bulletin.pan.pl/(58-3)371.pdf

8. Domagała P., Richter L. (1994), Automatic discrimination of Polish stop consonants based on bursts analysis, Archives of Acoustics, 19, 2, 147–159, http://acoustics.ippt.pan.pl/index.php/aa/article/- view/1084.

9. Driaunys K., Rudžionis V., Žvinys P. (2005), Analysis of vocal phonemes and fricative consonant discrimination based on phonetic acoustics features, Information Technology and Control, 34, 3, 257–262.

10. Dziubinski M., Kostek B. (2005), Octave error immune and instantaneous pitch detection algorithm, Journal of New Music Reseach, 34, 3, 273–292.

11. Gardzielewska H., Preis A. (2007), The intelligibility of Polish speech synthesized with a new sinewave synthesis method, Archives of Acoustics, 32, 3, 579– 589.

12. Gussmann E. (2007), The phonology of Polish, New York: Oxford University Press.

13. Igras M., Ziółko B., Jadczyk T. (2013), Audiovisual database of Polish speech recordings, Studia Informatica, 33, 2b, 163–172.

14. Jadczyk T., Ziółko M. (2015), Audio-visual speech processing system for Polish with dynamic Bayesian Network Models, Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015), Barcelona, Spain, July 13–14, Paper No. 343.

15. Jassem W. (2003), Polish, Journal of the International Phonetic Association, 33, 103–107.

16. Johannessen J.B., Hagen K., Priestley J.J., Nygaard L. (2007), An advanced speech corpus for Norwegian, Proceedings of the 16th Nordic Conference of Computational Linguistics Nodalida-2007, 29–36, Tartu, Estonia, ISBN 978-9985-4-0513-0.

17. Korzinek D., Marasek K., Brocki Ł. (2011), Automatic transcription of Polish radio and television broadcast audio, Intelligent Tools for Building a Scientific Information Platform, Vol. 467, pp. 489–497, Springer.

18. Krynicki G. (2006), Contrasting selected aspects of Polish and English phonetics, http://ifa.amu.edu.pl/ _krynicki/my pres/my pres 6c.htm (accessed Jan. 2017).

19. Labarre T. (2011), LING550: CLMS project on Polish, http://www.academia.edu/5332895/ling550 clms project on polish.

20. Ladefoged P., Disner S.F. (2012), Vowels and consonants, 3rd Ed., Ladefoged P. [Ed.], Wiley-Blackwell, Chichester.

21. Oliver D., Szklanny K. (2006), Creation and analysis of a Polish speech database for use in unit selection synthesis, http://syntezamowy.pjwstk.edu.pl/ publikacje/lrec2006.pdf (accessed Jan. 2017).

22. Oostdijk N. (2000), The spoken Dutch corpus. Overview and first evaluation, Proceedings of LREC 2000, pp. 887–894, Athens, Greece.

23. Pinnis M., Auziňa I. (2010), Latvian text-to-speech synthesizer, Proceedings of the 2010 Conference on Human Language Technologies – The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010, pp. 69–72, Riga, Latvia: IOS Pres, https://doi.org/10.3233/978-1-60750-641-6-6

24. Pinnis M., Auziňa I., Goba K. (2014), Designing the Latvian speech recognition corpus, Proceedings of 9th International Conference on Language Resources and Evaluation, LREC’14, pp. 1547–1553.

25. Pyž G., Šimonytė V., Slivinskas V. (2011), Modelling of Lithuanian speech diphthongs, Informatica, 22, 3, 411– 434.

26. Pyž G., Šimonytė V., Slivinskas V. (2014), Developing models of Lithuanian speech vowels and semivowels, Informatica, 25, 1, 55–72.

27. Raitio T., Lu H., Kane J., Suni A., Vainio M., King S., Alku P. (2014), Voice source modelling using deep neural networks for statistical parametric speech synthesis, [in:] European Signal Processing Conference, 6952838, European Signal Processing Conference, EUSIPCO, pp. 2290–2294, 22nd European Signal Processing Conference, EUSIPCO 2014, Lisbon, United Kingdom, 1–5 September.

28. Răskinis A., Dereškeviciutė S. (2007), An analysis of spectral attributes, characterizing the interaction of lithuanian voiceless velar stop consonants with their pre- and postvocalic context, Information Technology and Control, 36, 1, 68–75.

29. Ringys, T., Slivinskas, V. (2010), Lithuanian language vowel formant modelling using multiple input and single output linear dynamic system with multiple poles, Proceedings of the 5th International Conference on Electrical and Control Technologies (ECT-2010), pp. 117–120.

30. SAMPA Homepage (2005) [in Polish], http://www.phon.ucl.ac.uk/home/sampa/polish.htm (last revised 2005; accessed Jan. 2017).

31. SAMPA Homepage (2005), http://www.phon.ucl.ac uk/home/sampa/index.html (last revised 2005; accessed Jan. 2017).

32. Sasirekha D., Chandra E. (2012), Text to speech: a simple tutorial, International Journal of Soft Computing and Engineering (IJSCE), 2, 1, 275–278.

33. Stănescu M., Cucu H., Buzo A., Burileanu C. (2012), ASR for low-resourced languages: building a phonetically balanced Romanian speech corpus, Proceedings of 20th European Signal Processing Conference, pp. 2060–2064.

34. Stevens K.N. (1993), Modelling affricate consonants, Speech Communication, 13, 1–2, 33–43.

35. Tabet Y., Boughazi M. (2011), Speech synthesis techniques. A survey, 7th International Workshop on Systems, Signal Processing and Their Applications (WOSSPA), pp. 67–70.

36. Tamulevičius G., Kaukėnas J. (2016), Adequacy analysis of autoregressive model for Lithuanian semivowels, Advances in Information, Electronic and Electrical Engineering (AIEEE), 2016 IEEE 4th Workshop on, https://doi.org/10.1109/AIEEE.2016.7821825

37. Tokuda K., Nankaku Y., Toda T., Zen H., Yamagishi J., Oura K. (2013), Speech synthesis based on hidden Markov Model, Proceedings of the IEEE, 101, 5, 1234–1252.

38. Upadhyaya P., Farooq O., Abidi M.R., Varshney P. (2015), Comparative study of visual feature for bimodal Hindi speech recognition, Archives of Acoustics, 40, 4, 609–619, https://doi.org/10.1515/aoa-2015-0061

39. VoxForge (2017), http://www.voxforge.org/home/downloads (accessed Jan. 2017).

40. Zelasko P., Ziółko B., Jadczyk T., Skurzok D. (2016), AGH corpus of Polish speech, Language Resources and Evaluation, 50, 3, 585–601, https://doi.org/10.1007/S10579-015-9302-Y

41. Zen H., Tokuda K., Black A.W. (2009), Statistical parametric speech synthesis, Speech Communication, 51, 11, 1039–1064.

42. Ziółko B., Gałka J., Suresh M., Wilson R., Ziółko M. (2009), Triphone statistics for Polish language, Human Language Technology: Challenges of the Information Society, LTC 2007, Lecture Notes in Computer Science, Vol. 5603, pp. 63–73, Springer, Berlin, Heidelberg.

43. Ziółko B., Ziółko M. (2011), Time durations of phonemes in Polish language for speech and speaker recognition, Human Language Technology. Challenges for Computer Science and Linguistics. Lecture Notes in Computer Science, Vol. 6562, 105–114, Springer Verlag.

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Voiceless Stop Consonant Modelling and Synthesis Framework Based on Miso Dynamic System

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact