Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results

Gražina KORVEL; Olga KURASOVA; Bożena KOSTEK

doi:10.24425/aoa.2019.129725

Authors

Gražina KORVEL Vilnius University, Lithuania
Olga KURASOVA Vilnius University, Lithuania
Bożena KOSTEK Gdansk University of Technology, Poland

Abstract

The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are time- and frequency-domain descriptors. English language as an auxiliary language is used in the experiments. In the first part of the experiments, an analysis of Lithuanian and Polish language samples is carried out, features are extracted, and the most discriminating ones are determined. In the second part of the experiments, automatic classification of Lithuanian/English, Polish/English, and Lithuanian/Polish phonemes is performed.

Keywords:

acoustic analysis, consonant phonemes, acoustic parameters, machine learning methods

References

[1] Badshah A.M. et al. (2019), Deep features-based speech emotion recognition for smart affective services, Multimedia Tools and Applications, 78, 5, 5571–5589, https://doi.org/10.1007/s11042-017-5292-7.

[2] Bourlard H. (2018), Evolution of Neural Network Architectures for speech recognition, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, p. 1767.

[3] Chia Ai, Hariharan M., Yaacob S., Sin L. Chee (2012), Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Applications, 39, 2, 2157–2165, https://doi.org/10.1016/j.eswa.2011.07.065.

[4] Czyżewski A., Piotrowska M., Kostek B. (2017), Analysis of allophones based on audio signal recordings and parameterization, Journal of the Acoustical Society of America, 141, 5, 3521–3521, https://doi.org/10.1121/1.4987415.

[5] Decker D.M. (1999), Handbook of the international phonetic association: a guide to the use of the international phonetic alphabet, Cambridge University Press.

[6] Demenko G., Wypych M., Baranowska E. (2003), Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, 7, 17, 79–97.

[7] Deng L, Seltzer M.L, Yu D., Acero A., Mohamed A.-R., Hinton G.E. (2010), Binary coding of speech spectrograms using a deep auto-encoder, Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 1692–1695.

[8] Duda R.O., Hart P. E., Stork D. G. (2000), Pattern classification, 2nd ed., New York: Wiley.

[9] Eringis D., Tamulevicius G. (2015), Modified filterbank analysis features for speech recognition, Baltic Journal of Modern Computing, 3, 1, 29–42, https://www.bjmc.lu.lv/fileadmin/user_upload/lu_portal/projekti/bjmc/Contents/3_1_3_Eringis.pdf.

[10] Gales M.J.F., Knill K.M., Ragni A. (2015), Unicode-based graphemic systems for limited resource languages, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5186–5190, https://doi.org/10.1109/ICASSP.2015.7178960.

[11] Gibbon D., Moore R., Winski R. (1997), Handbook of standards and resources for spoken language systems, Berlin; New York: Mouton de Gruyter.

[12] Girdenis, A.S. (2003), Theoretical bases of Lithuanian phonology [in Lithuanian: Teoriniai lietuvių fonologijos pagrindai], Vilnius: Mokslo ir enciklopedijų leidybos institutas.

[13] Greibus M., Ringelienė Ž., Telksnys L. (2017), The phoneme set influence for Lithuanian speech commands recognition accuracy, Open Conference of Electrical, Electronic and Information Sciences (eStream), 27–27 April 2017, Vilnius, Lithuania, pp. 82–85, https://doi.org/10.1109/eStream.2017.7950321.

[14] GUT U. (2014), Introduction to English phonetics and phonology volume, Peter Lang.

[15] Gussmann E. (2007), The Phonology of Polish, New York: Oxford University Press.

[16] Howard D.M., Murphy D.T. (2007), Voice science, acoustics, and recording, Plural Publishing.

[17] Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L. (1993), TIMIT acoustic-phonetic continuous speech corpus, LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium.

[18] Igras M., Ziółko B., Jadczyk T. (2013), Audiovisual database of Polish speech recordings, Studia Informatica, 33, 2B, 163–172, https://doi.org/10.21936/si2012_v33.n2B.182.

[19] Izydorczyk J., Kłosowski P. (2001), Base acoustic properties of Polish speech, International Conference Programable Devices and Systems PDS2001 IFAC Workshop, Gliwice, November 22–23, pp. 61–66.

[20] Jassem W. (2003), Polish, Journal of the International Phonetic Association, 33, 1, 103–107, https://doi.org/10.1017/S0025100303001191.

[21] Kasparaitis, P. (2005), Diphone databases for Lithuanian text-to-speech synthesis, Informatica, 2, 16, 193–202.

[22] Kasparaitis P. (2008), Lithuanian speech recognition using the English recognizer, Informatica, 19, 4, 505–516.

[23] Kim H.-G., Moreau N., Sikora T. (2005), MPEG-7 audio and beyond: audio content indexing and retrieval, Wiley & Sons.

[24] Kłosowski P., Dustor A., Izydorczyk J., Kotas J., Slimok J. (2014), Speech recognition based on open source speech processing software, [In:] Computer Networks, CN. Vol. 431 of Communications in Computer and Information Science, ed. by A. Kwiecień, P. Gaj, and P. Stera, 21st International Science Conference on Computer Networks (CN), Poland, June 23–27 (Springer-Verlag Berlin, 2014), pp. 308–317.

[25] Kłosowski P. (2017), Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 5, https://doi.org/10.1186/s13636-017-0102-8.

[26] Korvel G., Kostek B. (2017a), Examining feature vector for phoneme recognition, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Bilbao, 2017, pp. 394–398, https://doi.org/10.1109/ISSPIT.2017.8388675.

[27] Korvel G., Kostek B. (2017b), Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 42, 3, 375–383, 2017, https://doi.org/10.1515/aoa-2017-0039.

[28] Korvel G., Kurowski A., Kostek B., Czyzewski A. (2019), Speech analytics based on machine learning, [In:] Tsihrintzis G., Sotiropoulos D., Jain L. [Eds], Machine Learning Paradigms. Intelligent Systems Reference Library, Vol. 149, pp. 129–157, Springer: Cham, https://doi.org/10.1007/978-3-319-94030-4.

[29] Korvel G., Treigys P., Tamulevičius G., Bernatavičienė J., Kostek B. (2018), Analysis of 2d feature spaces for deep learning-based speech recognition, Journal of the Audio Engineering Society, 66, 12, 1072–1081, https://doi.org/10.17743/jaes.2018.0066.

[30] Kostek B. et al. (2011), Report of the ISMIS 2011 Contest: Music Information Retrieval, [In:] Kryszkiewicz M., Rybinski H., Skowron A., Raś Z.W. [Eds], Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science, Vol. 6804, pp. 715–724, Springer: Berlin, Heidelberg, https://doi.org/10.1007/978-3-642-21916-0_75.

[31] Kostek B., Piotrowska M., Czyżewski A. (2017), Comparative study of self-organizing maps vs. subjective evaluation of quality of allophone pronunciation for nonnative English speakers, 143rd Audio Engineering Society Convention, preprint 9847, New York.

[32] Kozierski P., Sadalla T., Drgas S., Dąbrowski A. (2016), Allophones in automatic whispery speech recognition, 2016 21st International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, 2016, pp. 811–815, https://doi.org/10.1109/MMAR.2016.7575241.

[33] Labarre T. (2011), LING550: CLMS project on Polish, https://www.academia.edu/5332895/LING550_CLMS_Project_on_Polish.

[34] Laurinciukaite S., Telksnys L., Kasparaitis P., Kliukiene R, Paukstyte V. (2018), Lithuanian Speech Corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode, Informatica, 29, 3, 487–498, https://doi.org/10.15388/informatica.2018.177.

[35] Lileikytė R., Gorin A., Lamel L., Gauvain J., Fraga-Silva T. (2016), Lithuanian broadcast speech transcription using semi-supervised acoustic model training, Procedia Computer Science, 81, 107–113, https://doi.org/10.1016/j.procs.2016.04.037.

[36] Mitterer H., Reinisch E., Mcqueen J.M. (2018), Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language, 98, 77–92, https://doi.org/10.1016/j.jml.2017.09.005.

[37] Noroozi F., Kamińska D., Sapinski T., Anbarjafari G. (2017), Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and AdaBoost, Journal of the Audio Engineering Society, 65, 7/8, 562–572, https://doi.org/10.17743/jaes.2017.0022.

[38] Oliver D., Szklanny K. (2006), Creation and analysis of a Polish speech database for use in unit selection synthesis, http://syntezamowy.pjwstk.edu.pl/publikacje/lrec2006.pdf. (accessed Jan. 2019).

[39] Padmanabhan J., Premkumar M.J.J. (2015), Machine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review, 32, 1–12, https://doi.org/10.1080/02564602.2015.1010611.

[40] Przepiórkowski A., Bańko M., Górski R.L., Lewandowska-Tomaszczyk B. (2012), The National Corpus of Polish [in Polish: Narodowy korpus języka polskiego], Wydawnictwo Naukowe PWN, Warszawa.

[41] Raškinis A., Raškinis G., Kazlauskienė A. (2003), SAMPA (speech assessment methods phonetic alphabet) for encoding transcriptions of Lithuanian speech corpora, Information Technology and Control, 29, 4, 50–56, https://hdl.handle.net/20.500.12259/55530.

[42] Recasens D. (2012), A cross-language acoustic study of initial and final allophones of /l/, Speech Communication, 54, 3, 368–383, https://doi.org/10.1016/j.specom.2011.10.001.

[43] Rudzionis V., Maskeliunas R., Rudzionis A., Ratkevicius K. (2009), On the adaptation of foreign language speech recognition engines for Lithuanian speech recognition, [In:] Abramowicz W., Flejter D. [eds] Business Information Systems Workshops. BIS 2009. Lecture Notes in Business Information Processing, Vol. 37, pp. 113–118, Springer, Berlin, Heidelberg, doi: /10.1007/978-3-642-03424-4_13.

[44] SAMPA En, https://www.phon.ucl.ac.uk/home/sampa/english.htm.

[45] SAMPA Pl, https://www.phon.ucl.ac.uk/home/sampa/polish.htm.

[46] Sathe-Pathak B.V., Panat A.R. (2012), Extraction of pitch and formants and its analysis to identify 3 different emotional states of a person, International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, http://www.ijcsi.org/papers/IJCSI-9-4-1-296-299.pdf.

[47] Spangler T., Vinodchandran N. V., Samal A., Green J. R. (2017), Fractal features for automatic detection of dysarthria, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 437–440, https://doi.org/10.1109/BHI.2017.7897299.

[48] Upadhya S.S., Cheeran A.N., Nirmal J.H. (2018), Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease, Biomedical Signal Processing and Control, 46, 293–301, https://doi.org/10.1016/j.bspc.2018.07.019.

[49] Wei Y., Zeng Y., Li C., Single-Channel Speech Enhancement Based on Sub-Band Spectral Entropy, J. Audio Eng. Soc., 66, 3, 100–113, https://doi.org/10.17743/jaes.2018.000.

[50] Ziółko B., Gałka J., Ziółko M. (2009), Polish phoneme statistics obtained on large set of written texts. Computer Science, 10, 3, 97–106, https://doi.org/10.7494/csci.2009.10.3.97.

[51] Ziółko B., Żelasko P., Skurzok D. (2014), Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR. XXII Annual Pacific Voice Conference (PVC), Krakow, 2014, pp. 1–6, https://doi.org/10.1109/PVC.2014.6845418.

Online first
Early birds
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact