Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results

Downloads

Authors

  • Gražina KORVEL Vilnius University, Lithuania
  • Olga KURASOVA Vilnius University, Lithuania
  • Bożena KOSTEK Gdansk University of Technology, Poland

Abstract

The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are time- and frequency-domain descriptors. English language as an auxiliary language is used in the experiments. In the first part of the experiments, an analysis of Lithuanian and Polish language samples is carried out, features are extracted, and the most discriminating ones are determined. In the second part of the experiments, automatic classification of Lithuanian/English, Polish/English, and Lithuanian/Polish phonemes is performed.

Keywords:

acoustic analysis, consonant phonemes, acoustic parameters, machine learning methods

References

[1] Badshah A.M. et al. (2019), Deep features-based speech emotion recognition for smart affective services, Multimedia Tools and Applications, 78, 5, 5571–5589, https://doi.org/10.1007/s11042-017-5292-7.

[2] Bourlard H. (2018), Evolution of Neural Network Architectures for speech recognition, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, p. 1767.

[3] Chia Ai, Hariharan M., Yaacob S., Sin L. Chee (2012), Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Applications, 39, 2, 2157–2165, https://doi.org/10.1016/j.eswa.2011.07.065.

[4] Czyżewski A., Piotrowska M., Kostek B. (2017), Analysis of allophones based on audio signal recordings and parameterization, Journal of the Acoustical Society of America, 141, 5, 3521–3521, https://doi.org/10.1121/1.4987415.

[5] Decker D.M. (1999), Handbook of the international phonetic association: a guide to the use of the international phonetic alphabet, Cambridge University Press.

[6] Demenko G., Wypych M., Baranowska E. (2003), Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, 7, 17, 79–97.

[7] Deng L, Seltzer M.L, Yu D., Acero A., Mohamed A.-R., Hinton G.E. (2010), Binary coding of speech spectrograms using a deep auto-encoder, Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 1692–1695.

[8] Duda R.O., Hart P. E., Stork D. G. (2000), Pattern classification, 2nd ed., New York: Wiley.

[9] Eringis D., Tamulevicius G. (2015), Modified filterbank analysis features for speech recognition, Baltic Journal of Modern Computing, 3, 1, 29–42, https://www.bjmc.lu.lv/fileadmin/user_upload/lu_portal/projekti/bjmc/Contents/3_1_3_Eringis.pdf.

[10] Gales M.J.F., Knill K.M., Ragni A. (2015), Unicode-based graphemic systems for limited resource languages, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5186–5190, https://doi.org/10.1109/ICASSP.2015.7178960.

[11] Gibbon D., Moore R., Winski R. (1997), Handbook of standards and resources for spoken language systems, Berlin; New York: Mouton de Gruyter.

[12] Girdenis, A.S. (2003), Theoretical bases of Lithuanian phonology [in Lithuanian: Teoriniai lietuvių fonologijos pagrindai], Vilnius: Mokslo ir enciklopedijų leidybos institutas.

[13] Greibus M., Ringelienė Ž., Telksnys L. (2017), The phoneme set influence for Lithuanian speech commands recognition accuracy, Open Conference of Electrical, Electronic and Information Sciences (eStream), 27–27 April 2017, Vilnius, Lithuania, pp. 82–85, https://doi.org/10.1109/eStream.2017.7950321.

[14] GUT U. (2014), Introduction to English phonetics and phonology volume, Peter Lang.

[15] Gussmann E. (2007), The Phonology of Polish, New York: Oxford University Press.

[16] Howard D.M., Murphy D.T. (2007), Voice science, acoustics, and recording, Plural Publishing.

[17] Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L. (1993), TIMIT acoustic-phonetic continuous speech corpus, LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium.

[18] Igras M., Ziółko B., Jadczyk T. (2013), Audiovisual database of Polish speech recordings, Studia Informatica, 33, 2B, 163–172, https://doi.org/10.21936/si2012_v33.n2B.182.

[19] Izydorczyk J., Kłosowski P. (2001), Base acoustic properties of Polish speech, International Conference Programable Devices and Systems PDS2001 IFAC Workshop, Gliwice, November 22–23, pp. 61–66.

[20] Jassem W. (2003), Polish, Journal of the International Phonetic Association, 33, 1, 103–107, https://doi.org/10.1017/S0025100303001191.

[21] Kasparaitis, P. (2005), Diphone databases for Lithuanian text-to-speech synthesis, Informatica, 2, 16, 193–202.

[22] Kasparaitis P. (2008), Lithuanian speech recognition using the English recognizer, Informatica, 19, 4, 505–516.

[23] Kim H.-G., Moreau N., Sikora T. (2005), MPEG-7 audio and beyond: audio content indexing and retrieval, Wiley & Sons.

[24] Kłosowski P., Dustor A., Izydorczyk J., Kotas J., Slimok J. (2014), Speech recognition based on open source speech processing software, [In:] Computer Networks, CN. Vol. 431 of Communications in Computer and Information Science, ed. by A. Kwiecień, P. Gaj, and P. Stera, 21st International Science Conference on Computer Networks (CN), Poland, June 23–27 (Springer-Verlag Berlin, 2014), pp. 308–317.

[25] Kłosowski P. (2017), Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 5, https://doi.org/10.1186/s13636-017-0102-8.

[26] Korvel G., Kostek B. (2017a), Examining feature vector for phoneme recognition, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Bilbao, 2017, pp. 394–398, https://doi.org/10.1109/ISSPIT.2017.8388675.

[27] Korvel G., Kostek B. (2017b), Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 42, 3, 375–383, 2017, https://doi.org/10.1515/aoa-2017-0039.

[28] Korvel G., Kurowski A., Kostek B., Czyzewski A. (2019), Speech analytics based on machine learning, [In:] Tsihrintzis G., Sotiropoulos D., Jain L. [Eds], Machine Learning Paradigms. Intelligent Systems Reference Library, Vol. 149, pp. 129–157, Springer: Cham, https://doi.org/10.1007/978-3-319-94030-4.

[29] Korvel G., Treigys P., Tamulevičius G., Bernatavičienė J., Kostek B. (2018), Analysis of 2d feature spaces for deep learning-based speech recognition, Journal of the Audio Engineering Society, 66, 12, 1072–1081, https://doi.org/10.17743/jaes.2018.0066.

[30] Kostek B. et al. (2011), Report of the ISMIS 2011 Contest: Music Information Retrieval, [In:] Kryszkiewicz M., Rybinski H., Skowron A., Raś Z.W. [Eds], Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science, Vol. 6804, pp. 715–724, Springer: Berlin, Heidelberg, https://doi.org/10.1007/978-3-642-21916-0_75.

[31] Kostek B., Piotrowska M., Czyżewski A. (2017), Comparative study of self-organizing maps vs. subjective evaluation of quality of allophone pronunciation for nonnative English speakers, 143rd Audio Engineering Society Convention, preprint 9847, New York.

[32] Kozierski P., Sadalla T., Drgas S., Dąbrowski A. (2016), Allophones in automatic whispery speech recognition, 2016 21st International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, 2016, pp. 811–815, https://doi.org/10.1109/MMAR.2016.7575241.

[33] Labarre T. (2011), LING550: CLMS project on Polish, https://www.academia.edu/5332895/LING550_CLMS_Project_on_Polish.

[34] Laurinciukaite S., Telksnys L., Kasparaitis P., Kliukiene R, Paukstyte V. (2018), Lithuanian Speech Corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode, Informatica, 29, 3, 487–498, https://doi.org/10.15388/informatica.2018.177.

[35] Lileikytė R., Gorin A., Lamel L., Gauvain J., Fraga-Silva T. (2016), Lithuanian broadcast speech transcription using semi-supervised acoustic model training, Procedia Computer Science, 81, 107–113, https://doi.org/10.1016/j.procs.2016.04.037.

[36] Mitterer H., Reinisch E., Mcqueen J.M. (2018), Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language, 98, 77–92, https://doi.org/10.1016/j.jml.2017.09.005.

[37] Noroozi F., Kamińska D., Sapinski T., Anbarjafari G. (2017), Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and AdaBoost, Journal of the Audio Engineering Society, 65, 7/8, 562–572, https://doi.org/10.17743/jaes.2017.0022.

[38] Oliver D., Szklanny K. (2006), Creation and analysis of a Polish speech database for use in unit selection synthesis, http://syntezamowy.pjwstk.edu.pl/publikacje/lrec2006.pdf. (accessed Jan. 2019).

[39] Padmanabhan J., Premkumar M.J.J. (2015), Machine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review, 32, 1–12, https://doi.org/10.1080/02564602.2015.1010611.

[40] Przepiórkowski A., Bańko M., Górski R.L., Lewandowska-Tomaszczyk B. (2012), The National Corpus of Polish [in Polish: Narodowy korpus języka polskiego], Wydawnictwo Naukowe PWN, Warszawa.

[41] Raškinis A., Raškinis G., Kazlauskienė A. (2003), SAMPA (speech assessment methods phonetic alphabet) for encoding transcriptions of Lithuanian speech corpora, Information Technology and Control, 29, 4, 50–56, https://hdl.handle.net/20.500.12259/55530.

[42] Recasens D. (2012), A cross-language acoustic study of initial and final allophones of /l/, Speech Communication, 54, 3, 368–383, https://doi.org/10.1016/j.specom.2011.10.001.

[43] Rudzionis V., Maskeliunas R., Rudzionis A., Ratkevicius K. (2009), On the adaptation of foreign language speech recognition engines for Lithuanian speech recognition, [In:] Abramowicz W., Flejter D. [eds] Business Information Systems Workshops. BIS 2009. Lecture Notes in Business Information Processing, Vol. 37, pp. 113–118, Springer, Berlin, Heidelberg, doi: /10.1007/978-3-642-03424-4_13.

[44] SAMPA En, https://www.phon.ucl.ac.uk/home/sampa/english.htm.

[45] SAMPA Pl, https://www.phon.ucl.ac.uk/home/sampa/polish.htm.

[46] Sathe-Pathak B.V., Panat A.R. (2012), Extraction of pitch and formants and its analysis to identify 3 different emotional states of a person, International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1, http://www.ijcsi.org/papers/IJCSI-9-4-1-296-299.pdf.

[47] Spangler T., Vinodchandran N. V., Samal A., Green J. R. (2017), Fractal features for automatic detection of dysarthria, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 437–440, https://doi.org/10.1109/BHI.2017.7897299.

[48] Upadhya S.S., Cheeran A.N., Nirmal J.H. (2018), Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease, Biomedical Signal Processing and Control, 46, 293–301, https://doi.org/10.1016/j.bspc.2018.07.019.

[49] Wei Y., Zeng Y., Li C., Single-Channel Speech Enhancement Based on Sub-Band Spectral Entropy, J. Audio Eng. Soc., 66, 3, 100–113, https://doi.org/10.17743/jaes.2018.000.

[50] Ziółko B., Gałka J., Ziółko M. (2009), Polish phoneme statistics obtained on large set of written texts. Computer Science, 10, 3, 97–106, https://doi.org/10.7494/csci.2009.10.3.97.

[51] Ziółko B., Żelasko P., Skurzok D. (2014), Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR. XXII Annual Pacific Voice Conference (PVC), Krakow, 2014, pp. 1–6, https://doi.org/10.1109/PVC.2014.6845418.

Other articles by the same author(s)

1 2 3 > >>