Conditional Random Fields Applied to Arabic Orthographic-Phonetic Transcription

El-Hadi CHERIFI; Mhania GUERTI

doi:10.24425/aoa.2021.136574

Authors

El-Hadi CHERIFI National Polytechnic School, Algeria
Mhania GUERTI National Polytechnic School, Algeria

Abstract

Orthographic-To-Phonetic (O2P) Transcription is the process of learning the relationship between the written word and its phonetic transcription. It is a necessary part of Text-To-Speech (TTS) systems and it plays an important role in handling Out-Of-Vocabulary (OOV) words in Automatic Speech Recognition systems. The O2P is a complex task, because for many languages, the correspondence between the orthography and its phonetic transcription is not completely consistent. Over time, the techniques used to tackle this problem have evolved, from earlier rules based systems to the current more sophisticated machine learning approaches. In this paper, we propose an approach for Arabic O2P Conversion based on a probabilistic method: Conditional Random Fields (CRF). We discuss the results and experiments of this method apply on a pronunciation dictionary of the Most Commonly used Arabic Words, a database that we called (MCAW-Dic). MCAW-Dic contains over 35 000 words in Modern Standard Arabic (MSA) and their pronunciation, a database that we have developed by ourselves assisted by phoneticians and linguists from the University of Tlemcen. The results achieved are very satisfactory and point the way towards future innovations. Indeed, in all our tests, the score was between 11 and 15% error rate on the transcription of phonemes (Phoneme Error Rate). We could improve this result by including a large context, but in this case, we encountered memory limitations and calculation difficulties.

Keywords:

Orthographic-To-Phonetic Transcription, Conditional Random Fields, text-to-speech, Arabic speech synthesis, Modern Standard Arabic

References

1. Abu-Salim I.M. (1988), Consonant assimilation in Arabic: An auto-segmental perspective, Lingua, 74(1): 45–66, https://doi.org/10.1016/0024-3841%2888%2990048-4

2. AbuZeina D., Al-Khatib W., Elshafei M., Al-Muhtaseb H. (2012), Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach, International Journal of Speech Technology, 15(2): 65–75, https://doi.org/10.1007/s10772-011-9122-4

3. Ahmed M. E. (1991), Toward an Arabic text-to-speech system, The Arabian Journal for Science and Engineering, 16(4): 565–583.

4. Al-Daradkah B., Al-Diri B. (2015), Automatic grapheme-to-phoneme conversion of Arabic text, [In:] 2015 Science and Information Conference (SAI), pp. 468–473, https://doi.org/10.1109/SAI.2015.7237184

5. Alduais A.M.S. (2013), Quranic phonology and generative phonology: formulating generative phonological rules to non-syllabic Nuun’s Rules, International Journal of Linguistics, 5(5): 33–61, https://doi.org/10.5296/ijl.v5i1.2436

6. Al-Ghamdi M., Al-Muhtasib H., Elshafei M. (2004), Phonetic rules in Arabic script, Journal of King Saud University – Computer and Information Sciences, 16: 85–115, https://doi.org/10.1016/S1319-1578%2804%2980010-7

7. Al-Ghamdi M., Elshafei M., Al-Muhtaseb H. (2009), Arabic broadcast news transcription system, International Journal of Speech Technology, 10(4): 183–195, https://doi.org/10.1007/s10772-009-9026-8

8. Apostolopoulou M.S., Sotiropoulos D.G., Livieris I.E, Pintelas P. (2009), A memoryless BFGS neural network training algorithm, [In:] Proceeding of the 7th IEEE International Conference on Industrial Informatics (INDIN), pp. 216–221, https://doi.org/10.1109/INDIN.2009.5195806

9. Bagshaw P.C. (1998), Phonemic transcription by analogy in text-to-speech synthesis: novel word pronunciation and lexicon compression, Computer Speech and Language, 12(2): 119-142, https://doi.org/10.1006/csla.1998.0042

10. Biadsy F., Habash N., Hirschberg J. (2009), Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules, [In:] Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, Boulder, Colorado, pp. 397–405.

11. Casacuberta F., Vidal E. (2007), Systems and tools for machine translation. GIZA ++: Training of statistical translation models, Universitat Politécnica de Valéncia, Spain, https://www.prhlt.upv.es/~evidal/students/master/sht/transp/giza2p.pdf

12. Cherifi E.H. (2020), MCAW-Dict, Phonetic Dictionary of the Most Commonly used Arabic Words with SIMPA Transcription, https://drive.google.com/file/d/1h\_dPwAXKone7nGIKgelMt8mIzGYFF7d2/view?usp=sharing

13. Cherifi E.H., Guerti M. (2017), Phonetisaurus-based letter-to-sound transcription for standard Arabic, [In:] The 5th International Conference on Electrical Engineering (ICEE-B 2017), pp. 45–48, October 29th to 31st, 2017, Boumerdes, Algeria, https://doi.org/10.1109/ICEE-B.2017.8192073

14. El-Imam Y.A.(1989), An unrestricted vocabulary Arabic speech synthesis system, IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12): 1829–1845, https://doi.org/10.1109/29.45531

15. El-Imam Y.A. (2004), Phonetization of Arabic: rules and algorithms, Computer Speech and Language, 18: 339–373, https://doi.org/10.1016/S0885-2308%2803%2900035-4

16. Elshafei M., Al-Ghamdi M., Al-Muhtaseb H., Al-Najjar A. (2008), Generation of Arabic phonetic dictionaries for speech recognition, [In:] Proceedings of the International Conference on Innovations in Information Technology IIT2008, pp. 59-63. https://doi.org/10.1109/INNOVATIONS.2008.4781716

17. Ferrat K., Guerti M. (2017), An experimental study of the gemination in Arabic language, Archives of Acoustics, 42(4): 571–578, https://doi.org/10.1515/aoa-2017-0061

18. Habash N., Rambow O., Roth R. (2009), Mada+tokan: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization, [In:] Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109.

19. Illina I., Fohr D., Jouvet D. (2012), Pronunciation generation for proper names using Conditional Random Fields [in French: Génération des prononciations de noms propres à l'aide des Champs Aléatoires Conditionnels], Actes de la Conférence Conjointe JEP-TALN-RECITAL 2012, Vol. 1, pp. 641–648.

20. Jousse F., Gilleron R., Tellier I., Tommasi M. (2006), Random conditional fields for tree annotation [In French: Champs conditionnels aléatoires pour l'annotation d'arbres], 8e Conférence Francophone Sur L'apprentissage Automatique, https://hal.inria.fr/inria-00117014/document

21. Kudo T. (2005), CRF++: Yet another CRF toolkit. User’s manual and implementation, available at ttp://crfpp.googlecode.com/svn/trunk/doc/index.html.

22. Lafferty J., McCallum A., Pereira F. (2001), Conditional Random Fields: probabilistic models for segmenting and labeling sequence data, [In:] Proceedings of the International Conference on Machine Learning ICML'01, pp. 282–289.

23. Luk R.W.P., Damper R.I. (1996), Stochastic phonographic transduction for English, Computer Speech and Language, 10(2): 133–153, https://doi.org/10.1006/csla.1996.0009

24. McCallum A., Li W. (2003), Early results for named entity recognition with conditional random ﬁelds, feature induction and web-enhanced lexicons, [In:] Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003, pp. 188–191, https://www.aclweb.org/anthology/W03-0430

25. Polyakova T., Bonafonte A. (2005), Main issues in grapheme-to-phonetic transcription for TTS, Procesamiento Del Lenguaje Natural, 2005(35): 29–34, https://www.redalyc.org/articulo.oa?id=5157/515751735004

26. Priva U.C. (2012), Sign and signal deriving linguistic generalizations from information utility, Phd Thesis, Stanford University.

27. Ramsay A., Alsharhan I., Ahmed H. (2014), Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model, Computer Speech and Language, 28(4): 959–978, https://doi.org/10.1016/j.csl.2014.02.005

28. Roach P. (1987), English Phonetics and Phonology, 3rd ed., Longman: Cambridge UP.

29. Sejnowsky T., Rosenberg C.R. (1987), Parallel networks that learn to pronounce English text, Complex System, 1(1): 145–168.

30. Selim H., Anbar T. (1987), A phonetic transcription system of Arabic text, [In:] ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1446–1449, https://doi.org/10.1109/ICASSP.1987.1169472

31. Sha F., Pereira F. (2003), Shallow parsing with conditional random fields, [In:] Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 213–220, https://doi.org/10.3115/1073445.1073473

32. Sindran F., Mualla F., Haderlein T., Daqrouq K., Nöth E. (2016), Rule-based standard Arabic Phonetization at phoneme, allophone, and syllable level, International Journal of Computational Linguistics (IJCL), 7(2): 23–37.

33. Sînziana M., Iria J. (2011), L1 vs. L2 regularization in text classification when learning from labeled features, [In:] Proceedings of the 2011 10th International Conference on Machine Learning and Applications, Vol. 1, pp. 168–171, https://doi.org/10.1109/ICMLA.2011.85

34. Toutanova K., Klein D., Manning C.D., Singer Y.Y. (2003), Feature-rich part-of-speech tagging with a cyclic dependency network, [In:] Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252–259, https://www.aclweb.org/anthology/N03-1033

35. Tsuruoka Y., Tsujii J., Ananiadou S. (2009), Fast full parsing by linear-chain conditional random ﬁelds, [In:] Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), pp. 790–798, https://www.aclweb.org/anthology/E09-1090

36. Van Coile B. (1991), Inductive learning of pronunciation rules with the Depes system, [In:] Proceedings of ICASSP 91: The IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 745–748, https://doi.org/10.1109/ICASSP.1991.150448

37. Wallach H. (2002), Efficient training of conditional random fields,; Master's Tthesis, University of Edinburgh.

38. Wells J.C. (2002), SAMPA for Arabic, OrienTel Project, http://www.phon.ucl.ac.uk/home/sampa/arabic.htm

39. Yvon F. (1996), Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks, [In:] Proceedings of the Conference on New Methods in Natural Language Processing, NeMLaP’96, pp. 218–228, Ankara, Turkey.

Online first
Early birds
2026, Vol 51
	No 1
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Conditional Random Fields Applied to Arabic Orthographic-Phonetic Transcription

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Revised

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact