A Symmetric Approach in the Three-Dimensional Digital Waveguide Modeling of the Vocal Tract
Abstract
Simulation of wave propagation in the three-dimensional (3D) modeling of the vocal tract has shown significant promise for enhancing the accuracy of speech production. Recent 3D waveguide models of the vocal tract have been designed for better accuracy but require a lot of computational tasks. A high computational cost in these models leads to novel work in reducing the computational cost while retaining accuracy and performance. In the current work, we divide the geometry of the vocal tract into four equal symmetric parts with the introduction of two axial perpendicular planes, and the simulation is performed on only one part. A novel strategy is defined to implement symmetric conditions in the mesh. The complete standard 3D digital waveguide model is assumed as a benchmark model. The proposed model is compared with the benchmark model in terms of formant frequencies and efficiency. For the demonstration, the vowels /O/, /i/, /E/, /A/, and /u/ have been selected for the simulations. According to the results, the benchmark and current models are nearly identical in terms of frequency profiles and formant frequencies. Still the current model is three times more effective than the benchmark model.Keywords:
symmetric, digital waveguide, vocal tract, delay lines, rectilinear uniform gridReferences
1. Arnela M. et al. (2016a), Influence of lips on the production of vowels based on finite element simulations and experiments, The Journal of the Acoustical Society of America, 139(5): 2852–2859, https://doi.org/10.1121/1.4950698
2. Arnela M. et al. (2016b), Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds, The Journal of the Acoustical Society of America, 140(3): 1707–1718, https://doi.org/10.1121/1.4962488
3. Arnela M., Dabbaghchian S., Guasch O., Engwall O. (2019), MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs, IEEE/ACM Transactions on Audio, Speech, Language Processing, 27(12): 2173–2182, https://doi.org/10.1109/TASLP.2019.2942439
4. Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, [in:] Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273.
5. Blandin R. et al. (2015), Effects of higher order propagation modes in vocal tract like geometries, The Journal of the Acoustical Society of America, 137(2): 832–843, https://doi.org/10.1121/1.4906166
6. Blandin R., Félix S., Doc J.-B., Birkholz P. (2021), Combining multimodal method and 2D finite elements for the efficient simulation of vocal tract acoustics, [in:] Proceedings of the 27th International Congress on Sound and Vibration.
7. Gully A.J., Daffern H., Murphy D.T. (2017), Diphthong synthesis using the dynamic 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, Language Processing, 26(2): 243–255, https://doi.org/10.1109/TASLP.2017.2774921
8. Gully A.J., Tucker B. (2019), Modeling voiced stop consonants using the 3D dynamic digital waveguide mesh vocal tract model, [in:] Proceedings of the International Congress of Phonetic Sciences 2019, Australasian Speech Science and Technology Association Inc.
9. Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 978–989, https://doi.org/10.1155/S1110865704401176
10. Lim Y., Zhu Y., Lingala S.G., Byrd D., Narayanan S., Nayak K.S. (2019), 3D dynamic MRI of the vocal tract during natural speech, Magnetic Resonance in Medicine, 81(3): 1511–1520, https://doi.org/10.1002/mrm.27570
11. Makarov I.S. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, https://doi.org/10.1134/S106377100902016X
12. Markel J.E., Gray A.H. (1976), Linear Prediction of Speech, Springer.
13. Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, https://doi.org/10.1109/TSA.2005.858550
14. Mohapatra D.R., Fleischer M., Zappi V., Birkholz P., Fels S. (2022), Three-dimensional finitedifference time-domain acoustic analysis of simplified vocal tract shapes, [in:] Proceedings of Interspeech, pp. 764–768, https://doi.org/10.21437/Interspeech.2022-10649
15. Mohapatra D.R., Zappi V., Fels S. (2019), An extended two-dimensional vocal tract model for fast acoustic simulation of single-axis symmetric three-dimensional tubes, [in:] Proceedings of Interspeech 2019, pp. 3760–3764, https://doi.org/10.21437/Interspeech.2019-1764
16. Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, Ph.D. Thesis, The University of York.
17. Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, [in:] 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 119–122, https://doi.org/10.1109/ASPAA.2003.1285834
18. Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: Flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, https://doi.org/10.1109/TSA.2005.858052
19. Mullen J., Howard D.M., Murphy D.T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 577–585, https://doi.org/10.1109/TASL.2006.876751
20. Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, https://doi.org/10.1109/TASL.2006.881681
21. Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, [in:] Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.
22. Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, https://doi.org/10.24425/aoa.2019.128492
23. Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, https://doi.org/10.1016/j.csl.2014.12.004
24. Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57(4): 41–58, https://doi.org/10.1016/j.csl.2019.02.001
25. Qureshi T.M., Syed K.S., Zafar A. (2020), Nonuniform rectilinear grid in the waveguide modeling of the vocal tract, Archives of Acoustics, 45(4): 585–600, https://doi.org/10.24425/aoa.2020.135247
26. Rabiner L.R., Schafer R.W. (1978), Digital Processing of Speech Signals, Prentice-Hall.
27. Schickhofer L., Mihaescu M. (2020), Analysis of the aerodynamic sound of speech through static vocal tract models of various glottal shapes, Journal of Biomechanics, 99: 109484, https://doi.org/10.1016/j.jbiomech.2019.109484
28. Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, https://doi.org/10.1109/TASLP.2013.2294579
29. Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–455, https://doi.org/10.1109/TASL.2012.2224342
30. Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, https://doi.org/10.1121/1.415960
31. Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, https://doi.org/10.1121/1.1623789
32. Treysscde F. (2021), A model reduction method for fast finite element analysis of continuously symmetric waveguides, Journal of Sound and Vibration, 508: 116204, https://doi.org/10.1016/j.jsv.2021.116204
33. Vampola T., Horácek J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, https://doi.org/10.3109/14015439.2013.775333
34. Van Duyne S.A., Smith J.O. (1993), Physical modeling with the 2-D digital waveguide mesh, [in:] Proceedings of the International Computer Music Conference.
35. Van Duyne S.A., Smith J.O. (1995), The tetrahedral digital waveguide mesh, [in:] Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 234–237, https://doi.org/10.1109/ASPAA.1995.482998
2. Arnela M. et al. (2016b), Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds, The Journal of the Acoustical Society of America, 140(3): 1707–1718, https://doi.org/10.1121/1.4962488
3. Arnela M., Dabbaghchian S., Guasch O., Engwall O. (2019), MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs, IEEE/ACM Transactions on Audio, Speech, Language Processing, 27(12): 2173–2182, https://doi.org/10.1109/TASLP.2019.2942439
4. Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, [in:] Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273.
5. Blandin R. et al. (2015), Effects of higher order propagation modes in vocal tract like geometries, The Journal of the Acoustical Society of America, 137(2): 832–843, https://doi.org/10.1121/1.4906166
6. Blandin R., Félix S., Doc J.-B., Birkholz P. (2021), Combining multimodal method and 2D finite elements for the efficient simulation of vocal tract acoustics, [in:] Proceedings of the 27th International Congress on Sound and Vibration.
7. Gully A.J., Daffern H., Murphy D.T. (2017), Diphthong synthesis using the dynamic 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, Language Processing, 26(2): 243–255, https://doi.org/10.1109/TASLP.2017.2774921
8. Gully A.J., Tucker B. (2019), Modeling voiced stop consonants using the 3D dynamic digital waveguide mesh vocal tract model, [in:] Proceedings of the International Congress of Phonetic Sciences 2019, Australasian Speech Science and Technology Association Inc.
9. Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 978–989, https://doi.org/10.1155/S1110865704401176
10. Lim Y., Zhu Y., Lingala S.G., Byrd D., Narayanan S., Nayak K.S. (2019), 3D dynamic MRI of the vocal tract during natural speech, Magnetic Resonance in Medicine, 81(3): 1511–1520, https://doi.org/10.1002/mrm.27570
11. Makarov I.S. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, https://doi.org/10.1134/S106377100902016X
12. Markel J.E., Gray A.H. (1976), Linear Prediction of Speech, Springer.
13. Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, https://doi.org/10.1109/TSA.2005.858550
14. Mohapatra D.R., Fleischer M., Zappi V., Birkholz P., Fels S. (2022), Three-dimensional finitedifference time-domain acoustic analysis of simplified vocal tract shapes, [in:] Proceedings of Interspeech, pp. 764–768, https://doi.org/10.21437/Interspeech.2022-10649
15. Mohapatra D.R., Zappi V., Fels S. (2019), An extended two-dimensional vocal tract model for fast acoustic simulation of single-axis symmetric three-dimensional tubes, [in:] Proceedings of Interspeech 2019, pp. 3760–3764, https://doi.org/10.21437/Interspeech.2019-1764
16. Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, Ph.D. Thesis, The University of York.
17. Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, [in:] 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 119–122, https://doi.org/10.1109/ASPAA.2003.1285834
18. Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: Flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, https://doi.org/10.1109/TSA.2005.858052
19. Mullen J., Howard D.M., Murphy D.T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 577–585, https://doi.org/10.1109/TASL.2006.876751
20. Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, https://doi.org/10.1109/TASL.2006.881681
21. Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, [in:] Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.
22. Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, https://doi.org/10.24425/aoa.2019.128492
23. Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, https://doi.org/10.1016/j.csl.2014.12.004
24. Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57(4): 41–58, https://doi.org/10.1016/j.csl.2019.02.001
25. Qureshi T.M., Syed K.S., Zafar A. (2020), Nonuniform rectilinear grid in the waveguide modeling of the vocal tract, Archives of Acoustics, 45(4): 585–600, https://doi.org/10.24425/aoa.2020.135247
26. Rabiner L.R., Schafer R.W. (1978), Digital Processing of Speech Signals, Prentice-Hall.
27. Schickhofer L., Mihaescu M. (2020), Analysis of the aerodynamic sound of speech through static vocal tract models of various glottal shapes, Journal of Biomechanics, 99: 109484, https://doi.org/10.1016/j.jbiomech.2019.109484
28. Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, https://doi.org/10.1109/TASLP.2013.2294579
29. Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–455, https://doi.org/10.1109/TASL.2012.2224342
30. Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, https://doi.org/10.1121/1.415960
31. Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, https://doi.org/10.1121/1.1623789
32. Treysscde F. (2021), A model reduction method for fast finite element analysis of continuously symmetric waveguides, Journal of Sound and Vibration, 508: 116204, https://doi.org/10.1016/j.jsv.2021.116204
33. Vampola T., Horácek J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, https://doi.org/10.3109/14015439.2013.775333
34. Van Duyne S.A., Smith J.O. (1993), Physical modeling with the 2-D digital waveguide mesh, [in:] Proceedings of the International Computer Music Conference.
35. Van Duyne S.A., Smith J.O. (1995), The tetrahedral digital waveguide mesh, [in:] Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 234–237, https://doi.org/10.1109/ASPAA.1995.482998

