Non-uniform Rectilinear Grid in the Waveguide Modeling of the Vocal Tract

Downloads

Authors

  • Tahir Mushtaq QURESHI COMSATS University Islamabad, Pakistan
  • Khalid Saifullah SYED Bahauddin Zakariya University, Pakistan
  • Asim ZAFAR COMSATS University Islamabad, Pakistan

Abstract

For many years, a digital waveguide model is being used for sound propagation in the modeling of the vocal tract with the structured and uniform mesh of scattering junctions connected by same delay lines. There are many varieties in the formation and layouts of the mesh grid called topologies. Current novel work has been dedicated to the mesh of two-dimensional digital waveguide models of sound propagation in the vocal tract with the structured and non-uniform rectilinear grid in orientation. In this work, there are two types of delay lines: one is called a smaller-delay line and other is called a larger-delay line. The larger-delay lines are the double of the smaller delay lines. The scheme of using the combination of both smaller- and larger-delay lines generates the non-uniform rectilinear two-dimensional waveguide mesh. The advantage of this approach is the ability to get a transfer function without fractional delay. This eliminates the need to get interpolation for the approximation of fractional delay and give efficient simulation for sound wave propagation in the two-dimensional waveguide modeling of the vocal tract. The simulation has been performed by considering the vowels /ɔ/, /a/, /i/ and /u/ in this work. By keeping the same sampling frequency, the standard two-dimensional waveguide model with uniform mesh is considered as our benchmark model. The results and efficiency of the proposed model have compared with our benchmark model.

Keywords:

non-linear mesh, waveguide, delay lines

References

1. Bailly L. et al. (2018), 3D multiscale imaging of human vocal folds using synchrotron X-ray microtomography in phase retrieval mode, Scientific Reports, 8(1): 14003, https://doi.org/10.1038/s41598-018-31849-w

2. Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273, Naples, Italy, http://www.mattmontag.com/auralization/media/RoomWeaver.pdf

3. Birkholz P., Kröger B.J., Neuschaefer-Rube C. (2010), Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets, Proceedings of the 11th Annual Conference of the International Speech Communication Association INTERSPEECH 2010, pp. 1017–1020, Chiba, Japan .

4. Campos G., Howard D. (2000), A parallel 3D digital waveguide mesh model with tetrahedral topology for room acoustic simulation, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 73–78, Verona, Italy.

5. Campos G.R., Howard D.M. (2005), On the computational efficiency of different waveguide mesh topologies for room acoustic simulation, IEEE Transactions on Speech and Audio Processing, 13(5): 1063–1072, https://doi.org/10.1109/TSA.2005.852015

6. Cooper C., Murphy D., Howard D., Tyrrell A. (2006), Singing synthesis with an evolved physical model, IEEE Transactions on Audio, Speech, and Language Processing, 14(4): 1454–1461, https://doi.org/10.1109/TSA.2005.860844

7. Flanagan J., Landgraf L. (1968), Self-oscillating source for vocal-tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16(1): 57–64, https://doi.org/10.1109/TAU.1968.1161949

8. Fontana F., Rocchesso D. (1995), A new formulation of the 2D-waveguide mesh for percussion instruments, Proceedings of the XI Colloquium on Musical Informatics, pp. 27–30, Bologna, Italy.

9. Fontana F., Rocchesso D. (2001), Signal-theoretic characterization of waveguide mesh geometries for models of two-dimensional wave propagation in elastic media, IEEE Transactions on Speech and Audio Processing, 9(2), 152–161, https://doi.org/10.1109/89.902281

10. Ishizaka K., Falanagan J. L. (1972), Synthesis of voiced sounds from a two-mass model of the vocal cords, The Bell System Technical Journal, 51(6): 1233–1268, https://doi.org/10.1002/j.1538-7305.1972.tb02651.x

11. Ishizaka K., Flanagan J. (1977), Acoustic properties of longitudinal displacement in vocal cord vibration, The Bell System Technical Journal, 56(6): 889–918, https://doi.org/10.1002/j.1538-7305.1977.tb00546.x

12. Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 561060, https://doi.org/10.1155/S1110865704401176

13. Kelly J.L., Lochbaum C.C. (1962), Speech synthesis, Proceedings of the Stockholm Speech Communications Seminar, RIT, Stockholm, Sweden.

14. Kumar S.P., Švec J.G. (2019), Kinematic model for simulating mucosal wave phenomena on vocal folds, Biomedical Signal Processing and Control, 49: 328–337, https://doi.org/10.1016/j.bspc.2018.12.002

15. Maddox A., Oren L., Khosla S., Gutmark E. (2014), Prediction of pressure distribution between the vocal folds using Bernoulli’s equation, The Journal of the Acoustical Society of America, 136(4): 2126–2126, https://doi.org/10.1121/1.4899655

16. Makarov I. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, https://doi.org/10.1134/S106377100902016X

17. Markel J.E., Gray A.H. (1976), Linear prediction of speech, New York: Springer-Verlag, Inc.

18. Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, https://doi.org/10.1109/TSA.2005.858550

19. Morse P. (1981), Vibration and Sound, The Journal of the Acoustical Society of America, 71(6): 1623, https://doi.org/10.1121/1.387830

20. Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, PhD Thesis, Department of Electronics, University of York.

21. Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE Cat. No.03TH8684), New Paltz, NY, USA, 2003, pp. 119–122, https://doi.org/10.1109/ASPAA.2003.1285834

22. Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, https://doi.org/10.1109/TSA.2005.858052

23. Mullen, J., Howard, D. M., & Murphy, D. T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on, Audio, Speech, and Language Processing, 15(2): 577–585, https://doi.org/10.1109/TASL.2006.876751

24. Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, https://doi.org/10.1109/TASL.2006.881681

25. Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.

26. Qureshi T., Syed K. (2011a), A one-mass physical model of the vocal folds with seesaw-like oscillations, Archives of Acoustics, 36(1): 15–27, https://doi.org/10.2478/v10168-011-0002-3

27. Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, https://doi.org/10.24425/aoa.2019.128492

28. Qureshi T.M., Syed K.S. (2011b), A new approach to parametric modeling of glottal flow, Archives of Acoustics, 36(4): 695–712, 10.2478/v10168-011-0047-3.

29. Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, https://doi.org/10.1016/j.csl.2014.12.004

30. Qureshi T.M., Syed K.S. (2018), Fulcrum-point based self-oscillatory glottal model with numerical flow simulation, International Journal of Acoustics & Vibration, 23(4): 516–528, https://doi.org/10.20855/ijav.2018.23.41235

31. Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57: 41–58, https://doi.org/10.1016/j.csl.2019.02.001

32. Rabiner L.R., Schafer R.W. (1978), Digital processing of speech signals, Prantice-Hall, Inc.

33. Radolf V., Horáček J., Bula V., Košina J., Švec J. (2018), Experimental simulation of unilateral paralysis of human vocal folds, 34th Conference on Computational Mechanics, pp.87–88.

34. Savioja, L., Rinne, T. J., & Takala, T. (1994), Simulation of room acoustics with a 3D finite difference mesh, Proceedings of International Computer Music Conference, Aarhus, Denmark, pp. 463–466.

35. Shimamura R., Tokuda I.T. (2016), Effect of level difference between left and right vocal folds on phonation: physical experiment and theoretical study, The Journal of the Acoustical Society of America, 140(4): 3393–3394, https://doi.org/10.1121/1.4970869

36. Smith J.O. (2002), Principles of digital waveguide models of musical instruments, [In:] Kahrs M., Brandenburg K. (Eds), Applications of Digital Signal Processing to Audio and Acoustics. The International Series in Engineering and Computer Science, Vol. 437, pp. 417–466, Springer, Boston, MA, https://doi.org/10.1007/0-306-47042-X_10

37. Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, https://doi.org/10.1109/TASLP.2013.2294579

38. Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–454, https://doi.org/10.1109/TASL.2012.2224342

39. Story B.H. (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech & Language, 27(4): 989–1010, https://doi.org/10.1016/j.csl.2012.10.005

40. Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, https://doi.org/10.1121/1.415960

41. Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, https://doi.org/10.1121/1.1623789

42. Välimäki V., Karjalainen M. (1994), Improving the Kelly-Lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques, Proceedings of the International Conference on Spoken Language Processing, pp. 615–618, https://www.isca-speech.org/archive/archive_papers/icslp_1994/i94_0615.pdf

43. Vampola T., Horáček J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, https://doi.org/10.3109/14015439.2013.775333

44. Van Duyne S.A., Smith J.O. (1993a), The 2-D digital waveguide mesh, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp. 177–180, https://doi.org/10.1109/ASPAA.1993.379968

45. Van Duyne S.A., Smith J.O. (1993b), Physical modeling with the 2-D digital waveguide mesh, Proceedings of the International Computer Music Conference, pp. 40–47, Tokyo, Japan.

46. Van Duyne S.A., Smith J.O. (1996), The 3D tetrahedral digital waveguide mesh with musical applications, Proceedings of the 1996 International Computer Music Conference, pp. 9–16, Hong Kong.

47. Wilkinson W., Reiss J.D. (2016), A synthesis model for mammalian vocalization sound effects, 61st International Conference of Audio Engineering Society: Audio for Games, London, UK, https://www.eecs.qmul.ac.uk/~josh/documents/2016/wilkinson%20reiss%20-%202016.pdf