Abstract
For many years, a digital waveguide model is being used for sound propagation in the modeling of the vocal tract with the structured and uniform mesh of scattering junctions connected by same delay lines. There are many varieties in the formation and layouts of the mesh grid called topologies. Current novel work has been dedicated to the mesh of two-dimensional digital waveguide models of sound propagation in the vocal tract with the structured and non-uniform rectilinear grid in orientation. In this work, there are two types of delay lines: one is called a smaller-delay line and other is called a larger-delay line. The larger-delay lines are the double of the smaller delay lines. The scheme of using the combination of both smaller- and larger-delay lines generates the non-uniform rectilinear two-dimensional waveguide mesh. The advantage of this approach is the ability to get a transfer function without fractional delay. This eliminates the need to get interpolation for the approximation of fractional delay and give efficient simulation for sound wave propagation in the two-dimensional waveguide modeling of the vocal tract. The simulation has been performed by considering the vowels /ɔ/, /a/, /i/ and /u/ in this work. By keeping the same sampling frequency, the standard two-dimensional waveguide model with uniform mesh is considered as our benchmark model. The results and efficiency of the proposed model have compared with our benchmark model.Keywords:
non-linear mesh, waveguide, delay linesReferences
1. Bailly L. et al. (2018), 3D multiscale imaging of human vocal folds using synchrotron X-ray microtomography in phase retrieval mode, Scientific Reports, 8(1): 14003, https://doi.org/10.1038/s41598-018-31849-w
2. Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273, Naples, Italy, http://www.mattmontag.com/auralization/media/RoomWeaver.pdf
3. Birkholz P., Kröger B.J., Neuschaefer-Rube C. (2010), Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets, Proceedings of the 11th Annual Conference of the International Speech Communication Association INTERSPEECH 2010, pp. 1017–1020, Chiba, Japan .
4. Campos G., Howard D. (2000), A parallel 3D digital waveguide mesh model with tetrahedral topology for room acoustic simulation, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 73–78, Verona, Italy.
5. Campos G.R., Howard D.M. (2005), On the computational efficiency of different waveguide mesh topologies for room acoustic simulation, IEEE Transactions on Speech and Audio Processing, 13(5): 1063–1072, https://doi.org/10.1109/TSA.2005.852015
6. Cooper C., Murphy D., Howard D., Tyrrell A. (2006), Singing synthesis with an evolved physical model, IEEE Transactions on Audio, Speech, and Language Processing, 14(4): 1454–1461, https://doi.org/10.1109/TSA.2005.860844
7. Flanagan J., Landgraf L. (1968), Self-oscillating source for vocal-tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16(1): 57–64, https://doi.org/10.1109/TAU.1968.1161949
8. Fontana F., Rocchesso D. (1995), A new formulation of the 2D-waveguide mesh for percussion instruments, Proceedings of the XI Colloquium on Musical Informatics, pp. 27–30, Bologna, Italy.
9. Fontana F., Rocchesso D. (2001), Signal-theoretic characterization of waveguide mesh geometries for models of two-dimensional wave propagation in elastic media, IEEE Transactions on Speech and Audio Processing, 9(2), 152–161, https://doi.org/10.1109/89.902281
10. Ishizaka K., Falanagan J. L. (1972), Synthesis of voiced sounds from a two-mass model of the vocal cords, The Bell System Technical Journal, 51(6): 1233–1268, https://doi.org/10.1002/j.1538-7305.1972.tb02651.x
11. Ishizaka K., Flanagan J. (1977), Acoustic properties of longitudinal displacement in vocal cord vibration, The Bell System Technical Journal, 56(6): 889–918, https://doi.org/10.1002/j.1538-7305.1977.tb00546.x
12. Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 561060, https://doi.org/10.1155/S1110865704401176
13. Kelly J.L., Lochbaum C.C. (1962), Speech synthesis, Proceedings of the Stockholm Speech Communications Seminar, RIT, Stockholm, Sweden.
14. Kumar S.P., Švec J.G. (2019), Kinematic model for simulating mucosal wave phenomena on vocal folds, Biomedical Signal Processing and Control, 49: 328–337, https://doi.org/10.1016/j.bspc.2018.12.002
15. Maddox A., Oren L., Khosla S., Gutmark E. (2014), Prediction of pressure distribution between the vocal folds using Bernoulli’s equation, The Journal of the Acoustical Society of America, 136(4): 2126–2126, https://doi.org/10.1121/1.4899655
16. Makarov I. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, https://doi.org/10.1134/S106377100902016X
17. Markel J.E., Gray A.H. (1976), Linear prediction of speech, New York: Springer-Verlag, Inc.
18. Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, https://doi.org/10.1109/TSA.2005.858550
19. Morse P. (1981), Vibration and Sound, The Journal of the Acoustical Society of America, 71(6): 1623, https://doi.org/10.1121/1.387830
20. Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, PhD Thesis, Department of Electronics, University of York.
21. Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE Cat. No.03TH8684), New Paltz, NY, USA, 2003, pp. 119–122, https://doi.org/10.1109/ASPAA.2003.1285834
22. Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, https://doi.org/10.1109/TSA.2005.858052
23. Mullen, J., Howard, D. M., & Murphy, D. T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on, Audio, Speech, and Language Processing, 15(2): 577–585, https://doi.org/10.1109/TASL.2006.876751
24. Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, https://doi.org/10.1109/TASL.2006.881681
25. Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.
26. Qureshi T., Syed K. (2011a), A one-mass physical model of the vocal folds with seesaw-like oscillations, Archives of Acoustics, 36(1): 15–27, https://doi.org/10.2478/v10168-011-0002-3
27. Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, https://doi.org/10.24425/aoa.2019.128492
28. Qureshi T.M., Syed K.S. (2011b), A new approach to parametric modeling of glottal flow, Archives of Acoustics, 36(4): 695–712, 10.2478/v10168-011-0047-3.
29. Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, https://doi.org/10.1016/j.csl.2014.12.004
30. Qureshi T.M., Syed K.S. (2018), Fulcrum-point based self-oscillatory glottal model with numerical flow simulation, International Journal of Acoustics & Vibration, 23(4): 516–528, https://doi.org/10.20855/ijav.2018.23.41235
31. Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57: 41–58, https://doi.org/10.1016/j.csl.2019.02.001
32. Rabiner L.R., Schafer R.W. (1978), Digital processing of speech signals, Prantice-Hall, Inc.
33. Radolf V., Horáček J., Bula V., Košina J., Švec J. (2018), Experimental simulation of unilateral paralysis of human vocal folds, 34th Conference on Computational Mechanics, pp.87–88.
34. Savioja, L., Rinne, T. J., & Takala, T. (1994), Simulation of room acoustics with a 3D finite difference mesh, Proceedings of International Computer Music Conference, Aarhus, Denmark, pp. 463–466.
35. Shimamura R., Tokuda I.T. (2016), Effect of level difference between left and right vocal folds on phonation: physical experiment and theoretical study, The Journal of the Acoustical Society of America, 140(4): 3393–3394, https://doi.org/10.1121/1.4970869
36. Smith J.O. (2002), Principles of digital waveguide models of musical instruments, [In:] Kahrs M., Brandenburg K. (Eds), Applications of Digital Signal Processing to Audio and Acoustics. The International Series in Engineering and Computer Science, Vol. 437, pp. 417–466, Springer, Boston, MA, https://doi.org/10.1007/0-306-47042-X_10
37. Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, https://doi.org/10.1109/TASLP.2013.2294579
38. Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–454, https://doi.org/10.1109/TASL.2012.2224342
39. Story B.H. (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech & Language, 27(4): 989–1010, https://doi.org/10.1016/j.csl.2012.10.005
40. Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, https://doi.org/10.1121/1.415960
41. Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, https://doi.org/10.1121/1.1623789
42. Välimäki V., Karjalainen M. (1994), Improving the Kelly-Lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques, Proceedings of the International Conference on Spoken Language Processing, pp. 615–618, https://www.isca-speech.org/archive/archive_papers/icslp_1994/i94_0615.pdf
43. Vampola T., Horáček J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, https://doi.org/10.3109/14015439.2013.775333
44. Van Duyne S.A., Smith J.O. (1993a), The 2-D digital waveguide mesh, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp. 177–180, https://doi.org/10.1109/ASPAA.1993.379968
45. Van Duyne S.A., Smith J.O. (1993b), Physical modeling with the 2-D digital waveguide mesh, Proceedings of the International Computer Music Conference, pp. 40–47, Tokyo, Japan.
46. Van Duyne S.A., Smith J.O. (1996), The 3D tetrahedral digital waveguide mesh with musical applications, Proceedings of the 1996 International Computer Music Conference, pp. 9–16, Hong Kong.
47. Wilkinson W., Reiss J.D. (2016), A synthesis model for mammalian vocalization sound effects, 61st International Conference of Audio Engineering Society: Audio for Games, London, UK, https://www.eecs.qmul.ac.uk/~josh/documents/2016/wilkinson%20reiss%20-%202016.pdf
2. Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273, Naples, Italy, http://www.mattmontag.com/auralization/media/RoomWeaver.pdf
3. Birkholz P., Kröger B.J., Neuschaefer-Rube C. (2010), Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets, Proceedings of the 11th Annual Conference of the International Speech Communication Association INTERSPEECH 2010, pp. 1017–1020, Chiba, Japan .
4. Campos G., Howard D. (2000), A parallel 3D digital waveguide mesh model with tetrahedral topology for room acoustic simulation, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 73–78, Verona, Italy.
5. Campos G.R., Howard D.M. (2005), On the computational efficiency of different waveguide mesh topologies for room acoustic simulation, IEEE Transactions on Speech and Audio Processing, 13(5): 1063–1072, https://doi.org/10.1109/TSA.2005.852015
6. Cooper C., Murphy D., Howard D., Tyrrell A. (2006), Singing synthesis with an evolved physical model, IEEE Transactions on Audio, Speech, and Language Processing, 14(4): 1454–1461, https://doi.org/10.1109/TSA.2005.860844
7. Flanagan J., Landgraf L. (1968), Self-oscillating source for vocal-tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16(1): 57–64, https://doi.org/10.1109/TAU.1968.1161949
8. Fontana F., Rocchesso D. (1995), A new formulation of the 2D-waveguide mesh for percussion instruments, Proceedings of the XI Colloquium on Musical Informatics, pp. 27–30, Bologna, Italy.
9. Fontana F., Rocchesso D. (2001), Signal-theoretic characterization of waveguide mesh geometries for models of two-dimensional wave propagation in elastic media, IEEE Transactions on Speech and Audio Processing, 9(2), 152–161, https://doi.org/10.1109/89.902281
10. Ishizaka K., Falanagan J. L. (1972), Synthesis of voiced sounds from a two-mass model of the vocal cords, The Bell System Technical Journal, 51(6): 1233–1268, https://doi.org/10.1002/j.1538-7305.1972.tb02651.x
11. Ishizaka K., Flanagan J. (1977), Acoustic properties of longitudinal displacement in vocal cord vibration, The Bell System Technical Journal, 56(6): 889–918, https://doi.org/10.1002/j.1538-7305.1977.tb00546.x
12. Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 561060, https://doi.org/10.1155/S1110865704401176
13. Kelly J.L., Lochbaum C.C. (1962), Speech synthesis, Proceedings of the Stockholm Speech Communications Seminar, RIT, Stockholm, Sweden.
14. Kumar S.P., Švec J.G. (2019), Kinematic model for simulating mucosal wave phenomena on vocal folds, Biomedical Signal Processing and Control, 49: 328–337, https://doi.org/10.1016/j.bspc.2018.12.002
15. Maddox A., Oren L., Khosla S., Gutmark E. (2014), Prediction of pressure distribution between the vocal folds using Bernoulli’s equation, The Journal of the Acoustical Society of America, 136(4): 2126–2126, https://doi.org/10.1121/1.4899655
16. Makarov I. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, https://doi.org/10.1134/S106377100902016X
17. Markel J.E., Gray A.H. (1976), Linear prediction of speech, New York: Springer-Verlag, Inc.
18. Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, https://doi.org/10.1109/TSA.2005.858550
19. Morse P. (1981), Vibration and Sound, The Journal of the Acoustical Society of America, 71(6): 1623, https://doi.org/10.1121/1.387830
20. Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, PhD Thesis, Department of Electronics, University of York.
21. Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE Cat. No.03TH8684), New Paltz, NY, USA, 2003, pp. 119–122, https://doi.org/10.1109/ASPAA.2003.1285834
22. Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, https://doi.org/10.1109/TSA.2005.858052
23. Mullen, J., Howard, D. M., & Murphy, D. T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on, Audio, Speech, and Language Processing, 15(2): 577–585, https://doi.org/10.1109/TASL.2006.876751
24. Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, https://doi.org/10.1109/TASL.2006.881681
25. Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.
26. Qureshi T., Syed K. (2011a), A one-mass physical model of the vocal folds with seesaw-like oscillations, Archives of Acoustics, 36(1): 15–27, https://doi.org/10.2478/v10168-011-0002-3
27. Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, https://doi.org/10.24425/aoa.2019.128492
28. Qureshi T.M., Syed K.S. (2011b), A new approach to parametric modeling of glottal flow, Archives of Acoustics, 36(4): 695–712, 10.2478/v10168-011-0047-3.
29. Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, https://doi.org/10.1016/j.csl.2014.12.004
30. Qureshi T.M., Syed K.S. (2018), Fulcrum-point based self-oscillatory glottal model with numerical flow simulation, International Journal of Acoustics & Vibration, 23(4): 516–528, https://doi.org/10.20855/ijav.2018.23.41235
31. Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57: 41–58, https://doi.org/10.1016/j.csl.2019.02.001
32. Rabiner L.R., Schafer R.W. (1978), Digital processing of speech signals, Prantice-Hall, Inc.
33. Radolf V., Horáček J., Bula V., Košina J., Švec J. (2018), Experimental simulation of unilateral paralysis of human vocal folds, 34th Conference on Computational Mechanics, pp.87–88.
34. Savioja, L., Rinne, T. J., & Takala, T. (1994), Simulation of room acoustics with a 3D finite difference mesh, Proceedings of International Computer Music Conference, Aarhus, Denmark, pp. 463–466.
35. Shimamura R., Tokuda I.T. (2016), Effect of level difference between left and right vocal folds on phonation: physical experiment and theoretical study, The Journal of the Acoustical Society of America, 140(4): 3393–3394, https://doi.org/10.1121/1.4970869
36. Smith J.O. (2002), Principles of digital waveguide models of musical instruments, [In:] Kahrs M., Brandenburg K. (Eds), Applications of Digital Signal Processing to Audio and Acoustics. The International Series in Engineering and Computer Science, Vol. 437, pp. 417–466, Springer, Boston, MA, https://doi.org/10.1007/0-306-47042-X_10
37. Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, https://doi.org/10.1109/TASLP.2013.2294579
38. Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–454, https://doi.org/10.1109/TASL.2012.2224342
39. Story B.H. (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech & Language, 27(4): 989–1010, https://doi.org/10.1016/j.csl.2012.10.005
40. Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, https://doi.org/10.1121/1.415960
41. Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, https://doi.org/10.1121/1.1623789
42. Välimäki V., Karjalainen M. (1994), Improving the Kelly-Lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques, Proceedings of the International Conference on Spoken Language Processing, pp. 615–618, https://www.isca-speech.org/archive/archive_papers/icslp_1994/i94_0615.pdf
43. Vampola T., Horáček J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, https://doi.org/10.3109/14015439.2013.775333
44. Van Duyne S.A., Smith J.O. (1993a), The 2-D digital waveguide mesh, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp. 177–180, https://doi.org/10.1109/ASPAA.1993.379968
45. Van Duyne S.A., Smith J.O. (1993b), Physical modeling with the 2-D digital waveguide mesh, Proceedings of the International Computer Music Conference, pp. 40–47, Tokyo, Japan.
46. Van Duyne S.A., Smith J.O. (1996), The 3D tetrahedral digital waveguide mesh with musical applications, Proceedings of the 1996 International Computer Music Conference, pp. 9–16, Hong Kong.
47. Wilkinson W., Reiss J.D. (2016), A synthesis model for mammalian vocalization sound effects, 61st International Conference of Audio Engineering Society: Audio for Games, London, UK, https://www.eecs.qmul.ac.uk/~josh/documents/2016/wilkinson%20reiss%20-%202016.pdf

