Archives of Acoustics, 45, 4, pp. 585–600, 2020
10.24425/aoa.2020.135247

Non-uniform Rectilinear Grid in the Waveguide Modeling of the Vocal Tract

Tahir Mushtaq QURESHI
COMSATS University Islamabad
Pakistan

Khalid Saifullah SYED
Bahauddin Zakariya University
Pakistan

Asim ZAFAR
COMSATS University Islamabad
Pakistan

For many years, a digital waveguide model is being used for sound propagation in the modeling of the vocal tract with the structured and uniform mesh of scattering junctions connected by same delay lines. There are many varieties in the formation and layouts of the mesh grid called topologies. Current novel work has been dedicated to the mesh of two-dimensional digital waveguide models of sound propagation in the vocal tract with the structured and non-uniform rectilinear grid in orientation. In this work, there are two types of delay lines: one is called a smaller-delay line and other is called a larger-delay line. The larger-delay lines are the double of the smaller delay lines. The scheme of using the combination of both smaller- and larger-delay lines generates the non-uniform rectilinear two-dimensional waveguide mesh. The advantage of this approach is the ability to get a transfer function without fractional delay. This eliminates the need to get interpolation for the approximation of fractional delay and give efficient simulation for sound wave propagation in the two-dimensional waveguide modeling of the vocal tract. The simulation has been performed by considering the vowels /ɔ/, /a/, /i/ and /u/ in this work. By keeping the same sampling frequency, the standard two-dimensional waveguide model with uniform mesh is considered as our benchmark model. The results and efficiency of the proposed model have compared with our benchmark model.
Keywords: non-linear mesh; waveguide; delay lines
Full Text: PDF

References

Bailly L. et al. (2018), 3D multiscale imaging of human vocal folds using synchrotron X-ray microtomography in phase retrieval mode, Scientific Reports, 8(1): 14003, doi: 10.1038/s41598-018-31849-w.

Beeson M.J., Murphy D.T. (2004), RoomWeaver: A digital waveguide mesh based room acoustics research tool, Proceedings of the Seventh International Conference on Digital Audio Effects, pp. 268–273, Naples, Italy, http://www.mattmontag.com/auralization/media/RoomWeaver.pdf.

Birkholz P., Kröger B.J., Neuschaefer-Rube C. (2010), Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets, Proceedings of the 11th Annual Conference of the International Speech Communication Association INTERSPEECH 2010, pp. 1017–1020, Chiba, Japan .

Campos G., Howard D. (2000), A parallel 3D digital waveguide mesh model with tetrahedral topology for room acoustic simulation, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 73–78, Verona, Italy.

Campos G.R., Howard D.M. (2005), On the computational efficiency of different waveguide mesh topologies for room acoustic simulation, IEEE Transactions on Speech and Audio Processing, 13(5): 1063–1072, doi: 10.1109/TSA.2005.852015.

Cooper C., Murphy D., Howard D., Tyrrell A. (2006), Singing synthesis with an evolved physical model, IEEE Transactions on Audio, Speech, and Language Processing, 14(4): 1454–1461, doi: 10.1109/TSA.2005.860844.

Flanagan J., Landgraf L. (1968), Self-oscillating source for vocal-tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16(1): 57–64, doi: 10.1109/TAU.1968.1161949.

Fontana F., Rocchesso D. (1995), A new formulation of the 2D-waveguide mesh for percussion instruments, Proceedings of the XI Colloquium on Musical Informatics, pp. 27–30, Bologna, Italy.

Fontana F., Rocchesso D. (2001), Signal-theoretic characterization of waveguide mesh geometries for models of two-dimensional wave propagation in elastic media, IEEE Transactions on Speech and Audio Processing, 9(2), 152–161, doi: 10.1109/89.902281.

Ishizaka K., Falanagan J. L. (1972), Synthesis of voiced sounds from a two-mass model of the vocal cords, The Bell System Technical Journal, 51(6): 1233–1268, doi: 10.1002/j.1538-7305.1972.tb02651.x.

Ishizaka K., Flanagan J. (1977), Acoustic properties of longitudinal displacement in vocal cord vibration, The Bell System Technical Journal, 56(6): 889–918, doi: 10.1002/j.1538-7305.1977.tb00546.x.

Karjalainen M., Erkut C. (2004), Digital waveguides versus finite difference structures: Equivalence and mixed modeling, EURASIP Journal on Applied Signal Processing, 2004(7): 561060, doi: 10.1155/S1110865704401176.

Kelly J.L., Lochbaum C.C. (1962), Speech synthesis, Proceedings of the Stockholm Speech Communications Seminar, RIT, Stockholm, Sweden.

Kumar S.P., Švec J.G. (2019), Kinematic model for simulating mucosal wave phenomena on vocal folds, Biomedical Signal Processing and Control, 49: 328–337, doi: 10.1016/j.bspc.2018.12.002.

Maddox A., Oren L., Khosla S., Gutmark E. (2014), Prediction of pressure distribution between the vocal folds using Bernoulli’s equation, The Journal of the Acoustical Society of America, 136(4): 2126–2126, doi: 10.1121/1.4899655.

Makarov I. (2009), Approximating the vocal tract by conical horns, Acoustical Physics, 55(2): 261–269, doi: 10.1134/S106377100902016X.

Markel J.E., Gray A.H. (1976), Linear prediction of speech, New York: Springer-Verlag, Inc.

Mathur S., Story B.H., Rodríguez J.J. (2006), Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1754–1762, doi: 10.1109/TSA.2005.858550.

Morse P. (1981), Vibration and Sound, The Journal of the Acoustical Society of America, 71(6): 1623, doi: 10.1121/1.387830.

Mullen J. (2006), Physical modelling of the vocal tract with the 2D digital waveguide mesh, PhD Thesis, Department of Electronics, University of York.

Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE Cat. No.03TH8684), New Paltz, NY, USA, 2003, pp. 119–122, doi: 10.1109/ASPAA.2003.1285834..

Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14(3): 964–971, doi: 10.1109/TSA.2005.858052.

Mullen, J., Howard, D. M., & Murphy, D. T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on, Audio, Speech, and Language Processing, 15(2): 577–585, doi: 10.1109/TASL.2006.876751.

Murphy D.T., Beeson M. (2007), The KW-boundary hybrid digital waveguide mesh for room acoustics applications, IEEE Transactions on Audio, Speech, and Language Processing, 15(2): 552–564, doi: 10.1109/TASL.2006.881681.

Murphy D.T., Howard D.M. (2000), 2-D digital waveguide mesh topologies in room acoustics modelling, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFx), pp. 211–216.

Qureshi T., Syed K. (2011a), A one-mass physical model of the vocal folds with seesaw-like oscillations, Archives of Acoustics, 36(1): 15–27, doi: 10.2478/v10168-011-0002-3.

Qureshi T.M., Ishaq M. (2019), Real-time vocal tract model for elongation of segment lengths in a waveguide model, Archives of Acoustics, 44(2): 287–300, doi: 10.24425/aoa.2019.128492.

Qureshi T.M., Syed K.S. (2011b), A new approach to parametric modeling of glottal flow, Archives of Acoustics, 36(4): 695–712, 10.2478/v10168-011-0047-3.

Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language, 33(1): 47–66, doi: 10.1016/j.csl.2014.12.004.

Qureshi T.M., Syed K.S. (2018), Fulcrum-point based self-oscillatory glottal model with numerical flow simulation, International Journal of Acoustics & Vibration, 23(4): 516–528, doi: 10.20855/ijav.2018.23.41235.

Qureshi T.M., Syed K.S. (2019), Improved vocal tract model for the elongation of segment lengths in a real time, Computer Speech & Language, 57: 41–58, doi: 10.1016/j.csl.2019.02.001.

Rabiner L.R., Schafer R.W. (1978), Digital processing of speech signals, Prantice-Hall, Inc.

Radolf V., Horáček J., Bula V., Košina J., Švec J. (2018), Experimental simulation of unilateral paralysis of human vocal folds, 34th Conference on Computational Mechanics, pp.87–88.

Savioja, L., Rinne, T. J., & Takala, T. (1994), Simulation of room acoustics with a 3D finite difference mesh, Proceedings of International Computer Music Conference, Aarhus, Denmark, pp. 463–466.

Shimamura R., Tokuda I.T. (2016), Effect of level difference between left and right vocal folds on phonation: physical experiment and theoretical study, The Journal of the Acoustical Society of America, 140(4): 3393–3394, doi: 10.1121/1.4970869.

Smith J.O. (2002), Principles of digital waveguide models of musical instruments, [In:] Kahrs M., Brandenburg K. (Eds), Applications of Digital Signal Processing to Audio and Acoustics. The International Series in Engineering and Computer Science, Vol. 437, pp. 417–466, Springer, Boston, MA, doi: 10.1007/0-306-47042-X_10.

Speed M., Murphy D., Howard D. (2013a), Modeling the vocal tract transfer function using a 3D digital waveguide mesh, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2): 453–464, doi: 10.1109/TASLP.2013.2294579..

Speed M., Murphy D., Howard D. (2013b), Three-dimensional digital waveguide mesh simulation of cylindrical vocal tract analogs, IEEE Transaction on Audio, Speech, and Language Processing, 21(2): 449–454, doi: 10.1109/TASL.2012.2224342.

Story B.H. (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech & Language, 27(4): 989–1010, doi: 10.1016/j.csl.2012.10.005.

Story B.H., Titze I.R., Hoffman E.A. (1996), Vocal tract area functions from magnetic resonance imaging, The Journal of the Acoustical Society of America, 100(1): 537–554, doi: 10.1121/1.415960.

Strube H.W. (2003), Are conical segments useful for vocal-tract simulation? (L), The Journal of the Acoustical Society of America, 114(6): 3028–3031, doi: 10.1121/1.1623789.

Välimäki V., Karjalainen M. (1994), Improving the Kelly-Lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques, Proceedings of the International Conference on Spoken Language Processing, pp. 615–618, https://www.isca-speech.org/archive/archive_papers/icslp_1994/i94_0615.pdf.

Vampola T., Horáček J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40(1): 14–23, doi: 10.3109/14015439.2013.775333.

Van Duyne S.A., Smith J.O. (1993a), The 2-D digital waveguide mesh, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp. 177–180, doi: 10.1109/ASPAA.1993.379968.

Van Duyne S.A., Smith J.O. (1993b), Physical modeling with the 2-D digital waveguide mesh, Proceedings of the International Computer Music Conference, pp. 40–47, Tokyo, Japan.

Van Duyne S.A., Smith J.O. (1996), The 3D tetrahedral digital waveguide mesh with musical applications, Proceedings of the 1996 International Computer Music Conference, pp. 9–16, Hong Kong.

Wilkinson W., Reiss J.D. (2016), A synthesis model for mammalian vocalization sound effects, 61st International Conference of Audio Engineering Society: Audio for Games, London, UK, https://www.eecs.qmul.ac.uk/~josh/documents/2016/wilkinson%20reiss%20-%202016.pdf.




DOI: 10.24425/aoa.2020.135247

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)