Archives of Acoustics, 44, 2, pp. 287–300, 2019

Real-Time Vocal Tract Model for Elongation of Segment Lengths in a Waveguide Model

Tahir Mushtaq QURESHI
COMSATS University Islamabad

Muhammad ISHAQ
COMSATS University Islamabad

A vocal tract model based on a digital waveguide is presented in which the vocal tract has been decomposed into uniform cylindrical segments of variable lengths. We present a model for the real-time numerical solution of the digital waveguide equations in a uniform tube with the temporally varying cross section. In the current work, the uniform cylindrical segments of the vocal tract may have their different lengths, the time taken by the sound wave to propagate through a cylindrical segment in an axial direction may not be an integer multiple of each other. In such a case, the delay in an axial direction is necessarily a fractional delay. For the approximation of fractional-delay filters, Lagrange interpolation is used in the current model. Variable length of the individual segment of the vocal tract enables the model to produce realistic results. These results are validated with accurate benchmark model. The proposed model has been devised to elongate or shorten any arbitrary cylindrical segment by a suitable scaling factor. This model has a single algorithm and there is no need to make section of segments for elongation or shortening of the intermediate segments. The proposed model is about 23% more efficient than the previous model.
Keywords: digital waveguide; vocal tract; elongation of cylindrical segment
Full Text: PDF
Copyright © The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).


Avanzini F., Alku P., Karjalainen M. (2001), One-delayed-mass model for efficient synthesis of glottal flow, [in:] 7th European Conference on Speech Communication and Technology, "INTERSPEECH", pp. 51–54.

Birkholz P., Kröger B.J., Neuschaefer-Rube C. (2010), Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets, [in:] Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 1017–1020, Makuhari, Chiba, Japan.

Cooper C., Murphy D., Howard D., Tyrrell A. (2006), Singing synthesis with an evolved physical model, IEEE Transactions on Audio, Speech, and Language Processing, 14, 1454–1461.

Fant G. (1971), Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, Walter de Gruyter.

Fettweis A. (1971), Digital filters related to classical structures, AEU: Archive für Elektronik und Übertragungstechnik, 25, 78–89.

Flanagan J., Landgraf L. (1968), Self-oscillating source for vocal-tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16, 57–64.

Gold B., Morgan N., Ellis D. (2011), Speech and audio signal processing: processing and perception of speech and music, John Wiley & Sons.

Gunnar F. (1960), The acoustic theory of speech production, s’Gravenhage, Mouton.

Helmholtz H. von (1863), Die Lehre von den Tonempfindungen ais physiologische Grundlage fur die Theorie der Musik, Braunschweig.

Helmholtz H.L.F. von (1866), Handbuch der physiologischen Optik, Leopold Voss, Leipzig.

Hoefer W. (1985), The transmission-line matrix method theory and applications. IEEE Transactions on Microwave Theory and Techniques, 33, 882–893.

Ishizaka K., Falanagan J.L. (1972), Synthesis of voiced sounds from a two-mass model of the vocal cords, Bell System Technical Journal 51, 1233–1268.

Ishizaka, K., Flanagan J. (1977), Acoustic properties of longitudinal displacement in vocal cord vibration, Bell System Technical Journal, 56, 889–918.

Johns P.B., Beurle R. (1971), Numerical solution of 2-dimensional scattering problems using a transmission-line matrix, [in:] Proceedings of the Institution of Electrical Engineers, Vol. 118, pp. 1203–1208, IET.

Karjalainen M. (2003), Mixed physical modeling: DWG+ FDTD+ WDF, [in:] Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on, pp. 225–228. IEEE, New Paltz, NY.

Kelly J.L., Lochbaum C.C. (1962), Speech synthesis, [in:] Proceedings of the Stockholm Speech Communications Seminar, RIT, Stockholm, Sweden, pp. 1–4.

Laakso T.I., Valimaki V., Karjalainen M., Laine U.K. (1996), Splitting the unit delay: tool for fractional delay filter design, IEEE Signal Processing Magazine, 13, 30–60.

Lim I.-T., Lee B.G. (1993), Lossless pole-zero modeling of speech signals, IEEE Transactions on Speech and Audio Processing, 1, 269–276.

Maddox A., Oren L., Khosla S., Gutmark E. (2014), Prediction of pressure distribution between the vocal folds using Bernoulli’s equation, The Journal of the Acoustical Society of America, 136, 2126–2126.

Markel J.E., Gray A.H. (1976), Linear prediction of speech, Springer-Verlag, New York.

Mathur S., Story B.H., Rodriguez J.J. (2006), Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays, IEEE Transactions on Audio, Speech, and Language Processing, 14, 1754–1762.

Morse P. (1981), Vibration and Sound, Acoustical Society of America.

Mullen J., Howard D.M., Murphy D.T. (2003), Digital waveguide mesh modeling of the vocal tract acoustics, [in:] Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop, pp. 119–122, IEEE.

Mullen J., Howard D.M., Murphy D.T. (2006), Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality, IEEE Transactions on Audio, Speech, and Language Processing, 14, 964–971.

Mullen J., Howard D.M., Murphy D.T. (2007), Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model, IEEE Transactions on Audio, Speech, and Language Processing, 15, 577–585.

Qureshi T., Syed K. (2011a), A One-Mass Physical Model of the Vocal Folds with Seesaw-Like Oscillations, Archives of Acoustics, 36, 15–27.

Qureshi T.M., Syed K.S. (2011b), A New Approach to Parametric Modeling of Glottal Flow, Archives of Acoustics, 36, 695–712.

Qureshi T.M., Syed K.S. (2015), Two dimensional featured one dimensional digital waveguide model for the vocal tract, Computer Speech & Language 33, 47–66.

Rabiner L.R., Schafer R.W. (1978), Digital processing of speech signals, Prantice-Hall.

Samadi S., Ahmad M.O., Swamy M. (2004), Results on maximally flat fractional-delay systems, IEEE Transactions on Circuits and Systems I: Regular Papers, 51, 2271–2286.

Savioja L., Rinne T.J., Takala T. (1994), Simulation of Room Acoustics with a 3D finite difference mesh, [in:] Proceedings of International Computer Music Conference, pp. 463–466, Aarhus, Denmark.

Shimamura R., Tokuda I.T. (2016), Effect of level difference between left and right vocal folds on phonation: Physical experiment and theoretical study, The Journal of the Acoustical Society of America, 140, 3393–3394.

Smith J.O. (1985), A new approach to digital reverberation using closed waveguide networks, [in:] Proceedings of International Computer Music Conference, pp. 47–53, Vancouver, Canada.

Smith J.O. (1992), Physical modeling using digital waveguides. Computer Music Journal, 16, 74–91.

Smith J.O. (2002), Principles of digital waveguide models of musical instruments. [in:] M. Kahrsand, K. Brandenburg [Eds.], Applications of digital signal processing to audio and acoustics, pp. 417–466, Kluwer Academic Publishers, Boston, Dordrecht, London.

Sondhi M., Schroeter J. (1987), A hybrid time-frequency domain articulatory speech synthesizer, IEEE Transactions on Acoustics, Speech and Signal Processing, 35, 955–967.

Speed M., Murphy D., Howard D. (2013), Three-Dimensional Digital Waveguide Mesh Simulation of Cylindrical Vocal Tract Analogs, IEEE Transaction on Audio Speech, and Language Processing, 21, 449–454.

Story B.H. (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech & Language, 27, 989–1010.

Story B.H., Titze I.R. (1998), Parameterization of vocal tract area functions by empirical orthogonal modes, Journal of Phonetics, 26, 223–260.

Titze I.R., Titze I.R. (2014). One glottal airflow—Two vocal folds, The Journal of the Acoustical Society of America, 136, 2163–2163.

Välimäki V. (1995), Discrete-time modeling of acoustic tubes using fractional delay filters, Helsinki University of Technology.

Välimäki V., Karjalainen M. (1994), Improving the Kelly-Lochbaum vocal tract model using conical tube sections and fractional delay filtering techniques, [in:] Processings of the International Conference on Spoken Language Processing (ICSLP), Vol. 2, pp. 615–618,Yokohama, Japan.

Välimäki V., Pakarinen J., Erkut C., Karjalainen M. (2006), Discrete-time modelling of musical instruments, Reports on Progress in Physics, 69, 1–78.

Vampola T., Horáček J., Laukkanen A.-M., Švec J.G. (2015), Human vocal tract resonances and the corresponding mode shapes investigated by three-dimensional finite-element modelling based on CT measurement, Logopedics Phoniatrics Vocology, 40, 14–23.

Van Duyne S.A., Smith J.O. (1993a), The 2-D digital waveguide mesh, [in:] Applications of Signal Processing to Audio and Acoustics. Final Program and Paper Summaries, 1993 IEEE Workshop, pp. 177–180, IEEE, New Paltz, NY.

Van Duyne S.A., Smith J.O. (1993b), Physical modeling with the 2-D digital waveguide mesh, [in:] Proceedings of the International Computer Music Conference, pp. 40–40, International Computer Music Accociation, Tokyo, Japan.

Wang Y., Wang H., Wei J., Dang J. (2012a), Acoustic analysis of the vocal tract from a 3D physiological articulatory model by finite-difference time-domain method, [in:] Proceeding of international conference on Automatic Control and Artificial Intelligence, pp. 329–333, IET, Xiamen, China.

Wang Y., Wang H., Wei J., Dang J. (2012b), Mandarin vowel synthesis based on 2D and 3D vocal tract model by finite-difference time-domain method, [in:] Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp. 1–4, IEEE, Hollywood, CA.

DOI: 10.24425/aoa.2019.128492