Head-Related Transfer Function Selection Using Neural Networks

Shu-Nung YAO; Tim COLLINS; Chaoyun LIANG

doi:10.1515/aoa-2017-0038

Authors

Shu-Nung YAO National Taipei University, Taiwan
Tim COLLINS Manchester Metropolitan University, United Kingdom
Chaoyun LIANG National Taiwan University, Taiwan

Abstract

In binaural audio systems, for an optimal virtual acoustic space a set of head-related transfer functions (HRTFs) should be used that closely matches the listener’s ones. This study aims to select the most appropriate HRTF dataset from a large database for users without the need for extensive listening tests. Currently, there is no way to reliably reduce the number of datasets to a smaller, more manageable number without risking discarding potentially good matches. A neural network that estimates the appropriateness of HRTF datasets based on input vectors of anthropometric measurements is proposed. The shapes and sizes of listeners’ heads and pinnas were measured using digital photography; the measured anthropometric parameters form the feature vectors used by the neural network. A graphical user interface (GUI) was developed for participants to listen to music transformed using different HRTFs and to evaluate the fitness of each HRTF dataset. The listening scores recorded were the target outputs used to train the neural networks. The aim was to learn a mapping between anthropometric parameters and listener’s perception scores. Experimental validations were performed on 30 subjects. It is demonstrated that the proposed system produces a much more reliable HRTF selection than previously used methods.

Keywords:

head-related transfer function, neural networks, localization, music, audio, anthropometry, pinna

References

1. Algazi V.R., Duda R.O., Thompson D.M., Avendano C. (2001), The CIPIC HRTF database, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electro-Acoustics, pp. 99–102.

2. Batteau D.W. (1967), The role of the pinna in human localisation, Royal Society London, 168, B, 158–180.

3. Benitez J.M., Castro J.L., Requena I. (1997), Are artificial neural networks black boxes, IEEE Transactions on Neural Networks, 8, 5, 1156–1164.

4. Brown C.P., Duda R.O. (1997), An efficient HRTF model for 3-D sound, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19–22.

5. Brown C.P., Duda R.O. (1998), A structural model for binaural sound synthesis, Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 6, 5, 476–488.

6. Choi T., Park Y., Youn D., Lee S. (2011), Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1962 –1974

7. Chun C.J., Kim H.K., Choi S.H., Jang S.J., Lee S.P. (2011), Sound source elevation using spectral notch filtering and directional band boosting in stereo loudspeaker reproduction, IEEE Transactions on Consumer Electronics, 57, 4, 1915–1920.

8. Collins T. (2013), Binaural ambisonic decoding with enhanced lateral localization, Proceedings of Audio Engineering Society 134th Convention.

9. Dave V.S., Dutta K. (2014), Neural network based models for software effort estimation: a review, Artificial Intelligence Review, 42, 2, 295–307.

10. Fechner G.T. (1860), Elements of psychophysics, Holt Rinehart & Winston, New York.

11. Gupta N., Barreto A., Joshi M., Aguedelo J. (2010), HRTF database at FIU DSP lab, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 169–172.

12. Gupta N., Barreto A., Ordonez C. (2002), Spectral modification of head-related transfer functions for improved virtual sound spatialization, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1953–1956.

13. Hagan M.T., Demuth H.B., Beale M. (2002), Neural Network Design, CITIC Publishing House, Beijing.

14. Ideri A., Abran A., Mbarki S. (2004), Validating and understanding software cost estimation models based on neural networks, Proceedings of IEEE International Conference on Information and Communication Technologies, pp. 433–434.

15. Ircam (2002), Listen HRTF database, http://recherche.ircam.fr/equipes/salles/listen/

16. Jang J.-S.R, Sun C.T. (1993), Functional equivalence between radial basis function networks and fuzzy inference systems, IEEE Transactions on Neural Networks, 4, 1, 156–159.

17. Masterson C., Kearney G., Gorzel M., Boland F.M. (2012), HRIR order reduction using approximate factorization. IEEE Transactions on Audio, Speech, and Language Processing, 20, 6, 1808–1817.

18. Pett M.A. (1997), Nonparametric statistics for health care research: Statistics for small samples and unusual distributions, Sage Publications, Thousand Oaks, CA.

19. Ranjan R., Gan W.-S. (2015), Natural listening over headphones in augmented reality using adaptive filtering techniques, IEEE/ACM Trans. Audio, Speech and Language Processing, 23, 11, 1988–2002.

20. Salkind N.J. (2004), Statistics for people who (think they) hate statistics, Sage Publications, Thousand Oaks, CA.

21. Shabtai N.R., Rafaely B. (2014), Generalized spherical array beamforming for binaural speech reproduction, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22, 1, 238–247.

22. Tan C.-J., Gan W.-S. (1998), User-defined spectral manipulation of HRTF for improved localisation in 3D sound systems, Electronics Letters, 34, 25, 2387–2389.

23. Watkins A.J. (1978), Psychoacoustical aspects of synthesized vertical locale cues, Journal of Acoustical Society of America, 63, 4, 1152–1165.

24. Wythoff B.J. (1993), Backpropagation neural networks: a tutorial, Chemometrics and Intelligent Laboratory Systems,18, 115–155.

25. Yao S.-N., Chen L.J. (2013), HRTF Adjustments with audio quality assessments, Archives of Acoustics, 38, 1, 55–62.

26. Zhang M., Tan K.-C., Er M.H. (1998), A refined algorithm of 3-D sound synthesis, Proceedings of IEEE International Conference on Signal Processing Proceedings, pp. 1408–1411.

27. Zotkin D.N., Duraiswami R., Davis L.S. (2004), Rendering localized spatial audio in a virtual auditory space, IEEE Transactions on Multimedia, 6, 4, 553–564.

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Head-Related Transfer Function Selection Using Neural Networks

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact