Multi-label Bird Species Classification Using Transfer Learning Network

Downloads

Authors

  • Xue HAN School of Physics and Optoelectronics, South China University of Technology, China
  • Jianxin PENG School of Physics and Optoelectronics, South China University of Technology, China

Abstract

Bird sounds collected in the field usually include multiple birds of different species vocalizing at the same time, and the overlapping bird sounds pose challenges for species recognition. Extracting effective acoustic features is critical to multi-label bird species classification task. This work has extended an efficient transfer learning technique for labelling and classifying multiple bird species from audio recordings, further laying the foundation for conservation plans. A synthetic dataset was created by randomly mixing original single-species bird audio recordings from the Cornell Macaulay Library. The final dataset consists of 28 000 audio clips, each 5 s long, containing overlapping vocalizations of two or three bird species among 11 different species. Several pre-trained convolutional neural networks (CNNs), including InceptionV3, ResNet50, VGG16, and VGG19, were evaluated for extracting deep features from audio signals represented as mel spectrograms. The long short-term memory network (LSTM) was further employed to extract temporal features. A multi-label bird species classification was investigated. The absolute matching rate, accuracy, recall, precision, and F1-score of the InceptionV3+LSTM model for multi-label bird species classification are 98.25 %, 99.32 %, 99.41 %, 99.90 %, and 99.57 %, respectively, with the minimum Hamming loss of 0.0062. The results show that the proposed method has excellent performance and can be used for multi-label bird species classification.

Keywords:

transfer learning, multi-label bird species classification, InceptionV3, LSTM

References

1. Abdul Kareem N., Rajan R. (2023), Multi-label bird species classification using sequential aggregation strategy from audio recordings, Computing and Informatics, 42(5): 1255–1280, https://doi.org/10.31577/cai. 2023 5 1255.

2. Bravo Sanchez F.J., Hossain M.R., English N.B., Moore S.T. (2021), Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture, Scientific Reports, 11: 15733, https://doi.org/10.1038/s41598-021-95076-6.

3. Briggs F. et al. (2012), Acoustic classification of multiple simultaneous bird species: A multi-instance multilabel approach, The Journal of the Acoustical Society of America, 131(6): 4640–4650, https://doi.org/10.1121/1.4707424.

4. Cheng Y., Ma M., Li X., Zhou Y. (2021), Multi-label classification of fundus images based on graph convolutional network, BMC Medical Informatics and Decision Making, 21: 82, https://doi.org/10.1186/s12911-021-01424-x.

5. Deng J., Dong W., Socher R., Li L.J., Li K., Li F.F. (2009), ImageNet: A large-scale hierarchical image database, [in:] 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, https://doi.org/10.1109/CVPR.2009.5206848.

6. Fagerlund S. (2004), Automatic recognition of bird species by their sounds, MSc. Thesis, Helsinki University of Technology.

7. Godbole S., Sarawagi S. (2004), Discriminative Methods for Multi-labeled Classification, [in:] Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science, Dai H., Srikant R., Zhang C. [Eds.], 3056: 22–30, https://doi.org/10.1007/978-3-540-24775-3. 5.

8. Gomez-Gomez J., Vidana-Vila E., Sevillano X. (2023), Western Mediterranean Wetland Birds dataset: A new annotated dataset for acoustic bird species classification, Ecological Informatics, 75: 102014, https://doi.org/10.1016/j.ecoinf.2023.102014.

9. Gunawan K.W., Hidayat A.A., Cenggoro T.W., Pardamean B. (2021), A transfer learning strategy for owl sound classification by using image classification model with audio spectrogram, International Journal on Electrical Engineering and Informatics, 13(3): 546–553, https://doi.org/10.15676/ijeei.2021.13.3.3.

10. He K., Zhang X., Ren S., Sun J. (2016), Deep residual learning for image recognition, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.

11. Huang Y.-P., Basanta H. (2021), Recognition of endemic bird species using deep learning models, IEEE Access, 9: 102975–102984, https://doi.org/10.1109/ACCESS.2021.3098532.

12. Leng Y.R., Dat Tran H. (2014), Multi-label bird classification using an ensemble classifier with simple features, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, pp. 1–5, https://doi.org/10.1109/APSIPA.2014.7041649.

13. Li G., Ji Z.F., Chang Y.L., Li S., Qu X.D., Cao D.P. (2021), ML-ANet: A transfer learning approach using adaptation network for multi-label image classification in autonomous driving, Chinese Journal of Mechanical Engineering, 34: 78, https://doi.org/10.1186/s10033-021-00598-9.

14. Liu A. et al. (2021), Residual recurrent CRNN for end-to-end optical music recognition on monophonic scores, arXiv, http://arxiv.org/abs/2010.13418.

15. Liu H.T. (2016), A study on multi-label transfer learning algorithm and application in the bird sounds recognition, Msc. Thesis, Nanjing Forestry University.

16. Michaud F., Sueur J., Le Cesne M., Haupert S. (2023), Unsupervised classification to improve the quality of a bird song recording dataset, Ecological Informatics, 74: 101952, https://doi.org/10.1016/j.ecoinf.2022.101952.

17. Nishikimi R., Nakamura E., Goto M., Yoshii K. (2021), Audio-to-score singing transcription based on a CRNN-HSMM hybrid model, APSIPA Transactions on Signal and Information Processing, 10(1): e7, https://doi.org/10.1017/ATSIP.2021.4.

18. Noumida A., Rajan R. (2022), Multi-label bird species classification from audio recordings using attention framework, Applied Acoustics, 197: 108901, https://doi.org/10.1016/j.apacoust.2022.108901.

19. Paniri M., Dowlatshahi M.B., Nezamabadi-pour H. (2020), MLACO: A multi-label feature selection algorithm based on ant colony optimization, Knowledge-Based Systems, 192: 105285, https://doi.org/10.1016/j.knosys.2019.105285.

20. Sainath T.N., Vinyals O., Senior A., Sak H. (2015), Convolutional, long short-term memory, fully connected deep neural networks, [in:] 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584, https://doi.org/10.1109/ICASSP.2015.7178838.

21. Sevilla A., Glotin H. (2017), Audio bird classification with Inception-v4 extended with time and time-frequency attention mechanisms, Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, Cappellato L., Ferro N., Goeuriot L., Mandl T. [Eds.], 1866, https://ceur-ws.org/Vol-1866/paper. 177.pdf.

22. Simonyan K., Zisserman A. (2014), Very deep convolutional networks for large-scale image recognition, arXiv, http://arxiv.org/abs/1409.1556.

23. Sorower M.S. (2010), A literature survey on algorithms for multi-label learning.

24. Sprengel E., Jaggi M., Kilcher Y., Hofmann T. (2016), Audio based bird species identification using deep learning techniques, Working Notes of CLEF 2016 – Conference and Labs of the Evaluation forum, Balog K., Cappellato L., Ferro N., Macdonald C. [Eds.], 1609, https://ceur-ws.org/Vol-1609/16090547.pdf.

25. Szegedy C., Ioffe S., Vanhoucke V., Alemi A. (2017), Inception-v4, Inception-ResNet and the impact of residual connections on learning, [in:] Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), https://doi.org/10.1609/aaai.v31i1.11231.

26. Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. (2016), Rethinking the Inception Architecture for Computer Vision, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, https://doi.org/10.1109/CVPR.2016.308.

27. Tao J., Fang X. (2020), Toward multi-label sentiment analysis: a transfer learning based approach, Journal of Big Data, 7: 1, https://doi.org/10.1186/s40537-019-0278-0.

28. Weiss K., Khoshgoftaar T.M., Wang D. (2016), A survey of transfer learning, Journal of Big Data, 3: 9, https://doi.org/10.1186/s40537-016-0043-6.

29. Zhang L., Towsey M., Xie J., Zhang J., Roe P. (2016), Using multi-label classification for acoustic pattern detection and assisting bird species surveys, Applied Acoustics, 110: 91–98, https://doi.org/10.1016/j.apacoust.2016.03.027.

Most read articles by the same author(s)