Multi-label Bird Species Classification Using Transfer Learning Network

Xue HAN; Jianxin PENG

doi:10.24425/aoa.2025.154812

Authors

Xue HAN School of Physics and Optoelectronics, South China University of Technology, China
Jianxin PENG School of Physics and Optoelectronics, South China University of Technology, China

Abstract

Bird sounds collected in the field usually include multiple birds of different species vocalizing at the same time, and the overlapping bird sounds pose challenges for species recognition. Extracting effective acoustic features is critical to multi-label bird species classification task. This work has extended an efficient transfer learning technique for labelling and classifying multiple bird species from audio recordings, further laying the foundation for conservation plans. A synthetic dataset was created by randomly mixing original single-species bird audio recordings from the Cornell Macaulay Library. The final dataset consists of 28 000 audio clips, each 5 s long, containing overlapping vocalizations of two or three bird species among 11 different species. Several pre-trained convolutional neural networks (CNNs), including InceptionV3, ResNet50, VGG16, and VGG19, were evaluated for extracting deep features from audio signals represented as mel spectrograms. The long short-term memory network (LSTM) was further employed to extract temporal features. A multi-label bird species classification was investigated. The absolute matching rate, accuracy, recall, precision, and F1-score of the InceptionV3+LSTM model for multi-label bird species classification are 98.25 %, 99.32 %, 99.41 %, 99.90 %, and 99.57 %, respectively, with the minimum Hamming loss of 0.0062. The results show that the proposed method has excellent performance and can be used for multi-label bird species classification.

Keywords:

transfer learning, multi-label bird species classification, InceptionV3, LSTM

References

Abdul Kareem N., Rajan R. (2023), Multi-label bird species classification using sequential aggregation strategy from audio recordings, Computing and Informatics, 42(5): 1255–1280, https://doi.org/10.31577/cai_2023_5_1255.

Bravo Sanchez F.J., Hossain M.R., English N.B., Moore S.T. (2021), Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture, Scientific Reports, 11: 15733, https://doi.org/10.1038/s41598-021-95076-6.

Briggs F. et al. (2012), Acoustic classification of multiple simultaneous bird species: A multi-instance multilabel approach, The Journal of the Acoustical Society of America, 131(6): 4640–4650, https://doi.org/10.1121/1.4707424.

Cheng Y., Ma M., Li X., Zhou Y. (2021), Multi-label classification of fundus images based on graph convolutional network, BMC Medical Informatics and Decision Making, 21: 82, https://doi.org/10.1186/s12911-021-01424-x.

Deng J., Dong W., Socher R., Li L.J., Li K., Li F.F. (2009), ImageNet: A large-scale hierarchical image database, [in:] 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, https://doi.org/10.1109/CVPR.2009.5206848.

Fagerlund S. (2004), Automatic recognition of bird species by their sounds, MSc. Thesis, Helsinki University of Technology.

Godbole S., Sarawagi S. (2004), Discriminative Methods for Multi-labeled Classification, [in:] Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science, Dai H., Srikant R., Zhang C. [Eds.], 3056: 22–30, https://doi.org/10.1007/978-3-540-24775-3. 5.

Gomez-Gomez J., Vidana-Vila E., Sevillano X. (2023), Western Mediterranean Wetland Birds dataset: A new annotated dataset for acoustic bird species classification, Ecological Informatics, 75: 102014, https://doi.org/10.1016/j.ecoinf.2023.102014.

Gunawan K.W., Hidayat A.A., Cenggoro T.W., Pardamean B. (2021), A transfer learning strategy for owl sound classification by using image classification model with audio spectrogram, International Journal on Electrical Engineering and Informatics, 13(3): 546–553, https://doi.org/10.15676/ijeei.2021.13.3.3.

He K., Zhang X., Ren S., Sun J. (2016), Deep residual learning for image recognition, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.

Huang Y.-P., Basanta H. (2021), Recognition of endemic bird species using deep learning models, IEEE Access, 9: 102975–102984, https://doi.org/10.1109/ACCESS.2021.3098532.

Leng Y.R., Dat Tran H. (2014), Multi-label bird classification using an ensemble classifier with simple features, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, pp. 1–5, https://doi.org/10.1109/APSIPA.2014.7041649.

Li G., Ji Z.F., Chang Y.L., Li S., Qu X.D., Cao D.P. (2021), ML-ANet: A transfer learning approach using adaptation network for multi-label image classification in autonomous driving, Chinese Journal of Mechanical Engineering, 34: 78, https://doi.org/10.1186/s10033-021-00598-9.

Liu A. et al. (2021), Residual recurrent CRNN for end-to-end optical music recognition on monophonic scores, arXiv, http://arxiv.org/abs/2010.13418.

Liu H.T. (2016), A study on multi-label transfer learning algorithm and application in the bird sounds recognition, Msc. Thesis, Nanjing Forestry University.

Michaud F., Sueur J., Le Cesne M., Haupert S. (2023), Unsupervised classification to improve the quality of a bird song recording dataset, Ecological Informatics, 74: 101952, https://doi.org/10.1016/j.ecoinf.2022.101952.

Nishikimi R., Nakamura E., Goto M., Yoshii K. (2021), Audio-to-score singing transcription based on a CRNN-HSMM hybrid model, APSIPA Transactions on Signal and Information Processing, 10(1): e7, https://doi.org/10.1017/ATSIP.2021.4.

Noumida A., Rajan R. (2022), Multi-label bird species classification from audio recordings using attention framework, Applied Acoustics, 197: 108901, https://doi.org/10.1016/j.apacoust.2022.108901.

Paniri M., Dowlatshahi M.B., Nezamabadi-pour H. (2020), MLACO: A multi-label feature selection algorithm based on ant colony optimization, Knowledge-Based Systems, 192: 105285, https://doi.org/10.1016/j.knosys.2019.105285.

Sainath T.N., Vinyals O., Senior A., Sak H. (2015), Convolutional, long short-term memory, fully connected deep neural networks, [in:] 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584, https://doi.org/10.1109/ICASSP.2015.7178838.

Sevilla A., Glotin H. (2017), Audio bird classification with Inception-v4 extended with time and time-frequency attention mechanisms, Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, Cappellato L., Ferro N., Goeuriot L., Mandl T. [Eds.], 1866, https://ceur-ws.org/Vol-1866/paper. 177.pdf.

Simonyan K., Zisserman A. (2014), Very deep convolutional networks for large-scale image recognition, arXiv, http://arxiv.org/abs/1409.1556.

Sorower M.S. (2010), A literature survey on algorithms for multi-label learning.

Sprengel E., Jaggi M., Kilcher Y., Hofmann T. (2016), Audio based bird species identification using deep learning techniques, Working Notes of CLEF 2016 – Conference and Labs of the Evaluation forum, Balog K., Cappellato L., Ferro N., Macdonald C. [Eds.], 1609, https://ceur-ws.org/Vol-1609/16090547.pdf.

Szegedy C., Ioffe S., Vanhoucke V., Alemi A. (2017), Inception-v4, Inception-ResNet and the impact of residual connections on learning, [in:] Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), https://doi.org/10.1609/aaai.v31i1.11231.

Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. (2016), Rethinking the Inception Architecture for Computer Vision, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, https://doi.org/10.1109/CVPR.2016.308.

Tao J., Fang X. (2020), Toward multi-label sentiment analysis: a transfer learning based approach, Journal of Big Data, 7: 1, https://doi.org/10.1186/s40537-019-0278-0. 2

Weiss K., Khoshgoftaar T.M., Wang D. (2016), A survey of transfer learning, Journal of Big Data, 3: 9, https://doi.org/10.1186/s40537-016-0043-6.

Zhang L., Towsey M., Xie J., Zhang J., Roe P. (2016), Using multi-label classification for acoustic pattern detection and assisting bird species surveys, Applied Acoustics, 110: 91–98, https://doi.org/10.1016/j.apacoust.2016.03.027.

Online first
2025, Vol 50
	No 1	No 2
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Multi-label Bird Species Classification Using Transfer Learning Network

Downloads

Authors

Abstract

Keywords:

References

Most read articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Revised

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact