CAPSE-ViT: A Lightweight Framework for Underwater Acoustic Vessel Classification Using Coherent Spectral Estimation and Modified Vision Transformer

Downloads

Authors

  • Najamuddin NAJAMUDDIN Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
  • Usman Ullah SHEIKH Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
  • Ahmad Zuri SHA’AMERI Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia

Abstract

Underwater acoustic target classification has become a key area of research for marine vessel classification, where machine learning (ML) models are leveraged to identify targets automatically. The major challenge is inserting area-specific understanding into ML frameworks to extract features that effectively distinguish between different vessel types. In this study, we propose a model that uses the coherently averaged power spectral estimation (CAPSE) algorithm. Vessel frequency spectra is first computed through the CAPSE analysis, capturing key machinery characteristics. Further, the features are processed via a vision transformer (ViT) network. This method enables the model to learn more complex relationships and patterns within the data, thereby improving the classification performance. This is accomplished by using self-attention mechanisms to capture global dependencies between features, enabling the model to focus on relationships throughout the entire input. The results, evaluated on standard DeepShip and ShipsEar datasets, show that the proposed model achieved a classification accuracy of 97.98 % and 99.19 % while utilizing just 1.90 million parameters, outperforming other models such as ResNet18 and UATR-Transformer in terms of both accuracy and computational efficiency. This work offers an improvement to the development of efficient marine vessel classification systems for underwater acoustics applications, demonstrating that high performance can be achieved with reduced computational complexity.

Keywords:

underwater acoustic targets, CAPSE, vision transformer, CNN, LOFAR gram

References


  1. Aslam M.A. et al. (2024), Underwater sound classification using learning based methods: A review, Expert Systems with Applications, 255(Part 1): 124498, https://doi.org/10.1016/j.eswa.2024.124498

  2. Bianco M.J. et al. (2019), Machine learning in acoustics: Theory and applications, The Journal of the Acoustical Society of America, 146(5): 3590–3628. https://doi.org/10.1121/1.5133944

  3. Bjorno L. (2017), Underwater acoustic measurements and their applications, [in:] Applied Underwater Acoustics, Neighbors T.H., III, Bradley D. [Eds.], pp. 889–947, Elsevier, https://doi.org/10.1016/B978-0-12-811240-3.00014-X

  4. Cao X., Togneri R., Zhang X., Yu Y. (2019), Convolutional neural network with second-order pooling for underwater target classification, IEEE Sensors Journal, 19(8): 3058–3066, https://doi.org/10.1109/JSEN.2018.2886368

  5. Chen J., Han B., Ma X., Zhang J. (2021), Underwater target recognition based on multi-decision LOFAR spectrum enhancement: A deep-learning approach, Future Internet, 13(10): 265, https://doi.org/10.3390/fi13100265

  6. Chen L., Luo X., Zhou H. (2024), A ship-radiated noise classification method based on domain knowledge embedding and attention mechanism, Engineering Applications of Artificial Intelligence, 127(Part B): 107320, https://doi.org/10.1016/j.engappai.2023.107320

  7. Cinelli L.P., Chaves G.S., Lima M.V.S. (2018), Vessel classification through convolutional neural networks using passive sonar spectrogram images, [in:] Proceedings of the Simpósio Brasileiro de Telecomunicaçõese Processamento de Sinais (SBrT 2018), pp. 21–25, http://doi.org/10.14209/sbrt.2018.340

  8. de Carvalho H.T., Avila F.R., Biscainho L.W.P. (2021), Bayesian restoration of audio degraded by lowfrequency pulses modeled via Gaussian process, IEEE Journal of Selected Topics in Signal Processing, 15(1): 90–103, https://doi.org/10.1109/JSTSP.2020.3033410

  9. de Moura N.N., de Seixas J.M. (2016), Novelty detection in passive SONAR systems using support vector machines, 2015 Latin-America Congress on Computational Intelligence (LA-CCI), https://doi.org/10.1109/LA-CCI.2015.7435957

  10. Domingos L.C.F., Santos P.E., Skelton P.S.M., Brinkworth R.S.A., Sammut K. (2022), A survey of underwater acoustic data classification methods using deep learning for shoreline surveillance, Sensors, 22(6): 2181, https://doi.org/10.3390/s22062181

  11. Dosovitskiy A. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv, https://doi.org/10.48550/arXiv.2010.11929

  12. Feng S., Jiang K., Kong X. (2021), A line spectrum detector based on improved coherent power spectrum estimation, Journal of Physics: Conference Series, 1971(1): 012006, https://doi.org/10.1088/1742-6596/1971/1/012006

  13. Feng S., Zhu X. (2022), A transformer-based deep learning network for underwater acoustic target recognition, IEEE Geoscience and Remote Sensing Letters, 19: 1–5, https://doi.org/10.1109/LGRS.2022.3201396

  14. Hegazy A.E., Makhlouf M.A., El-Tawel G.S. (2020), Improved salp swarm algorithm for feature selection, Journal of King Saud University – Computer and Information Sciences, 32(3): 335–344, https://doi.org/10.1016/j.jksuci.2018.06.003

  15. Hong F., Liu C., Guo L., Chen F., Feng H. (2021), Underwater acoustic target recognition with ResNet18 on shipsear dataset, 2021 IEEE 4th International Conference on Electronics Technology (ICET), pp. 1240–1244, https://doi.org/10.1109/ICET51757.2021.9451099

  16. Hu G., Wang K., Liu L. (2021), Underwater acoustic target recognition based on depthwise separable convolution neural networks, Sensors, 21(4): 1429, https://doi.org/10.3390/s21041429

  17. Ikpekha O.W., Eltayeb A., Pandya A., Daniels S. (2018), Operational noise associated with underwater sound emitting vessels and potential effect of oceanographic conditions: A Dublin Bay port area study, Journal of Marine Science and Technology, 23: 228–235, https://doi.org/10.1007/s00773-017-0468-4

  18. Irfan M., Jiangbin Z., Ali S., Iqbal M., Masood Z., Hamid U. (2021), DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification, Expert Systems with Applications, 183: 115270, https://doi.org/10.1016/j.eswa.2021.115270

  19. Khishe M., Mohammadi H. (2019), Passive sonar target classification using multi-layer perceptron trained by salp swarm algorithm, Ocean Engineering, 181: 98–108, https://doi.org/10.1016/j.oceaneng.2019.04.013

  20. Kim K.-I., Pak M.-I., Chon B.-P., Ri C.-H. (2021), A method for underwater acoustic signal classification using convolutional neural network combined with discrete wavelet transform, International Journal of Wavelets, Multiresolution and Information Processing, 19(04): 2050092, https://doi.org/10.1142/S0219691320500927

  21. Lampert T.A., O’Keefe S.E.M. (2013), On the detection of tracks in spectrogram images, Pattern Recognition, 46(5): 1396–1408, https://doi.org/10.1016/j.patcog.2012.11.009

  22. Lan H., White P.R., Li N., Li J., Sun D. (2020), Coherently averaged power spectral estimate for signal detection, Signal Processing, 169: 107414, https://doi.org/10.1016/j.sigpro.2019.107414

  23. Li X., Wang D., Tian Y., Kong X. (2023), A method for extracting interference striations in lofargram based on decomposition and clustering, IET Image Processing, 17(6): 1951–1958, https://doi.org/10.1049/ipr2.12768

  24. Lim T., Bae K., Hwang C., Lee H. (2007), Classification of underwater transient signals using MFCC feature vector, 2007 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, Proceedings, pp. 1–4, https://doi.org/10.1109/ISSPA.2007.4555521

  25. Luo X., Chen L., Zhou H., Cao H. (2023), A survey of underwater acoustic target recognition methods based on machine learning, Journal of Marine Science and Engineering, 11(2): 384, https://doi.org/10.3390/ jmse11020384.

  26. Luo X., Zhang M., Liu T., Huang M., Xu X. (2021), An underwater acoustic target recognition method based on spectrograms with different resolutions, Journal of Marine Science and Engineering, 9(11): 1246, https://doi.org/10.3390/jmse9111246

  27. McKenna M.F. et al. (2024), Understanding vessel noise across a network of marine protected areas, Environmental Monitoring and Assessment, 196(4): 369, https://doi.org/10.1007/s10661-024-12497-2

  28. Müller N., Reermann J., Meisen T. (2024), Navigating the depths: A comprehensive survey of deep learning for passive underwater, IEEE Access, 12: 154092–154118, https://doi.org/10.1109/ACCESS.2024.3480788

  29. Noumida A., Rajan R. (2022), Multi-label bird species classification from audio recordings using attention framework, Applied Acoustics, 197: 108901, https://doi.org/10.1016/j.apacoust.2022.108901

  30. Pang D., Wang H., Ma J., Liang D. (2023), DCTN: A dense parallel network combining CNN and transforme for identifying plant disease in field, Soft Computing, 27(21): 15549–15561, https://doi.org/10.1007/s00500-023-09071-2

  31. Park J., Jung D.-J. (2021), Deep convolutional neural network architectures for tonal frequency identification in a lofargram, International Journal of Control, Automation and Systems, 19(2): 1103–1112, https://doi.org/10.1007/s12555-019-1014-4

  32. Raffel C. et al. (2020), Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research, 21(140): 1–67.

  33. Santos-Domınguez D., Torres-Guijarro S., Cardenal-Lopez A., Pena-Gimenez A. (2016), ShipsEar: An underwater vessel noise database, Applied Acoustics, 113: 64–69, https://doi.org/10.1016/j.apacoust.2016.06.008

  34. Sharma G., Umapathy K., Krishnan S. (2020), Trends in audio signal feature extraction methods, Applied Acoustics, 158: 107020, https://doi.org/10.1016/j.apacoust.2019.107020

  35. Sherin B.M., Supriya M.H. (2015), Selection and parameter optimization of SVM kernel function for underwater target classification, [in:] 2015 IEEE Underwater Technology (UT), pp. 1–5, https://doi.org/10.1109/UT.2015.7108260

  36. Siddagangaiah S., Li Y., Guo X., Chen X., Zhang Q., Yang K., Yang Y. (2016), A complexity-based approach for the detection of weak signals in ocean ambient noise, Entropy, 18(3): 101, https://doi.org/10.3390/e18030101

  37. Singh P., Saha G., Sahidullah M. (2021), Non-linear frequency warping using constant-Q transformation for speech emotion recognition, [in:] 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6, https://doi.org/10.1109/ICCCI50826.2021.9402569

  38. Song G., Guo X., Wang W., Ren Q., Li J., Ma L. (2021), A machine learning-based underwater noise classification method, Applied Acoustics, 184: 108333, https://doi.org/10.1016/j.apacoust.2021.108333

  39. Thomas M., Martin B., Kowarski K., Gaudet B., Matwin S. (2020), Marine mammal species classification using convolutional neural networks and a novel acoustic representation, [in:] Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science, 11908: 290–305, https://doi.org/10.1007/978-3-030-46133-1 18.

  40. Yang Y., Yao Q., Wang Y. (2024), Underwater acoustic target recognition method based on feature fusion and residual CNN, IEEE Sensors Journal, 24(22): 37342–37357, https://doi.org/10.1109/JSEN.2024.3464754

  41. Yuan F., Ke X., Cheng E. (2019), Joint representation and recognition for ship-radiated noise based on multimodal deep learning, Journal of Marine Science and Engineering, 7(11): 380, https://doi.org/10.3390/jmse7110380

  42. Zeng Y., Zhang M., Han F., Gong Y., Zhang J. (2019), Spectrum analysis and convolutional neural network for automatic modulation recognition, [in:] IEEE Wireless Communications Letters, 8(3): 929–932, https://doi.org/10.1109/LWC.2019.2900247