CAPSE-ViT: A Lightweight Framework for Underwater Acoustic Vessel Classification Using Coherent Spectral Estimation and Modified Vision Transformer

Najamuddin NAJAMUDDIN; Usman Ullah SHEIKH; Ahmad Zuri SHA’AMERI

doi:10.24425/aoa.2025.153662

Authors

Najamuddin NAJAMUDDIN Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
Usman Ullah SHEIKH Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
Ahmad Zuri SHA’AMERI Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia

Abstract

Underwater acoustic target classification has become a key area of research for marine vessel classification, where machine learning (ML) models are leveraged to identify targets automatically. The major challenge is inserting area-specific understanding into ML frameworks to extract features that effectively distinguish between different vessel types. In this study, we propose a model that uses the coherently averaged power spectral estimation (CAPSE) algorithm. Vessel frequency spectra is first computed through the CAPSE analysis, capturing key machinery characteristics. Further, the features are processed via a vision transformer (ViT) network. This method enables the model to learn more complex relationships and patterns within the data, thereby improving the classification performance. This is accomplished by using self-attention mechanisms to capture global dependencies between features, enabling the model to focus on relationships throughout the entire input. The results, evaluated on standard DeepShip and ShipsEar datasets, show that the proposed model achieved a classification accuracy of 97.98 % and 99.19 % while utilizing just 1.90 million parameters, outperforming other models such as ResNet18 and UATR-Transformer in terms of both accuracy and computational efficiency. This work offers an improvement to the development of efficient marine vessel classification systems for underwater acoustics applications, demonstrating that high performance can be achieved with reduced computational complexity.

Keywords:

underwater acoustic targets, CAPSE, vision transformer, CNN, LOFAR gram

References

Aslam M.A. et al. (2024), Underwater sound classification using learning based methods: A review, Expert Systems with Applications, 255(Part 1): 124498, https://doi.org/10.1016/j.eswa.2024.124498

Bianco M.J. et al. (2019), Machine learning in acoustics: Theory and applications, The Journal of the Acoustical Society of America, 146(5): 3590–3628. https://doi.org/10.1121/1.5133944

Bjorno L. (2017), Underwater acoustic measurements and their applications, [in:] Applied Underwater Acoustics, Neighbors T.H., III, Bradley D. [Eds.], pp. 889–947, Elsevier, https://doi.org/10.1016/B978-0-12-811240-3.00014-X

Cao X., Togneri R., Zhang X., Yu Y. (2019), Convolutional neural network with second-order pooling for underwater target classification, IEEE Sensors Journal, 19(8): 3058–3066, https://doi.org/10.1109/JSEN.2018.2886368

Chen J., Han B., Ma X., Zhang J. (2021), Underwater target recognition based on multi-decision LOFAR spectrum enhancement: A deep-learning approach, Future Internet, 13(10): 265, https://doi.org/10.3390/fi13100265

Chen L., Luo X., Zhou H. (2024), A ship-radiated noise classification method based on domain knowledge embedding and attention mechanism, Engineering Applications of Artificial Intelligence, 127(Part B): 107320, https://doi.org/10.1016/j.engappai.2023.107320

Cinelli L.P., Chaves G.S., Lima M.V.S. (2018), Vessel classification through convolutional neural networks using passive sonar spectrogram images, [in:] Proceedings of the Simpósio Brasileiro de Telecomunicaçõese Processamento de Sinais (SBrT 2018), pp. 21–25, http://doi.org/10.14209/sbrt.2018.340

de Carvalho H.T., Avila F.R., Biscainho L.W.P. (2021), Bayesian restoration of audio degraded by lowfrequency pulses modeled via Gaussian process, IEEE Journal of Selected Topics in Signal Processing, 15(1): 90–103, https://doi.org/10.1109/JSTSP.2020.3033410

de Moura N.N., de Seixas J.M. (2016), Novelty detection in passive SONAR systems using support vector machines, 2015 Latin-America Congress on Computational Intelligence (LA-CCI), https://doi.org/10.1109/LA-CCI.2015.7435957

Domingos L.C.F., Santos P.E., Skelton P.S.M., Brinkworth R.S.A., Sammut K. (2022), A survey of underwater acoustic data classification methods using deep learning for shoreline surveillance, Sensors, 22(6): 2181, https://doi.org/10.3390/s22062181

Dosovitskiy A. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv, https://doi.org/10.48550/arXiv.2010.11929

Feng S., Jiang K., Kong X. (2021), A line spectrum detector based on improved coherent power spectrum estimation, Journal of Physics: Conference Series, 1971(1): 012006, https://doi.org/10.1088/1742-6596/1971/1/012006

Feng S., Zhu X. (2022), A transformer-based deep learning network for underwater acoustic target recognition, IEEE Geoscience and Remote Sensing Letters, 19: 1–5, https://doi.org/10.1109/LGRS.2022.3201396

Hegazy A.E., Makhlouf M.A., El-Tawel G.S. (2020), Improved salp swarm algorithm for feature selection, Journal of King Saud University – Computer and Information Sciences, 32(3): 335–344, https://doi.org/10.1016/j.jksuci.2018.06.003

Hong F., Liu C., Guo L., Chen F., Feng H. (2021), Underwater acoustic target recognition with ResNet18 on shipsear dataset, 2021 IEEE 4th International Conference on Electronics Technology (ICET), pp. 1240–1244, https://doi.org/10.1109/ICET51757.2021.9451099

Hu G., Wang K., Liu L. (2021), Underwater acoustic target recognition based on depthwise separable convolution neural networks, Sensors, 21(4): 1429, https://doi.org/10.3390/s21041429

Ikpekha O.W., Eltayeb A., Pandya A., Daniels S. (2018), Operational noise associated with underwater sound emitting vessels and potential effect of oceanographic conditions: A Dublin Bay port area study, Journal of Marine Science and Technology, 23: 228–235, https://doi.org/10.1007/s00773-017-0468-4

Irfan M., Jiangbin Z., Ali S., Iqbal M., Masood Z., Hamid U. (2021), DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification, Expert Systems with Applications, 183: 115270, https://doi.org/10.1016/j.eswa.2021.115270

Khishe M., Mohammadi H. (2019), Passive sonar target classification using multi-layer perceptron trained by salp swarm algorithm, Ocean Engineering, 181: 98–108, https://doi.org/10.1016/j.oceaneng.2019.04.013

Kim K.-I., Pak M.-I., Chon B.-P., Ri C.-H. (2021), A method for underwater acoustic signal classification using convolutional neural network combined with discrete wavelet transform, International Journal of Wavelets, Multiresolution and Information Processing, 19(04): 2050092, https://doi.org/10.1142/S0219691320500927

Lampert T.A., O’Keefe S.E.M. (2013), On the detection of tracks in spectrogram images, Pattern Recognition, 46(5): 1396–1408, https://doi.org/10.1016/j.patcog.2012.11.009

Lan H., White P.R., Li N., Li J., Sun D. (2020), Coherently averaged power spectral estimate for signal detection, Signal Processing, 169: 107414, https://doi.org/10.1016/j.sigpro.2019.107414

Li X., Wang D., Tian Y., Kong X. (2023), A method for extracting interference striations in lofargram based on decomposition and clustering, IET Image Processing, 17(6): 1951–1958, https://doi.org/10.1049/ipr2.12768

Lim T., Bae K., Hwang C., Lee H. (2007), Classification of underwater transient signals using MFCC feature vector, 2007 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, Proceedings, pp. 1–4, https://doi.org/10.1109/ISSPA.2007.4555521

Luo X., Chen L., Zhou H., Cao H. (2023), A survey of underwater acoustic target recognition methods based on machine learning, Journal of Marine Science and Engineering, 11(2): 384, https://doi.org/10.3390/ jmse11020384.

Luo X., Zhang M., Liu T., Huang M., Xu X. (2021), An underwater acoustic target recognition method based on spectrograms with different resolutions, Journal of Marine Science and Engineering, 9(11): 1246, https://doi.org/10.3390/jmse9111246

McKenna M.F. et al. (2024), Understanding vessel noise across a network of marine protected areas, Environmental Monitoring and Assessment, 196(4): 369, https://doi.org/10.1007/s10661-024-12497-2

Müller N., Reermann J., Meisen T. (2024), Navigating the depths: A comprehensive survey of deep learning for passive underwater, IEEE Access, 12: 154092–154118, https://doi.org/10.1109/ACCESS.2024.3480788

Noumida A., Rajan R. (2022), Multi-label bird species classification from audio recordings using attention framework, Applied Acoustics, 197: 108901, https://doi.org/10.1016/j.apacoust.2022.108901

Pang D., Wang H., Ma J., Liang D. (2023), DCTN: A dense parallel network combining CNN and transforme for identifying plant disease in field, Soft Computing, 27(21): 15549–15561, https://doi.org/10.1007/s00500-023-09071-2

Park J., Jung D.-J. (2021), Deep convolutional neural network architectures for tonal frequency identification in a lofargram, International Journal of Control, Automation and Systems, 19(2): 1103–1112, https://doi.org/10.1007/s12555-019-1014-4

Raffel C. et al. (2020), Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research, 21(140): 1–67.

Santos-Domınguez D., Torres-Guijarro S., Cardenal-Lopez A., Pena-Gimenez A. (2016), ShipsEar: An underwater vessel noise database, Applied Acoustics, 113: 64–69, https://doi.org/10.1016/j.apacoust.2016.06.008

Sharma G., Umapathy K., Krishnan S. (2020), Trends in audio signal feature extraction methods, Applied Acoustics, 158: 107020, https://doi.org/10.1016/j.apacoust.2019.107020

Sherin B.M., Supriya M.H. (2015), Selection and parameter optimization of SVM kernel function for underwater target classification, [in:] 2015 IEEE Underwater Technology (UT), pp. 1–5, https://doi.org/10.1109/UT.2015.7108260

Siddagangaiah S., Li Y., Guo X., Chen X., Zhang Q., Yang K., Yang Y. (2016), A complexity-based approach for the detection of weak signals in ocean ambient noise, Entropy, 18(3): 101, https://doi.org/10.3390/e18030101

Singh P., Saha G., Sahidullah M. (2021), Non-linear frequency warping using constant-Q transformation for speech emotion recognition, [in:] 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6, https://doi.org/10.1109/ICCCI50826.2021.9402569

Song G., Guo X., Wang W., Ren Q., Li J., Ma L. (2021), A machine learning-based underwater noise classification method, Applied Acoustics, 184: 108333, https://doi.org/10.1016/j.apacoust.2021.108333

Thomas M., Martin B., Kowarski K., Gaudet B., Matwin S. (2020), Marine mammal species classification using convolutional neural networks and a novel acoustic representation, [in:] Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science, 11908: 290–305, https://doi.org/10.1007/978-3-030-46133-1 18.

Yang Y., Yao Q., Wang Y. (2024), Underwater acoustic target recognition method based on feature fusion and residual CNN, IEEE Sensors Journal, 24(22): 37342–37357, https://doi.org/10.1109/JSEN.2024.3464754

Yuan F., Ke X., Cheng E. (2019), Joint representation and recognition for ship-radiated noise based on multimodal deep learning, Journal of Marine Science and Engineering, 7(11): 380, https://doi.org/10.3390/jmse7110380

Zeng Y., Zhang M., Han F., Gong Y., Zhang J. (2019), Spectrum analysis and convolutional neural network for automatic modulation recognition, [in:] IEEE Wireless Communications Letters, 8(3): 929–932, https://doi.org/10.1109/LWC.2019.2900247

Online first
Early birds
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

CAPSE-ViT: A Lightweight Framework for Underwater Acoustic Vessel Classification Using Coherent Spectral Estimation and Modified Vision Transformer

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Revised

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact