From Speech to Underwater Acoustics: A Transfer Learning Framework for Real-Time Passive Diver Detection Using Keyword Spotting Models

Downloads

Authors

  • Osama Deeb Higher Institute for Applied Sciences and Technology, Syria ORCID ID 0009-0008-5915-1192
  • Saier Mahmoud Higher Institute for Applied Sciences and Technology, Syria
  • Louay Saleh Higher Institute for Applied Sciences and Technology, Syria
  • Assef Jafar Higher Institute for Applied Sciences and Technology, Syria ORCID ID 0000-0002-7868-8621
  • Oumayma Al Dakkak Higher Institute for Applied Sciences and Technology, Syria ORCID ID 0000-0002-8842-0979
  • Ibrahim Chouaib Higher Institute for Applied Sciences and Technology, Syria

Abstract

Passive acoustic detection of divers faces challenges such as low signal-to-noise ratios (SNRs), data scarcity, and latency in conventional methods. This paper proposes Keyword Spotting for Diver Detection (KWS-DD)—a transfer learning framework that repurposes speech-oriented KWS models for data-efficient diver detection. Diver inhalation signatures are treated as acoustic "keywords," enabling adaptation of the transformer-based HuBERT architecture (pre-trained on speech) to identify quasi-periodic respiratory events in underwater audio. The core innovation of this work lies in adapting the state-of-the-art speech model HuBERT for accurate diver detection via non-speech inhalation acoustics. This approach eliminates the need for respiratory cycles accumulation, enabling real-time detection using minimal domain-specific data (120 inhalation samples). Deployed in diverse marine conditions, the solution achieved 94.4% accuracy and 94.6% F1-score for inhalation sounds. This represents a more than 50% range extension over conventional methods, which proved unreliable beyond 10 meters in low-SNR environments. The framework reduces false alarms caused by boat noise and generalizes to external datasets, validating cross-domain transferability. This work bridges AI-based speech processing and passive sonar signal processing, offering a resource-efficient solution for real-time underwater surveillance.