Few-Shot Transfer for Speech Enhancement Using SEGAN with Stability Guardrails

Downloads

Authors

  • Rubi Sharma Rajiv Gandhi University, India
  • Firos A. Rajiv Gandhi University, India

Abstract

High-quality speech communication is often compromised by background noise, reducing intelligibility and perceived quality. We investigate data-efficient few-shot transfer of a Speech Enhancement Generative Adversarial Network (SEGAN) to a new noise domain. Starting from a generator pretrained on VoiceBank–DEMAND, we adapt the model to MiniLibriMix using only 300 paired noisy–clean examples. To prevent overfitting and catastrophic forgetting, we introduce SAFE (Stable Adversarial Few-shot Enhancement), a three-fold stabilisation strategy with (i) exponential-moving-average (EMA) weight averaging, (ii) L2-SP weight anchoring to the source-domain parameters, and (iii) a teacher–student consistency loss. SAFE maintains VoiceBank performance (PESQ ≈ 1.84; STOI ≈ 90 %) and, after an optional perceptual fine-tuning stage (MR-STFT + adversarial), yields substantial target-domain gains on MiniLibriMix (PESQ 1.11 → 1.26, STOI 71.5 % → 81.5 %) with only a minor source-domain trade-off in STOI. Ablation experiments demonstrate that EMA provides the strongest stabilising effect, while L2‑SP and consistency regularisation offer complementary benefits. These results suggest that stable few‑shot adaptation can make lightweight time‑domain speech enhancers practical for rapid deployment in novel acoustic environments.

Keywords:

speech enhancement, generative adversarial networks, few-shot learning, transfer learning, domain adaptation, stability regularization