Abstract
High-quality speech communication is often compromised by background noise, reducing intelligibility and perceived quality. We investigate data-efficient few-shot transfer of a Speech Enhancement Generative Adversarial Network (SEGAN) to a new noise domain. Starting from a generator pretrained on VoiceBank–DEMAND, we adapt the model to MiniLibriMix using only 300 paired noisy–clean examples. To prevent overfitting and catastrophic forgetting, we introduce SAFE (Stable Adversarial Few-shot Enhancement), a three-fold stabilisation strategy with (i) exponential-moving-average (EMA) weight averaging, (ii) L2-SP weight anchoring to the source-domain parameters, and (iii) a teacher–student consistency loss. SAFE maintains VoiceBank performance (PESQ ≈ 1.84; STOI ≈ 90 %) and, after an optional perceptual fine-tuning stage (MR-STFT + adversarial), yields substantial target-domain gains on MiniLibriMix (PESQ 1.11 → 1.26, STOI 71.5 % → 81.5 %) with only a minor source-domain trade-off in STOI. Ablation experiments demonstrate that EMA provides the strongest stabilising effect, while L2‑SP and consistency regularisation offer complementary benefits. These results suggest that stable few‑shot adaptation can make lightweight time‑domain speech enhancers practical for rapid deployment in novel acoustic environments.

