Skip to content

aleXiehta/AD-FlowTSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AD-FlowTSE: Adaptive Discriminative Flow-Matching Target Speaker Extraction

ICASSP 2026 Submission

Python PyTorch License arXiv Demo

Overview

Generative target-speaker extraction (TSE) methods often produce more natural outputs than predictive models. Recent diffusion- or flow-matching-based approaches typically rely on a fixed number of reverse steps with uniform step size.

We introduce Adaptive Discriminative Flow Matching TSE (AD-FlowTSE) — a generative framework that extracts target speech with an adaptive step size.

Unlike prior FM-based speech enhancement and TSE methods that transport between the mixture (or a normal prior) and the clean-speech distribution, AD-FlowTSE defines the flow between the background and the source, governed by the mixing ratio (MR) of the source and background forming the mixture.

This design enables MR-aware initialization, where the model starts at an adaptive point along the background–source trajectory rather than using a fixed reverse schedule across all noise levels.

💡 Experiments show that AD-FlowTSE delivers efficient and accurate TSE by achieving strong performance even with a single reverse step, further enhanced by auxiliary MR estimation, path alignment with mixture composition, and noise-adaptive step sizes.

Dataset Preparation

Follow the official data-preparation pipeline from SpeakerBeam. After preparation, ensure your dataset follows the same directory structure (mixture, clean, and reference files).

Pre-trained Checkpoints

Pre-trained AD-FlowTSE models and mixing-ratio predictors are available here.

Training Instructions

Train the Mixing-Ratio Predictor

python train_t_predicter.py \
  --config config/<config_FlowTSE_alpha.yaml | config_FlowTSE_alpha_noisy.yaml>

Train AD-FlowTSE

python train.py \
  --config config/<config_FlowTSE_large.yaml | config_FlowTSE_large_noisy.yaml>

Evaluation

Run evaluation with different MR-predictor variants:

python eval.py \
  --config config/<config_FlowTSE_large.yaml | config_FlowTSE_large_noisy.yaml> \
  --t_predicter <ECAPAMLP | GT | ZERO | ONE | RAND>

Credits

Our UDiT backbone is ported and modified from SoloAudio. We thank the authors for releasing their high-quality implementation.

Citation

If you find this work helpful, please cite:

@inproceedings{hsieh2026adflowtse,
  title     = {Adaptive Discriminative Flow Matching for Target Speaker Extraction},
  author    = {Tsun-An Hsieh and Minje Kim},
  booktitle = {submitted to Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2026},
}

About

Adaptive Flow-Matching for Target Speaker Extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages