AD-FlowTSE: Adaptive Discriminative Flow-Matching Target Speaker Extraction

ICASSP 2026 Submission

Overview

Generative target-speaker extraction (TSE) methods often produce more natural outputs than predictive models. Recent diffusion- or flow-matching-based approaches typically rely on a fixed number of reverse steps with uniform step size.

We introduce Adaptive Discriminative Flow Matching TSE (AD-FlowTSE) — a generative framework that extracts target speech with an adaptive step size.

Unlike prior FM-based speech enhancement and TSE methods that transport between the mixture (or a normal prior) and the clean-speech distribution, AD-FlowTSE defines the flow between the background and the source, governed by the mixing ratio (MR) of the source and background forming the mixture.

This design enables MR-aware initialization, where the model starts at an adaptive point along the background–source trajectory rather than using a fixed reverse schedule across all noise levels.

💡 Experiments show that AD-FlowTSE delivers efficient and accurate TSE by achieving strong performance even with a single reverse step, further enhanced by auxiliary MR estimation, path alignment with mixture composition, and noise-adaptive step sizes.

Dataset Preparation

Follow the official data-preparation pipeline from SpeakerBeam. After preparation, ensure your dataset follows the same directory structure (mixture, clean, and reference files).

Pre-trained Checkpoints

Pre-trained AD-FlowTSE models and mixing-ratio predictors are available here.

Training Instructions

Train the Mixing-Ratio Predictor

python train_t_predicter.py \
  --config config/<config_FlowTSE_alpha.yaml | config_FlowTSE_alpha_noisy.yaml>

Train AD-FlowTSE

python train.py \
  --config config/<config_FlowTSE_large.yaml | config_FlowTSE_large_noisy.yaml>

Evaluation

Run evaluation with different MR-predictor variants:

python eval.py \
  --config config/<config_FlowTSE_large.yaml | config_FlowTSE_large_noisy.yaml> \
  --t_predicter <ECAPAMLP | GT | ZERO | ONE | RAND>

Credits

Our UDiT backbone is ported and modified from SoloAudio. We thank the authors for releasing their high-quality implementation.

Citation

If you find this work helpful, please cite:

@inproceedings{hsieh2026adflowtse,
  title     = {Adaptive Discriminative Flow Matching for Target Speaker Extraction},
  author    = {Tsun-An Hsieh and Minje Kim},
  booktitle = {submitted to Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2026},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AD-FlowTSE: Adaptive Discriminative Flow-Matching Target Speaker Extraction

Overview

Dataset Preparation

Pre-trained Checkpoints

Training Instructions

Train the Mixing-Ratio Predictor

Train AD-FlowTSE

Evaluation

Credits

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
train.py		train.py
train_t_predicter.py		train_t_predicter.py

License

aleXiehta/AD-FlowTSE

Folders and files

Latest commit

History

Repository files navigation

AD-FlowTSE: Adaptive Discriminative Flow-Matching Target Speaker Extraction

Overview

Dataset Preparation

Pre-trained Checkpoints

Training Instructions

Train the Mixing-Ratio Predictor

Train AD-FlowTSE

Evaluation

Credits

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages