Release Notes

v1.0: Upgraded the SED decoder to the DPC (Dense Prediction Cell) architecture.
v1.1: Enhanced the training objective and PIT (Permutation Invariant Training) logic.
- Transitioned from reference-channel-based PIT to PIT using Negative SA-SDR Loss across all channels.

DeepASA: An object-oriented multi-purpose network for auditory scene analysis

Official implementation of "DeepASA: An object-oriented multi-purpose network for auditory scene analysis (NeurIPS 2025)".

We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation (DoAE) within a unified framework.

1. Setup

Clone repository

git clone https://github.com/donghoney0416/DeepASA.git
cd DeepASA

Install requirements

pip install -r requirements_.txt

We use FlashAttention and Mamba, so CUDA version must be > 11.6. If your CUDA version is less than 11.6, we recommand updating your NVIDIA driver first.

2. Details

Dataset

We constructed a new dataset, Auditory Scene Analysis V2 (ASA2) dataset for multichannel USS and polyphonic audio classification tasks. The proposed dataset is designed to reflect various conditions, including moving sources with temporal onsets and offsets. For foreground sound sources, signals from 13 audio classes were selected from open-source databases (Pixabay, FSD50K, Librispeech, MUSDB18, Vocalsound). Specific information and how to download the dataset can be found at the hugging face link below.

ASA2 dataset link

Training

You can train the DeepASA using the below command

python SharedTrainer.py fit --config=configs/DeepASA.yaml --configs/dataset/AuditorySceneAnalysis.yaml --data.batch_size=[2,2] --trainer.devices=[0,1,2,3] --trainer.max_epochs=100

Pretrained ATST

Inference

You can evaluate the model you trained by appropriately modifying the code below

python SharedTrainer.py test --config=configs/logs/DeepASA/version_0/config.yaml --checkpoints=configs/logs/DeepASA/version_0/checkpoints/last.ckpt --data.batch_size=[2,2] --trainer.devices=[0,1,2,3]

Citations

@article{lee2025deepasa,
  title={DeepASA: An object-oriented multi-purpose network for auditory scene analysis},
  author={Lee, Dongheon, Kwon Younghoo and Choi, Jung-Woo},
  journal={in Proc. Conference on Neural Information Processing Systems},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
DeepASA		DeepASA
figure		figure
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Release Notes

DeepASA: An object-oriented multi-purpose network for auditory scene analysis

1. Setup

2. Details

Dataset

Training

Inference

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Release Notes

DeepASA: An object-oriented multi-purpose network for auditory scene analysis

1. Setup

2. Details

Dataset

Training

Inference

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages