sslsv

sslsv is a PyTorch-based deep learning toolkit consisting of a collection of Self-Supervised Learning (SSL) frameworks for learning speaker representations, applicable to various speaker-related downstream tasks, notably Speaker Verification (SV).

Its main objectives are to: (1) provide implementations of state-of-the-art SSL frameworks by adapting algorithms from the computer vision domain; and (2) evaluate them within a consistent and comparable environment.

An overview of the general training and evaluation framework is provided in the figure below.

News

June 2025 – 👏 Release of results and checkpoints (v2.0).
June 2025 – 🔖 Support for Python 3.13 and PyTorch 2.7.
December 2024 – 🧪 Implementation of SimCLR MultiViews and MoCo Margins.
November 2024 – 💡 Implementation of Self-Supervised Positive Sampling (SSPS).
July 2024 – 🌱 Implementation of more losses for SimCLR Margins (SphereFace, CurricularFace, MagFace, AdaFace).
May 2024 – 📚 Documentation of the complete codebase.
April 2024 – 🛠️ Complete refactoring, including typing, tests, and coding style (v2.0).
January 2024 – 🚀 Implementation of the W-MSE framework.
July 2023 – ⚡ Support for PyTorch Distributed Data Parallel (DDP).
June 2023 – 🧠 Evaluation on language, emotion, age, and gender recognition tasks.
April 2023 – 📊 Additional benchmarks (SITW, VOiCES) and metrics (CLLR, ActDCF, AvgRPrec).
March 2023 – 📏 Support for cosine scoring normalizations and PLDA evaluations.
January 2023 – 🧪 Implementation of SimCLR Margins (CosFace and ArcFace).
December 2022 – 🚀 Implementation of SSL frameworks: LIM, CPC, SimCLR, MoCo, Barlow Twins, VICReg, VIbCReg, DeepCluster, SwAV, SimSiam, BYOL, and DINO.
June 2022 – 🌠 First release of sslsv (v1.0).

Features

General

Data:
- Supervised and Self-supervised datasets (siamese and DINO sampling)
- Audio augmentation (noise and reverberation)
Training:
- CPU, GPU and multi-GPUs (DataParallel and DistributedDataParallel)
- Checkpointing, resuming, early stopping and logging
- Tensorboard and wandb
Evaluation:
- Speaker verification
  - Backend: Cosine scoring and PLDA
  - Metrics: EER, MinDCF, ActDFC, CLLR, AvgRPrec
- Classification (emotion, language, ...)
Notebooks: DET curve, scores distribution, t-SNE on embeddings, ...
Misc: scalable config, typing, documentation and tests

Encoders

TDNN (sslsv.encoders.TDNN)
X-vectors: Robust dnn embeddings for speaker recognition [PDF]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur
Simple Audio CNN (sslsv.encoders.SimpleAudioCNN)
Representation Learning with Contrastive Predictive Coding [PDF]
Aaron van den Oord, Yazhe Li, Oriol Vinyals
ResNet-34 (sslsv.encoders.ResNet34)
VoxCeleb2: Deep Speaker Recognition [PDF]
Joon Son Chung, Arsha Nagrani, Andrew Zisserman
ECAPA-TDNN (sslsv.encoders.ECAPATDNN)
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [PDF]
Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck
S3PRL (sslsv.encoders.S3PRL)
Pre-trained speech foundation models (e.g., WavLM, HuBERT, wav2vec 2.0) can be used as encoders using the s3prl toolkit

Frameworks

LIM (sslsv.methods.LIM)
Learning Speaker Representations with Mutual Information [PDF]
Mirco Ravanelli, Yoshua Bengio
CPC (sslsv.methods.CPC)
Representation Learning with Contrastive Predictive Coding [PDF]
Aaron van den Oord, Yazhe Li, Oriol Vinyals
SimCLR (sslsv.methods.SimCLR)
A Simple Framework for Contrastive Learning of Visual Representations [PDF]
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton
MoCo v2+ (sslsv.methods.MoCo)
Improved Baselines with Momentum Contrastive Learning [PDF]
Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He
DeepCluster v2 (sslsv.methods.DeepCluster)
Deep Clustering for Unsupervised Learning of Visual Features [PDF]
Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
SwAV (sslsv.methods.SwAV)
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [PDF]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
W-MSE (sslsv.methods.WMSE)
Whitening for Self-Supervised Representation Learning [PDF]
Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe
Barlow Twins (sslsv.methods.BarlowTwins)
Barlow Twins: Self-Supervised Learning via Redundancy Reduction [PDF]
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stéphane Deny
VICReg (sslsv.methods.VICReg)
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning [PDF]
Adrien Bardes, Jean Ponce, Yann LeCun
VIbCReg (sslsv.methods.VIbCReg)
Computer Vision Self-supervised Learning Methods on Time Series [PDF]
Daesoo Lee, Erlend Aune
BYOL (sslsv.methods.BYOL)
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning [PDF]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko
SimSiam (sslsv.methods.SimSiam)
Exploring Simple Siamese Representation Learning [PDF]
Xinlei Chen, Kaiming He
DINO (sslsv.methods.DINO)
Emerging Properties in Self-Supervised Vision Transformers [PDF]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

Methods (contributions)

Combiner (sslsv.methods.Combiner)
Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning [PDF] [Ref]
Theo Lepage, Reda Dehak
Margins (sslsv.methods.SimCLRMargins, sslsv.methods.MoCoMargins)
Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations [PDF] [Ref]
Theo Lepage, Reda Dehak
SSPS (sslsv.methods._SSPS)
Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling [PDF] [Ref]
Theo Lepage, Reda Dehak

Requirements

sslsv runs on Python 3.13.3 with the following dependencies.

Module	Versions
torch	2.7.1
torchaudio	2.7.1
numpy	*
pandas	*
soundfile	*
scikit-learn	*
speechbrain	*
tensorboard	*
wandb	*
ruamel.yaml	*
dacite	*
prettyprinter	*
tqdm	*

Note: developers will also need pytest, pre-commit and twine to work on this project.

Datasets

Speaker recognition:

VoxCeleb1 (train and test)
VoxCeleb2 (train)
SITW (test)
VOiCES (test)

Language recognition:

VoxLingua107

Emotion recognition:

CREMA-D

Data-augmentation:

Data used for main experiments (conducted on VoxCeleb1 and VoxCeleb2 + data-augmentation) can be automatically downloaded, extracted and prepared using the following scripts.

python tools/prepare_data/prepare_voxceleb.py data/
python tools/prepare_data/prepare_augmentation.py data/

The resulting data folder shoud have the structure presented below.

data
├── musan_split/
├── simulated_rirs/
├── voxceleb1/
├── voxceleb2/
├── voxceleb1_test_O
├── voxceleb1_test_H
├── voxceleb1_test_E
├── voxsrc2021_val
├── voxceleb1_train.csv
└── voxceleb2_train.csv

Other datasets have to be manually downloaded and extracted but their train and trials files can be created using the corresponding scripts from the tools/prepare_data/ folder.

Example format of a train file (voxceleb1_train.csv)

File,Speaker
voxceleb1/id10001/1zcIwhmdeo4/00001.wav,id10001
...
voxceleb1/id11251/s4R4hvqrhFw/00009.wav,id11251

Example format of a trials file (voxceleb1_test_O)

1 voxceleb1/id10270/x6uYqmx31kE/00001.wav voxceleb1/id10270/8jEAjG6SegY/00008.wav
...
0 voxceleb1/id10309/0cYFdtyWVds/00005.wav voxceleb1/id10296/Y-qKARMSO7k/00001.wav

Installation

Clone this repository: git clone https://github.com/theolepage/sslsv.git.
Install dependencies: pip install -r requirements.txt.

Note: sslsv can also be installed as a standalone package via pip with pip install sslsv or with pip install . (in the project root folder) to get the latest version.

Usage

Start a training (2 GPUs): ./train_ddp.sh 2 <config_path>.
Evaluate your model (2 GPUs): ./evaluate_ddp.sh 2 <config_path>.

Note: use sslsv/bin/train.py and sslsv/bin/evaluate.py for non-distributed mode to run with a CPU, a single GPU or multiple GPUs (DataParallel).

Tensorboard

You can visualize your experiments with tensorboard --logdir models/your_model/.

wandb

Use wandb online and wandb offline to toggle wandb. To log your experiments you first need to provide your API key with wandb login API_KEY.

Documentation

Documentation is currently being developed...

Results

SSL frameworks

Configs: models/ssl/voxceleb2/
Train set: VoxCeleb2
Evaluation: VoxCeleb1-O (Original)
Encoder: Fast ResNet-34 and ECAPA-TDNN

Fast ResNet-34

Method	Model	EER (%)	minDCF (p=0.01)	Checkpoint
LIM	`lim/lim_loss-NCE_proj-2048-BN-R-2048-BN-R-512`	16.13	0.9015
CPC	`cpc/cpc_t-4_agg-GRU-1-256`	12.77	0.8033
SimCLR	`simclr/simclr_proj-none_t-0.03`	9.05	0.6364	🔗
MoCo	`moco/moco_proj-none_Q-32768_t-0.03_m-0.999`	8.49	0.5990	🔗
DeepCluster	`deepcluster/deepcluster_proj-2048-BN-R-2048-BN-R-512_K-3000-3000-3000_t-0.1`	15.16	0.8193
SwAV	`swav/swav_proj-2048-BN-R-2048-BN-R-512_K-6000_t-0.1`	11.82	0.7177	🔗
W-MSE	`wmse/wmse_proj-1024-BN-R-64_ws-128`	14.62	0.8506
Barlow Twins	`barlowtwins/barlowtwins_proj-2048-BN-R-2048-BN-R-512_lambda-0.005`	13.22	0.7658
VICReg	`vicreg/vicreg_proj-2048-BN-R-2048-BN-R-512_inv-1.0_var-1.0_cov-0.1`	11.33	0.6658	🔗
BYOL	`byol/byol_proj-2048-BN-R-2048-BN-R-512_pred-4096-BN-R-256_m-0.996-sched`	13.99	0.7509
SimSiam	`simsiam/simsiam_proj-2048-BN-R-2048-BN-R-512-BN_pred-512-BN-R-2048`	28.94	0.9984
DINO	`dino/dino_proj-2048-BN-G-2048-BN-G-256-L2-65536_G-2x4_L-4x2_t-0.04`	6.04	0.4526	🔗
Supervised	`supervised/supervised_loss-AAM_s-30_m-0.2`	2.95	0.3122	🔗

ECAPA-TDNN

Method	Model	EER (%)	minDCF (p=0.01)	Checkpoint
SimCLR	`simclr/simclr_enc-ECAPATDNN-1024_proj-none_t-0.03`	6.41	0.5160	🔗
MoCo	`moco/moco_enc-ECAPATDNN-1024_proj-none_Q-32768_t-0.03_m-0.999`	6.48	0.5372	🔗
SwAV	`swav/swav_enc-ECAPATDNN-1024_proj-2048-BN-R-2048-BN-R-512_K-6000_t-0.1`	8.12	0.6148	🔗
VICReg	`vicreg/vicreg_enc-ECAPATDNN-1024_proj-2048-BN-R-2048-BN-R-512_inv-1.0_var-1.0_cov-0.1`	7.42	0.5659	🔗
DINO	`dino/dino_enc-ECAPATDNN-1024_proj-2048-BN-G-2048-BN-G-256-L2-65536_G-2x4_L-4x2_t-0.04`	2.82	0.3463	🔗
Supervised	`supervised/supervised_enc-ECAPATDNN-1024_loss-AAM_s-30_m-0.2`	1.34	0.1521	🔗

SSPS

Configs: models/ssps/voxceleb2/
Train set: VoxCeleb2
Evaluation: VoxCeleb1-O (Original)
Encoder: ECAPA-TDNN

Method	Model	EER (%)	minDCF (p=0.01)	Checkpoint
SimCLR	`simclr_e-ecapa/ssps_kmeans_25k_uni-1`	2.57	0.3033	🔗
DINO	`dino_e-ecapa/ssps_kmeans_25k_uni-1`	2.53	0.2843	🔗

Acknowledgements

sslsv contains third-party components and code adapted from other open-source projects, including: voxceleb_trainer, voxceleb_unsupervised and solo-learn.

Citations

If you use sslsv, please consider starring this repository on GitHub and citing one of the following papers.

@Article{lepage2025SLSRReview,
  title   = {Self-Supervised Learning for Speaker Recognition: A study and review},
  author  = {Lepage, Theo and Dehak, Reda},
  year    = {2026},
  journal = {Speech Communication},
  volume  = {176},
  pages   = {103333},
  doi     = {10.1016/j.specom.2025.103333},
  url     = {https://arxiv.org/pdf/2602.10829}
}

@InProceedings{lepage2025SSPS,
  title     = {SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification},
  author    = {Lepage, Theo and Dehak, Reda},
  year      = {2025},
  booktitle = {Interspeech 2025},
  pages     = {1098--1102},
  doi       = {10.21437/Interspeech.2025-183},
  url       = {https://www.isca-archive.org/interspeech_2025/lepage25_interspeech.pdf}
}

@Article{lepage2025BootstrappedPositiveSampling,
  title     = {Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling},
  author    = {Lepage, Theo and Dehak, Reda},
  year      = {2025},
  journal   = {IEEE Transactions on Audio, Speech and Language Processing},
  volume    = {33},
  pages     = {2932--2945},
  doi       = {10.1109/TASLPRO.2025.3587462},
  url       = {https://arxiv.org/pdf/2501.17772}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
models		models
notebooks		notebooks
sslsv		sslsv
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
README.md		README.md
create_pl_ddp_jz.sh		create_pl_ddp_jz.sh
evaluate_ddp.sh		evaluate_ddp.sh
evaluate_ddp_jz.sh		evaluate_ddp_jz.sh
experiments_foundation_jz.sh		experiments_foundation_jz.sh
experiments_margins_jz.sh		experiments_margins_jz.sh
experiments_sslsv_jz.sh		experiments_sslsv_jz.sh
experiments_ssps_jz.sh		experiments_ssps_jz.sh
framework.svg		framework.svg
logo.png		logo.png
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements_py-3.13.3_torch-2.7.1.txt		requirements_py-3.13.3_torch-2.7.1.txt
requirements_py-3.8.8_torch-1.11.0.txt		requirements_py-3.8.8_torch-1.11.0.txt
train_ddp.sh		train_ddp.sh
train_ddp_jz.sh		train_ddp_jz.sh
train_ddp_jz_a100.sh		train_ddp_jz_a100.sh
train_ddp_ssps_jz_x2.sh		train_ddp_ssps_jz_x2.sh
train_ddp_ssps_jz_x2_exp.sh		train_ddp_ssps_jz_x2_exp.sh
train_ddp_ssps_jz_x4.sh		train_ddp_ssps_jz_x4.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sslsv

News

Features

Requirements

Datasets

Installation

Usage

Tensorboard

wandb

Documentation

Results

SSL frameworks

Fast ResNet-34

ECAPA-TDNN

SSPS

Acknowledgements

Citations

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sslsv

News

Features

Requirements

Datasets

Installation

Usage

Tensorboard

wandb

Documentation

Results

SSL frameworks

Fast ResNet-34

ECAPA-TDNN

SSPS

Acknowledgements

Citations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages