Starred repositories
Training, validation, and inference code for various SSL approaches and architectures.
mimbres / audio-dataset
Forked from LAION-AI/audio-datasetAudio Dataset for training CLAP and other models
mimbres / MidiTok
Forked from Natooz/MidiTokMIDI / symbolic music tokenizers for Deep Learning models 🎶
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
A library for efficient similarity search and clustering of dense vectors.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models [NAACL 2025]
LLaQo, a Large Language Query-based Coach in the domain of expressive performance
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input fr…
A benchmark for evaluating audio encoders on various audio tasks.
Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (Vi…
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
JEPAs for audio representation learning
EVAR ~ Evaluation package for Audio Representations
PyTorch code and models for V-JEPA self-supervised learning from video.
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
kaldi-asr/kaldi is the official location of the Kaldi project.
🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.