Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
ConceptAttention: A method for interpreting multi-modal diffusion transformers.
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Simultaneous speech-to-text model
Official implementation of YingMusic-SVC.
The ArtificialSongGenerator automatically composes and compiles the Artifical Audio Multitrack dataset (AAM).
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Multilingual Voice Understanding Model
Official repository for the paper: Scaling Self-Supervised Representation Learning for Symbolic Piano Performance (ISMIR 2025)
Open Source code for our paper, Steering Autoregressive Music Generation with Recursive Feature Machines (Zhao et al., 2025). aka MusicRFM
HuwCheston / Panako-SampleID
Forked from JorenSix/PanakoWrapper around Panako for Spotify Sample ID internship work
An automatic sample identification (ASID) system using a contrastively trained GNN encoder.
Implementation of the experiments for "Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Latent Chord Labels and Features"
Chordify Annotator Subjectivity Dataset - A chord-Label harmony dataset with multiple reference annotations per song
Official Implementation of paper "BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music"
Companion resources for the paper 'Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music'
Deep learning based dependency parsing for music sequences
"Joint Transcription of Acoustic Guitar Strumming Directions and Chords" - ISMIR2025