Starred repositories
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
Towards Scalable Pre-training of Visual Tokenizers for Generation
Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Advanced Signal Processing Notebooks and Tutorials
Suggestions for those interested in developing audio applications of machine learning
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
A simple yet effective Audio-to-Midi Automatic Piano Transcription system
UniSpeech - Large Scale Self-Supervised Learning for Speech
Variational Autoencoder in the mel-spectrogram domain for one-shot audio synthesis
Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"
This is the official implementation for the paper "Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training".
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Official repository for “DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation”