Lists (27)
Sort Name ascending (A-Z)
algro
am
Annotation
BIG
books
cv
data_process
dataset
diffusion_models
expressive_tts
frontend
fun
Go
mos-predict
multilingual
nlp
others
separate
sing
star
TODO
toy
tts_data_process
tts_framework
ttsing
vae
vocoder
Starred repositories
VITA-QINYU: Expressive Spoken Language Model for Role-Playing and Singing
High-Quality Voice Cloning TTS for 600+ Languages
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic
Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click
🔬 Harness Vibe Research with Self-evolving AI Scientists
AI agents running research on single-GPU nanochat training automatically
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
【LLMs九层妖塔】分享 LLMs在自然语言处理(ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等)、信息检索(langchain)、语言合成、语言识别、多模态等领域(Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等)等 实战与经验。
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[ICASSP 2026]Official code for "Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum"
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Tools for merging pretrained large language models.
Implementing DeepSeek R1's GRPO algorithm from scratch
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
Soprano: Instant, Ultra-Realistic Text-to-Speech
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
SkyRL: A Modular Full-stack RL Library for LLMs
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Fast audio super resolution from 16khz to 48khz.