A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting…

483 34 Updated Sep 28, 2022

CalvinXKY / InfraTech

分享AI Infra知识&代码练习：PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 2,636 235 Updated May 30, 2026

WKQ9411 / Mini-LLM

This project aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters.

Python 276 29 Updated Jun 14, 2026

Norman-bury / research-writing-skill

科研写作助手 (Research Writing Assistant)

Python 2,420 163 Updated Jun 10, 2026

OpenMOSS / MOSS-Audio

MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.

Python 574 41 Updated Jun 2, 2026

alibaba / ROCK

A construction kit for reinforcement learning environment management.

Python 456 67 Updated Jun 18, 2026

ultraworkers / claw-code

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 194,024 109,950 Updated Jun 8, 2026

agentica-project / rllm

Jupyter Notebook 393 32 Updated Sep 17, 2025

vegaction / nanorllm

mini project for nanorllm

Python 59 7 Updated Mar 31, 2026

nex-agi / NexGAP

Nex General Agentic Data Pipeline, an end-to-end pipeline for generating high-quality agentic training data.

Python 37 3 Updated Nov 19, 2025

datawhalechina / musiclm-universe

Music Language Model Generation, Optimization, and Practice

Jupyter Notebook 61 11 Updated Apr 20, 2026

aivolcano / CiteScan

Scan the Hallucination Citation of Academic papers. Convert second-hand citation to official version

Python 232 36 Updated Apr 1, 2026

OpenMOSS / MOVA

MOVA: Towards Scalable and Synchronized Video–Audio Generation

Python 1,046 87 Updated Jun 18, 2026

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 673 80 Updated May 21, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,938 297 Updated Jan 30, 2026

kyutai-labs / tts_longeval

Python 30 1 Updated Apr 29, 2026

EmbodiedForge / Inspire-cli

A tool for better use of Inspire platform (Beta: Codeberg version is more up-to-date)

Python 26 5 Updated Apr 2, 2026

Cr-Fish / WESR

Official implementation of ACL'26 (findings) paper WESR (Word-level Event-Speech Recognition): A comprehensive benchmark and baseline for detecting and localizing non-verbal vocal events in speech.

Python 30 1 Updated Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chen Yang Benioh

Achievements

Achievements

Block or report Benioh

Stars

ddlBoJack / MMAE

XD111ds / sii-beamer-theme

lmcggg / graduate-thesis-polish-and-write-skill

THUDM / slime

earendil-works / pi

moonlarry / awesome-llm-paper-wiki

OpenMOSS / Awesome-WAM

Benioh / Plume

BBuf / AI-Infra-Auto-Driven-SKILLS

wuuucy / speech-to-prompt

guan-yuan / Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion