-
Tsinghua University (2019-2022), WeNet Community (2021-now)
- Beijing, China
-
19:53
(UTC +08:00) - xingchensong.github.io
- https://blog.csdn.net/zongza
- https://scholar.google.com/citations?user=65eIdn4AAAAJ&hl=zh-CN
Highlights
- Pro
Lists (22)
Sort Name ascending (A-Z)
agent
AIsys
app
full-stack Appsbenchmark
cpp
crawler
data is all we needcuda
CV
datapipe
dataset
open-sourced data collectionsjust for fun
long context
music
notes
happy study happy lifeomni
paperlist
quant
rl
speech
tokenizer
Training & Inference
TTS
Stars
Audio-Oscar is a multi-agent framework for generating long-form, controllable audio from complex audio scene descriptions.
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
Unofficial fairseq-free PyTorch implementation of UTMOS (v1, 2022), matching the original system.
🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sa…
MOSS-Music is an open-source music understanding model for targeting musical captioning, lyrics ASR, structural analysis, chord / key / tempo reasoning, and long-form musical question answering.
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Generative models for conditional audio generation
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.
A batch scoring tool for speaker similarity evaluation.
Public release of the Sound Effect Foundation model by Sony AI.
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)
OmX - Oh My codeX: Your codex is not alone. Add hooks, agent teams, HUDs, and so much more.
From Early Internet Design Patterns to AI Agent Implementation — A Deep Dive into Claude Code for Developers
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
Find slow PyTorch training bottlenecks: DataLoader stalls, low GPU utilization, rank stragglers, memory creep, and run regressions.
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
Pure Rust + CUDA LLM inference engine