-
Shanghai Jiao Tong University & Shanghai Innovation Institute
- Shanghai
-
08:27
(UTC +08:00) - https://zhikangniu.github.io/
Lists (28)
Sort Name ascending (A-Z)
ASR
Awesome List
Bench
Chinese LLM
Codec
CV
Dataset/Tools/Course
Diffusion
emotion
Framework
front
LLM
Music Generation
nano
nlp
other
pipeline
Podcast
PyTorch
RLHF
s2st
speaker diarization
T2V
TTS
tutorial
unify
V2A
Vocoder
Stars
High-Quality Voice Cloning TTS for 600+ Languages
💥 Blazing fast terminal file manager written in Rust, based on async I/O.
Parameter-efficient text-to-audio generation for edge and low-memory deployment.
Vibecoding 系列教程:从环境搭建到多智能体协作,涵盖 MCP、Skills、Agent 分工治理
[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny 300M model!
Podcast Downloader - Download all podcasts / episodes from an RSS-feed
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA …
The open agent skills tool - npx skills
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
Collection of common code that's shared among different research projects in FAIR computer vision team.
Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
HeartMuLa Official Repo: The Most Powerful Open-Source Music Generation Model of 2026
State-of-the-art continious audio tokenization
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Music Language Model Generation, Optimization, and Practice
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
An agentic skills framework & software development methodology that works.
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Sel…
Text to speech alignment using CTC forced alignment
A general purpose scientific writer
Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".
Elevate your AI research writing, no more tedious polishing ✨