Lists (12)
Sort Name ascending (A-Z)
Stars
Miso TTS is an 8 billion, highly emotive text-to-speech model
💫 Toolkit to help you get started with Spec-Driven Development
将博导十年科研经验炼化为可直接调用的 AI 技能。从 Idea 构思到论文投稿,你的 AI 科研副导师。
A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
337 Claude Code skills & agent skills & plugins (30+ Agents, 70+ custom commands, 330+ skills, customizable references, scripts)for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents …
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion archit…
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Reimplementation of CC-G2PnP: Streaming Conformer-CTC based Japanese Grapheme-to-Phoneme and Prosody model (arXiv:2602.17157)
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
Translate the video from one language to another and embed dubbing & subtitles.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
AI video translation & dubbing tool for humans and AI Agents, powered by LLMs. Full pipeline: download, transcribe, translate, TTS dub, reformat, cover generation. 100+ languages, optimized for You…
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
A TTS that fits in your CPU (and pocket)
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
PyTorch Implementation of TCSinger 2(ACL 2025): Customizable Multilingual Zero-shot Singing Voice Synthesis
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…
Real-time voice assistant — WebRTC streaming, faster-whisper ASR, local LLM, Vui Nano (300M) TTS. OpenAI Realtime API compatible. Voice cloning, barge-in, ~9× realtime on a 4090. Apache 2.0.