- Beijing
Starred repositories
Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
Self hosted, real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory, tools, RAG, and optional digital-human video.
Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
High-quality speech synthesis with LoRA fine-tuning on index-tts, enhancing prosody and naturalness for single and multi-speaker voices.
Text Normalization & Inverse Text Normalization
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
Chinese text normalization for speech processing
High-Quality Voice Cloning TTS for 600+ Languages
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
自动直播录制、投稿、twitch、ytb频道搬运工具。命令行投稿(B站)和视频下载工具,提供多种登录方式,支持多p。
🚀🎬灵活、高效、可扩展,专属剪辑配音工具箱,释放创作潜力 . Flexible, efficient, and scalable toolbox for editing and dubbing, unleashing creative potential
[ECCV 2026] Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"