Skip to content
View cacard's full-sized avatar
👽
I may be slow to respond.
👽
I may be slow to respond.

Block or report cacard

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 21 6 Updated Jun 17, 2025

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Python 198 19 Updated Jun 17, 2026

Generate audiobooks from e-books

Python 7,656 654 Updated Feb 27, 2026

SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.

Python 1,349 127 Updated May 21, 2026

Self hosted, real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory, tools, RAG, and optional digital-human video.

Python 1,230 172 Updated Jun 17, 2026

Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Python 1,121 106 Updated Jun 15, 2026

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

Java 25,304 2,391 Updated Jun 18, 2026

High-quality speech synthesis with LoRA fine-tuning on index-tts, enhancing prosody and naturalness for single and multi-speaker voices.

Python 306 25 Updated Mar 12, 2026

Text Normalization & Inverse Text Normalization

Python 784 111 Updated Jun 15, 2026

MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…

Python 3,516 450 Updated Jun 2, 2026

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 30,642 3,456 Updated Jun 10, 2026
Python 432 34 Updated Mar 25, 2026

Chinese text normalization for speech processing

Python 732 151 Updated Mar 18, 2023

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,576 1,187 Updated Jun 11, 2026
Python 781 71 Updated Jun 1, 2026

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

Python 1,248 188 Updated May 18, 2026

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 990 140 Updated Dec 2, 2025

Spark-TTS Inference Code

Python 10,991 1,164 Updated Apr 9, 2025

F5-TTS 推理加速,速度提升约4倍!

Python 124 17 Updated Jan 6, 2025

TTS with kokoro and onnx runtime

Python 2,586 280 Updated Jan 30, 2026

State-of-the-art TTS model under 25MB 😻

Python 14,134 773 Updated Jun 11, 2026

Open Source Speech Language Model

Jupyter Notebook 995 107 Updated May 11, 2026

自动直播录制、投稿、twitch、ytb频道搬运工具。命令行投稿(B站)和视频下载工具,提供多种登录方式,支持多p。

Rust 5,219 635 Updated Jun 11, 2026

PersonaPlex code.

Python 10,052 1,400 Updated Mar 2, 2026

🚀🎬灵活、高效、可扩展,专属剪辑配音工具箱,释放创作潜力 . Flexible, efficient, and scalable toolbox for editing and dubbing, unleashing creative potential

Python 184 23 Updated Jun 18, 2026

LongLive 2.0: Infra - Long Video Gen

Python 2,357 214 Updated Jun 13, 2026

[ECCV 2026] Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"

Python 2,170 243 Updated Jun 18, 2026
Next