Lists (13)
Sort Name ascending (A-Z)
Stars
Official SeedVR2 Video Upscaler for ComfyUI
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
使用IndexTTS模型在ComfyUI中实现高质量文本到语音转换的自定义节点。支持中文和英文文本,可以基于参考音频复刻声音特征。
Text-to-audio and video-to-audio using Sony AI's Woosh foundation model.
The ultimate training toolkit for finetuning diffusion models
ComfyUI custom nodes for Fish Audio S2-Pro TTS — voice clone, multi-speaker, and text-to-speech
The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations (CVPR 2026 Findings)
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Easily train a good VC model with voice data <= 10 mins!
Simultaneous speech-to-text models
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers
BillionMail gives you open-source MailServer, NewsLetter, Email Marketing — fully self-hosted, dev-friendly, and free from monthly fees. Join the discord: https://discord.gg/asfXzBUhZr
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a re…
[CVPR2026 🎉] Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
A pipeline parallel training script for diffusion models.
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
Text-audio foundation model from Boson AI
🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.