Lists (6)
Sort Name ascending (A-Z)
Stars
FreeLighting: A Next-generation Image Relighting Model with Background Replica from Any Perspective Angle
[CVPR 2026 Oral] Official implementation for ChordEdit: One-Step Low-Energy Transport for Image Editing
AdaRefSR is a novel reference-based one-step diffusion super-resolution framework. Paper was accepted by ICLR2026.
official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
[CVPR26 Oral] MagicBokeh is the first unified method specifically designed for high-zoom bokeh rendering.
The best-benchmarked open-source AI memory system. And it's free.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Try X-Dub to sync any character in a video with any audio you like | Official repository for "From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping"
GPT-Image-2 API and Prompts
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR pipeline that includes data processing, model training, inf…
High-Quality Voice Cloning TTS for 600+ Languages
A service to convert audio to facial blendshapes for lipsyncing and facial performances.
[CVPR 2026] FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning
FireRed-OpenStoryline is an AI video editing agent that transforms manual editing into intention-driven directing through natural language interaction, LLM-powered planning, and precise tool orches…
Caxson / CosyVoice
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…
Catalan TTS fine-tune of ZipVoice. Includes model weights and data preparation scripts.
[NeurIPS 2024] Generalizable and Animatable Gaussian Head Avatar
ICLR 2025 paper X-NeMo & Project X-Portrati2
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
ARTalk generates realistic 3D head motions (lip sync, blinking, expressions, head poses) from audio in ⚡ real-time ⚡.