Lists (2)
Sort Name ascending (A-Z)
Starred repositories
Official Implementation of ReCo: Region-Constraint In-Context Generation for Instructional Video Editing
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6$\times$ acceleration in inference speed.
LiveKit Client SDK for ESP32 series chips. Easily enable real-time audio, video, and data for embedded projects.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Can we build an addordable open source health ring ?
Your CrewAI Powered Video Editing Assistant
EMNLP 2025 - "Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs", Official Implementation
PersonaLive! : Expressive Portrait Image Animation for Live Streaming
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using autoregressive diffusion.
Paper Debugger is the best overleaf companion
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints.
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Effortless monitoring and analytics for API frameworks.
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.