Stars
Implementation of Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
Taste-Skill - gives your AI good taste. stops the AI from generating boring, generic slop
A modular framework for few-shot visual segmentation using visual prompting techniques. Enables easy experimentation with different algorithms, backbones (SAM, MobileSAM, DinoV2), and pipeline com…
🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sa…
FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios
Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
Official Pytorch implementation of AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
Helios: Real Real-Time Long Video Generation Model
SoulX-FlashHead: A unified 1.3B-parameter framework designed for high-fidelity, infinite-length, and real-time streaming portrait video generation.
DeepTutor: Agent-native Personalized Tutoring. https://deeptutor.info/.
The API to search, scrape, and interact with the web at scale. 🔥
[CVPR 2026] Official Pytorch implementation of Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
An open-source AI Voice Agent that integrates with Asterisk/FreePBX using Audiosocket/RTP technology
The definitive resource for Agent Skills - modular capabilities revolutionizing AI agent architecture
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
[ECCV 2026] Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
AnyTalker: Scaling Multi-person Talking Video Generation with Interactivity Refinement
Real-Time VLAs via Future-state-aware Asynchronous Inference.