Stars
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Fast and Simple Face Swap Extension Node for ComfyUI (SFW)
Industry leading face manipulation platform
Powerful & Easy-to-Use Video Face Swapping and Editing Software
Helios: Real Real-Time Long Video Generation Model
FireRed-OpenStoryline is an AI video editing agent that transforms manual editing into intention-driven directing through natural language interaction, LLM-powered planning, and precise tool orches…
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, e…
Vercel's official collection of agent skills
MoCha: End-to-End Video Character Replacement without Structural Guidance
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Build, run, manage agentic software at scale.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
A PyTorch-native inference engine with hybrid cache acceleration and massive parallelism for DiTs.
MOVA: Towards Scalable and Synchronized Video–Audio Generation
SGLang is a high-performance serving framework for large language models and multimodal models.
CRT-Nodes is a collection of custom nodes for ComfyUI.
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and vi…
A general fine-tuning kit geared toward image/video/audio diffusion models.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.