Lists (3)
Sort Name ascending (A-Z)
Starred repositories
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Official inference repo for FLUX.2 models
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The …
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
HyperGen - Optimized inference and fine-tuning framework for diffusion (image & video) models. Up to 3x faster & 80% less VRAM.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"
SQL Native Memory Layer for LLMs, AI Agents & Multi-Agent Systems
This repo aims to record resource of role-playing abilities in LLMs, including dataset, paper, application, etc.
A powerful tool that translates ComfyUI workflows into executable Python code.
Discomfort: Control ComfyUI with Python
[ACM TIST 2025] GenAI in Fashion: Overview, also includes 🔥latest papers, ⚙️metrics, 👀workshops, 🚀companies & products, ...)
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling