Stars
Advancing AI by embracing human-likeness for better AI understanding, human–AI collaboration, and social simulation, bridging technology and genuine human experience.
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
Fine-tuning InternVL-3.5-1B with different PEFT strategies: LoRA/QLoRA/full-tuning on MVBench dataset
FlowGram is an extensible workflow development framework with built-in canvas, form, variable, and materials that helps developers build AI workflow platforms faster and simpler.
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
OpenAI Agents Tracing | Open-Source Tracing Dashboard including Costs & Usages
Conveniently control parts of text prompts with custom UI. Pack includes loaders from txt and csv files, dynamic text concatenation tool and easy-to-use input node
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Synthetic Faces High Quality - Text2Image (SFHQ-T2I) Dataset. 122,726 curated 1024x1024 synthetic face images
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
🔥 [ICCV 2025 Highlight] InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Q-Insight is open-sourced at https://github.com/bytedance/Q-Insight. This repository will not receive further updates.
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Collection of Aesthetics Assessment Papers for Graphic Designs.
comfyui_dagthomas - Advanced Prompt Generation and Image Analysis
Code release for "Semi-supervised learning made simple with self-supervised clustering"
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
Wrapper to use DynamiCrafter models in ComfyUI
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis
A ComfyUI node for driving videos using batches of images.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)