Stars
🔥[AAAI 2026, Official Code] First work of Aesthetics Assessment of Image Color Temperature. 首篇针对色温美学评估的工作
🔥[AAAI 2026, Official Code] Regression Over Classification: Assessing Image Aesthetics via Multimodal Large Language Models. 克服大模型在美学评估过程中对分数不敏感的问题
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
GoatWu / Self-Forcing-Plus
Forked from guandeh17/Self-ForcingUnofficial extension implementation of Self-Forcing to support I2V && 14B training.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Rectified Flow Inversion (RF-Inversion) - ICLR 2025
[CVPR2025] RORem: Training a Robust Object Remover with Human-in-the-Loop
A simple tool to make SVG paths more smooth. Customizable tolerance and download the result.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
Chrome extension to download images with one click, saving time on image dataset creation.
ComfyUI : 163 nodes : Display, manipulate, and edit text, images, videos, loras and more. Manage looping operations, generate randomized content, use logical conditions and work with external AI to…
All my self trained & released AI upscaling models. After gathering and applying over 600 different upscaling models, I learned how to train my own models, and these are the results.
FlashMLA: Efficient Multi-head Latent Attention Kernels
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
Official implementation of the paper "Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance" (AAAI 2025 Oral)
Minimal reproduction of DeepSeek R1-Zero
Official repository of In-Context LoRA for Diffusion Transformers
This is a study aim to transfer the single concept by using DIT model self-attention capablity
Unofficial custom_node for AnyText v1.1: https://github.com/tyxsspa/AnyText and AnyText v2.0: https://github.com/tyxsspa/AnyText2 and Glyph-ByT5: https://github.com/AIGText/Glyph-ByT5 (Test failed …
An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation