Stars
GPT Image 2 prompt gallery, image prompt library, agentic skill, and CLI for OpenAI image generation/editing
Universal skills loader for AI coding agents - npm i -g openskills
Elevate your AI research writing, no more tedious polishing ✨
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
one summary of diffusion-based image processing, including restoration, enhancement, coding, quality assessment
Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
A Holistic and Scalable Harmonization Method for PlanetScope Constellation Imagery Leveraging a Graph-based Greedy Optimization Strategy
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Witness the aha moment of VLM with less than $3.
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Awesome Reasoning LLM Tutorial/Survey/Guide
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
Official code for "SRFormer: Permuted Self-Attention for Single Image Super-Resolution" (ICCV 2023) and SRFormerV2
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[ICLR'25] Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[ICML2025] VARSR: Visual Autogressive Modeling for Image Super Resolution
Collections of Papers and Projects for Multimodal Reasoning.