Stars
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
[CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation modes. The dataset includes extensive contextual desc…
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
[NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
This is an official repo for fine-tuning SAM to customized medical images.
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
[ICLR 2025] SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
verl: Volcano Engine Reinforcement Learning for LLMs
The world's simplest facial recognition api for Python and the command line
[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.