Highlights
- Pro
Stars
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"
A simple screen parsing tool towards pure vision based GUI agent
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
[ECCV 2024] Tokenize Anything via Prompting
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
MAGI-1: Autoregressive Video Generation at Scale
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personali…
verl: Volcano Engine Reinforcement Learning for LLMs