Starred repositories
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
[SIGGRAPH Asia 2024 (Journal Track)] StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal
OmniGen2: Exploration to Advanced Multimodal Generation.
This is the dataset and code release of the OpenRooms Dataset. For more information, please refer to our webpage below. Thanks a lot for your interest in our research!
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
适用于中山大学(SYSU)课程/实验报告的一个简单的 LaTeX 小模板
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
[ICCV'25] Official PyTorch Implementation of "JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers"
A framework for few-shot evaluation of language models.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
verl: Volcano Engine Reinforcement Learning for LLMs
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Code and data for Shading Annotations in the Wild
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)
A lightweight implementation of the Qwen-Image-Edit model for inference and LoRA fine-tuning on 8×V100 GPUs
A Survey of Reinforcement Learning for Large Reasoning Models
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering (nvidia)
[CVPR2025] Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
STGAN-based framework for material appearance editing.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Official Repository for Ouroboros - ICCV 2025
Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
The collection of awesome papers on alignment of diffusion models.
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)