Stars
The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).
ClawPhD is an agent for research that can turn academic papers into publication-ready diagrams, posters, videos, and more.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding
Q-Insight Family: Q-Insight, VQ-Insight and RALI (NeurIPS 2025 Spotlight, AAAI 2026 Oral, and ICLR 2026 Oral)
SigLIP-based Aesthetic Score Predictor
[CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official PyTorch implementation of "Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization" (ECCV 2024)
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
An open-source AI agent that brings the power of Gemini directly into your terminal.
The Universe of Evaluation. All about the evaluation for LLMs.
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Official inference repo for FLUX.1 models
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Index of URLs to pdf files all over the internet and scripts
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A high-performance inference system for large language models, designed for production environments.