Lists (24)
Sort Name ascending (A-Z)
1️⃣ Unified Model
🧰 Agent
🦁 Backbone
🦅 Dataset Distillation
🌲 Eval
💎 generation
😪 Hallucination
🛥️ infra
🐰 KD
💭 latent
☕ LoRA
🔢 math
🌟 MLLM
🍄 MTP
🐤 Open-Vocabulary Detection
⭐ Open-Vocabulary Segmentation
👁️🗨️ Post-Training
🚀 Pretraining
🍦 RAG
🤔 Resoning LLM
🍰 SAM
✊ SAM+CLIP
✈️ Segmentation
🚘 V2A
Stars
UniRL is a Framework for Unified Multimodal Model Reinforcement Learning
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
【三年面试五年模拟】AIGC/LLM/AI Agent算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI Agent、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
Trainable fast and memory-efficient sparse attention
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]
MTRefSeg: An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation
Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.
Efficient Universal Perception Encoder: a single on-device vision encoder with versatile representations that match or exceed specialized experts across multiple task domains.
Awesome Unified Multimodal Models
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
PyTorch denoising diffusion demo
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Janus-Series: Unified Multimodal Understanding and Generation Models
Official repo for "Let ViT Speak: Generative Language-Image Pre-training"
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
Vero: An Open RL Recipe for General Visual Reasoning