Stars
Official implementation of LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents
omo/lazycodex: The coding agent for tokenmaxxers;the one and only agent harness for complex codebases. For your Codex, for your OpenCode
PaperBanana: Automating Academic Illustration For AI Scientists
[NeurIPS 2025]Official repositories for "Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought".
MR-Pruner: Training-free Multi-resolution Visual Token Pruning for Multi-modal Large Language Models
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020
[CVPR 2025 Highlight] Official implementation of "Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity"
Scalable data pre processing and curation toolkit for LLMs
Fast Multimodal Semantic Deduplication & Filtering
ERGO (Efficient Reasoning & Guided Observation) is a large vision-language model trained with reinforcement learning on efficiency objectives. [ICLR'26]
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Tools for merging pretrained large language models.
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
[ACL 2024] Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
Official repository for "AM-RADIO: Reduce All Domains Into One"
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Pixel-Level Reasoning Model trained with RL [NeuIPS25]