Lists (1)
Sort Name ascending (A-Z)
Stars
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
The first high school physics Olympiad benchmark for evaluating (M)LLMs with step-level grading and human-level comparison.
Random network distillation on Montezuma's Revenge and Super Mario Bros.
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
The Next Step Forward in Multimodal LLM Alignment
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
✨✨Latest Advances on Multimodal Large Language Models
A Robust and Versatile Monocular Visual-Inertial State Estimator
Ready-to-use code and tutorial notebooks to boost your way into few-shot learning for image classification.
[TPAMI 2023] LibFewShot: A Comprehensive Library for Few-shot Learning.