Lists (3)
Sort Name ascending (A-Z)
Stars
Fully Open Framework for Democratized Multimodal Training
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Official implementation for "TRIO: Token Reduction via Inference-Objective Guidance for Efficient Vision-Language Models" https://arxiv.org/pdf/2602.04657
omo/lazycodex: The coding agent for tokenmaxxers;the one and only agent harness for complex codebases. For your Codex, for your OpenCode
A simple yet powerful agent framework that delivers with open-source models
ERGO (Efficient Reasoning & Guided Observation) is a large vision-language model trained with reinforcement learning on efficiency objectives. [ICLR'26]
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Solve Visual Understanding with Reinforced VLMs
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Witness the aha moment of VLM with less than $3.
A high-throughput and memory-efficient inference and serving engine for LLMs
A paper list of some recent works about Token Compress for Vit and VLM
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
A 28× Compressed Wav2Lip for Efficient Talking Face Generation [ICCV'23 Demo] [MLSys'23 Workshop] [NVIDIA GTC'23]
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
The official NetsPresso Python package.
A library for training, compressing and deploying computer vision models (including ViT) with edge devices
Repository for 2023 AI City Challenge (Track1: Multi-Camera People Tracking)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Polynomial Learning Rate Decay Scheduler for PyTorch