Lists (3)
Sort Name ascending (A-Z)
Stars
A simple yet powerful agent framework that delivers with open-source models
ERGO (Efficient Reasoning & Guided Observation) is a large vision–language model trained with reinforcement learning on efficiency objectives.
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Solve Visual Understanding with Reinforced VLMs
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Witness the aha moment of VLM with less than $3.
A high-throughput and memory-efficient inference and serving engine for LLMs
A paper list of some recent works about Token Compress for Vit and VLM
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
A 28× Compressed Wav2Lip for Efficient Talking Face Generation [ICCV'23 Demo] [MLSys'23 Workshop] [NVIDIA GTC'23]
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
The official NetsPresso Python package.
A library for training, compressing and deploying computer vision models (including ViT) with edge devices
Repository for 2023 AI City Challenge (Track1: Multi-Camera People Tracking)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Polynomial Learning Rate Decay Scheduler for PyTorch
An easy to use PyTorch to TensorRT converter
Conversion of PyTorch Models into TFLite