Skip to content

Fanziyang-v/Awesome-MLLMs-Acceleration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Awesome-MLLMs-Acceleration

This is a list of some awesome works on accelerating in Multimodal Large Language Models(MLLMs).

📊Benchmarks

  1. MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models (Jun. 23, 2023) arxivgithubalias
  2. MMBench: Is Your Multi-modal Model an All-around Player? (Jul. 12, 2023, ECCV 2024)arxivgithubtagalias
  3. Evaluating Object Hallucination in Large Vision-Language Models (May. 17, 2023, EMNLP 2023) arxivgithubalias
  4. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering (Sep. 20, 2022, NeurIPS 2022)arxivgithubalias
  5. Towards VQA Models That Can Read (April. 18, 2019, CVPR 2019)arxivgithubalias
  6. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering (Feb. 25, 2019, CVPR 2019)arxivtagalias
  7. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (Dec. 2, 2016, CVPR 2017)arxivalias

👏MLLMs Acceleration

  1. Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models (Mar. 20, 2025, CVPR 2025)arxivgithubalias
  2. EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models (Mar. 19, 2025, CVPR 2025)arxivalias
  3. Adaptive Keyframe Sampling for Long Video Understanding (Feb. 22, 2025, CVPR 2025)arxivgithubalias
  4. LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (Jan. 7, 2025, ICLR 2025)arxivgithubalias
  5. FastVLM: Efficient Vision Encoding for Vision Language Models (Dec. 17, 2024, CVPR 2025)arxivalias
  6. VisionZip: Longer is Better but Not Necessary in Vision Language Models (Dec. 5, 2024, CVPR 2025)arxivgithubalias
  7. FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression (Dec. 5, 2024, CVPR 2025)arxivgithubalias
  8. CLS Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster (Dec. 2, 2024) arxivgithubalias
  9. DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models (Nov. 22, 2024, CVPR 2025)arxivgithubalias
  10. SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference (Oct. 6, 2024) arxivgithubalias
  11. Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding (Sep. 22, 2024, CVPR 2025)arxivgithub![alias](https://img.shields.io/badge/Video XL-black)
  12. TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings (Sep. 15, 2024, AAAI 2025)arxivgithubalias
  13. VoCo-LLaMA: Towards Vision Compression with Large Language Models (Jun. 18, 2024, CVPR 2025)arxivgithubalias
  14. Matryoshka Query Transformer for Large Vision-Language Models (May. 29, 2024, NeurIPS 2024)arxivgithubalias
  15. LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models (Mar. 22, 2024)arxivgithubalias
  16. An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models (Mar. 11, 2024, ECCV 2024)arxivgithubtagalias
  17. LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (Nov. 28, 2023, ECCV 2024)arxivgithubalias
  18. Efficient Streaming Language Models with Attention Sinks (Sep. 29, 2023, ICLR 2024) arxivgithubalias
  19. Token Merging: Your ViT But Faster (Oct. 17, 2022, ICLR 2023)arxivgithubtagalias
  20. Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations (Feb. 16, 2022, ICLR 2022)arxivgithubtagalias

About

Paper list of awesome works in accelerating Multimodal Large Language Models(MLLMs).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published