Lists (3)
Sort Name ascending (A-Z)
Starred repositories
(NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"
Official implementation of FullMatch (CVPR2023)
This is the code for Knowledge-Guided Adversarial Training(KGAT)
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models
A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding
Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Official repository for VisionZip (CVPR 2025)
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
A paper list of some recent works about Token Compress for Vit and VLM
The implement of paper "Asymmetric Contextual Modulation for Infrared Small Target Detection" in Pytorch
Pytorch implementation of "EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation"
行业内领先的报告集合 行业 员工 金融 个税 福利薪酬 领导力 财富 会议 报告&工具
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
AIGCPanel 是一个简单易用的一站式AI数字人系统,支持视频合成、声音合成、声音克隆,简化本地模型管理、一键导入和使用AI模型。
✨✨Latest Advances on Multimodal Large Language Models
Collection of AWESOME vision-language models for vision tasks
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding