Lists (1)
Sort Name ascending (A-Z)
Stars
[CVPR 2026] BiGain is a training-free framework for accelerating diffusion models while preserving generation quality and improving classification.
Official implementation of paper: From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
Official implementation of paper: Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
Official code for our paper "Sink-Aware Pruning for Diffusion Language Models"
A defense framework against MLLM-based web GUI agents. This repository provides both the generative CAPTCHA system and tools for evaluating agent resistance.
[ICLR 2026 🔥] Official pytorch implementation for "Attention Is All You Need for KV Cache in Diffusion LLMs"
Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift
The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
[ICLR 2026] Optimization-free Dataset Distillation for Object Detection. Paper at: https://arxiv.org/abs/2506.01942
(ACL 2025 Main) Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
[CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?
[NeurIPS 2025] The first web-based benchmark and platform to evaluate visual reasoning and interaction capabilities of MLLM powered agents through diverse and dynamic CAPTCHA puzzles.
[NeurIPS25 & ICML25 Workshop on Reliable and Responsible Foundation Models] A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https:/…
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark. Paper at: https://arxiv.org/abs/2503.20786
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA top 1-acc by +1.3% and increases diversity per class by +5%
Official inference framework for 1-bit LLMs
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
Semantics-Aware Patch Encoding and Hierarchical Dependency Modeling for Long-Term Time Series Forecasting
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
Prompt Engineering at Your Fingertips!
Prompt Builder is a small Python application that implements the principles outlined in the paper "Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4". It allows users to…
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171