Lists (12)
Sort Name ascending (A-Z)
Stars
[NeurIPS 2025] Official PyTorch implementation of paper "BADiff: Bandwidth Adaptive Diffusion Model"
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
This repository contains low-bit quantization papers from 2020 to 2025 on top conference.
[NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Q-Insight is open-sourced at https://github.com/bytedance/Q-Insight. This repository will not receive further updates.
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Janus-Series: Unified Multimodal Understanding and Generation Models
[Paper List‘25] Paper List of Visual Data Coding for Machines, including Image/Video Coding for Machines, Feature Compression, Point Cloud Compression for Machines and Image/Video Coding for Machin…
Model Compression Toolbox for Large Language Models and Diffusion Models
Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)
Official repo for `LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM', ACM MM2024 Oral
h4nwei / Compare2Score
Forked from Q-Future/Compare2Score[NeurIPS'24] Compare2Score
[Neurips 24 Spotlight] Training in Pairs + Inference on Single Image with Anchors
🔥Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".
A curated list of recent diffusion models for video generation, editing, and various other applications.
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Collections of papers and code for employing MLLM for quality assessment tasks.