-
Xidian University
- Xi'an, China
-
23:23
(UTC +08:00)
Lists (4)
Sort Name ascending (A-Z)
Stars
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Wan: Open and Advanced Large-Scale Video Generative Models
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also …
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
GPT4V-level open-source multi-modal model based on Llama3-8B
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
IQA: Deep Image Structure and Texture Similarity Metric
CVPR 2025: Frequency Dynamic Convolution for Dense Image Prediction
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark
[NeurIPS 2025 Spotlight] Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
[CVPR 2025 Highlight] Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
[NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.
Very Long Natural Scenery Image Prediction by Outpainting, ICCV2019, TensorFlow
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models