Stars
'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)
Download flickr8k, flickr30k image caption datasets
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
[IEEE TPAMI] A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends
This is an official implementation for [ICLR'24] INTR: Interpretable Transformer for Fine-grained Image Classification.
Official implementation of "FET-FGVC: Feature-Enhanced Transformer for Fine-Grained Visual Classification"
Pytorch implementation of "Fine-grained Visual Classification with High-temperature Refinement and Background Suppression"
[TCSVT 2023, Highly Cited Paper] Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment
Solution of the NTIRE 2025 Challenge on Efficient Super-Resolution
MulimgViewer is a multi-image viewer that can open multiple images in one interface, which is convenient for image comparison and image stitching.
本项目包含 WeblyFG-Dataset 的数据和相关工具,这是一个专门为细粒度视觉分类 (Fine-Grained Visual Classification, FGVC) 任务构建的大规模图像数据集。 与传统的人工标注数据集不同,本数据集的数据完全从网络搜索引擎自动收集而来,并利用了弱监督学习方法进行处理。它旨在推动在带有噪声标签和干扰信息的真实网络环境下进行细粒度识别的研究。
solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
Data Upcycling Knowledge Distillation for Image Super-Resolution (official repository)
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)
[ECCV 2022] Patch Similarity Aware Data-Free Quantization for Vision Transformers
[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Primary codes of a channel pruning methos of Vision Transformer based on MLP layers in encoders
a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity
The official implementation of "Diversity-Guided MLP Reduction for Efficient Large Vision Transformers"
[CVPR'24] Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
The codebase for paper "PPT: Token Pruning and Pooling for Efficient Vision Transformer"
A super-resolution neural network for restoring low-resolution QR code images to high-quality outputs, integrating RRDB, SEBlock, and Transformer modules with perceptual and recognizability loss op…
A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI
The official code for the paper `Improving the transferability of adversarial examples through black-box feature attacks`.
OpenMMLab Pre-training Toolbox and Benchmark
Awesome Knowledge Distillation