Highlights
- Pro
Stars
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Fully open reproduction of DeepSeek-R1
A high-throughput and memory-efficient inference and serving engine for LLMs
Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".
[NeurIPS 2020] Released code for Interventional Few-Shot Learning
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
deep learning for image processing including classification and object-detection etc.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Code for "Leveraging Bilateral Correlations for Multi-Label Few-Shot Learning" in TNNLS 2024.
Ready-to-use code and tutorial notebooks to boost your way into few-shot learning for image classification.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
The new spin-off of Visual Language Navigation.
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
✨✨Latest Advances on Multimodal Large Language Models
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Diffusion model papers, survey, and taxonomy
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A summary of recent semi-supervised semantic segmentation methods
OpenMMLab Foundational Library for Training Deep Learning Models
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
[ECCV 2024] Tokenize Anything via Prompting
Fast and memory-efficient exact attention
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
[ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Single-Stage Semantic Segmentation from Image Labels (CVPR 2020)