Stars
Wan: Open and Advanced Large-Scale Video Generative Models
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Janus-Series: Unified Multimodal Understanding and Generation Models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
[ICLR'24 & IJCV‘25] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.
Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
Code release for NeRF (Neural Radiance Fields)
VOLO: Vision Outlooker for Visual Recognition
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Semantic Segmentation Architectures Implemented in PyTorch
UPSNet: A Unified Panoptic Segmentation Network