Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Best Practices, code samples, and documentation for Computer Vision.
Reference PyTorch implementation and models for DINOv3
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
TPAMI:Frequency-aware Feature Fusion for Dense Image Prediction
Implementation of various topic models
Official Codes for "Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality"
🎮 Advanced Deep Learning and Reinforcement Learning at UCL & DeepMind | YouTube videos 👉
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.
Feature guided masked Autoencoder for self-supervised learning in remote sensing
code for "Spectral Removal of Guarded Attribute Information"