-
Zhejiang U. -> Tsinghua U.
- Shenzhen
Highlights
- Pro
Stars
Instruct-tune LLaMA on consumer hardware
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
This repository contains implementations and illustrative code to accompany DeepMind publications
Taming Transformers for High-Resolution Image Synthesis
The Udacity open source self-driving car project
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
A suite of image and video neural tokenizers
An autonomous AI racecar using NVIDIA Jetson Nano
This repo contains the code for 1D tokenizer and generator
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Artificial Intelligence Research for Science (AIRS)
[NeurIPS 2024] Code release for "Segment Anything without Supervision"
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
[SIGGRAPH 2025] Official code of the paper "FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios"
Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"
Official PyTorch implementation of FlowMo.