-
The University of Hong Kong
- Hong Kong
- https://peizesun.github.io/
Stars
MAGI-1: Autoregressive Video Generation at Scale
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
[AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training sc…
Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"
[ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Models
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
SEED-Voken: A Series of Powerful Visual Tokenizers
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Building a quick conversation-based search demo with Lepton AI.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
High-fidelity performance metrics for generative models in PyTorch
Open reproduction of MUSE for fast text2image generation.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces