Stars
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This project trains a U-Net on the Waymo Open Motion Dataset to gener…
[ICML 2025] Official Github Repo for WOMD-Reasoning Dataset
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Official inference repo for FLUX.1 models
Documentation on how to access and use the Quick, Draw! Dataset.
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[CVPR 2024] DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Official Code for "GeoPT: Scaling Physics Simulation via Lifted Geometric Pre-Training" (ICML 2026) https://arxiv.org/pdf/2602.20399
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
Official PyTorch implementation of SegFormer
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
PyTorch Implementation of EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[AAAI 2025] DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
[ECCV 2024] Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
An open-source AI coding agent that lives in your terminal.
[ICCV 2023 & ICLR 2026] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
[CVPR'21] SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data, in PyTorch