Stars
🌐 3D and 4D World Modeling: A Survey
ICCV 2023: QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection
Official implementation of MAD: Motion Appearance Decoupling for efficient Driving World Models.
[NeurIPS 2025] Official code of Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
[CVPR 2026 Oral] Learning to Drive via Real-World Simulation at Scale
awesome-autonomous-driving
⛽️「算法通关手册」:从零开始的「算法与数据结构」学习教程,200 道「算法面试热门题目」,1000+ 道「LeetCode 题目解析」,持续更新中!
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
a nano flash attention with pure cutlass cute dsl
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
[Information Fusion 2025] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective
[CVPR 2026 Highlight] LitePT: Lighter Yet Stronger Point Transformer
Flash Attention from Scratch on CUDA Ampere
My tests and experiments with some popular dl frameworks.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A Python subset for a better MLIR programming experience
A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Compile MLIR to PTX and execute it on NVIDIA GPUs
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
how to optimize some algorithm in cuda.