Skip to content
View zhouleidcc's full-sized avatar

Block or report zhouleidcc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🌐 3D and 4D World Modeling: A Survey

HTML 883 53 Updated Apr 12, 2026
Python 88 7 Updated Mar 21, 2026

ICCV 2023: QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

Python 7 Updated Jul 18, 2025

Official implementation of MAD: Motion Appearance Decoupling for efficient Driving World Models.

31 2 Updated Jan 15, 2026

[NeurIPS 2025] Official code of Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

Python 149 4 Updated Dec 3, 2025

[CVPR 2026 Oral] Learning to Drive via Real-World Simulation at Scale

Python 219 20 Updated Apr 9, 2026

Solutions for LeetCode

Python 149 57 Updated Jan 2, 2023

awesome-autonomous-driving

1,122 103 Updated Aug 19, 2024

⛽️「算法通关手册」:从零开始的「算法与数据结构」学习教程,200 道「算法面试热门题目」,1000+ 道「LeetCode 题目解析」,持续更新中!

Python 7,687 1,288 Updated Jan 17, 2026

🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解

Java 35,897 9,406 Updated Apr 15, 2026

a nano flash attention with pure cutlass cute dsl

Python 1 Updated Jul 25, 2025

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

Cuda 20 4 Updated Mar 3, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,302 398 Updated Jan 17, 2026

[Information Fusion 2025] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

599 37 Updated Apr 12, 2026

[CVPR 2026 Highlight] LitePT: Lighter Yet Stronger Point Transformer

Python 265 27 Updated Feb 21, 2026

Flash Attention from Scratch on CUDA Ampere

Assembly 166 28 Updated Sep 1, 2025

My tests and experiments with some popular dl frameworks.

Python 17 2 Updated Sep 11, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,288 1,045 Updated Apr 12, 2026

From Minimal GEMM to Everything

Cuda 199 10 Updated Feb 10, 2026

高性能 GPU 线性代数库深度解析与实战指南

HTML 3 Updated Sep 27, 2025

A Python subset for a better MLIR programming experience

Python 53 9 Updated Mar 12, 2026

An optimizing ahead-of-time Python Compiler

C++ 260 14 Updated Jun 9, 2024

A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM

Python 62 30 Updated Apr 13, 2026

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 800 28 Updated Oct 13, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 759 58 Updated Aug 6, 2025

Compile MLIR to PTX and execute it on NVIDIA GPUs

Jupyter Notebook 12 1 Updated Apr 16, 2025

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 452 30 Updated Dec 14, 2024

Example of RISC-V Vector programming

C 28 10 Updated Sep 4, 2025

how to optimize some algorithm in cuda.

Cuda 2,926 270 Updated Apr 16, 2026

CUDA Core Compute Libraries

C++ 2,277 378 Updated Apr 16, 2026
Next