- Greater NYC area
- https://leofang.github.io/about
Highlights
- Pro
Stars
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
A conda plugin which creates NVIDIA-specific virtual packages
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Linter that finds portability issues in Python package distributions (wheels, sdists, conda packages).
Manipulating ragged arrays in an Array API compliant way.
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Reusable GitHub Actions workflows for RAPIDS CI
Let your Claude able to think
Experimental projects related to TensorRT
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
Python library for generating high-performance implementations of stencil kernels for weather and climate modeling from a domain-specific language (DSL).
A Python module for decorators, wrappers and monkey patching.
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
NVIDIA curated collection of educational resources related to general purpose GPU programming.
NVIDIA Math Libraries for the Python Ecosystem
GPU Development in Python 101 tutorial
A library for detecting, labeling, and reasoning about microarchitectures
JupyterLite demo deployed to GitHub Pages 🚀
A massively parallel, high-level programming language
A cross-version Python bytecode decompiler