- Greater NYC area
- https://leofang.github.io/about
Highlights
- Pro
Stars
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Linter that finds portability issues in Python package distributions (wheels, sdists, conda packages).
Manipulating ragged arrays in an Array API compliant way.
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Reusable GitHub Actions workflows for RAPIDS CI
Let your Claude able to think
Experimental projects related to TensorRT
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
Python library for generating high-performance implementations of stencil kernels for weather and climate modeling from a domain-specific language (DSL).
A Python module for decorators, wrappers and monkey patching.
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
NVIDIA curated collection of educational resources related to general purpose GPU programming.
NVIDIA Math Libraries for the Python Ecosystem
A library for detecting, labeling, and reasoning about microarchitectures
JupyterLite demo deployed to GitHub Pages 🚀
A massively parallel, high-level programming language
A cross-version Python bytecode decompiler
The pythoncapi-compat project can be used to write a C extension supporting a wide range of Python versions with a single code base.
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
The project provides high-performance concurrency, enabling highly parallel computation.