- Palo Alto, CA
- https://stuartsul.com
- in/stuartsul
- @stuart_sul
Highlights
- Pro
Stars
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
A collection of GPU tests and benchmarks for my own research.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Implementation for FP8/INT8 Rollout for RL training without performence drop.
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A PyTorch native platform for training generative AI models
Tile primitives for speedy kernels
A free, source-available and fair-code licensed mac app cleaner
Optimized primitives for collective multi-GPU communication
Co-Chuck: WebChucK IDE with Multi-User Collaboration and Synchronized ChucK Shreds
An OSX print to pdf-file printer driver
A frontend Framework for single-page applications on top of REST/GraphQL APIs, using TypeScript, React and Material Design
This boilerplate contains terraform configurations for the rapid deployment of a Kubernetes cluster, supporting services, and the underlying infrastructure in AWS.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model