Skip to content
View StuartSul's full-sized avatar

Highlights

  • Pro

Organizations

@anysphere

Block or report StuartSul

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

all class materials for 340lx

C 3 1 Updated Oct 8, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 327 26 Updated Oct 8, 2025

A collection of GPU tests and benchmarks for my own research.

Cuda 3 1 Updated Oct 5, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,766 514 Updated Oct 9, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,554 1,474 Updated Sep 25, 2025

Implementation for FP8/INT8 Rollout for RL training without performence drop.

Python 250 18 Updated Sep 29, 2025
Jupyter Notebook 100 10 Updated Aug 24, 2025

Infiniband Verbs Performance Tests

C 830 353 Updated Oct 5, 2025

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 145 36 Updated Jul 30, 2024

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,138 83 Updated Aug 28, 2025

A PyTorch native platform for training generative AI models

Python 4,511 555 Updated Oct 9, 2025

C++ extensions in PyTorch

Python 1,149 245 Updated Jul 8, 2025

cs240lx stanford 2025 spring

C 15 2 Updated Jun 11, 2025

Large Context Attention

Python 742 54 Updated Jan 24, 2025
JavaScript 1 Updated May 18, 2024

Tile primitives for speedy kernels

Cuda 2,794 182 Updated Sep 21, 2025

A free, source-available and fair-code licensed mac app cleaner

Swift 8,978 211 Updated Oct 8, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,124 1,033 Updated Sep 24, 2025

Co-Chuck: WebChucK IDE with Multi-User Collaboration and Synchronized ChucK Shreds

TypeScript 3 Updated Dec 4, 2024

An OSX print to pdf-file printer driver

Swift 1,032 87 Updated Sep 9, 2025

A frontend Framework for single-page applications on top of REST/GraphQL APIs, using TypeScript, React and Material Design

TypeScript 26,273 5,405 Updated Oct 9, 2025

This boilerplate contains terraform configurations for the rapid deployment of a Kubernetes cluster, supporting services, and the underlying infrastructure in AWS.

HCL 633 111 Updated Sep 2, 2025

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

C++ 13,982 1,209 Updated Jul 29, 2024

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,117 877 Updated Oct 8, 2025

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

Python 1,979 294 Updated Sep 27, 2025

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Python 1,160 364 Updated Apr 14, 2023