Stars
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Prompts for our Grok chat assistant and the `@grok` bot on X.
A place to store reusable transformer components of my own creation or found on the interwebs
A list of inputs that will beat the vast majority of Pokemon Firered games
mcarilli / FlameGraph
Forked from brendangregg/FlameGraphStack trace visualizer
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
World's Smallest Nintendo Wii, using a trimmed motherboard and custom stacked PCBs
Zero Bubble Pipeline Parallelism
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
You like pytorch? You like micrograd? You love tinygrad! ❤️
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in bot…
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry lead…
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Running large language models on a single GPU for throughput-oriented scenarios.