Highlights
- Pro
Stars
MoE training for Me and You and maybe other people
My learning notes for ML SYS.
A framework for the evaluation of autoregressive code generation language models.
Code for the paper "Efficient Training of Language Models to Fill in the Middle"
Code for the paper "Evaluating Large Language Models Trained on Code"
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Official repository for LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking; published at MICCAI 2025.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
a small protein language model based off of nanochat
PyTorch native quantization and sparsity for training and inference
FlashAttention written in metal-cpp headers
The simplest, fastest repository for training/finetuning small-sized VLMs.
A PyTorch native platform for training generative AI models
PyTorch building blocks for the OLMo ecosystem
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…
A machine learning accelerator core designed for energy-efficient AI at the edge.