Skip to content
View embg's full-sized avatar
  • Jane Street
  • New York, NY

Block or report embg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign languag…

Rust 2,769 186 Updated Jun 16, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 5,129 3,334 Updated May 4, 2026

CUDA checkpoint and restore utility

C 464 35 Updated Sep 15, 2025
Python 606 72 Updated Sep 23, 2025

Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/

Python 1,830 90 Updated Jun 13, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,803 1,054 Updated Jun 17, 2026

Allow torch tensor memory to be released and resumed later

Python 251 58 Updated May 16, 2026

PatchBatch is an electrophysiology data analysis program designed to facilitate automated processing of raw data into visualization-ready forms.

Python 1 Updated May 24, 2026

Solve puzzles. Improve your pytorch.

Jupyter Notebook 4,145 378 Updated Jul 15, 2024

Ongoing research training transformer models at scale

Python 16,727 4,088 Updated Jun 17, 2026

A PyTorch native platform for training generative AI models

Python 5,440 862 Updated Jun 17, 2026

Nano vLLM

Python 14,067 2,227 Updated Apr 26, 2026

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Python 39 2 Updated Aug 3, 2021

jax-triton contains integrations between JAX and OpenAI Triton

Python 462 57 Updated Jun 1, 2026

PyTorch native quantization and sparsity for training and inference

Python 2,859 532 Updated Jun 16, 2026

NanoGPT (124M) in 90 seconds

Python 5,400 809 Updated Jun 13, 2026

PyTorch media decoding and encoding

Python 1,125 108 Updated Jun 16, 2026

RandomX, KawPow, CryptoNight and GhostRider unified CPU/GPU miner and RandomX benchmark

C 9,998 3,818 Updated May 25, 2026

Proof of work algorithm based on random code execution

C++ 1,620 353 Updated May 24, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,107 18,129 Updated Jun 17, 2026

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,404 161 Updated Jun 17, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,736 1,287 Updated Jun 15, 2026

The Book of Statistical Proofs

Python 420 80 Updated May 29, 2026

Hash function quality and speed tests

C++ 2,160 188 Updated Dec 2, 2025

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 5,627 441 Updated Jun 16, 2026

Fast CRC32 implementations

C 133 11 Updated Nov 27, 2025

C++11 metaprogramming library

C++ 287 95 Updated Apr 24, 2026

High-level tracing language for Linux

C++ 10,168 1,463 Updated Jun 17, 2026
OCaml 223 45 Updated Jun 16, 2026
Next