embg

Elliot Gorokhovsky embg

Software Engineer @janestreet

Jane Street
New York, NY

Stars

NVlabs / cuda-oxide

cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign languag…

Rust 2,769 186 Updated Jun 16, 2026

openai / parameter-golf

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 5,129 3,334 Updated May 4, 2026

NVIDIA / cuda-checkpoint

CUDA checkpoint and restore utility

C 464 35 Updated Sep 15, 2025

apple / ml-cross-entropy

Python 606 72 Updated Sep 23, 2025

patrick-kidger / jaxtyping

Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/

Python 1,830 90 Updated Jun 13, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,803 1,054 Updated Jun 17, 2026

fzyzcjy / torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 251 58 Updated May 16, 2026

ck852 / patchbatch

PatchBatch is an electrophysiology data analysis program designed to facilitate automated processing of raw data into visualization-ready forms.

Python 1 Updated May 24, 2026

srush / Tensor-Puzzles

Solve puzzles. Improve your pytorch.

Jupyter Notebook 4,145 378 Updated Jul 15, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 16,727 4,088 Updated Jun 17, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,440 862 Updated Jun 17, 2026

GeeeekExplorer / nano-vllm

Nano vLLM

Python 14,067 2,227 Updated Apr 26, 2026

lucidrains / triangle-multiplicative-module

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Python 39 2 Updated Aug 3, 2021