Skip to content
View omkaark's full-sized avatar
  • 05:52 (UTC -05:00)

Highlights

  • Pro

Block or report omkaark

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
73 results for source starred repositories
Clear filter

MoE training for Me and You and maybe other people

Python 256 19 Updated Dec 17, 2025

My learning notes for ML SYS.

Python 4,698 298 Updated Dec 19, 2025

kernels, of the mega variety

Python 631 34 Updated Sep 28, 2025

A framework for the evaluation of autoregressive code generation language models.

Python 1,007 251 Updated Jul 22, 2025

Code for the paper "Efficient Training of Language Models to Fill in the Middle"

Python 194 43 Updated Apr 2, 2023

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 3,055 426 Updated Jan 17, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,396 118 Updated Nov 13, 2025

NanoGPT (124M) in 3 minutes

Python 3,970 520 Updated Dec 17, 2025

Official repository for LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking; published at MICCAI 2025.

Python 200 12 Updated Nov 12, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,977 778 Updated Dec 8, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,009 26,285 Updated Dec 19, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,035 4,669 Updated Dec 18, 2025

a small protein language model based off of nanochat

Python 2 Updated Oct 20, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,579 386 Updated Dec 19, 2025

FlashAttention written in metal-cpp headers

Makefile 3 1 Updated Oct 5, 2025

The Modular Platform (includes MAX & Mojo)

Mojo 25,366 2,743 Updated Dec 18, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,415 428 Updated Oct 27, 2025

A PyTorch native platform for training generative AI models

Python 4,855 644 Updated Dec 19, 2025

PyTorch building blocks for the OLMo ecosystem

Python 594 107 Updated Dec 19, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,010 583 Updated Dec 19, 2025

Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)

Python 457 53 Updated Dec 6, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 689 89 Updated Dec 19, 2025

A Quirky Assortment of CuTe Kernels

Python 699 64 Updated Dec 16, 2025

Python SQL Parser and Transpiler

Python 8,719 1,034 Updated Dec 19, 2025

RL gym for vision language models written in JAX

Python 135 13 Updated Oct 30, 2025

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda 212 21 Updated Oct 10, 2025
Python 78 6 Updated Dec 2, 2025

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,950 212 Updated Dec 19, 2025
Next