Skip to content
View balancap's full-sized avatar

Block or report balancap

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MLIR-based partitioning system

MLIR 191 37 Updated Jun 13, 2026

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing,…

Rust 330 96 Updated Jun 13, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,425 155 Updated Jun 14, 2026

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 311 26 Updated May 8, 2026

🚀 Sliding Window Attention Training for Efficient Large Language Models

Python 18 Updated Jun 7, 2026

Code snippets and reproductions from JustAByte

PureBasic 48 1 Updated Apr 6, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 713 89 Updated Jun 13, 2026

Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”

Python 194 20 Updated Apr 21, 2026

torchcomms: a modern PyTorch communications API

C++ 371 150 Updated Jun 13, 2026

CyPari is a Python3 extension module for Windows, macOS and linux. The user interface, and most of the underlying code, is the same for CyPari as for Sage's cypari2 module, but CyPari is completely…

Cython 8 7 Updated Jan 5, 2026

Data and tools for generating and inspecting OLMo pre-training data.

Python 1,508 193 Updated Nov 5, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 882 153 Updated Jun 13, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,136 8,826 Updated Jun 13, 2026

An open-source efficient deep learning framework/compiler, written in python.

Python 743 69 Updated Sep 4, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,762 272 Updated Jul 18, 2025

Entropy Based Sampling and Parallel CoT Decoding

Python 3,435 321 Updated Nov 13, 2024

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 402 90 Updated Feb 14, 2025

💫 Beautiful spinners for terminal, IPython and Jupyter

Python 3,023 149 Updated Jun 16, 2024

A Data Streaming Library for Efficient Neural Network Training

Python 1,517 196 Updated Feb 2, 2026

nsync is a C library that exports various synchronization primitives, such as mutexes

C 1,272 91 Updated Oct 29, 2025

A PyTorch native platform for training generative AI models

Python 5,436 859 Updated Jun 14, 2026

Track & Visualisation tool for numerics debugging

Python 6 Updated Sep 20, 2024

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,022 412 Updated Jun 14, 2026

Efficient Triton Kernels for LLM Training

Python 6,430 539 Updated Jun 12, 2026

Tile primitives for speedy kernels

Cuda 3,427 295 Updated May 27, 2026

An RSS/Atom feed reader for text terminals

C++ 3,822 249 Updated Jun 13, 2026

XLS: Accelerated HW Synthesis

C++ 1,498 237 Updated Jun 13, 2026
Jupyter Notebook 41 5 Updated Mar 25, 2026
Python 306 22 Updated Jul 15, 2024
Next