balancap

Follow

Paul Balanca balancap

Follow

Research ML team lead at @graphcore Previously at @fiveai. ML + data pipeline practioner. Love small bits numbers.

366 followers · 17 following

Achievements

Achievements

Stars

openxla / shardy

MLIR-based partitioning system

MLIR 191 37 Updated Jun 13, 2026

lightseekorg / smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing,…

Rust 330 96 Updated Jun 13, 2026

lightseekorg / tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,425 155 Updated Jun 14, 2026

patrick-toulme / pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 311 26 Updated May 8, 2026

Fzkuji / swat-attention

Forked from fla-org/flash-linear-attention

🚀 Sliding Window Attention Training for Efficient Large Language Models

Python 18 Updated Jun 7, 2026

patrick-toulme / justabyte

Code snippets and reproductions from JustAByte

PureBasic 48 1 Updated Apr 6, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 713 89 Updated Jun 13, 2026

mit-han-lab / fouroversix

Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”

Python 194 20 Updated Apr 21, 2026

meta-pytorch / torchcomms

torchcomms: a modern PyTorch communications API

C++ 371 150 Updated Jun 13, 2026

3-manifolds / CyPari

CyPari is a Python3 extension module for Windows, macOS and linux. The user interface, and most of the underlying code, is the same for CyPari as for Sage's cypari2 module, but CyPari is completely…

Cython 8 7 Updated Jan 5, 2026

allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.

Python 1,508 193 Updated Nov 5, 2025

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 882 153 Updated Jun 13, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,136 8,826 Updated Jun 13, 2026

tenstorrent / tensix-isa-simulator

C++ 29 3 Updated Jun 1, 2026

hidet-org / hidet

An open-source efficient deep learning framework/compiler, written in python.

Python 743 69 Updated Sep 4, 2025

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,762 272 Updated Jul 18, 2025

xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding

Python 3,435 321 Updated Nov 13, 2024

facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 402 90 Updated Feb 14, 2025

manrajgrover / halo

💫 Beautiful spinners for terminal, IPython and Jupyter

Python 3,023 149 Updated Jun 16, 2024

mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

Python 1,517 196 Updated Feb 2, 2026

google / nsync

nsync is a C library that exports various synchronization primitives, such as mutexes

C 1,272 91 Updated Oct 29, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,436 859 Updated Jun 14, 2026

graphcore-research / track-and-visualize

Track & Visualisation tool for numerics debugging

Python 6 Updated Sep 20, 2024

flagos-ai / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,022 412 Updated Jun 14, 2026

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,430 539 Updated Jun 12, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,427 295 Updated May 27, 2026

newsboat / newsboat

An RSS/Atom feed reader for text terminals

C++ 3,822 249 Updated Jun 13, 2026

google / xls

XLS: Accelerated HW Synthesis

C++ 1,498 237 Updated Jun 13, 2026

google / pwlfit

Jupyter Notebook 41 5 Updated Mar 25, 2026

google-deepmind / nanodo

Python 306 22 Updated Jul 15, 2024