-
alibaba
- Hangzhou
Stars
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A machine learning compiler for GPUs, CPUs, and ML accelerators
Implement a reasoning LLM in PyTorch from scratch, step by step
Build smaller, faster, and more secure desktop and mobile applications with a web frontend.
An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
An easy-to-use, header-only C++ wrapper for Linux' perf event API
Goal: Enable awesome tooling for Bazel users of the C language family.
[TMLR 2025] Efficient Reasoning Models: A Survey
Vector (and Scalar) Quantization, in Pytorch
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
The hub for EleutherAI's work on interpretability and learning dynamics
Modeling, training, eval, and inference code for OLMo
InkFuse - An Experimental Database Runtime Unifying Vectorized and Compiled Query Execution.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unsloth Studio is a web UI for training and running open models like Qwen3.5, Gemma 4, DeepSeek, gpt-oss locally.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.