Skip to content
View ivanium's full-sized avatar

Highlights

  • Pro

Block or report ivanium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,540 1,014 Updated Apr 12, 2026

Machine Learning Engineering Open Book

Python 17,670 1,121 Updated Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 65 5 Updated Apr 9, 2026

Heterogeneous GPU Sharing on Kubernetes

Go 3,272 508 Updated Apr 10, 2026

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 50 7 Updated Mar 27, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 295 18 Updated Feb 24, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,451 251 Updated Apr 8, 2026
Jupyter Notebook 24 4 Updated Dec 6, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 699 60 Updated Apr 12, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,014 130 Updated Apr 11, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 98 6 Updated Dec 2, 2025

A framework for efficient model inference with omni-modality models

Python 4,246 735 Updated Apr 12, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 705 42 Updated Mar 8, 2026

Advancing the frontier of efficient AI

Python 58 10 Updated Apr 6, 2026

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 74 15 Updated Feb 18, 2026

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 52,450 2,479 Updated Apr 10, 2026

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 656 46 Updated Mar 6, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 56,560 9,664 Updated Nov 12, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 286 157 Updated Apr 12, 2026

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 167 24 Updated Jan 13, 2026

Fast and memory-efficient exact kmeans

Python 531 27 Updated Mar 26, 2026

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 3,787 523 Updated Apr 6, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,973 1,782 Updated Mar 5, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,480 502 Updated Apr 12, 2026

Puzzles for learning Triton

Jupyter Notebook 2,366 214 Updated Apr 1, 2026
Jupyter Notebook 23 3 Updated May 18, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 848 98 Updated Apr 7, 2026

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,790 946 Updated Apr 10, 2026

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 59 4 Updated Mar 17, 2026

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,001 2,061 Updated Mar 27, 2026
Next