Skip to content
View ivanium's full-sized avatar

Highlights

  • Pro

Block or report ivanium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,546 1,018 Updated Apr 13, 2026

Machine Learning Engineering Open Book

Python 17,688 1,122 Updated Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 65 5 Updated Apr 9, 2026

Heterogeneous GPU Sharing on Kubernetes

Go 3,283 510 Updated Apr 13, 2026

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 50 7 Updated Mar 27, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 295 18 Updated Feb 24, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,453 250 Updated Apr 8, 2026
Jupyter Notebook 24 4 Updated Dec 6, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 699 61 Updated Apr 13, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,017 130 Updated Apr 11, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 98 6 Updated Dec 2, 2025

A framework for efficient model inference with omni-modality models

Python 4,270 744 Updated Apr 13, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 705 42 Updated Mar 8, 2026

Advancing the frontier of efficient AI

Python 58 10 Updated Apr 6, 2026

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 75 15 Updated Feb 18, 2026

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 52,553 2,482 Updated Apr 13, 2026

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 656 46 Updated Mar 6, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 56,620 9,678 Updated Nov 12, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 286 157 Updated Apr 13, 2026

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 168 24 Updated Jan 13, 2026

Fast and memory-efficient exact kmeans

Python 533 27 Updated Mar 26, 2026

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 3,806 525 Updated Apr 13, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,980 1,786 Updated Mar 5, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 5,483 505 Updated Apr 13, 2026

Puzzles for learning Triton

Jupyter Notebook 2,371 214 Updated Apr 1, 2026
Jupyter Notebook 23 3 Updated May 18, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 849 98 Updated Apr 7, 2026

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,796 949 Updated Apr 13, 2026

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 59 4 Updated Mar 17, 2026

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,001 2,062 Updated Mar 27, 2026
Next