Skip to content
View ivanium's full-sized avatar

Highlights

  • Pro

Block or report ivanium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 383 14 Updated Apr 29, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,697 1,072 Updated Apr 29, 2026

Machine Learning Engineering Open Book

Python 17,827 1,132 Updated Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 69 6 Updated Apr 25, 2026

Heterogeneous GPU Sharing on Kubernetes

Go 3,381 544 Updated Apr 29, 2026

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 51 7 Updated Mar 27, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 305 18 Updated Feb 24, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,478 250 Updated Apr 15, 2026
Jupyter Notebook 24 4 Updated Dec 6, 2025

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 708 68 Updated Apr 29, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,032 134 Updated Apr 28, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 98 6 Updated Dec 2, 2025

A framework for efficient model inference with omni-modality models

Python 4,557 855 Updated Apr 29, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 714 43 Updated Mar 8, 2026

Advancing the frontier of efficient AI

Python 59 10 Updated Apr 27, 2026

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 74 15 Updated Feb 18, 2026

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 54,098 2,562 Updated Apr 29, 2026

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 662 46 Updated Mar 6, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 57,332 9,828 Updated Nov 12, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 306 172 Updated Apr 29, 2026

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 169 25 Updated Jan 13, 2026

Fast and memory-efficient exact kmeans

Python 547 29 Updated Apr 17, 2026

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 4,122 567 Updated Apr 27, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 21,095 1,797 Updated Mar 5, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,904 533 Updated Apr 29, 2026

Puzzles for learning Triton

Jupyter Notebook 2,408 223 Updated Apr 1, 2026
Jupyter Notebook 23 3 Updated May 18, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 899 106 Updated Apr 26, 2026

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,934 957 Updated Apr 24, 2026

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 61 3 Updated Mar 17, 2026
Next