Skip to content
View ivanium's full-sized avatar

Highlights

  • Pro

Block or report ivanium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Machine Learning Engineering Open Book

Python 17,569 1,114 Updated Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 61 5 Updated Mar 23, 2026

Heterogeneous GPU Sharing on Kubernetes

Go 3,186 496 Updated Mar 26, 2026

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 39 7 Updated Mar 27, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 292 18 Updated Feb 24, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,432 247 Updated Mar 6, 2026
Jupyter Notebook 24 4 Updated Dec 6, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 685 56 Updated Mar 26, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,002 130 Updated Mar 27, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 96 5 Updated Dec 2, 2025

A framework for efficient model inference with omni-modality models

Python 3,958 647 Updated Mar 29, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 690 40 Updated Mar 8, 2026

Advancing the frontier of efficient AI

Python 56 8 Updated Mar 20, 2026

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 74 15 Updated Feb 18, 2026

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 50,990 2,415 Updated Mar 28, 2026

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 652 43 Updated Mar 6, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 55,781 9,513 Updated Nov 12, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 274 140 Updated Mar 29, 2026

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 162 21 Updated Jan 13, 2026

Fast and memory-efficient exact kmeans

Python 509 25 Updated Mar 26, 2026

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 3,567 491 Updated Mar 24, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,868 1,752 Updated Mar 5, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,436 491 Updated Mar 29, 2026

Puzzles for learning Triton

Jupyter Notebook 2,348 208 Updated Mar 18, 2026
Jupyter Notebook 23 2 Updated May 18, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 825 94 Updated Mar 29, 2026

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,374 898 Updated Mar 29, 2026

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 58 2 Updated Mar 17, 2026

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,948 2,061 Updated Mar 27, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,719 285 Updated Mar 29, 2026
Next