ivanium

Yifan Qiao ivanium

@Inferact | ex-Postdoc at UC Berkeley | PhD from UCLA

257 followers · 420 following

Inferact
SF
https://yifanqiao.com

Achievements

x3 x2 x3

Achievements

x3 x2 x3

Highlights

Stars

haoran-ding / FM-Agent

Python 383 14 Updated Apr 29, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,697 1,072 Updated Apr 29, 2026

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 17,827 1,132 Updated Mar 16, 2026

FlashSampling / FlashSampling

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 69 6 Updated Apr 25, 2026

Project-HAMi / HAMi

Heterogeneous GPU Sharing on Kubernetes

Go 3,381 544 Updated Apr 29, 2026

rh-aiservices-bu / sardeenz

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 51 7 Updated Mar 27, 2026

thu-ml / SLA

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 305 18 Updated Feb 24, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,478 250 Updated Apr 15, 2026

mert-cemri / autoevolve

Jupyter Notebook 24 4 Updated Dec 6, 2025

NVIDIA / TileGym

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 708 68 Updated Apr 29, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,032 134 Updated Apr 28, 2026

sspec-project / SparseSpec

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 98 6 Updated Dec 2, 2025

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 4,557 855 Updated Apr 29, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 714 43 Updated Mar 8, 2026

skylight-org / sparse-attention-hub

Advancing the frontier of efficient AI

Python 59 10 Updated Apr 27, 2026

RouteWorks / RouterArena

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 74 15 Updated Feb 18, 2026

upstash / context7

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 54,098 2,562 Updated Apr 29, 2026

svg-project / Sparse-VideoGen

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 662 46 Updated Mar 6, 2026

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 57,332 9,828 Updated Nov 12, 2025

vllm-project / tpu-inference

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 306 172 Updated Apr 29, 2026

XpuOS / xsched

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 169 25 Updated Jan 13, 2026

svg-project / flash-kmeans

Fast and memory-efficient exact kmeans

Python 547 29 Updated Apr 17, 2026

SWE-agent / mini-swe-agent

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 4,122 567 Updated Apr 27, 2026

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 21,095 1,797 Updated Mar 5, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,904 533 Updated Apr 29, 2026

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,408 223 Updated Apr 1, 2026

thustorage / GPreempt

Jupyter Notebook 23 3 Updated May 18, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 899 106 Updated Apr 26, 2026

yichuan-w / LEANN

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,934 957 Updated Apr 24, 2026

Multi-LLM / prism-research

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 61 3 Updated Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yifan Qiao ivanium

Achievements

Achievements

Highlights

Block or report ivanium

Stars

haoran-ding / FM-Agent

ai-dynamo / dynamo

stas00 / ml-engineering

FlashSampling / FlashSampling

Project-HAMi / HAMi

rh-aiservices-bu / sardeenz

thu-ml / SLA

thu-ml / TurboDiffusion

mert-cemri / autoevolve

NVIDIA / TileGym

NVIDIA / cutile-python

sspec-project / SparseSpec

vllm-project / vllm-omni

tile-ai / TileRT

skylight-org / sparse-attention-hub

RouteWorks / RouterArena

upstash / context7

svg-project / Sparse-VideoGen

karpathy / nanoGPT

vllm-project / tpu-inference

XpuOS / xsched

svg-project / flash-kmeans

SWE-agent / mini-swe-agent

QwenLM / Qwen

tile-ai / tilelang

gpu-mode / Triton-Puzzles

thustorage / GPreempt

ovg-project / kvcached

yichuan-w / LEANN

Multi-LLM / prism-research