ivanium

Yifan Qiao ivanium

@Inferact | ex-Postdoc at UC Berkeley | PhD from UCLA

243 followers · 420 following

Inferact
SF
https://yifanqiao.com

Achievements

x2 x3 x2

Achievements

x2 x3 x2

Highlights

Stars

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 17,569 1,114 Updated Mar 16, 2026

FlashSampling / FlashSampling

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 61 5 Updated Mar 23, 2026

Project-HAMi / HAMi

Heterogeneous GPU Sharing on Kubernetes

Go 3,186 496 Updated Mar 26, 2026

rh-aiservices-bu / sardeenz

Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more models onto a GPU, until it is fully utilized.

TypeScript 39 7 Updated Mar 27, 2026

thu-ml / SLA

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 292 18 Updated Feb 24, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,432 247 Updated Mar 6, 2026

mert-cemri / autoevolve

Jupyter Notebook 24 4 Updated Dec 6, 2025

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 685 56 Updated Mar 26, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,002 130 Updated Mar 27, 2026

sspec-project / SparseSpec

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 96 5 Updated Dec 2, 2025

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 3,958 647 Updated Mar 29, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 690 40 Updated Mar 8, 2026

skylight-org / sparse-attention-hub

Advancing the frontier of efficient AI

Python 56 8 Updated Mar 20, 2026

RouteWorks / RouterArena

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 74 15 Updated Feb 18, 2026

upstash / context7

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

TypeScript 50,990 2,415 Updated Mar 28, 2026

svg-project / Sparse-VideoGen

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 652 43 Updated Mar 6, 2026

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 55,781 9,513 Updated Nov 12, 2025

vllm-project / tpu-inference

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 274 140 Updated Mar 29, 2026

XpuOS / xsched

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 162 21 Updated Jan 13, 2026

svg-project / flash-kmeans

Fast and memory-efficient exact kmeans

Python 509 25 Updated Mar 26, 2026

SWE-agent / mini-swe-agent

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 3,567 491 Updated Mar 24, 2026

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,868 1,752 Updated Mar 5, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,436 491 Updated Mar 29, 2026

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,348 208 Updated Mar 18, 2026

thustorage / GPreempt

Jupyter Notebook 23 2 Updated May 18, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 825 94 Updated Mar 29, 2026

yichuan-w / LEANN

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 10,374 898 Updated Mar 29, 2026

Multi-LLM / prism-research

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 58 2 Updated Mar 17, 2026

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,948 2,061 Updated Mar 27, 2026

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,719 285 Updated Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yifan Qiao ivanium

Achievements

Achievements

Highlights

Block or report ivanium

Stars

stas00 / ml-engineering

FlashSampling / FlashSampling

Project-HAMi / HAMi

rh-aiservices-bu / sardeenz

thu-ml / SLA

thu-ml / TurboDiffusion

mert-cemri / autoevolve

NVIDIA / TileGym

NVIDIA / cutile-python

sspec-project / SparseSpec

vllm-project / vllm-omni

tile-ai / TileRT

skylight-org / sparse-attention-hub

RouteWorks / RouterArena

upstash / context7

svg-project / Sparse-VideoGen

karpathy / nanoGPT

vllm-project / tpu-inference

XpuOS / xsched

svg-project / flash-kmeans

SWE-agent / mini-swe-agent

QwenLM / Qwen

tile-ai / tilelang

gpu-mode / Triton-Puzzles

thustorage / GPreempt

ovg-project / kvcached

yichuan-w / LEANN

Multi-LLM / prism-research

openai / gpt-oss

NovaSky-AI / SkyRL