-
University of Illinois Urbana-Champaign
- zichengma.github.io
Stars
My learning notes for ML SYS.
A high-throughput and memory-efficient inference and serving engine for LLMs
Modular Serving Engine x Workload Generator Benchmarking Tool
Systematic and comprehensive benchmarks for LLM systems.
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
SGLang is a high-performance serving framework for large language models and multimodal models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A Datacenter Scale Distributed Inference Serving Framework
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
Tips and resources to prepare for Behavioral interviews.
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
2026 SWE internship & new graduate job list updated daily
Collection of Summer 2026 tech internships!
Summer 2026 software engineering, data science, AI, quant, product management, and hardware internship postings. Updated daily by Simplify and Pitt CSC.
The first "code-first" agent framework for seamlessly planning and executing data analytics tasks.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Home of CodeT5: Open Code LLMs for Code Understanding and Generation
📐 Jekyll theme for building a personal site, blog, project documentation, or portfolio.
A beautiful, simple, clean, and responsive Jekyll theme for academics
ZooKeeper client writes in async rust.