Skip to content
View wzhao18's full-sized avatar

Organizations

@lying-flat-projects

Block or report wzhao18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 470 21 Updated Apr 8, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,356 878 Updated Apr 9, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,874 15,363 Updated Apr 9, 2026

The Unified Intent Interface: The easiest way to build intent-powered UIs

TypeScript 219 14 Updated Apr 6, 2026
Jupyter Notebook 9 2 Updated Dec 7, 2024
C++ 9 4 Updated Mar 28, 2025

A collection of full time roles in SWE, Quant, and PM for new grads.

16,670 1,275 Updated Apr 9, 2026

Building blocks for foundation models.

614 28 Updated Jan 3, 2024

A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.

Python 12 1 Updated Aug 27, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,319 2,264 Updated Apr 9, 2026

Structured Outputs

Python 13,642 675 Updated Mar 26, 2026

Ongoing research training transformer models at scale

Python 15,974 3,801 Updated Apr 9, 2026

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,349 195 Updated Apr 14, 2025

Collection of Summer 2026 tech internships!

Python 44,136 3,168 Updated Apr 9, 2026

Development repository for the Triton language and compiler

MLIR 18,879 2,743 Updated Apr 9, 2026

Intercepting CUDA runtime calls with LD_PRELOAD

C++ 43 9 Updated Mar 11, 2014

Inference code for Llama models

Python 59,313 9,833 Updated Jan 26, 2025

A tool for examining GPU scheduling behavior.

Cuda 96 22 Updated Aug 17, 2024

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 2 Updated Jul 16, 2020

Helios Traces from SenseTime

62 13 Updated Sep 27, 2022

A latent text-to-image diffusion model

Jupyter Notebook 72,844 10,622 Updated Jun 18, 2024

Open Machine Learning Compiler Framework

Python 13,259 3,848 Updated Apr 9, 2026

CVNets: A library for training computer vision networks

Python 1,965 252 Updated Oct 30, 2023

An open-source efficient deep learning framework/compiler, written in python.

Python 740 68 Updated Sep 4, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,091 32,807 Updated Apr 9, 2026
1 Updated Jan 18, 2023

A library that provides an embeddable, persistent key-value store for fast storage.

C++ 31,659 6,788 Updated Apr 9, 2026
Next