Skip to content
View wzhao18's full-sized avatar

Organizations

@lying-flat-projects

Block or report wzhao18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 462 18 Updated Mar 26, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,223 829 Updated Mar 26, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 74,445 14,822 Updated Mar 27, 2026

The Unified Intent Interface: The easiest way to build intent-powered UIs

TypeScript 219 14 Updated Jun 13, 2025
Jupyter Notebook 9 2 Updated Dec 7, 2024
C++ 9 4 Updated Mar 28, 2025

A collection of full time roles in SWE, Quant, and PM for new grads.

16,553 1,269 Updated Mar 27, 2026

Building blocks for foundation models.

613 30 Updated Jan 3, 2024

A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.

Python 12 1 Updated Aug 27, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,198 2,218 Updated Mar 27, 2026

Structured Outputs

Python 13,605 675 Updated Mar 26, 2026

Ongoing research training transformer models at scale

Python 15,816 3,762 Updated Mar 26, 2026

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,349 196 Updated Apr 14, 2025

Collection of Summer 2026 tech internships!

Python 43,946 3,169 Updated Mar 27, 2026

Development repository for the Triton language and compiler

MLIR 18,776 2,706 Updated Mar 27, 2026

Intercepting CUDA runtime calls with LD_PRELOAD

C++ 43 9 Updated Mar 11, 2014

Inference code for Llama models

Python 59,268 9,826 Updated Jan 26, 2025

A tool for examining GPU scheduling behavior.

Cuda 96 22 Updated Aug 17, 2024

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 2 Updated Jul 16, 2020

Helios Traces from SenseTime

61 13 Updated Sep 27, 2022

A latent text-to-image diffusion model

Jupyter Notebook 72,771 10,619 Updated Jun 18, 2024

Open Machine Learning Compiler Framework

Python 13,221 3,837 Updated Mar 26, 2026

CVNets: A library for training computer vision networks

Python 1,968 252 Updated Oct 30, 2023

An open-source efficient deep learning framework/compiler, written in python.

Python 740 68 Updated Sep 4, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 158,458 32,625 Updated Mar 27, 2026
1 Updated Jan 18, 2023

A library that provides an embeddable, persistent key-value store for fast storage.

C++ 31,675 6,766 Updated Mar 26, 2026
Next