TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,198 2,218 Updated Mar 27, 2026

dottxt-ai / outlines

Structured Outputs

Python 13,605 675 Updated Mar 26, 2026

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 15,816 3,762 Updated Mar 26, 2026

tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,349 196 Updated Apr 14, 2025

SimplifyJobs / Summer2026-Internships

Collection of Summer 2026 tech internships!

Python 43,946 3,169 Updated Mar 27, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 18,776 2,706 Updated Mar 27, 2026

nchong / cudahook

Intercepting CUDA runtime calls with LD_PRELOAD

C++ 43 9 Updated Mar 11, 2014

ampersand-projects / tilt

C++ 11 4 Updated Jun 9, 2024

meta-llama / llama

Inference code for Llama models

Python 59,268 9,826 Updated Jan 26, 2025

yalue / cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

Cuda 96 22 Updated Aug 17, 2024

UofT-EcoSystem / gpgpu-sim_distribution

Forked from gpgpu-sim/gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

C++ 2 Updated Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wei Zhao wzhao18

Organizations

Block or report wzhao18

Stars

NVIDIA / tilus

flashinfer-ai / flashinfer

vllm-project / vllm

EvanZhouDev / ui2

tally-project / tally-bench

tally-project / tally

jonbarron / jonbarron.github.io

SimplifyJobs / New-Grad-Positions

HazyResearch / aisys-building-blocks

CentML / flexible-inference-bench

NVIDIA / TensorRT-LLM

dottxt-ai / outlines

NVIDIA / Megatron-LM

tensor-compiler / taco

SimplifyJobs / Summer2026-Internships

triton-lang / triton

nchong / cudahook

ampersand-projects / tilt

meta-llama / llama

yalue / cuda_scheduling_examiner_mirror

UofT-EcoSystem / gpgpu-sim_distribution

UofT-EcoSystem / GPU-Virtualization-Benchmarks

S-Lab-System-Group / HeliosData

CompVis / stable-diffusion

apache / tvm

apple / ml-cvnets

hidet-org / hidet

huggingface / transformers

wangz585 / whoami

facebook / rocksdb