hlu1

hlu1

58 followers · 0 following

Achievements

x2 x3

Achievements

x2 x3

Stars

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,363 15,197 Updated Apr 5, 2026

lm-sys / lm-sys.github.io

The source of LMSYS website and blogs

JavaScript 83 75 Updated Mar 31, 2026

deepseek-ai / Engram

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 4,229 308 Updated Jan 14, 2026

ishandhanani / srt-slurm

Benchmark SGLang on SLURM

Python 24 39 Updated Apr 3, 2026

verl-project / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,455 3,576 Updated Apr 3, 2026

meta-llama / llama-models

Utilities intended for use with Llama models.

Python 7,546 1,357 Updated Feb 11, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,285 2,256 Updated Apr 5, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 18,847 2,731 Updated Apr 5, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,450 5,197 Updated Apr 5, 2026

agent0ai / agent-zero

Agent Zero AI framework

Python 16,759 3,441 Updated Apr 3, 2026

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 278 24 Updated Jul 16, 2025

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,356 210 Updated Apr 1, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,149 2,584 Updated Apr 5, 2026

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 25,125 1,649 Updated Apr 3, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,531 1,770 Updated Apr 2, 2026

leptonai / search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

TypeScript 8,103 1,009 Updated Dec 2, 2025

voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Python 9,280 1,230 Updated Apr 3, 2026

ROCm / composable_kernel

[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror

C++ 525 281 Updated Apr 3, 2026

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,715 385 Updated Mar 16, 2026

tqchen / ffi-navigator

Python 250 25 Updated Jul 27, 2025

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,296 480 Updated Apr 4, 2026

996icu / 996.ICU

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

275,817 20,916 Updated Aug 22, 2025

hlu1 / pytorch

Forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

C++ 2 Updated May 19, 2022

hlu1 / QNNPACK

Forked from pytorch/QNNPACK

Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

C 1 Updated Dec 22, 2018

hlu1 / tvm

Forked from apache/tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 2 Updated Jun 12, 2020

apache / tvm

Open Machine Learning Compiler Framework

Python 13,250 3,849 Updated Apr 5, 2026

pytorch / QNNPACK

Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

C 1,548 222 Updated Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hlu1

Achievements

Achievements

Block or report hlu1

Stars

vllm-project / vllm

lm-sys / lm-sys.github.io

deepseek-ai / Engram

ishandhanani / srt-slurm

verl-project / verl

meta-llama / llama-models

NVIDIA / TensorRT-LLM

triton-lang / triton

sgl-project / sglang

agent0ai / agent-zero

usyd-fsalab / fp6_llm

gpu-mode / Triton-Puzzles

Dao-AILab / flash-attention

ml-explore / mlx

NVIDIA / cutlass

leptonai / search_with_lepton

voicepaw / so-vits-svc-fork

ROCm / composable_kernel

facebookincubator / AITemplate

tqchen / ffi-navigator

google / XNNPACK

996icu / 996.ICU

hlu1 / pytorch

hlu1 / QNNPACK

hlu1 / tvm

apache / tvm

pytorch / QNNPACK