ai-hpc

🦙

Research Physical AI agent safety, performance, memory

NVIDIAN ai-hpc

🦙

Research Physical AI agent safety, performance, memory

I am gpu computing expert who always challenging to build better solutions. Love to find solutions with limited resources.

1.6k followers · 71 following

Achievements

x2 x3 x3

Achievements

x2 x3 x3

Highlights

Organizations

Stars

intel / auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

Python 1,543 159 Updated Jul 30, 2026

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 13,472 3,452 Updated Jul 13, 2026

entrius / gittensor

Python 50 224 Updated Jul 30, 2026

gittensor-ai-lab / sparkinfer

Fastest MoE/LLM inference runtime for consumer and edge Blackwell GPUs. SN74 on Gittensor.

Cuda 11 60 Updated Jul 30, 2026

openenclave / openenclave

SDK for developing enclaves

C 1,200 381 Updated Jul 24, 2026

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

Python 12,288 1,397 Updated Jul 30, 2026

Datura-ai / lium-io

Python 38 30 Updated Jul 30, 2026

BytedTsinghua-SIA / CUDA-Agent

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Python 1,123 94 Updated Jul 8, 2026

turboderp-org / exllamav3

An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs

Python 1,080 123 Updated Jul 29, 2026

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 10,949 1,636 Updated Jul 30, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 30,972 7,523 Updated Jul 30, 2026

sipeed / NanoCluster

NanoCluster: Compact & Affordable Cluster for Everyone

222 9 Updated Jan 26, 2026

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,759 369 Updated May 20, 2026

mit-han-lab / KernelWiki

Python 318 39 Updated Jun 9, 2026

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,600 316 Updated Jul 17, 2025

mit-han-lab / vlash

Real-Time VLAs via Future-state-aware Asynchronous Inference.

Python 452 32 Updated Apr 22, 2026

z-lab / dflash

DFlash: Block Diffusion for Flash Speculative Decoding

Python 5,551 397 Updated May 10, 2026

open-lm-engine / coda-kernels

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 235 24 Updated Jul 30, 2026

NVIDIA / CompileIQ

An Optimizer for Nvidia Compilers.

Python 113 11 Updated Jul 3, 2026

zezhishao / DailyArXiv

Daily ArXiv Papers.

Python 446 103 Updated Jul 30, 2026

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,863 966 Updated Jul 30, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,818 3,059 Updated Jul 30, 2026

lightseekorg / tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,765 210 Updated Jul 30, 2026

ROCm / FlyDSL

FlyDSL is the Python front‑end of the project: a Flexible Layout Python DSL for expressing tiling, partitioning, data movement, and kernel structure at a high level.

Python 257 101 Updated Jul 30, 2026

NVIDIA / TensorRT-Edge-LLM

High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI

Python 487 96 Updated Jul 30, 2026

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,762 204 Updated Jun 25, 2024

apache / tvm

Open Machine Learning Compiler Framework

Python 13,632 3,935 Updated Jul 30, 2026

tiiuae / Falcon-H1

All information and news with respect to Falcon-H1 series

122 15 Updated Oct 9, 2025

Tencent / AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python 1,495 170 Updated Jul 30, 2026

ai-hpc / llm-inference-viz

Interactive 3D visualization of dense decoder-only LLM inference. Companion to the AI Inference Engineer 2026 course.

TypeScript 1 Updated Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIAN ai-hpc

Achievements

Achievements

Highlights

Organizations

Block or report ai-hpc

Stars

intel / auto-round

EleutherAI / lm-evaluation-harness

entrius / gittensor

gittensor-ai-lab / sparkinfer

openenclave / openenclave

axolotl-ai-cloud / axolotl

Datura-ai / lium-io

BytedTsinghua-SIA / CUDA-Agent

turboderp-org / exllamav3

LMCache / LMCache

sgl-project / sglang

sipeed / NanoCluster

BBuf / tvm_mlir_learn

mit-han-lab / KernelWiki

mit-han-lab / llm-awq

mit-han-lab / vlash

z-lab / dflash

open-lm-engine / coda-kernels

NVIDIA / CompileIQ

zezhishao / DailyArXiv

iree-org / iree

triton-lang / triton

lightseekorg / tokenspeed

ROCm / FlyDSL

NVIDIA / TensorRT-Edge-LLM

FasterDecoding / Medusa

apache / tvm

tiiuae / Falcon-H1

Tencent / AngelSlim

ai-hpc / llm-inference-viz