coolhok

justdoit coolhok

11 followers · 27 following

Achievements

Stars

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,348 207 Updated Mar 18, 2026

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 333 77 Updated Mar 24, 2026

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,700 364 Updated Mar 24, 2026

tile-ai / AttentionEngine

Python 52 3 Updated May 19, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,969 623 Updated Mar 24, 2026

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 966 88 Updated Feb 25, 2026

CalebDu / Awesome-Cute

C++ 119 18 Updated May 16, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,395 133 Updated Mar 11, 2026

openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,602 534 Updated Mar 23, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,396 950 Updated Mar 24, 2026

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 5,763 373 Updated Mar 19, 2026

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,935 318 Updated Jan 14, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,971 288 Updated May 15, 2025

verl-project / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,170 3,489 Updated Mar 24, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 24,965 4,974 Updated Mar 24, 2026

michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali

Python 2,728 182 Updated Mar 24, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 74,178 14,726 Updated Mar 24, 2026

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,041 86 Updated Sep 4, 2024

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 493 67 Updated Nov 26, 2024

ROCm / composable_kernel

[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror

C++ 525 278 Updated Mar 24, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,076 349 Updated Mar 19, 2026

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 496 40 Updated Dec 19, 2025

hyperai / tvm-cn

TVM Documentation in Chinese Simplified / TVM 中文文档

TypeScript 3,585 713 Updated Mar 12, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,883 264 Updated Mar 24, 2026

SqueezeBits / QUICK

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Python 120 5 Updated Mar 6, 2024

flame / how-to-optimize-gemm

C 2,001 364 Updated Jul 29, 2023

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,257 177 Updated Jul 29, 2023

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 998 96 Updated Sep 10, 2025

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

77,294 8,932 Updated Feb 5, 2026

tencentmusic / cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场…

Python 4,909 867 Updated Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

justdoit coolhok

Achievements

Achievements

Block or report coolhok

Stars

gpu-mode / Triton-Puzzles

meta-pytorch / tritonbench

llm-d / llm-d

tile-ai / AttentionEngine

kvcache-ai / Mooncake

thu-ml / SpargeAttn

CalebDu / Awesome-Cute

ByteDance-Seed / Triton-distributed

openucx / ucx

ai-dynamo / dynamo

zhaochenyang20 / Awesome-ML-SYS-Tutorial

deepseek-ai / DualPipe

deepseek-ai / open-infra-index

verl-project / verl

sgl-project / sglang

michaelfeil / infinity

vllm-project / vllm

IST-DASLab / marlin

spcl / QuaRot

ROCm / composable_kernel

xlite-dev / Awesome-LLM-Inference

vectorch-ai / ScaleLLM

hyperai / tvm-cn

BBuf / how-to-optim-algorithm-in-cuda

SqueezeBits / QUICK

flame / how-to-optimize-gemm

Liu-xiandong / How_to_optimize_in_GPU

zhuzilin / ring-flash-attention

mlabonne / llm-course

tencentmusic / cube-studio