This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,255 177 Updated Jul 29, 2023

arasgungore / arasgungore-CV

My curriculum vitae (CV) written using LaTeX.

TeX 908 275 Updated Sep 11, 2024

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

812 61 Updated Mar 27, 2025

l0ngc / hpc-learning

hpc-learning

783 45 Updated May 30, 2024

Global-CS-application / global-cs-application.github.io

欧港新CS留学项目指北

HTML 773 62 Updated Aug 25, 2025

SqueezeAILab / KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 411 37 Updated Aug 13, 2024

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 362 44 Updated Nov 20, 2025

infinigence / Infini-Megrez

341 20 Updated Oct 11, 2025

Aaronhuang-778 / BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 229 18 Updated Jan 11, 2025

mrnorman / miniWeather

A parallel programming training mini app simulating weather-like flows

C++ 178 82 Updated Aug 11, 2025

hpcgame / hpc-wiki

Wiki fo HPC

Python 137 13 Updated Jul 23, 2025

owensgroup / BGHT

BGHT: High-performance static GPU hash tables.

C++ 72 9 Updated Jul 2, 2025

ztt-21 / zTT

zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation

C 29 18 Updated May 10, 2021

leefige / radik

Scalable radix top-k selection on GPUs.

Cuda 21 3 Updated Jan 27, 2025

zyjopensource / geepafs

C 18 6 Updated Apr 25, 2025

ruixueqingyang / MF-GPOEO

a Model-Free GPU Online Energy Optimization (MF-GPOEO) framework

C++ 4 2 Updated Dec 11, 2023

chalmers-hub / first-week-in-chalmers

A quick survival guild for i18n students who comes to chalmers.

SCSS 4 2 Updated Nov 18, 2023

CHART-Team / xitao

XiTAO is a lightweight layer built on top of modern C++ features with the goals of being low-overhead and serving as a development platform for testing scheduling and resource management algorithms.

C++ 2 1 Updated Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long Cheng thynics

Block or report thynics

Stars

ggml-org / llama.cpp

vllm-project / vllm

helix-editor / helix

geekan / HowToLiveLonger

nikivdev / flow

Infrasys-AI / AISystem

vosen / ZLUDA

pengsida / learning_research

xlite-dev / LeetCUDA

gpu-mode / lectures

forthespada / Awsome-Courses

AIoT-MLSys-Lab / Efficient-LLMs-Survey

Liu-xiandong / How_to_optimize_in_GPU