Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 419 72 Updated Apr 11, 2026

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 924 110 Updated Apr 13, 2026

fzyzcjy / torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 237 49 Updated Mar 10, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 101,141 13,087 Updated Apr 13, 2026

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 849 102 Updated Apr 13, 2026

Simple-Efficient / RL-Factory

Train your Agent model via our easy and efficient framework

Python 1,732 163 Updated Dec 5, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,291 136 Updated Apr 13, 2026

deep-spin / triton-tutorial

From a+b to sparsemax(QK^T)V in Triton!

Jupyter Notebook 29 Updated Jun 19, 2025

IaroslavElistratov / triton-autodiff

Python 18 1 Updated Nov 11, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,383 893 Updated Apr 13, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,260 1,041 Updated Apr 12, 2026

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 29,552 3,517 Updated Jun 26, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 99,094 27,478 Updated Apr 13, 2026

lcy-seso / DLFrameworkTest

My tests and experiments with some popular dl frameworks.

Python 17 2 Updated Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edchangy11

Achievements

Achievements

Block or report edchangy11

Stars

Infatoshi / cuda-course

NVIDIA / cutile-python

vllm-project / production-stack

llm-d / llm-d

StuartSul / gpu-experiments

IST-DASLab / qutlass

Dao-AILab / flash-attention

IST-DASLab / llmq

pytorch / torchtitan

gau-nernst / learn-cuda

OleHolmNielsen / Slurm_tools

gpu-mode / kernelbot

deepbeepmeep / Wan2GP

meta-pytorch / tritonparse

stanford-cs336 / lectures

sgl-project / genai-bench

sgl-project / ome