n1mb0606

qowltmd n1mb0606

1 follower · 4 following

Kyung Hee University
https://n1mb0606.github.io/

Stars

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,987 102 Updated May 16, 2026

astra-sim / astra-sim

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 587 206 Updated Apr 25, 2026

casys-kaist / LLMServingSim

LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

Python 278 72 Updated May 11, 2026

ykcombat / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7 3 Updated Dec 15, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,301 1,181 Updated May 19, 2026

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 2,015 315 Updated Mar 30, 2026

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 3,214 483 Updated May 16, 2026

inancgumus / learngo

❤️ 1000+ Hand-Crafted Go Examples, Exercises, and Quizzes. 🚀 Learn Go by fixing 1000+ tiny programs.

Go 20,022 2,718 Updated Jun 24, 2025

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,179 217 Updated Oct 8, 2024

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,260 572 Updated May 19, 2026

HPMLL / BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 259 15 Updated Mar 19, 2026

NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 729 95 Updated Apr 21, 2026

unslothai / unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Python 64,637 5,711 Updated May 19, 2026

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,369 590 Updated Oct 28, 2024

hao-ai-lab / MuxServe

Jupyter Notebook 91 11 Updated Oct 17, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 8,074 622 Updated Jan 18, 2026

aojunzz / NM-sparsity

Python 244 32 Updated Nov 9, 2022

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 28,002 5,999 Updated May 19, 2026

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,462 773 Updated Apr 21, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 80,460 16,955 Updated May 19, 2026

mlcommons / inference

Reference implementations of MLPerf® inference benchmarks

Python 1,567 625 Updated May 14, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,636 981 Updated May 19, 2026

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,140 113 Updated Dec 30, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,839 2,744 Updated May 16, 2026

ImplFerris / LearnRust

Rust Learning Resources

2,012 198 Updated Apr 6, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,811 1,114 Updated May 19, 2026

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,187 186 Updated Sep 2, 2025

google / googletest

GoogleTest - Google Testing and Mocking Framework

C++ 38,622 10,777 Updated May 15, 2026

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 271 26 Updated Jul 19, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 111,307 18,415 Updated May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly