sunggg

Follow

Sunghyun Park sunggg

Follow

19 followers · 25 following

OctoML

Achievements

Achievements

Lists (1)

Sort

🚀 My stack

Stars

snudm-starlab / K-prune

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models (ICLR 2024)

Python 14 Updated May 31, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,935 1,918 Updated Jun 21, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,514 6,647 Updated Jun 22, 2026

openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 18,554 1,511 Updated May 24, 2026

deepspeedai / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 2,106 191 Updated Jun 30, 2025

mistralai / megablocks-public

Forked from databricks/megablocks

Python 867 62 Updated Dec 8, 2023

fixie-ai / ai-benchmarks

Benchmarking suite for popular AI APIs

Python 89 15 Updated Feb 6, 2025

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,913 124 Updated Jan 21, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,835 1,066 Updated Jun 22, 2026

intel / xFasterTransformer

C++ 436 75 Updated Sep 18, 2025

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,232 398 Updated Jul 11, 2024

turboderp-org / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 4,560 338 Updated Mar 4, 2026

artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,936 874 Updated Jun 10, 2024

mlc-ai / llm-perf-bench

Python 122 13 Updated Apr 22, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 24,208 2,850 Updated Jun 20, 2026

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 3,072 451 Updated Jul 13, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 117,617 19,803 Updated Jun 22, 2026

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Python 2,924 222 Updated Sep 30, 2023

mlc-ai / web-llm

High-performance In-browser LLM Inference Engine

TypeScript 18,250 1,312 Updated Jun 9, 2026

mlc-ai / relax

Python 176 104 Updated Jun 14, 2026

psrivas2 / relax

Forked from tlc-pack/relax

Temp repo for prototyping relax(relay next), the effort will be upstreamed. We use the wiki pages on this repo to host design docs.

Python 5 Updated Mar 6, 2023

mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 2…

C 950 161 Updated Nov 27, 2024

awslabs / raf

C++ 144 21 Updated Jan 30, 2025

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 195,816 75,196 Updated Jun 22, 2026

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,002 167 Updated Sep 19, 2024

msr-fiddle / pipedream

Python 394 115 Updated Nov 4, 2022

jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning

C++ 743 93 Updated Jan 26, 2023

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,494 2,952 Updated Jun 21, 2026

msr-fiddle / piper

C++ 9 3 Updated Dec 18, 2021

openai / openai-gemm

Open single and half precision gemm implementations

C 397 87 Updated Apr 2, 2023