quaternior

Follow

Jinhyeok Kim quaternior

Follow

1 follower · 10 following

MS/Phd Student @ AIDAS Lab, SNU
https://sites.google.com/view/jinhyeokkim/home

Highlights

Pro

Organizations

Stars

dhjoo98 / coruscant

codebase for Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference

Cuda 3 Updated Oct 17, 2025

dhjoo98 / mustafar

codebase for MUSTAFAR:Promoting Unstructured Sparsity for KV Pruning in LLM Inference

Python 8 2 Updated Nov 6, 2025

chenhongyu2048 / LLM-inference-optimization-paper

Summary of some awesome work for optimizing LLM inference

134 5 Updated Nov 2, 2025

deep-spin / triton-tutorial

From a+b to sparsemax(QK^T)V in Triton!

Jupyter Notebook 27 Updated Jun 19, 2025

AIDASLab / Awesome-Diffusion-LLM

A comprehensive list of papers about Large-Language-Diffusion-Models.

23 4 Updated Nov 4, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,250 524 Updated Sep 23, 2025

Bruce-Lee-LY / cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 68 7 Updated Sep 8, 2024

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 2,007 286 Updated Nov 6, 2025

wangsiping97 / FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 120 7 Updated Jul 13, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,319 823 Updated Oct 17, 2025

HPMLL / SpInfer_EuroSys25

Cuda 28 2 Updated Apr 2, 2025

AmadeusChan / Awesome-LLM-System-Papers

609 30 Updated May 10, 2025

MoE-Inf / awesome-moe-inference

Curated collection of papers in MoE model inference

297 11 Updated Oct 20, 2025

Jason2Brownlee / awesome-llm-books

Awesome LLM Books: Curated list of books on Large Language Models

1,063 161 Updated Oct 24, 2025

he-y / Awesome-Pruning

A curated list of neural network pruning resources.

2,481 332 Updated Apr 4, 2024

OpenDriveLab / Vista

[NeurIPS 2024] A Generalizable World Model for Autonomous Driving

Python 809 58 Updated Jul 2, 2025

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,891 144 Updated Jun 17, 2025

abdelfattah-lab / shadow_llm

Python 10 1 Updated Sep 20, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 436 58 Updated Oct 16, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,355 2,114 Updated Nov 5, 2025

RobertCsordas / switchhead

Python 16 1 Updated Jun 11, 2025

RobertCsordas / moe_attention

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Python 100 7 Updated Sep 30, 2024

junstar92 / nvidia-libraries-study

Cuda 56 3 Updated Nov 14, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 10,534 2,829 Updated Oct 29, 2025

Infini-AI-Lab / Sirius

Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.

Python 21 3 Updated Sep 10, 2024

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,136 31,051 Updated Nov 5, 2025

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

FMInference / DejaVu

Python 345 44 Updated Apr 2, 2024

ranggihwang / Pregated_MoE

C++ 55 7 Updated May 4, 2024

yonghwankim-dev / OperatingSystem_Study

[인프런] 운영체제 공룡책 강의, 정리

C 292 25 Updated May 7, 2024