Skip to content
View quaternior's full-sized avatar

Highlights

  • Pro

Organizations

@AIDASLab

Block or report quaternior

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
45 results for source starred repositories
Clear filter

codebase for Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference

Cuda 3 Updated Oct 17, 2025

codebase for MUSTAFAR:Promoting Unstructured Sparsity for KV Pruning in LLM Inference

Python 8 2 Updated Nov 6, 2025

Summary of some awesome work for optimizing LLM inference

135 5 Updated Nov 2, 2025

From a+b to sparsemax(QK^T)V in Triton!

Jupyter Notebook 27 Updated Jun 19, 2025

A comprehensive list of papers about Large-Language-Diffusion-Models.

23 4 Updated Nov 4, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,255 524 Updated Sep 23, 2025

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 68 7 Updated Sep 8, 2024

CUDA Core Compute Libraries

C++ 2,010 286 Updated Nov 6, 2025

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 120 7 Updated Jul 13, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,332 826 Updated Nov 6, 2025
Cuda 28 2 Updated Apr 2, 2025

Curated collection of papers in MoE model inference

297 11 Updated Oct 20, 2025

Awesome LLM Books: Curated list of books on Large Language Models

1,067 164 Updated Oct 24, 2025

A curated list of neural network pruning resources.

2,481 332 Updated Apr 4, 2024

[NeurIPS 2024] A Generalizable World Model for Autonomous Driving

Python 811 58 Updated Jul 2, 2025

A curated list for Efficient Large Language Models

Python 1,891 144 Updated Jun 17, 2025
Python 10 1 Updated Sep 20, 2024

A low-latency & high-throughput serving engine for LLMs

Python 436 58 Updated Oct 16, 2025

Fast and memory-efficient exact attention

Python 20,367 2,116 Updated Nov 5, 2025
Python 16 1 Updated Jun 11, 2025

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Python 100 7 Updated Sep 30, 2024

A framework for few-shot evaluation of language models.

Python 10,544 2,831 Updated Oct 29, 2025

Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.

Python 21 3 Updated Sep 10, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,172 31,060 Updated Nov 6, 2025

Sample codes for my CUDA programming book

Cuda 1,924 375 Updated Feb 15, 2025
Python 345 44 Updated Apr 2, 2024

[인프런] 운영체제 공룡책 강의, 정리

C 292 25 Updated May 7, 2024
Next