Skip to content
View quaternior's full-sized avatar

Highlights

  • Pro

Organizations

@AIDASLab

Block or report quaternior

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
15 results for source starred repositories written in Python
Clear filter

๐Ÿค— Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,270 31,092 Updated Nov 8, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,538 11,127 Updated Nov 9, 2025

Fast and memory-efficient exact attention

Python 20,413 2,125 Updated Nov 5, 2025

A framework for few-shot evaluation of language models.

Python 10,563 2,835 Updated Oct 29, 2025

๐Ÿ“šA curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.๐ŸŽ‰

Python 4,674 319 Updated Aug 19, 2025

A curated list for Efficient Large Language Models

Python 1,892 145 Updated Jun 17, 2025

[NeurIPS 2024] A Generalizable World Model for Autonomous Driving

Python 812 58 Updated Jul 2, 2025

A low-latency & high-throughput serving engine for LLMs

Python 439 58 Updated Oct 16, 2025
Python 345 44 Updated Apr 2, 2024

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 147 20 Updated Jun 25, 2022

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Python 101 7 Updated Sep 30, 2024

Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.

Python 21 3 Updated Sep 10, 2024
Python 16 1 Updated Jun 11, 2025
Python 10 1 Updated Sep 20, 2024

codebase for MUSTAFAR:Promoting Unstructured Sparsity for KV Pruning in LLM Inference

Python 8 2 Updated Nov 6, 2025