Skip to content
View Nickg02's full-sized avatar

Block or report Nickg02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

eecs498

26 repositories

Fast inference from large lauguage models via speculative decoding

Python 867 93 Updated Aug 22, 2024

[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Python 35 3 Updated Jun 12, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,672 189 Updated Jun 25, 2024

Mamba SSM architecture

Python 16,737 1,538 Updated Nov 11, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,413 1,957 Updated Dec 17, 2025

Go ahead and axolotl questions

Python 10,949 1,220 Updated Dec 16, 2025

Go ahead and axolotl questions

Python 11 13 Updated Feb 6, 2024

Large Language Model Text Generation Inference

Python 10,709 1,246 Updated Dec 11, 2025

Machine Learning Engineering Open Book

Python 16,054 987 Updated Dec 10, 2025

Building Transformer Models with PyTorch 2.0, by BPB Publications

Jupyter Notebook 36 20 Updated Jul 17, 2024

Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]

HTML 40 5 Updated May 13, 2025

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,309 78 Updated Mar 6, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,065 231 Updated Nov 23, 2025

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 342 45 Updated Apr 22, 2025

The Hugging Face course on Transformers

MDX 3,587 1,206 Updated Dec 8, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 153,958 31,465 Updated Dec 17, 2025

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,383 1,245 Updated Dec 16, 2025

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Python 28,023 1,461 Updated Nov 1, 2025

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 40,363 7,012 Updated Dec 17, 2025

Open Machine Learning Compiler Framework

Python 12,925 3,735 Updated Dec 17, 2025

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,513 1,314 Updated Dec 16, 2025

[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge

Python 37 3 Updated Sep 17, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,059 55 Updated Dec 11, 2025
Python 48 1 Updated Feb 19, 2024

Adaptive Draft-Verification for Efficient Large Language Model Decoding (AAAI 2025 Oral)

Python 69 7 Updated Apr 1, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,015 4,667 Updated Dec 12, 2025