eecs498
Fast inference from large lauguage models via speculative decoding
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Large Language Model Text Generation Inference
Machine Learning Engineering Open Book
Building Transformer Models with PyTorch 2.0, by BPB Publications
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Unsupervised text tokenizer for Neural Network-based text generation.
[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Adaptive Draft-Verification for Efficient Large Language Model Decoding (AAAI 2025 Oral)
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.