Starred repositories
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
7shoe / AdaParse
Forked from ramanathanlab/pdfwfAdaptive Parallel PDF Parsing and Resource Scaling Engine
Meta-Repository for Bespoke Silicon Group's Manycore Architecture (A.K.A HammerBlade)
Experiments in Joint Embedding Predictive Architectures (JEPAs).
PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework
Official PyTorch implementation of "Denoising MCMC for Accelerating Diffusion-Based Generative Models", ICML 2023 Oral Paper
A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)
tenstorrent / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Created and enhanced a local LLM training system on Apple Silicon with MLX and Metal API, overcoming the absence of CUDA support. Fine-tuned the Llama3 model on 16 GPUs for streamlined solution of …
[ISCA 2025] Official Implementation of "MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization"