Skip to content
View mlsw's full-sized avatar
  • University of Sheffield
  • United Kingdom
  • 16:52 (UTC +01:00)

Highlights

  • Pro

Block or report mlsw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 2,373 451 Updated Apr 7, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,654 5,286 Updated Apr 11, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 76,146 15,450 Updated Apr 11, 2026

PyTorch native quantization and sparsity for training and inference

Python 2,769 479 Updated Apr 11, 2026
Python 25 5 Updated Sep 26, 2025

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,433 344 Updated Apr 11, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,338 2,271 Updated Apr 11, 2026

A pytorch quantization backend for optimum

Python 1,035 85 Updated Apr 2, 2026

Official implementation of Half-Quadratic Quantization (HQQ)

Python 926 90 Updated Feb 26, 2026

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,324 298 Updated May 11, 2025

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,314 195 Updated Feb 26, 2026

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,789 392 Updated Mar 27, 2026

Codebase, data and models for hallucination of pruned models

Python 16 Updated Jan 11, 2025

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 877 120 Updated Aug 20, 2024
Python 553 43 Updated Feb 8, 2026

A simple and effective LLM pruning approach.

Python 861 128 Updated Aug 9, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 5,047 536 Updated Apr 11, 2025

Simple, safe way to store and distribute tensors

Python 3,694 307 Updated Apr 2, 2026

A framework for few-shot evaluation of language models.

Python 12,112 3,177 Updated Apr 8, 2026

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 10,615 1,069 Updated Apr 11, 2026

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 21,387 3,165 Updated Apr 10, 2026

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,749 1,336 Updated Apr 11, 2026

Task-based datasets, preprocessing, and evaluation for sequence models.

Python 593 60 Updated Mar 27, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 99,029 27,460 Updated Apr 11, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,197 32,833 Updated Apr 11, 2026

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,362 3,517 Updated Apr 11, 2026
Python 2,962 342 Updated Apr 2, 2026