-
University of Sheffield
- United Kingdom
-
16:52
(UTC +01:00)
Highlights
- Pro
Stars
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch native quantization and sparsity for training and inference
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A pytorch quantization backend for optimum
Official implementation of Half-Quadratic Quantization (HQQ)
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Codebase, data and models for hallucination of pruned models
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Simple, safe way to store and distribute tensors
A framework for few-shot evaluation of language models.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Unsupervised text tokenizer for Neural Network-based text generation.
Task-based datasets, preprocessing, and evaluation for sequence models.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more