Highlights
- Pro
Starred repositories
A high-throughput and memory-efficient inference and serving engine for LLMs
Instant voice cloning by MIT and MyShell. Audio foundation model.
Convert PDF to markdown + JSON quickly with high accuracy
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Official inference framework for 1-bit LLMs
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Magenta: Music and Art Generation with Machine Intelligence
A TTS model capable of generating ultra-realistic dialogue in one pass.
Train transformer language models with reinforcement learning.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
NumPy aware dynamic Python compiler using LLVM
Large Language Model Text Generation Inference
Accessible large language models via k-bit quantization for PyTorch.
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
A concise but complete full-attention transformer with a set of promising experimental features from various papers
A TensorFlow Implementation of the Transformer: Attention Is All You Need
A language for constraint-guided and efficient LLM programming.
A tool for extracting plain text from Wikipedia dumps
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Top2Vec learns jointly embedded topic, document and word vectors.
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
PyTorch original implementation of Cross-lingual Language Model Pretraining.
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime