Stars
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
A fast inference library for running LLMs locally on modern consumer-class GPUs
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Run inference on replit-3B code instruct model using CPU
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Official Code for Paper: RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text
QLoRA: Efficient Finetuning of Quantized LLMs
⛓️ Serving LangChain LLM apps and agents automagically with FastApi. LLMops
📋 A list of open LLMs available for commercial use.
Inference code and configs for the ReplitLM model family
A Bulletproof Way to Generate Structured JSON from Language Models
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHX…
Port of OpenAI's Whisper model in C/C++
Faster Whisper transcription with CTranslate2
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Open Academic Research on Improving LLaMA to SOTA LLM
Resource list for generating JSON using LLMs via function calling, tools, CFG. Libraries, Models, Notebooks, etc.
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
Inference script for Meta's LLaMA models using Hugging Face wrapper
tloen / llama-int8
Forked from meta-llama/llamaQuantized inference code for LLaMA models