Starred repositories
haoruilee / marscode-lark-url-preview-tutorial
Forked from CancerGary/marscode-lark-url-preview-tutorialHigh-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
FlashMLA: Efficient Multi-head Latent Attention Kernels
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
Dynamically get the suggested clusters in the data for unsupervised learning.
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
Fast inference from large lauguage models via speculative decoding
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code
CLI platform to experiment with codegen. Precursor to: https://lovable.dev
LangChain 的中文入门教程
🦜🔗 The platform for reliable agents.
limcheekin / flutter-gpt
Forked from mpaepper/content-chatbotBuild a Flutter Q&A bot of Flutter Docs Site (https://docs.flutter.dev/)
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
800,000 step-level correctness labels on LLM solutions to MATH problems
QLoRA: Efficient Finetuning of Quantized LLMs
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Aligning pretrained language models with instruction data generated by themselves.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Open-source keyboard firmware for Atmel AVR and Arm USB families
Run LLaMA (and Stanford-Alpaca) inference on Apple Silicon GPUs.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]