Highlights
- Pro
Stars
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
how to optimize some algorithm in cuda.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Header-only C++/python library for fast approximate nearest neighbors
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
OLMoE: Open Mixture-of-Experts Language Models
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
Everything we actually know about the Apple Neural Engine (ANE)
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
A natural language interface for computers
A framework for building realtime voice AI agents 🤖🎙️📹
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
A generative speech model for daily dialogue.
Minimal container for Chrome's headless shell, useful for automating / driving the web
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Fast and accurate AI powered file content types detection
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Make images smaller using best-in-class codecs, right in the browser.
A blazing fast inference solution for text embeddings models