Highlights
- Pro
Stars
- All languages
- ApacheConf
- Assembly
- Bikeshed
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Crystal
- Dart
- Dockerfile
- Elixir
- Emacs Lisp
- Go
- Groovy
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- Lua
- Makefile
- Markdown
- Nim
- Nix
- Nushell
- OCaml
- PHP
- PLpgSQL
- Perl
- Pony
- Python
- QML
- Red
- Rich Text Format
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Svelte
- Swift
- Tcl
- TeX
- Twig
- TypeScript
- Vala
- Vim Script
- Vue
- WebAssembly
- hoon
MiMo Code: Where Models and Agents Co-Evolve
Access large language models from the command-line
AI agent framework, written from scratch (not based on openclaw), focused on stripping it down to the bare necessities, optimizing token count, reducing security risks. modular so you can enable on…
Orchestrate multiple coding agents from desktop and mobile
rustic - fast, encrypted, and deduplicated backups powered by Rust
Access your entire server infrastructure from your local desktop
Pydantic + Instructor for Rust: extract structured, validated data from LLMs (OpenAI GPT, Anthropic Claude, Gemini, Grok/xAI) into native Rust structs & enums. Derive macros auto-generate JSON Sche…
~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & En…
Structured-output enforcer for LLM responses. Repair + validate + (optional) retry-with-LLM. BYO-LLM, BYO-schema.
A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows
redirect container traffic through a vpn container
Fullstack app framework for web, desktop, and mobile.
llama.cpp fork with TurboQuant WHT-rotated KV cache & weight compression + Gemma 4 MTP and Qwen 3.6 NextN speculative decoding (+30-50% throughput).
Expose Docker containers as Tailscale Services using label-based configuration.
Fused TBQ4 Flash Attention + MTP + Shared Tensors for llama.cpp — 82+ tok/s with lossless 4.25 bpv KV cache at 200K context on RTX 4090
MAESTRO is an AI-powered research application designed to streamline complex research tasks.
Fast LLM speculative inference server for consumer hardware.
mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local
pgsty / minio
Forked from minio/minioCommunity Maintained Fork of minio (Object Storage Service)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Community recipes for serving LLMs on RTX 3090/4090/5090 CUDA gpus. Multi-engine (vLLM, llama.cpp, ik_llama) and model-agnostic. Currently shipping Qwen3.6-27B Qwen3.6 35B Gemma 4 26B Gemma 4 31B c…
AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models