KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.

Python 986 83 Updated Apr 23, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 47,232 5,252 Updated May 6, 2026

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 123,568 8,362 Updated Apr 20, 2026

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 39,025 3,555 Updated Mar 10, 2026

ultraworkers / claw-code

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 191,821 109,924 Updated May 16, 2026

alibaba / page-agent

JavaScript in-page GUI agent. Control web interfaces with natural language.

TypeScript 17,904 1,511 Updated May 11, 2026

NVIDIA / NemoClaw

Run OpenClaw more securely inside NVIDIA OpenShell with managed inference

TypeScript 20,496 2,682 Updated May 18, 2026

intel / auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

Python 1,399 135 Updated May 18, 2026

ggml-org / llama.cpp

LLM inference in C/C++

C++ 110,673 18,333 Updated May 18, 2026

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 1,151 184 Updated May 18, 2026

chatboxai / chatbox

Powerful AI Client

TypeScript 39,995 4,059 Updated Apr 9, 2026

doronz88 / rpc-project

Minimalistic server (written in C) and a python3 client to allow calling native functions on a remote host for automation purposes

Python 86 10 Updated May 5, 2026

Lakr233 / vphone-cli

Swift 6,286 943 Updated May 16, 2026

JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 29,465 3,573 Updated Dec 5, 2025

lemonade-sdk / lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

C++ 3,988 304 Updated May 18, 2026

JamePeng / llama-cpp-python

Forked from abetlen/llama-cpp-python

Python bindings for llama.cpp

Python 421 52 Updated May 17, 2026

k-inoway k-inoway

Lists (27)

♿ Accessibility

Agent

🤖 Android

annotation

Auto Engineering

🐺 Auto testing

CI/CD

demo

🚀 How to

🍎 iOS

kv_cache_quant

🐧 Linux

🌌 LLM

📚 LLM model

🐱‍👓 LLM RAG

⚔️ LLM Server

MCP

OCR

📊 Performance

💎 Python

📱 Remote

🛡️ Security

😂 Useful

VMM

🎙️voice

🎮 操作の可視化

量子化

Stars