Stars
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
A massively parallel, high-level programming language
KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
🏔 Calculating total viewsheds for geographic terrain using a cache-efficient and hightly parallel algorithm
An Xposed module for downloading AI models from alternative sources
Fast, lossless LLM inference via dual-view diffusion decoding.
Libraries for executing federated programs and computations.
Visualize, query, and stream to train on multimodal robotics data.
Row-Bot - Personal AI Sovereignty. A local-first AI assistant with integrated tools, a personal knowledge graph, voice, vision, shell, browser automation, scheduled tasks, health tracking, and mess…
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
v1 of Asimov, an open-source humanoid robot
a browser frontend for codex desktop, running on a machine you control.
Generate hard iron offsets and soft iron matrix from raw magnetometer samples
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Open source cost intelligence proxy for AI agents. Cut costs ~80% with smart model routing. Dashboard, policy engine, 11 providers. MIT licensed.
A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code by TÂCHES.
Ultra-Sparse Adaptation of 1-Bit LLMs via XOR Patches
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
An AI co-worker with its own computer. Self-evolving, persistent memory, MCP server, secure credential collection, email identity. Built on the Claude Agent SDK.
AI agents running research on single-GPU nanochat training automatically
llama.cpp fork with additional SOTA quants and improved performance
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.
Train the smallest LM you can that fits in 16MB. Best model wins!
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware