Highlights
- Pro
Stars
slime is an LLM post-training framework for RL Scaling.
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
Open source Ghostty-based macOS terminal with vertical tabs and notifications. Built for AI coding agents and programmability.
Official source code of FreeCAD, a free and opensource multiplatform 3D parametric modeler.
extract all your personal data history from cursor, codex, claude-code, windsurf, and trae
The simplest, fastest repository for training/finetuning medium-sized GPTs.
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign languag…
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
Universal AI coding proxy. Use Claude Code, Codex CLI, or any tool with DeepSeek, GLM, MiniMax, and more—without rate limits breaking your flow.
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
RTX 6000 Pro Wiki — Running Large LLMs (Qwen3.5-397B, Kimi-K2.5, GLM-5) on PCIe GPUs without NVLink
Control panel for VLLM, Sglang, llama.cpp, exllamav3
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…
Overworld's local world client interface to run Waypoint world models
SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression
Autonomous GPU Kernel Generation & Optimization via Deep Agents
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Our first fully AI generated deep learning system
TheTom / llama-cpp-turboquant
Forked from ggml-org/llama.cppLLM inference in C/C++
Production-grade client-side tracing, profiling, and analysis for complex software systems.
Show usage stats for OpenAI Codex and Claude Code, without having to login.
DFlash: Block Diffusion for Flash Speculative Decoding
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
Voice-to-text with push-to-talk for Wayland compositors