Stars
🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.
Scala code intelligence for coding agents. Zero Build Server. Zero Compilation. Just answers.
DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm
rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.
Cats Actors framework for building apps which are reactive. Cats actors uses a conceptual actor model as a higher level abstraction for concurrency.
Java JNI wrapper around libdeflate for faster DEFLATE/gzip/zip on the JVM
CLI tool for coding agents and developers to query the public API of any Maven JVM dependency — get symbol signatures, list packages, search by name, and inspect dependency trees. Powered by Coursi…
Curated list of datasets and tools for post-training.
Smart(er) code reading for humans and AI agents. Reduces cost per correct answer by ~40% on average. Install: cargo install tilth -or- npx tilth
A library for creating Model Context Protocol (MCP) servers and clients for Scala 3
Fast, stateless LLM for your shell: qq answers; qa runs commands
A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.
shai is a coding agent, your pair programming buddy that lives in the terminal. Written in rust with love <3
A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.
Automatic code generation for Scala functions and expressions via the Curry-Howard isomorphism
How to develop react-like applications using Laminar
FlashInfer: Kernel Library for LLM Serving
Web interface to sn-bindgen (https://sn-bindgen.indoorvivants.com/) to generate Scala 3 Native bindings to C header files
System for eXtended Hybrid Abductive Inductive Learning
Represent large sets and maps compactly with finite state transducers.
A eDSL framework based on Scala and MLIR, focusing on the Hardware design.
[READ-ONLY] A collection of prompts for enhancing productivity with large language models.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)