Stars
A blazingly fast, open-source application server with type-safe APIs, built-in WebAssembly runtime, realtime, auth, and admin UI built on Rust, SQLite & Wasmtime.
All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.
A self-distillation based training method for long context reasoning in a single LLM without reinforcement learning
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
A general memory system for agents, powered by deep-research
Second iteration of my Rust key–value store — segmented log, in-memory index, checksums, and manual compaction.
Preprint: Asymmetry in Low-Rank Adapters of Foundation Models
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Official implementation of "Continuous Autoregressive Language Models"
Training and Inference for "Languages are Modalities: Cross-Lingual Alignment via Encoder Injection"
Nearly Inference Free Embeddings: make your RAG queries 500x faster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Automatic differentiation for Triton Kernels
LightMem: Lightweight and Efficient Memory-Augmented Generation
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Automatic Video Generation from Scientific Papers
Flask app to interact with FastVLM using CPU only
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception