Stars
Model export recipes, Python primitives, and Swift runtime utilities for on-device AI
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm
Postgres with instant branching and scale-to-zero
turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel subagents.
Minimal and scalable research codebase in JAX, designed for rapid iteration on frontier research in LLM and other autoregressive models.
High-performance GPU-accelerated signal processing and visualization framework that runs anywhere.
Hardware Security Module (HSM) for Raspberry Pico and ESP32
Parrot is an array fusion GPU library built on NVIDIA's CCCL libaries (Thrust/CUB).
Real-time webcam demo with SmolVLM and llama.cpp server
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
reverse engineering the best-selling drones on Amazon to control programmatically
The pretty much "official" DSPy framework for Typescript
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Implementing DeepSeek R1's GRPO algorithm from scratch
Official repository for our work on micro-budget training of large-scale diffusion models.
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
Fast CUDA matrix multiplication from scratch
@mention people in a textarea
Train a tiny Llama3 model with parquet datasets, using JavaScript.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Efficient Triton Kernels for LLM Training