Stars
AI agents running research on single-GPU nanochat training automatically
95% token savings. 155x faster queries. 16 languages. LLMs can't read your entire codebase. TLDR extracts structure, traces dependencies, and gives them exactly what they need.
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
LLM Council works together to answer your hardest questions
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Fast CUDA matrix multiplication from scratch
ππ€ Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Prompts for our Grok chat assistant and the `@grok` bot on X.
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Fully open reproduction of DeepSeek-R1
π Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
A comprehensive repository of reasoning tasks for LLMs (and beyond)
OCR, layout analysis, reading order, table recognition in 90+ languages
Convert PDF to markdown + JSON quickly with high accuracy
π Monitor deep learning model training and hardware usage from your mobile phone π±
A multi-programming language benchmark for LLMs
DeepSeek LLM: Let there be answers