Stars
Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
12 Lessons to Get Started Building AI Agents
All Algorithms implemented in Python
A tiny edit to nGPT and some custom kernels to speed it up
A minimal Agentic RAG built with LangGraph — learn Retrieval-Augmented Generation Agents in minutes.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Birch-san / k-diffusion
Forked from crowsonkb/k-diffusionKarras et al. (2022) diffusion models for PyTorch
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
Birch-san / NATTEN
Forked from SHI-Labs/NATTENNeighborhood Attention Extension. Bringing attention to a neighborhood near you!
In this project, I developed a live sketching functionlity using open-cv and created an app using streamlit
This is a python implementation of the Direct Linear Transform for 3d coordinates to 2d image coordinates and vice versa
Learn the building blocks of how to build DeepSeek from scratch.
In-depth tutorials on LLMs, RAGs and real-world AI agent applications.
A Hands on series on developing LLM applications
CodedK / jailbreak_llms
Forked from verazuo/jailbreak_llms[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through w…
Learn the building blocks of how to build nano-kimi from scratch
Jarvis is a voice-activated, conversational AI assistant powered by a local LLM (Qwen via Ollama). It listens for a wake word, processes spoken commands using a local language model with LangChain,…
21 Lessons, Get Started Building with Generative AI
Kimi K2 is the large language model series developed by Moonshot AI team