Lists (1)
Sort Name ascending (A-Z)
Stars
Running a big model on a small laptop
Exact speculative decoding on Apple Silicon, powered by MLX.
Step by step explanation/tutorial of llama2.c
coldlarry / llama2.cpp
Forked from karpathy/llama2.cInference Llama 2 in one file of pure C
From-scratch C++ runtime for Llama 2 inference. Implements full transformer forward pass with RoPE, GQA, KV cache, SwiGLU, and a custom BPE tokenizer. No framework dependencies.
AI Agent Backend Platform on FastAPI — MCP server + AI orchestration + async DDD architecture. Zero-boilerplate CRUD, auto domain discovery, 14 Claude Code AI development skills.
"🐈 nanobot: The Ultra-Lightweight Personal AI Agent"
A server implementation for Wikidata API using the Model Context Protocol (MCP).
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
code to run evaluation on MermaidSeqBench using LLMaJ and a RESTful OpenAI-compatible API
LangGraph V1 Tutorial in Korean
Train your own speech AI model from scratch
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, an…
TransInferSim is a simulation framework for analyzing transformer inference on hardware.
A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Sel…
From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.
tairov / llama2.py
Forked from karpathy/llama2.cInference Llama 2 in one file of pure Python
A systolic array simulator for multi-cycle MACs and varying-byte words, with the paper accepted to HPCA 2022.
Python code to show how a systolic array works. Written for https://medium.com/@antonpaquin/whats-inside-a-tpu-c013eb51973e
An Eyeriss Chip (researched by MIT, a CNN accelerator) simulator and New DNN framework "Hive"
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
A powerful CLI tool that brings AI-powered code generation and file manipulation directly to your terminal.
Build a Claude Code–like CLI coding agent from scratch.