- Chennai, India
- https://blog.abhinandb.com
- @abhinand58
Highlights
Starred repositories
AI coding agent that edits symbols, not strings. AST surgery, full LSP, and a live code graph wired to memory that resurfaces by file, co-change, and semantics.
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Control panel for VLLM, Sglang, llama.cpp, exllamav3
A terminal workspace with batteries included
DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm
How much experts do we need to serve a model?
The pretty much "official" DSPy framework for Typescript
Learn it. Build it. Ship it for others.
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
Linux & Powershell scripts to easily set up and run the Qwen 3.5 series locally on Windows and Linux with llama.cpp.
Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.
Artifact integrity and drift detection for ML and data pipelines.
Autonomous experiment loop extension for pi
TheTom / llama-cpp-turboquant
Forked from ggml-org/llama.cppLLM inference in C/C++
See where your AI tokens go. Interactive TUI dashboard for Claude Code, Codex, and Cursor cost observability. npx codeburn
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
turbo-tan / llama.cpp-tq3
Forked from ggml-org/llama.cppllama.cpp fork with TQ3_1S/4S CUDA kernels — 3.5-bit WHT quantization achieving Q4s quality at 10% smaller size. Based on RaBitQ-inspired Walsh-Hadamard transform. Enables 27B models on 16GB GPUs w…
omo/lazycodex: The coding agent for tokenmaxxers;the one and only agent harness for complex codebases. For your Codex, for your OpenCode
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
OCR model that handles complex tables, forms, handwriting with full layout.
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
Skills for Real Engineers. Straight from my .claude directory.