Skip to content
View achalpandeyy's full-sized avatar
🚀
Focusing
🚀
Focusing

Block or report achalpandeyy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepSeek 4 Flash local inference engine for Metal and CUDA

C 9,927 809 Updated May 16, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,030 93 Updated May 16, 2026

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

Python 808 164 Updated May 8, 2026

The fastest priority queue

Rust 22 3 Updated May 12, 2026
C++ 860 167 Updated May 15, 2026

Get the main content of any page as Markdown.

TypeScript 7,606 305 Updated Apr 28, 2026

A kernel library written in tilelang

Python 1,523 126 Updated Apr 23, 2026

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

71,038 6,994 Updated Jan 4, 2026

sitp: run nanochat by building teenygrad from scratch: the bridge from micrograd to tinygrad!

Python 73 9 Updated May 15, 2026

The open-source agent-serving project

Python 413 27 Updated May 12, 2026

An Extensible Deep Learning Library

Python 2,356 405 Updated May 16, 2026

The agent that grows with you

Python 153,016 24,336 Updated May 16, 2026

Fast state-of-the-art image and video segmentation in portable C/C++

C++ 298 26 Updated Apr 10, 2026

A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.

C++ 294 11 Updated Jan 29, 2025
Jupyter Notebook 1,081 698 Updated May 12, 2026

A modern causal profiler built leveraging Linux tracepoints

Zig 11 Updated May 13, 2026

VisuTwin Canvas

C++ 11 Updated May 7, 2026

Protocol Buffers implementation in C

C++ 2,965 767 Updated Apr 7, 2025

8x Faster JavaScript 3D Library.

TypeScript 538 34 Updated Apr 2, 2026

A simple HTTP server written from scratch as a teaching tool to teach Unix network program architectures

C 396 56 Updated Apr 28, 2019

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++ 132 7 Updated Apr 14, 2026

A small, portable and extensible C++ 3D coding framework

C++ 2,062 203 Updated Feb 6, 2023

Distribute and run LLMs with a single file.

C++ 24 Updated May 13, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,995 1,108 Updated May 3, 2026

Fast, accurate & comprehensive text measurement & layout

TypeScript 47,017 2,595 Updated May 11, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 227 40 Updated May 1, 2026

Machine learning framework written in C.

C 104 13 Updated Apr 25, 2026

GLyphy is an implementation of the Slug algorithm for GPU text rasterization

C++ 839 80 Updated Mar 30, 2026

AI Tensor Engine for ROCm

Python 433 313 Updated May 16, 2026

WebGPU implementation of Eric Lengyel's Slug algorithm for resolution-independent vector text rendering on the GPU

TypeScript 190 3 Updated Mar 25, 2026
Next