Skip to content
View skinzer's full-sized avatar

Block or report skinzer

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Learning TileLang with 10 puzzles!

Python 228 27 Updated Apr 28, 2026

Custom memory allocator in C++ built from scratch using mmap. Allocates a 1MB memory pool upfront and carves blocks from it to keep all allocations contiguous. Implements malloc, free, block reuse …

C++ 27 1 Updated Apr 27, 2026

Experimental GPU language with meta-programming

Jupyter Notebook 31 Updated Sep 6, 2024

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 244 15 Updated Apr 27, 2026

Pure C / AVX-512 port of Craftax-Classic. 47.8M SPS on a Ryzen 9 9950X3D -- 3.2x an RTX Pro 6000 Blackwell on the same env.

Python 22 1 Updated Apr 27, 2026

HTML representation of the Intel x86 instructions documentation.

Python 524 88 Updated Dec 5, 2014

Library Toolkit for Microcontrollers

C++ 9 1 Updated Jan 5, 2026

SIMD-accelerated distances, dot products, matrix ops, geospatial & geometric kernels for 16 numeric types — from 6-bit floats to 64-bit complex — across x86, Arm, RISC-V, and WASM, with bindings fo…

C 1,803 117 Updated Apr 23, 2026

x86-64, ARM, and RVV intrinsics viewer

JavaScript 79 4 Updated Feb 15, 2026

High-performance LLM inference engine — drop-in replacement for Ollama with faster multi-turn inference, lower TTFT, and higher throughput through prefix caching and continuous batching.

Rust 141 19 Updated Apr 25, 2026

Interactive version of the CuTe layout paper

Jupyter Notebook 57 7 Updated Apr 14, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 172 11 Updated Apr 28, 2026

Machine learning framework written in C.

C 104 13 Updated Apr 25, 2026

A zero-dependency ML framework in C with a modern Python API for full control over execution and memory.

C++ 683 35 Updated Apr 26, 2026

Megapack of LeetCode solutions in many different languages

C++ 56 8 Updated Mar 2, 2026

nCPU: model-native and tensor-optimized CPU research runtimes with organized workloads, tools, and docs

Python 637 28 Updated Apr 18, 2026

KV Cache & LoRA for minGPT

Python 62 8 Updated Mar 4, 2026

An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.

TypeScript 128 15 Updated Mar 21, 2026

creating a tiny tensor library in raw C

C 1,394 122 Updated Mar 5, 2025

GPU Engineering for AI Systems

HTML 299 35 Updated Apr 21, 2026

16-bit CPU emulator from scratch in pure C

C 138 13 Updated Feb 24, 2026

random snippets of c

C 5 39 Updated Mar 16, 2013

minimal compiler

MLIR 22 3 Updated Feb 19, 2026

Because `model.fit()` isn't an explanation

Python 1,295 98 Updated Apr 26, 2026

Exercises for Learning MLIR (Originally written for PPoPP 2026)

C++ 96 3 Updated Feb 5, 2026

The lcc retargetable ANSI C compiler

C 2,553 488 Updated Oct 6, 2024

A C++ repository for Competitive Programmer's Handbook by Antti Laaksonen

C++ 32 6 Updated Nov 28, 2020

Algorithms from Competitive Programmer's Handbook by Antti Laaksonen

C++ 25 2 Updated Dec 14, 2025

Data Structure Algorithms, (GenAI/ML) System Design, Machine Learning, DevOps coding interview practices

799 206 Updated Oct 7, 2025

A collection of solutions for all problem statements on the AlgoExpert Coding Interview platform.

Python 485 238 Updated Mar 26, 2023
Next