Skip to content
View andoorve's full-sized avatar
👋
Hi!
👋
Hi!
  • CentML
  • 04:29 (UTC -08:00)

Block or report andoorve

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A custom AI chip to be taped out soon!

Python 28 Updated Dec 20, 2025

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 770 26 Updated Oct 13, 2025

Fast and memory-efficient exact attention

Python 21,196 2,232 Updated Dec 18, 2025

Tile primitives for speedy kernels

Cuda 3,008 217 Updated Dec 9, 2025

A basic introduction to coding in modern C++.

C++ 1,016 217 Updated Jul 30, 2024

LLM training in simple, raw C/CUDA

Cuda 28,433 3,334 Updated Jun 26, 2025

Trio – a friendly Python library for async concurrency and I/O

Python 7,071 374 Updated Dec 15, 2025

Lightweight and extensible LLM Inference serving benchmark tool written in Rust.

Rust 4 Updated Apr 4, 2024

Empowering everyone to build reliable and efficient software.

Rust 108,640 14,245 Updated Dec 20, 2025

The website for PyTorch

HTML 271 312 Updated Dec 11, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,033 26,304 Updated Dec 20, 2025

Projects for an undergraduate OS course

C 5,368 1,468 Updated Jul 19, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,310 78 Updated Mar 6, 2025

A listing of compiler, language and runtime teams for people looking for jobs in this area

HTML 689 72 Updated Dec 9, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,817 12,086 Updated Dec 20, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 397 52 Updated Oct 10, 2025

Curated coding interview preparation materials for busy software engineers

TypeScript 136,317 16,328 Updated Nov 18, 2025

Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).

768 24 Updated Jul 20, 2023

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,033 1,384 Updated Dec 18, 2025

Memory footprint reduction for transformer models

Python 11 2 Updated Jan 24, 2023

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,840 909 Updated Sep 1, 2024

An open-source efficient deep learning framework/compiler, written in python.

Python 737 68 Updated Sep 4, 2025

NumPy & SciPy for GPU

Python 10,674 981 Updated Dec 18, 2025

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Python 199 30 Updated Dec 22, 2022

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Python 1,334 186 Updated Jul 8, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,077 31,498 Updated Dec 20, 2025

Course Page for Computer Graphics course

CSS 192 70 Updated Mar 15, 2022
Python 12 4 Updated May 3, 2020