Skip to content
View harborn's full-sized avatar

Block or report harborn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

My learning notes for ML SYS.

Python 5,772 374 Updated Mar 19, 2026

super repo for rocm systems projects

C++ 315 183 Updated Mar 25, 2026

[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror

C++ 523 278 Updated Mar 25, 2026

Analyze computation-communication overlap in V3/R1.

1,150 145 Updated Mar 21, 2025

Modular RDMA Interface

C++ 98 27 Updated Mar 25, 2026

AI Tensor Engine for ROCm

Python 391 255 Updated Mar 25, 2026

Optimized primitives for collective multi-GPU communication

C++ 4,555 1,181 Updated Mar 24, 2026

Public repo for HF blog posts

Jupyter Notebook 3,348 988 Updated Mar 13, 2026

[ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry

Python 407 13 Updated Sep 26, 2025

High-Performance C++ Fundamental Library

C++ 637 99 Updated Mar 16, 2026

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,548 732 Updated Mar 25, 2026

Fast and memory-efficient exact attention

Python 22,973 2,552 Updated Mar 25, 2026

Transformers 库快速入门教程

Python 1,855 223 Updated Feb 24, 2026

LLM101n: Let's build a Storyteller

36,607 2,004 Updated Aug 1, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,403 1,100 Updated Feb 3, 2026

LLM training in simple, raw C/CUDA

Cuda 29,259 3,446 Updated Jun 26, 2025

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

Shell 730 341 Updated Mar 21, 2026

Proxy: Next Generation Polymorphism in C++

C++ 3,060 224 Updated Jan 29, 2026

Open source code for AlphaFold 2.

Python 14,401 2,592 Updated Mar 13, 2026

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Python 3,314 665 Updated Dec 16, 2025

The Serenity Operating System 🐞

C++ 33,041 3,313 Updated Mar 25, 2026

PyTorch Tutorial for Deep Learning Researchers

Python 32,240 8,258 Updated Aug 15, 2023

Pretrain, finetune and serve LLMs on Intel platforms with Ray

Python 130 36 Updated Sep 23, 2025

LLM inference in C/C++

C++ 99,283 15,802 Updated Mar 25, 2026

A data oriented, simple but powerful DSL language.

Go 49 8 Updated Nov 21, 2025

Port of OpenAI's Whisper model in C/C++

C++ 47,942 5,338 Updated Mar 21, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 98,563 27,295 Updated Mar 25, 2026

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 53,775 6,297 Updated Sep 18, 2024

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 4,312 370 Updated Mar 23, 2026

Hash function quality and speed tests

C++ 2,133 190 Updated Dec 2, 2025
Next