Skip to content
View harborn's full-sized avatar

Block or report harborn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

My learning notes for ML SYS.

Python 5,893 383 Updated Apr 3, 2026

super repo for rocm systems projects

C++ 325 192 Updated Apr 5, 2026

[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror

C++ 525 281 Updated Apr 3, 2026

Analyze computation-communication overlap in V3/R1.

1,148 145 Updated Mar 21, 2025

Modular RDMA Interface

C++ 105 30 Updated Apr 3, 2026

AI Tensor Engine for ROCm

Python 399 271 Updated Apr 5, 2026

Optimized primitives for collective multi-GPU communication

C++ 4,589 1,195 Updated Apr 4, 2026

Public repo for HF blog posts

Jupyter Notebook 3,363 992 Updated Apr 4, 2026

[ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry

Python 411 13 Updated Sep 26, 2025

High-Performance C++ Fundamental Library

C++ 637 100 Updated Mar 16, 2026

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,549 730 Updated Apr 4, 2026

Fast and memory-efficient exact attention

Python 23,144 2,583 Updated Apr 4, 2026

Transformers 库快速入门教程

Python 1,858 224 Updated Feb 24, 2026

LLM101n: Let's build a Storyteller

36,664 2,006 Updated Aug 1, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,408 1,101 Updated Feb 3, 2026

LLM training in simple, raw C/CUDA

Cuda 29,385 3,490 Updated Jun 26, 2025

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

Shell 729 339 Updated Apr 5, 2026

Proxy: Next Generation Polymorphism in C++

C++ 3,056 224 Updated Jan 29, 2026

Open source code for AlphaFold 2.

Python 14,433 2,597 Updated Apr 1, 2026

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Python 3,321 667 Updated Dec 16, 2025

The Serenity Operating System 🐞

C++ 33,066 3,313 Updated Apr 3, 2026

PyTorch Tutorial for Deep Learning Researchers

Python 32,251 8,257 Updated Aug 15, 2023

Pretrain, finetune and serve LLMs on Intel platforms with Ray

Python 130 36 Updated Sep 23, 2025

LLM inference in C/C++

C++ 101,484 16,367 Updated Apr 5, 2026

A data oriented, simple but powerful DSL language.

Go 49 8 Updated Nov 21, 2025

Port of OpenAI's Whisper model in C/C++

C++ 48,316 5,382 Updated Mar 29, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 98,808 27,405 Updated Apr 5, 2026

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 53,856 6,311 Updated Sep 18, 2024

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 4,327 371 Updated Apr 4, 2026

Hash function quality and speed tests

C++ 2,137 190 Updated Dec 2, 2025
Next