Skip to content
View hkproj's full-sized avatar
🦾
每天努力
🦾
每天努力

Block or report hkproj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Puzzles for learning Triton

Jupyter Notebook 2,196 179 Updated Nov 18, 2024

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

419 39 Updated Aug 2, 2025

LLM training parallelisms (DP, FSDP, TP, PP) in pure C

C 26 3 Updated Jul 20, 2025

a minimal cache manager for PagedAttention, on top of llama3.

Python 127 11 Updated Aug 26, 2024

Nano vLLM

Python 10,042 1,256 Updated Nov 3, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,932 922 Updated Dec 15, 2025

Fully Open Language Models with Stellar Performance

Python 312 28 Updated Nov 14, 2025

🔥 A minimal training framework for scaling FLA models

Python 322 49 Updated Nov 15, 2025

Python API for writing multiprocessing pipelines

Python 90 25 Updated Apr 28, 2022

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,947 288 Updated May 15, 2025

Machine Learning Engineering Open Book

Python 16,086 988 Updated Dec 20, 2025

Fully open reproduction of DeepSeek-R1

Python 25,749 2,407 Updated Nov 24, 2025

100 days of building GPU kernels!

Cuda 555 61 Updated Apr 27, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 13,926 1,304 Updated Oct 28, 2025

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

428 39 Updated Feb 22, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,293 355 Updated Dec 23, 2025

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,771 177 Updated Dec 20, 2025

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 437 36 Updated Nov 2, 2025

GPU Kernels

Cuda 212 18 Updated Apr 27, 2025

Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Cuda 76 5 Updated Jul 14, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,296 85 Updated Jul 14, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,178 139 Updated Jan 30, 2025
Jupyter Notebook 460 35 Updated Oct 18, 2024

What would you do with 1000 H100s...

Jupyter Notebook 1,134 69 Updated Jan 10, 2024

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,927 149 Updated Aug 26, 2025

A generic, composable multi-dimensional array library.

C++ 12 1 Updated Dec 20, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,493 1,156 Updated Nov 21, 2025
C# 8 Updated Jan 1, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,892 280 Updated Sep 25, 2025

"Deep Generative Modeling": Introductory Examples

Jupyter Notebook 1,271 196 Updated Aug 30, 2025
Next