Skip to content
View hkproj's full-sized avatar
🦾
每天努力
🦾
每天努力

Block or report hkproj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

from vibe coding to agentic engineering - practice makes claude perfect

HTML 57,677 5,792 Updated Jun 14, 2026

Puzzles for learning Triton

Jupyter Notebook 2,487 237 Updated Apr 1, 2026

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

487 46 Updated Aug 2, 2025

LLM training parallelisms (DP, FSDP, TP, PP) in pure C

C 29 4 Updated Jan 27, 2026

a minimal cache manager for PagedAttention, on top of llama3.

Python 146 12 Updated Aug 26, 2024

Nano vLLM

Python 14,021 2,212 Updated Apr 26, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,706 1,058 Updated Apr 30, 2026

Fully Open Language Models with Stellar Performance

Python 320 29 Updated May 13, 2026

🔥 A minimal training framework for scaling FLA models

Python 392 63 Updated Apr 22, 2026

Python API for writing multiprocessing pipelines

Python 90 24 Updated Apr 28, 2022

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,002 287 Updated May 15, 2025

Machine Learning Engineering Open Book

Python 18,112 1,150 Updated May 18, 2026

Fully open reproduction of DeepSeek-R1

Python 26,310 2,437 Updated Apr 2, 2026

100 days of building GPU kernels!

Cuda 602 77 Updated Apr 27, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 16,490 1,557 Updated May 26, 2026

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

453 45 Updated Feb 22, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,495 602 Updated Jun 13, 2026

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,960 208 Updated Jun 6, 2026

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 461 37 Updated Nov 2, 2025

GPU Kernels

Cuda 225 25 Updated Apr 27, 2025

Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Cuda 90 5 Updated Jul 14, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,376 82 Updated Jul 14, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,215 140 Updated Jan 30, 2025
Jupyter Notebook 506 45 Updated Oct 18, 2024

What would you do with 1000 H100s...

Jupyter Notebook 1,175 73 Updated Jan 10, 2024

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 2,219 190 Updated Aug 26, 2025

A generic, composable multi-dimensional array library.

C++ 12 1 Updated May 23, 2026

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 12,203 1,252 Updated Nov 21, 2025
C# 8 Updated Jan 1, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 4,542 357 Updated Jan 5, 2026
Next