Skip to content
View reyoung's full-sized avatar
  • Tencent
  • Beijing

Block or report reyoung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 105 5 Updated Nov 12, 2025

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

Swift 22,152 521 Updated Nov 14, 2025
Rust 1 Updated Oct 11, 2025

Build Virtual Machine Image from Dockerfile or Docker image

Go 328 52 Updated Apr 29, 2025

run regular Docker images in KVM/Qemu

Go 838 48 Updated Apr 16, 2025

Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.

Python 1,445 133 Updated Nov 8, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

High performance server-side application framework

C++ 8,965 1,649 Updated Nov 13, 2025
C++ 316 29 Updated Nov 13, 2025

Set JSON values very quickly in Go

Go 2,656 176 Updated Nov 3, 2025

A Quirky Assortment of CuTe Kernels

Python 653 61 Updated Oct 30, 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 554 33 Updated Nov 14, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,471 252 Updated Nov 14, 2025

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Python 12,036 1,989 Updated Oct 31, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,918 313 Updated Nov 14, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

C 938 106 Updated Nov 10, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 434 72 Updated Nov 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 913 44 Updated Oct 29, 2025

Synchronization and asynchronous computation package for Go

Go 279 15 Updated Jul 5, 2025

A declarative drawing API in Python

Python 298 15 Updated Aug 28, 2024

[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Python 576 43 Updated Feb 11, 2025

Borgo is a statically typed language that compiles to Go.

Rust 4,481 64 Updated Oct 27, 2024

🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

TypeScript 67,708 5,259 Updated Nov 14, 2025

GLake: optimizing GPU memory management and IO transmission.

Python 489 44 Updated Mar 24, 2025

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 4,363 324 Updated Aug 16, 2025

Ring attention implementation with flash attention

Python 909 88 Updated Sep 10, 2025

Ant game engine

Lua 3,918 405 Updated Mar 24, 2025
Next