Skip to content
View zhuoyuan's full-sized avatar

Block or report zhuoyuan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The open source coding agent.

TypeScript 175,007 21,203 Updated Jun 16, 2026

Machine Learning Engineering Open Book

Python 18,124 1,150 Updated May 18, 2026

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3,404 545 Updated Jun 16, 2026

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 117,171 13,703 Updated Jun 16, 2026

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,724 1,369 Updated Jun 12, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,262 1,152 Updated Jun 16, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,732 1,283 Updated Jun 15, 2026

An Open Source Machine Learning Framework for Everyone

C++ 195,680 75,185 Updated Jun 16, 2026

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

C 163 31 Updated Feb 3, 2022

oneAPI Deep Neural Network Library (oneDNN)

C++ 4,008 1,147 Updated Jun 16, 2026

Samples for Intel® oneAPI Toolkits

C++ 1,153 743 Updated Jun 15, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,381 1,050 Updated Jun 4, 2026

Trying to figure various CPU things out

C 169 28 Updated Jan 31, 2026

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,004 287 Updated May 15, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,706 1,061 Updated Apr 30, 2026

Intel PMU profiling tools

Python 2,230 362 Updated Apr 28, 2026

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 5,624 441 Updated Jun 15, 2026

Tiptop is a performance monitoring tool for Linux. It provides a dynamic real-time view of the tasks running in the system. tiptop is very similar to the top utility, but most of the information di…

C 157 16 Updated Dec 11, 2022

High performance server-side application framework

C++ 9,269 1,688 Updated Jun 11, 2026

A microbenchmark support library

C++ 10,236 1,769 Updated Jun 15, 2026

A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2

C 2,246 308 Updated Jun 11, 2026

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++ 16,844 4,140 Updated Jun 16, 2026

A composable and fully extensible C++ execution engine library for data management systems.

C++ 4,155 1,529 Updated Jun 16, 2026

A High-Performance JIT-Based C++ Expression/Script Execution Engine with SIMD Vectorization Support

C++ 99 7 Updated May 24, 2026

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,687 2,244 Updated Dec 1, 2025

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 636 144 Updated Sep 4, 2025

DeepSeek Coder: Let the Code Write Itself

Python 23,690 2,852 Updated Nov 11, 2025

Library providing helpers for the Linux kernel io_uring support

C 3,684 518 Updated Jun 15, 2026

Proxy: Next Generation Polymorphism in C++

C++ 3,045 224 Updated Jan 29, 2026
Next