zhuoyuan

zhuoyuan zhuoyuan

19 followers · 19 following

BIT，ICT

Stars

anomalyco / opencode

The open source coding agent.

TypeScript 175,007 21,203 Updated Jun 16, 2026

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 18,124 1,150 Updated May 18, 2026

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3,404 545 Updated Jun 16, 2026

Comfy-Org / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 117,171 13,703 Updated Jun 16, 2026

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,724 1,369 Updated Jun 12, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,262 1,152 Updated Jun 16, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,732 1,283 Updated Jun 15, 2026

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 195,680 75,185 Updated Jun 16, 2026

yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

C 163 31 Updated Feb 3, 2022

uxlfoundation / oneDNN

oneAPI Deep Neural Network Library (oneDNN)

C++ 4,008 1,147 Updated Jun 16, 2026

flame / how-to-optimize-gemm

C 2,015 363 Updated Jul 29, 2023

oneapi-src / oneAPI-samples

Samples for Intel® oneAPI Toolkits

C++ 1,153 743 Updated Jun 15, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,381 1,050 Updated Jun 4, 2026

clamchowder / Microbenchmarks

Trying to figure various CPU things out

C 169 28 Updated Jan 31, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,004 287 Updated May 15, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,706 1,061 Updated Apr 30, 2026

andikleen / pmu-tools

Intel PMU profiling tools

Python 2,230 362 Updated Apr 28, 2026

google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 5,624 441 Updated Jun 15, 2026

FeCastle / tiptop

Tiptop is a performance monitoring tool for Linux. It provides a dynamic real-time view of the tasks running in the system. tiptop is very similar to the top utility, but most of the information di…

C 157 16 Updated Dec 11, 2022