jianyuh

Jianyu Huang jianyuh

Beat the speed of light.

292 followers · 39 following

Meta
http://jianyuhuang.com/

Achievements

x3 x2

Achievements

x3 x2

Organizations

Lists (1)

Sort

MLSys_Learn

4 repositories

Stars

45 stars written in C++

Clear filter

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 192,402 74,979 Updated Nov 11, 2025

google / leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

C++ 38,331 8,080 Updated Jan 30, 2025

BVLC / caffe

Caffe: a fast open framework for deep learning.

C++ 34,720 18,596 Updated Jul 31, 2024

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 899 Updated Sep 30, 2025

android / ndk-samples

Android NDK samples with Android Studio

C++ 10,409 4,249 Updated Oct 3, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,457 958 Updated Oct 24, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,758 1,522 Updated Nov 10, 2025

asmjit / asmjit

Low-latency machine code generation

C++ 4,325 549 Updated Nov 2, 2025

uxlfoundation / oneDNN

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,916 1,083 Updated Nov 11, 2025

google / lmctfy

lmctfy is the open source version of Google’s container stack, which provides Linux application containers.

C++ 3,414 240 Updated Jun 29, 2015

steveicarus / iverilog

Icarus Verilog

C++ 3,219 573 Updated Nov 11, 2025

google / cpu_features

A cross platform C99 library to get cpu features at runtime.

C++ 2,561 288 Updated Nov 7, 2025

ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser

C++ 2,307 439 Updated Feb 2, 2025

tqchen / tinyflow

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,942 149 Updated Nov 11, 2025

google / gemmlowp

Low-precision matrix multiplication

C++ 1,818 455 Updated Jan 29, 2024

moderngpu / moderngpu

Patterns and behaviors for GPU computing

C++ 1,744 283 Updated Jun 26, 2022

sparsehash / sparsehash

C++ associative containers

C++ 1,595 262 Updated Nov 30, 2021

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,474 679 Updated Nov 11, 2025

pytorch / gloo

Collective communications library with various primitives for multi-machine training.

C++ 1,367 338 Updated Oct 21, 2025

dmlc / mshadow

Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

C++ 1,117 433 Updated Aug 4, 2019

clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL

C++ 863 241 Updated Aug 2, 2024

travisdowns / uarch-bench

A benchmark for low-level CPU micro-architectural features

C++ 748 70 Updated Feb 8, 2022

HandsOnOpenCL / Exercises-Solutions

C, C++ and Python Code for Exercises and Solutions

C++ 530 188 Updated Dec 17, 2019

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 526 69 Updated Nov 7, 2025

elemental / Elemental

Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction

C++ 513 111 Updated May 14, 2019

stxxl / stxxl

STXXL: Standard Template Library for Extra Large Data Sets

C++ 495 99 Updated Dec 22, 2023

sbeamer / gapbs

GAP Benchmark Suite

C++ 373 158 Updated Mar 9, 2025

timsort / cpp-TimSort

A C++ implementation of timsort

C++ 312 47 Updated Dec 3, 2024

hughperkins / cltorch

An OpenCL backend for torch.

C++ 300 26 Updated Nov 16, 2016