Skip to content
View jianyuh's full-sized avatar

Organizations

@ULAFF @facebookresearch @pytorch

Block or report jianyuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
45 stars written in C++
Clear filter

An Open Source Machine Learning Framework for Everyone

C++ 192,402 74,979 Updated Nov 11, 2025

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

C++ 38,331 8,080 Updated Jan 30, 2025

Caffe: a fast open framework for deep learning.

C++ 34,720 18,596 Updated Jul 31, 2024

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 899 Updated Sep 30, 2025

Android NDK samples with Android Studio

C++ 10,409 4,249 Updated Oct 3, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,457 958 Updated Oct 24, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,758 1,522 Updated Nov 10, 2025

Low-latency machine code generation

C++ 4,325 549 Updated Nov 2, 2025

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,916 1,083 Updated Nov 11, 2025

lmctfy is the open source version of Google’s container stack, which provides Linux application containers.

C++ 3,414 240 Updated Jun 29, 2015

Icarus Verilog

C++ 3,219 573 Updated Nov 11, 2025

A cross platform C99 library to get cpu features at runtime.

C++ 2,561 288 Updated Nov 7, 2025

fast-cpp-csv-parser

C++ 2,307 439 Updated Feb 2, 2025

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,942 149 Updated Nov 11, 2025

Low-precision matrix multiplication

C++ 1,818 455 Updated Jan 29, 2024

Patterns and behaviors for GPU computing

C++ 1,744 283 Updated Jun 26, 2022

C++ associative containers

C++ 1,595 262 Updated Nov 30, 2021

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,474 679 Updated Nov 11, 2025

Collective communications library with various primitives for multi-machine training.

C++ 1,367 338 Updated Oct 21, 2025

Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

C++ 1,117 433 Updated Aug 4, 2019

a software library containing BLAS functions written in OpenCL

C++ 863 241 Updated Aug 2, 2024

A benchmark for low-level CPU micro-architectural features

C++ 748 70 Updated Feb 8, 2022

C, C++ and Python Code for Exercises and Solutions

C++ 530 188 Updated Dec 17, 2019

Perplexity GPU Kernels

C++ 526 69 Updated Nov 7, 2025

Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction

C++ 513 111 Updated May 14, 2019

STXXL: Standard Template Library for Extra Large Data Sets

C++ 495 99 Updated Dec 22, 2023

GAP Benchmark Suite

C++ 373 158 Updated Mar 9, 2025

A C++ implementation of timsort

C++ 312 47 Updated Dec 3, 2024

An OpenCL backend for torch.

C++ 300 26 Updated Nov 16, 2016
Next