zyan0

🎯

Focusing

Zheng Yan zyan0

🎯

Focusing

206 followers · 188 following

Organizations

Lists (1)

Sort

🚀 My stack

1 repository

Stars

dtolnay / anyhow

Flexible concrete Error type built on std::error::Error

Rust 6,560 213 Updated Mar 24, 2026

rust-lang / hashbrown

Rust port of Google's SwissTable hash map

Rust 2,943 352 Updated Jun 6, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,435 295 Updated Jun 15, 2026

karpathy / LLM101n

LLM101n: Let's build a Storyteller

37,326 2,051 Updated Aug 1, 2024

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,061 6,546 Updated Jun 16, 2026

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 516 72 Updated Nov 26, 2024

state-spaces / mamba

Mamba SSM architecture

Python 18,443 1,755 Updated Jun 15, 2026

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,888 252 Updated Jun 15, 2026

Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 12,047 1,361 Updated Jun 9, 2026

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,422 935 Updated Mar 27, 2024

twitter / the-algorithm-ml

Source code for Twitter's Recommendation Algorithm

Python 10,579 2,236 Updated Jul 10, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,495 776 Updated Jun 15, 2026

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 161,619 33,516 Updated Jun 16, 2026

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,864 7,058 Updated Jun 16, 2026

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,720 388 Updated Apr 9, 2026

meta-pytorch / torchrec

Pytorch domain library for recommendation systems

Python 2,565 654 Updated Jun 16, 2026

LibRerank-Community / LibRerank

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Python 270 46 Updated Feb 21, 2022

bytedance / byteps

A high performance and generic framework for distributed DNN training

Python 3,721 493 Updated Oct 3, 2023

NVIDIA / nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 624 92 Updated Sep 11, 2024

facebook / CacheLib

Pluggable in-process caching engine to build and scale high performance services

C++ 1,558 319 Updated Jun 15, 2026

pytorch / functorch

functorch is JAX-like composable function transforms for PyTorch.

Jupyter Notebook 1,436 107 Updated Aug 21, 2025

llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,845 694 Updated Jun 12, 2026

meta-pytorch / multipy

torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

C++ 179 36 Updated Dec 16, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,450 2,938 Updated Jun 16, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,901 1,908 Updated Jun 16, 2026

pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,078 127 Updated Apr 17, 2024

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 394 34 Updated Mar 18, 2026

microsoft / varuna

Python 251 28 Updated Jul 25, 2024

elastic / logstash

Logstash - transport and process your logs, events, or other data

Java 14,875 3,502 Updated Jun 15, 2026

pytorch / torcharrow

High performance model preprocessing library on PyTorch

Python 642 80 Updated Mar 29, 2024