Skip to content
View zyan0's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@TeaWhen @pkuapp @QSCTech @HaloWordApp

Block or report zyan0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Flexible concrete Error type built on std::error::Error

Rust 6,560 213 Updated Mar 24, 2026

Rust port of Google's SwissTable hash map

Rust 2,943 352 Updated Jun 6, 2026

Tile primitives for speedy kernels

Cuda 3,435 295 Updated Jun 15, 2026

LLM101n: Let's build a Storyteller

37,326 2,051 Updated Aug 1, 2024

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,061 6,546 Updated Jun 16, 2026

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 516 72 Updated Nov 26, 2024

Mamba SSM architecture

Python 18,443 1,755 Updated Jun 15, 2026

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,888 252 Updated Jun 15, 2026

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 12,047 1,361 Updated Jun 9, 2026

Transformer related optimization, including BERT, GPT

C++ 6,422 935 Updated Mar 27, 2024

Source code for Twitter's Recommendation Algorithm

Python 10,579 2,236 Updated Jul 10, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,495 776 Updated Jun 15, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 161,619 33,516 Updated Jun 16, 2026

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,864 7,058 Updated Jun 16, 2026

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,720 388 Updated Apr 9, 2026

Pytorch domain library for recommendation systems

Python 2,565 654 Updated Jun 16, 2026

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Python 270 46 Updated Feb 21, 2022

A high performance and generic framework for distributed DNN training

Python 3,721 493 Updated Oct 3, 2023

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 624 92 Updated Sep 11, 2024

Pluggable in-process caching engine to build and scale high performance services

C++ 1,558 319 Updated Jun 15, 2026

functorch is JAX-like composable function transforms for PyTorch.

Jupyter Notebook 1,436 107 Updated Aug 21, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,845 694 Updated Jun 12, 2026

torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

C++ 179 36 Updated Dec 16, 2025

Development repository for the Triton language and compiler

MLIR 19,450 2,938 Updated Jun 16, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,901 1,908 Updated Jun 16, 2026

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,078 127 Updated Apr 17, 2024

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 394 34 Updated Mar 18, 2026
Python 251 28 Updated Jul 25, 2024

Logstash - transport and process your logs, events, or other data

Java 14,875 3,502 Updated Jun 15, 2026

High performance model preprocessing library on PyTorch

Python 642 80 Updated Mar 29, 2024
Next