Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- Arduino
- Assembly
- Batchfile
- Blade
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Cuda
- Cython
- D
- Emacs Lisp
- Erlang
- Fortran
- Go
- HTML
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Less
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- Perl
- Pure Data
- Python
- R
- Rich Text Format
- Ruby
- Rust
- Scala
- Scheme
- Shell
- Swift
- TLA
- TeX
- TypeScript
- VHDL
- Verilog
- Vim Script
Flexible concrete Error type built on std::error::Error
Tile primitives for speedy kernels
SGLang is a high-performance serving framework for large language models and multimodal models.
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Transformer related optimization, including BERT, GPT
Source code for Twitter's Recommendation Algorithm
Hackable and optimized Transformers building blocks, supporting a composable construction.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Pytorch domain library for recommendation systems
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.
A high performance and generic framework for distributed DNN training
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
Pluggable in-process caching engine to build and scale high performance services
functorch is JAX-like composable function transforms for PyTorch.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.
Development repository for the Triton language and compiler
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
Logstash - transport and process your logs, events, or other data
High performance model preprocessing library on PyTorch