Stars
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
distributed-embeddings is a library for building large embedding based models in Tensorflow 2.
Godot Engine – Multi-platform 2D and 3D game engine
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Compatibility tool for Steam Play based on Wine and additional components
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Optimized primitives for collective multi-GPU communication
Vim-fork focused on extensibility and usability
Asynchronous linting and make framework for Neovim/Vim