Stars
🚀 Level up your GitHub profile readme with customizable cards including LOC statistics!
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
distributed-embeddings is a library for building large embedding based models in Tensorflow 2.
Godot Engine – Multi-platform 2D and 3D game engine
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Compatibility tool for Steam Play based on Wine and additional components
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Optimized primitives for collective multi-GPU communication
Vim-fork focused on extensibility and usability
Asynchronous linting and make framework for Neovim/Vim