Stars
4
stars
written in Python
Clear filter
🚀 Level up your GitHub profile readme with customizable cards including LOC statistics!
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
distributed-embeddings is a library for building large embedding based models in Tensorflow 2.