Stars
5
results
for source starred repositories
written in C++
Clear filter
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
FlashMLA: Efficient Multi-head Latent Attention Kernels
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.