moe

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Nov 11, 2025
C++

Improve this page

Add a description, image, and links to the moe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moe

Here are 5 public repositories matching this topic...

tobychui / Weather-Pet-Display

Harry-Chen / InfMoE

yvonwin / qwen2.cpp

inferflow / inferflow

NVIDIA / TensorRT-LLM

Improve this page

Add this topic to your repo