moe
Here are 5 public repositories matching this topic...
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
-
Updated
Mar 15, 2024 - C++
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
-
Updated
Nov 11, 2025 - C++
Improve this page
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."