TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.

Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.

Documentation

Developer Guide

Complete documentation for installation, usage, and deployment:

Overview - What is TensorRT Edge-LLM and key features
Quick Start Guide - Get started in ~15 minutes
Installation - Detailed installation instructions
Supported Models - Complete model compatibility matrix
Python Export Pipeline - Model export and quantization
Engine Builder - Building TensorRT engines
C++ Runtime Overview - Runtime system architecture
Examples - Working code examples
Chat Template Format - Chat template configuration
TensorRT Plugins - Introduction for TensorRT plugins.

Additional Resources

Examples Directory - LLM and VLM inference examples
Tests - Comprehensive test suite for contributors

Use Cases

🚗 Automotive

In-vehicle AI assistants
Voice-controlled interfaces
Scene understanding
Driver assistance systems

🤖 Robotics

Natural language interaction
Task planning and reasoning
Visual question answering
Human-robot collaboration

🏭 Industrial IoT

Equipment monitoring with NLP
Automated inspection
Predictive maintenance
Voice-controlled machinery

📱 Edge Devices

On-device chatbots
Offline language processing
Privacy-preserving AI
Low-latency inference

Tech Blogs

Coming soon

Stay tuned for technical deep-dives, optimization guides, and deployment best practices.

Latest News

Coming soon

Follow our GitHub repository for the latest updates, releases, and announcements.

Support

Documentation: Developer Guide
Issues: GitHub Issues
Discussions: GitHub Discussions
Forums: NVIDIA Developer Forums

License

Apache License 2.0

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
3rdParty		3rdParty
cmake		cmake
cpp		cpp
docs		docs
examples		examples
kernelSrcs		kernelSrcs
tensorrt_edgellm		tensorrt_edgellm
tests		tests
unittests		unittests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODING_GUIDELINES.md		CODING_GUIDELINES.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_HEADER		LICENSE_HEADER
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Developer Guide

Additional Resources

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Developer Guide

Additional Resources

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages