High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms
TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.
For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.
Complete documentation for installation, usage, and deployment:
- Overview - What is TensorRT Edge-LLM and key features
- Quick Start Guide - Get started in ~15 minutes
- Installation - Detailed installation instructions
- Supported Models - Complete model compatibility matrix
- Python Export Pipeline - Model export and quantization
- Engine Builder - Building TensorRT engines
- C++ Runtime Overview - Runtime system architecture
- Examples - Working code examples
- Chat Template Format - Chat template configuration
- TensorRT Plugins - Introduction for TensorRT plugins.
- Examples Directory - LLM and VLM inference examples
- Tests - Comprehensive test suite for contributors
π Automotive
- In-vehicle AI assistants
- Voice-controlled interfaces
- Scene understanding
- Driver assistance systems
π€ Robotics
- Natural language interaction
- Task planning and reasoning
- Visual question answering
- Human-robot collaboration
π Industrial IoT
- Equipment monitoring with NLP
- Automated inspection
- Predictive maintenance
- Voice-controlled machinery
π± Edge Devices
- On-device chatbots
- Offline language processing
- Privacy-preserving AI
- Low-latency inference
Coming soon
Stay tuned for technical deep-dives, optimization guides, and deployment best practices.
Coming soon
Follow our GitHub repository for the latest updates, releases, and announcements.
- Documentation: Developer Guide
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Forums: NVIDIA Developer Forums
We welcome contributions! Please see our Contributing Guidelines for details.