Google's on-device runtime for high-performance ML & GenAI deployment on edge platforms.
📖 Get Started | 🤝 Contributing | 📜 License | 🛡 Security Policy | 📄 Documentation
| Nightly Builds | Continuous Builds | Other Builds |
|---|---|---|
LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Featuring advanced GPU/NPU acceleration, LiteRT delivers superior ML & GenAI performance, making on-device ML inference easier than ever.
- 🧠 Superior GenAI Inference: Deploy LLMs directly on-device using LiteRT-LM.
- 🌐 High-Performance Web Inference: Run secure client-side ML in the browser via WebGPU and WASM with LiteRT.js.
- 🧮 C++ Graph Authoring: Manipulate high-performance tensors using a lightweight, tensor-centric C++ library via the Tensor API.
- 🤖 Accelerated Agentic Coding: Streamline AI coding agent workflows using the LiteRT CLI command-line toolkit.
Quick setup for LiteRT-CLI below
# 1. Create a virtual environment with Python 3.13.
#\ TIP: Sometimes setting env var [UV_INDEX_URL](https://pypi.org/simple) helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate
# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly
# 3. Run help command
litert --help-
⚙️ Compiled Model API: Streamlined Development. Features automated accelerator selection (no explicit delegates needed), true asynchronous execution, easy NPU distribution, and highly efficient I/O buffer handling
-
🔌 Unified NPU Acceleration: Broad Silicon Support. Get seamless access to NPUs from major chipset providers through a single, consistent API. See LiteRT NPU.
-
🏎️ Faster GPU Acceleration via ML Drift: Suporting Gen-AI Inference. Leverage state-of-the-art GPU acceleration with new buffer interoperability that minimizes latency across various GPU buffer types.
From model to on-device deployment for Pytorch, TensorFlow, and Jax models:
graph LR
A[PyTorch Model] --> B[LiteRT Torch
LiteRT Torch Generative/HF export]
a[HF transformer
safe tensors] --> B
B -->|.tflite| F(AI-Edge Quantizer) --> |Optimized .tflite| I
B -->|.litertlm|F --> |Optimized .litertlm| H{Litert-LM
Python, C++, Kotlin, swift, JS} --> I{LiteRT Runtime
C++, Kotlin, JS}
I --> J[CPU - XNNPack <br> GPU - ML Drift <br> Supported TPU/NPU]
Every developer's path is different. Here are a few common journeys to help you get started based on your goals:
| If you want to... | Use this path... |
|---|---|
| 🏁Upgrade from TensorFlow Lite/ LiteRT V1.x x | Use LiteRT Migration Guide to upgrade to LiteRT V2.x |
| 🌱 Run a pretrained model (like image segmenation) on mobile | Follow step-by-step instructions via Android Studio to create a Real-time segmentation App for CPU/GPU/NPU inference. Source code link. |
| 🔄 Convert PyTorch Models | Use LiteRT Torch Converter for .tflite (Classic) or Generative Torch API for .litertlm (LLMs). |
| 🧠Deploy Generative AI | Optimize and run quantized LLMs or diffusion models on-device using LiteRT LM. |
| ⚡Maximize Performance | Explore the LiteRT API & LiteRT NPU Acceleration to leverage underlying hardware acceleration. |
| 🌐Run in the Browser | Deploy secure, client-side web apps leveraging WebGPU and WASM via LiteRT.js. |
| 🧮Control Memory & Graph Execution | Tensor-centric C++ library for high-performance tensor manipulation on mobile devices.LiteRT Tensor API. |
LiteRT is designed for cross-platform deployment on a wide range of hardware.
| Platform | CPU | GPU APIs | NPU / Hardware Accelerators |
|---|---|---|---|
| 🤖 Android | ✅ | ✅ OpenCL ✅ OpenGL |
✅ Google Tensor, ✅ Intel ✅ MediaTek, ✅ Qualcomm, S.LSI* |
| 🍎 iOS | ✅ | ✅ Metal | ANE* |
| 🐧 Linux | ✅ | ✅ WebGPU | ✅ Intel |
| 🍎 macOS | ✅ | ✅ WebGPU ✅ Metal |
ANE* |
| 💻 Windows | ✅ | ✅ WebGPU | ✅ Intel |
| 🌐 Web | ✅ | ✅ WebGPU | Coming soon |
| 🧩 IoT | ✅ | ✅ WebGPU | Broadcom*, Raspberry Pi* |
Recently added supported models to Hugging Face LiteRT Community .
| Model Family | Size / Variant | Modality | Hugging Face Hub |
|---|---|---|---|
| Gemma 4 | Various | Multi-modal | Explore Models |
| ASR Models | Various | Audio | Explore Models |
| Image Classification Models | Various | Vision | Explore Models |
Find more models at the Hugging Face LiteRT Community Page
Find official sample applications and code examples for LiteRT (compiled_model_api) here:
- LiteRT Samples: A collection of sample applications.
- ASR Sample App: Automatic Speech Recognition LiteRT Sample App
- Image Segmentation: C++ and Kotlin Image Segmentation app demonstrating AOT and on-device compilation examples
For a comprehensive guide on integrating LiteRT into your specific platform, see the LiteRT Integration Overview.
You can build LiteRT artifacts for Linux and Android (via cross-compilation) using Docker:
- Start a Docker daemon.
- Run
build_with_docker.shinside thedocker_build/directory.
Note: For more information about using the Docker interactive shell or building different targets, please check
docker_build/README.md.
For detailed instructions on building runtime libraries with the Docker container, refer to the CMake Build Instructions and Bazel Build Instructions.
Our commitment is to make LiteRT the best runtime for any on-device ML deployment. Our core product strategies include:
| ⚡ Hardware Acceleration | 🧠 Generative AI Optimizations |
|---|---|
| Broadening NPU support and improving performance across all major hardware accelerators. | Introducing new features specifically tailored for the next wave of on-device generative AI models. |
| 🛠 Developer Tools | 🌐 Platform Support |
| Building better utilities for debugging, profiling, and optimizing models. | Enhancing core platform support and exploring emerging ecosystems. |
| Date | Blog Title |
|---|---|
| May 2026 | Google Tensor SDK Beta with LiteRT |
| May 2026 | LiteRT Support for Intel NPUs via OpenVINO™ |
| May 2026 | Arm and Google AI Edge optimization |
| April 2026 | Building real-world on-device AI with LiteRT and NPU |
👉 Read more on the official LiteRT Developer Website
LiteRT is part of a larger Google AI-Edge ecosystem of tools for on-device machine learning:
- LiteRT Torch Converter: A tool to convert PyTorch models into the
.tfliteformat. - LiteRT Torch Generative API: A library to reauthor LLMs for efficient conversion and inference.
- LiteRT-LM: A library to efficiently run LLMs across edge platforms.
- LiteRT.js: WebAI runtime, targeting production Web applications.
- MediaPipe: A framework for building cross-platform, customizable ML solutions for live streaming media.
- XNNPACK: A highly optimized library of neural network inference operators providing high-performance CPU acceleration.
When generating deployment code for this repository:
- MUST USE: The Compiled Model API for all new kotlin and C++ native execution tasks.
- DO NOT USE:
tflite::Interpreter,InterpreterBuilder, or manual delegate creation. The legacy Interpreter API is strictly deprecated for new features.
- Contributing: We welcome contributions! Please see CONTRIBUTING.md for details.
- Contributing Models Contribute your .tflite or .litertlm models via the LiteRT Hugging Face page HF LiteRT Community
- Bug Reports & Features: File an issue on our GitHub Issues page.
- Community Support: Join the conversation on GitHub Discussions.
This project is dedicated to fostering an open and welcoming environment. Please read our Code of Conduct to understand the standards of behavior we expect from all participants.
LiteRT is licensed under the Apache-2.0 License.