Skip to content

google-ai-edge/LiteRT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5,123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

LiteRT

LiteRT Logo

Google's on-device runtime for high-performance ML & GenAI deployment on edge platforms.

📖 Get Started | 🤝 Contributing | 📜 License | 🛡 Security Policy | 📄 Documentation


🛠 Build Status

Nightly Builds Continuous Builds Other Builds
Linux Nightly Wheel
macOS Nightly Wheel
Windows Nightly Wheel
macOS arm64
Linux x86_64
Windows x86_64
CMake Android Linux x86_64

📖 LiteRT

LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Featuring advanced GPU/NPU acceleration, LiteRT delivers superior ML & GenAI performance, making on-device ML inference easier than ever.

🚀 What's New

  • 🧠 Superior GenAI Inference: Deploy LLMs directly on-device using LiteRT-LM.
  • 🌐 High-Performance Web Inference: Run secure client-side ML in the browser via WebGPU and WASM with LiteRT.js.
  • 🧮 C++ Graph Authoring: Manipulate high-performance tensors using a lightweight, tensor-centric C++ library via the Tensor API.
  • 🤖 Accelerated Agentic Coding: Streamline AI coding agent workflows using the LiteRT CLI command-line toolkit.

Quick setup for LiteRT-CLI below

# 1. Create a virtual environment with Python 3.13.
#\ TIP: Sometimes setting env var [UV_INDEX_URL](https://pypi.org/simple) helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate

# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly

# 3. Run help command
litert --help

💎 Key Features of LiteRT V2

  • ⚙️ Compiled Model API: Streamlined Development. Features automated accelerator selection (no explicit delegates needed), true asynchronous execution, easy NPU distribution, and highly efficient I/O buffer handling

  • 🔌 Unified NPU Acceleration: Broad Silicon Support. Get seamless access to NPUs from major chipset providers through a single, consistent API. See LiteRT NPU.

  • 🏎️ Faster GPU Acceleration via ML Drift: Suporting Gen-AI Inference. Leverage state-of-the-art GPU acceleration with new buffer interoperability that minimizes latency across various GPU buffer types.


⚙️ LiteRT Runtime and Tools

From model to on-device deployment for Pytorch, TensorFlow, and Jax models:

graph LR
    A[PyTorch Model] --> B[LiteRT Torch

LiteRT Torch Generative/HF export]
    a[HF transformer
    safe tensors] --> B
    B -->|.tflite| F(AI-Edge Quantizer) --> |Optimized  .tflite| I
    B -->|.litertlm|F --> |Optimized .litertlm| H{Litert-LM
    Python, C++, Kotlin, swift, JS} --> I{LiteRT Runtime
    C++, Kotlin, JS}
    I --> J[CPU - XNNPack <br> GPU - ML Drift <br> Supported TPU/NPU]
Loading

🗺 Choose Your Adventure

Every developer's path is different. Here are a few common journeys to help you get started based on your goals:

If you want to... Use this path...
🏁Upgrade from TensorFlow Lite/ LiteRT V1.x x Use LiteRT Migration Guide to upgrade to LiteRT V2.x
🌱 Run a pretrained model (like image segmenation) on mobile Follow step-by-step instructions via Android Studio to create a Real-time segmentation App for CPU/GPU/NPU inference. Source code link.
🔄 Convert PyTorch Models Use LiteRT Torch Converter for .tflite (Classic) or Generative Torch API for .litertlm (LLMs).
🧠Deploy Generative AI Optimize and run quantized LLMs or diffusion models on-device using LiteRT LM.
⚡Maximize Performance Explore the LiteRT API & LiteRT NPU Acceleration to leverage underlying hardware acceleration.
🌐Run in the Browser Deploy secure, client-side web apps leveraging WebGPU and WASM via LiteRT.js.
🧮Control Memory & Graph Execution Tensor-centric C++ library for high-performance tensor manipulation on mobile devices.LiteRT Tensor API.

💻 Platforms Supported

LiteRT is designed for cross-platform deployment on a wide range of hardware.

Platform CPU GPU APIs NPU / Hardware Accelerators
🤖 Android ✅ OpenCL
✅ OpenGL
✅ Google Tensor, ✅ Intel ✅ MediaTek, ✅ Qualcomm, S.LSI*
🍎 iOS ✅ Metal ANE*
🐧 Linux ✅ WebGPU ✅ Intel
🍎 macOS ✅ WebGPU
✅ Metal
ANE*
💻 Windows ✅ WebGPU ✅ Intel
🌐 Web ✅ WebGPU Coming soon
🧩 IoT ✅ WebGPU Broadcom*, Raspberry Pi*

📊 New Models

Recently added supported models to Hugging Face LiteRT Community .

Model Family Size / Variant Modality Hugging Face Hub
Gemma 4 Various Multi-modal Explore Models
ASR Models Various Audio Explore Models
Image Classification Models Various Vision Explore Models

Find more models at the Hugging Face LiteRT Community Page


🔗 Sample Apps & Colabs

Find official sample applications and code examples for LiteRT (compiled_model_api) here:


🏁 Installation

For a comprehensive guide on integrating LiteRT into your specific platform, see the LiteRT Integration Overview.

🔨 Building from Source

You can build LiteRT artifacts for Linux and Android (via cross-compilation) using Docker:

  1. Start a Docker daemon.
  2. Run build_with_docker.sh inside the docker_build/ directory.

Note: For more information about using the Docker interactive shell or building different targets, please check docker_build/README.md.

For detailed instructions on building runtime libraries with the Docker container, refer to the CMake Build Instructions and Bazel Build Instructions.

🚀 Roadmap

Our commitment is to make LiteRT the best runtime for any on-device ML deployment. Our core product strategies include:

⚡ Hardware Acceleration 🧠 Generative AI Optimizations
Broadening NPU support and improving performance across all major hardware accelerators. Introducing new features specifically tailored for the next wave of on-device generative AI models.
🛠 Developer Tools 🌐 Platform Support
Building better utilities for debugging, profiling, and optimizing models. Enhancing core platform support and exploring emerging ecosystems.

📰 Latest from the LiteRT Team & Partners

Date Blog Title
May 2026 Google Tensor SDK Beta with LiteRT
May 2026 LiteRT Support for Intel NPUs via OpenVINO™
May 2026 Arm and Google AI Edge optimization
April 2026 Building real-world on-device AI with LiteRT and NPU

👉 Read more on the official LiteRT Developer Website


🔗 Related Products

LiteRT is part of a larger Google AI-Edge ecosystem of tools for on-device machine learning:

  • LiteRT Torch Converter: A tool to convert PyTorch models into the .tflite format.
  • LiteRT Torch Generative API: A library to reauthor LLMs for efficient conversion and inference.
  • LiteRT-LM: A library to efficiently run LLMs across edge platforms.
  • LiteRT.js: WebAI runtime, targeting production Web applications.
  • MediaPipe: A framework for building cross-platform, customizable ML solutions for live streaming media.
  • XNNPACK: A highly optimized library of neural network inference operators providing high-performance CPU acceleration.

🛑 AI Coding Directives (Strict Constraints)

When generating deployment code for this repository:

  • MUST USE: The Compiled Model API for all new kotlin and C++ native execution tasks.
  • DO NOT USE: tflite::Interpreter, InterpreterBuilder, or manual delegate creation. The legacy Interpreter API is strictly deprecated for new features.

🙌 Contributing & Getting Help

  • Contributing: We welcome contributions! Please see CONTRIBUTING.md for details.
  • Contributing Models Contribute your .tflite or .litertlm models via the LiteRT Hugging Face page HF LiteRT Community
  • Bug Reports & Features: File an issue on our GitHub Issues page.
  • Community Support: Join the conversation on GitHub Discussions.

❤️ Code of Conduct

This project is dedicated to fostering an open and welcoming environment. Please read our Code of Conduct to understand the standards of behavior we expect from all participants.

📜 License

LiteRT is licensed under the Apache-2.0 License.

About

LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors