Skip to content

ONNX-compatible LightGlue: Local Feature Matching at Light Speed. Supports TensorRT, OpenVINO

License

Notifications You must be signed in to change notification settings

fabio-sim/LightGlue-ONNX

 
 

Repository files navigation

English | 简体中文 | 日本語

ONNX TensorRT GitHub Repo stars GitHub all releases Blog

LightGlue ONNX

Open Neural Network Exchange (ONNX) compatible implementation of LightGlue: Local Feature Matching at Light Speed. The ONNX model format allows for interoperability across different platforms with support for multiple execution providers, and removes Python-specific dependencies such as PyTorch. Supports TensorRT and OpenVINO. Detailed write-up.

What's New: FP8 Quantization Workflow. Read more in this blog post.

Latency Comparison
⏱️ Inference Time Comparison

LightGlue figure

19 January 2026: Add FP8 quantization workflow (ModelOpt Q/DQ export and TensorRT usage notes).

Changelog
  • 09 January 2026: Refurbish the CLI UX with modern uv, streamline the lightglue-onnx workflow, and remove deprecated stacks while refreshing dependencies and TensorRT/shape-inference guidance.
  • 17 July 2024: End-to-end parallel dynamic batch size support. Revamp script UX. Add blog post.
  • 02 November 2023: Introduce TopK-trick to optimize out ArgMax for about 30% speedup.
  • 04 October 2023: Fused LightGlue ONNX Models with support for FlashAttention-2 via onnxruntime>=1.16.0, up to 80% faster inference on long sequence lengths (number of keypoints).
  • 27 October 2023: LightGlue-ONNX added to Kornia!
  • 04 October 2023: Multihead-attention fusion optimization.
  • 19 July 2023: Add support for TensorRT.
  • 13 July 2023: Add support for Flash Attention.
  • 11 July 2023: Add support for mixed precision.
  • 04 July 2023: Add inference time comparisons.
  • 01 July 2023: Add support for extractor max_num_keypoints.
  • 30 June 2023: Add support for DISK extractor.
  • 28 June 2023: Add end-to-end SuperPoint+LightGlue export & inference pipeline.

⭐ ONNX Export & Inference

We provide a typer CLI lightglue-onnx to easily export LightGlue to ONNX and perform inference using ONNX Runtime. If you would like to try out inference right away, you can download ONNX models that have already been exported here.

📦 Installation (uv)

Inference-only (default):

uv sync

Export support (adds PyTorch + ONNX):

uv sync --group export

TensorRT CLI support:

uv sync --group trt
$ uv run lightglue-onnx --help

Usage: lightglue-onnx [OPTIONS] COMMAND [ARGS]...

LightGlue Dynamo CLI

╭─ Commands ───────────────────────────────────────╮
│ export   Export LightGlue to ONNX.               │
│ infer    Run inference for LightGlue ONNX model. │
| trtexec  Run pure TensorRT inference using       |
|          Polygraphy.                             |
╰──────────────────────────────────────────────────╯

Pass --help to see the available options for each command. The CLI will export the full extractor-matcher pipeline so that you don't have to worry about orchestrating intermediate steps. By default, inference uses CUDA when available and falls back to CPU if the requested provider cannot be loaded.

GPU Prerequisites

The ONNX Runtime CUDA and TensorRT execution providers require compatible CUDA and cuDNN versions for your platform. If you encounter provider loading errors, confirm your CUDA/cuDNN setup against the ONNX Runtime CUDA provider documentation. If you install CUDA/TensorRT runtime libraries via PyPI (e.g. onnxruntime-gpu[cuda,cudnn] and tensorrt), you may need to add the venv paths to LD_LIBRARY_PATH so Polygraphy and the TensorRT EP can find libcudart.so and libnvinfer.so:

export LD_LIBRARY_PATH="$PWD/.venv/lib/python3.12/site-packages/tensorrt_libs:$PWD/.venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:${LD_LIBRARY_PATH:-}"

📖 Example Commands

🔥 ONNX Export
uv run lightglue-onnx export superpoint \
  --num-keypoints 1024 \
  -b 2 -h 1024 -w 1024 \
  -o weights/superpoint_lightglue_pipeline.onnx
🧰 Legacy Export Fallback
uv run lightglue-onnx export superpoint \
  --num-keypoints 1024 \
  -b 2 -h 1024 -w 1024 \
  --legacy-export \
  -o weights/superpoint_lightglue_pipeline.onnx
⚡ ONNX Runtime Inference (CUDA)
uv run lightglue-onnx infer \
  weights/superpoint_lightglue_pipeline.onnx \
  assets/sacre_coeur1.jpg assets/sacre_coeur2.jpg \
  superpoint \
  -h 1024 -w 1024 \
  -d cuda
🚀 ONNX Runtime Inference (TensorRT)
uv run lightglue-onnx infer \
  weights/superpoint_lightglue_pipeline.trt.onnx \
  assets/sacre_coeur1.jpg assets/sacre_coeur2.jpg \
  superpoint \
  -h 1024 -w 1024 \
  -d tensorrt --fp16
🧩 TensorRT Inference
uv run lightglue-onnx trtexec \
  weights/superpoint_lightglue_pipeline.trt.onnx \
  assets/sacre_coeur1.jpg assets/sacre_coeur2.jpg \
  superpoint \
  -h 1024 -w 1024 \
  --fp16
🧪 Quantization (FP8 Q/DQ for TensorRT)
# 1) Export a static-shape ONNX model
uv run lightglue-onnx export superpoint \
  --num-keypoints 1024 \
  -b 2 -h 1024 -w 1024 \
  -o weights/superpoint_lightglue_pipeline.static.onnx

2) Quantize to FP8 (DQ-only graph)

uv run lightglue_dynamo/scripts/quantize.py
--input weights/superpoint_lightglue_pipeline.static.onnx
--output weights/superpoint_lightglue_pipeline.static.fp8.onnx
--extractor superpoint
--height 1024 --width 1024
--quantize-mode fp8
--dq-only
--simplify

3) Run TensorRT (explicit quantized model)

uv run lightglue-onnx trtexec
weights/superpoint_lightglue_pipeline.static.fp8.onnx
assets/sacre_coeur1.jpg assets/sacre_coeur2.jpg
superpoint
-h 1024 -w 1024
--precision-constraints prefer --fp16

🟣 ONNX Runtime Inference (OpenVINO)
uv run lightglue-onnx infer \
  weights/superpoint_lightglue_pipeline.onnx \
  assets/sacre_coeur1.jpg assets/sacre_coeur2.jpg \
  superpoint \
  -h 512 -w 512 \
  -d openvino

Credits

If you use any ideas from the papers or code in this repo, please consider citing the authors of LightGlue and SuperPoint and DISK. Lastly, if the ONNX versions helped you in any way, please also consider starring this repository.

@inproceedings{lindenberger23lightglue,
  author    = {Philipp Lindenberger and
               Paul-Edouard Sarlin and
               Marc Pollefeys},
  title     = {{LightGlue}: Local Feature Matching at Light Speed},
  booktitle = {ArXiv PrePrint},
  year      = {2023}
}
@article{DBLP:journals/corr/abs-1712-07629,
  author       = {Daniel DeTone and
                  Tomasz Malisiewicz and
                  Andrew Rabinovich},
  title        = {SuperPoint: Self-Supervised Interest Point Detection and Description},
  journal      = {CoRR},
  volume       = {abs/1712.07629},
  year         = {2017},
  url          = {http://arxiv.org/abs/1712.07629},
  eprinttype    = {arXiv},
  eprint       = {1712.07629},
  timestamp    = {Mon, 13 Aug 2018 16:47:29 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-1712-07629.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2006-13566,
  author       = {Michal J. Tyszkiewicz and
                  Pascal Fua and
                  Eduard Trulls},
  title        = {{DISK:} Learning local features with policy gradient},
  journal      = {CoRR},
  volume       = {abs/2006.13566},
  year         = {2020},
  url          = {https://arxiv.org/abs/2006.13566},
  eprinttype    = {arXiv},
  eprint       = {2006.13566},
  timestamp    = {Wed, 01 Jul 2020 15:21:23 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2006-13566.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}