Skip to content

Conversation

@ahojnnes
Copy link
Contributor

No description provided.

@tsattler
Copy link
Contributor

Maybe you can consider adding xfeat? I have been playing around with aliked and xfeat. In general, aliked seems to be better. But if there is no GPU, xfeat is significantly faster to extract (and to match).

@ahojnnes
Copy link
Contributor Author

Thanks for the suggestion. ALIKED is a bit of a nightmare to convert to ONNX anyway because of custom layers. XFeat was easier and seems to work. I'll push this a bit further in the next days. Will let you know if I am done @tsattler. Grüsse!

@ahojnnes ahojnnes changed the title Add ALIKED/LightGlue support with ONNX Add XFeat/LighterGlue support with ONNX Aug 26, 2025
@ahojnnes
Copy link
Contributor Author

@tsattler Did you try xfeat for SFM as well or just for localization? Any suggested parameters for the extractor and for cosine similarity threshold during matching? Initial experiments show not so great performance on simple datasets because of very easy confusion of similar looking structure.

@ahojnnes
Copy link
Contributor Author

OK, a cosine similarity threshold of 0.9 seems to do more reasonably. Still interested if you have done some more structured sweeping of parameters.

@ahojnnes ahojnnes changed the title Add XFeat/LighterGlue support with ONNX Add XFeat support with ONNX Aug 28, 2025
@tsattler
Copy link
Contributor

tsattler commented Aug 29, 2025 via email

@xiemeilong
Copy link

Feature matching fails when --XFeatExtraction.min_score is set above 0. It appears the matcher can't handle a mismatch in feature counts.

Feature matching & geometric verification
==============================================================================
I20250923 09:58:16.188045   545 feature_matching_utils.cc:78] Bind FeatureMatcherWorker to GPU device 0
2025-09-23 09:58:16.298265935 [W:onnxruntime:, session_state.cc:1280 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-09-23 09:58:16.298294024 [W:onnxruntime:, session_state.cc:1282 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
I20250923 09:58:16.325433   544 pairing.cc:557] Generating spatial image pairs...
I20250923 09:58:16.325460   544 pairing.cc:562] Indexing images...
I20250923 09:58:16.329818   544 pairing.cc:567]  in 0.004s
I20250923 09:58:16.329830   544 pairing.cc:581] Building search index...
I20250923 09:58:16.329841   544 pairing.cc:586]  in 0.000s
I20250923 09:58:16.329849   544 pairing.cc:589] Searching for nearest neighbors...
I20250923 09:58:16.337879   544 pairing.cc:605]  in 0.008s
I20250923 09:58:16.337910   544 pairing.cc:628] Processing image [1/976]
2025-09-23 09:58:16.534845355 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running Equal node. Name:'node_eq_12' Status Message: node_eq_12: left operand cannot broadcast on dim 0 LeftShape: {3925}, RightShape: {3975}
terminate called after throwing an instance of 'Ort::Exception'
  what():  Non-zero status code returned while running Equal node. Name:'node_eq_12' Status Message: node_eq_12: left operand cannot broadcast on dim 0 LeftShape: {3925}, RightShape: {3975}

@ahojnnes
Copy link
Contributor Author

@xiemeilong Thank you. I noticed the same already. Something is wrong with the latest ONNX model conversion. Not sure what it is though.

@Dawars
Copy link
Contributor

Dawars commented Oct 6, 2025

Thank you for the implementation.
I tested it and works great so far.
I want to use sequential_matcher with loop closure. I assume a new vocab tree would need to be trained for the xfeats descriptors.
How would I go about training a vocab for xfeats?

  • what's the dataset used to train the sift vocab files?
  • do I need to reconstruct the dataset using xfeats and exhaustive matching? (or reuse the existing matches)

Should I open a separate issue for this?

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Oct 6, 2025

Thanks @Dawars for the feedback, glad that it works for you. There is some additional work needed to enable vocab tree training + retrieval for float features. It's not too much work.

I also did some testing with xfeat and found that it is much more prone to symmetric structures, where it very easily matches repeating windows on the same facades or the opposite facade. This is at least the case with the current default parameters. Did you tweak any of these in your experiments? See example below.

symmetry
image

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Oct 7, 2025

@tsattler Did you observe the same issue? Any suggested mitigation strategies? Tweaking the cosimilarity threshold does not really work for me, as increasing it removes the symmetric pairs but also breaks the reconstruction into multiple parts.

@shing-yan
Copy link

Thanks for the great work. I really enjoy using COLMAP in my projects. I've been using xfeat+lighterglue for feature extraction and matching. The extracted and matched features are manually imported into a database before running COLMAP's pose-prior mapper. Shout out to another excellent work, hloc.

Because I have pre-computed camera poses by fusing other sensor data (GNSS and LiDAR), I can limit the number of image pairs for feature matching, thereby reducing the possibility of erroneous matches between physically distant features. COLMAP could also have an optional feature to pre-compute pairs of similar-looking images for feature matching, which may complement spatial_matcher. I'm unsure how much LighterGlue contributes to reducing the outlier matches, though.

@tsattler
Copy link
Contributor

tsattler commented Oct 7, 2025

I have seen issues with similar structures before with XFeat (e.g., the Barn scene from the Tanks & Temples benchmark). Using Lighterglue / Lightglue for matching resolved this issue for me as the correctly connected parts get significantly more matches.

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Oct 8, 2025

Thanks for the feedback. I'll give LighterGlue a try and see if it can be easily converted to ONNX.

@ahojnnes
Copy link
Contributor Author

Managed to export LighterGlue to ONNX. Using the default parameters, it has the same issues:

image

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Oct 12, 2025

You can try it yourself. I pushed the required changes to this branch. Just pass --FeatureMatching.type XFEAT_LIGHTERGLUE to the exhaustive_matcher. If somebody tells me the "secret sauce" to make this work (e.g. different parameters for lighter glue?), I appreciate your help :-) Did you adjust the match score threshold?

@ahojnnes
Copy link
Contributor Author

I did a parameter sweep over min_conf/min_score={0.1, 0.2, 0.3} x min_num_matches={15,25,50} and I cannot prevent the south building model from collapsing while SIFT doesn't suffer from the same issue. On some other scenes with wide baselines and illumination changes, it succeeds to match where SIFT fails. I guess it is expected, I just wasn't expecting it to fail on the south building dataset.

@sarlinpe
Copy link
Member

I did a parameter sweep over min_conf/min_score={0.1, 0.2, 0.3} x min_num_matches={15,25,50} and I cannot prevent the south building model from collapsing while SIFT doesn't suffer from the same issue. On some other scenes with wide baselines and illumination changes, it succeeds to match where SIFT fails. I guess it is expected, I just wasn't expecting it to fail on the south building dataset.

FWIW learned matching is much more prone to failure on symmetries than SIFT (high recall but lower precision) but I've never seen this happening on South Building... If one has to run both SIFT and xfeat and pick the best, it becomes pretty inconvenient...

@ahojnnes
Copy link
Contributor Author

I ran through our ETH3D benchmarking suite and metrics also don't look great. The baseline (A) is SIFT with medium quality settings and the comparison (B) is this branch with default parameters:

I20251013 13:19:57.788485 140470133313664 compare.py:main:56] Results A:
=====scenes===== ======AUC @ X deg (%)====== ===images=== =components=
                  0.5    1.0    5.0    10.0     reg   all  num largest

==============================eth3d=dslr==============================
botanical_garden  12.54  16.68  21.85  22.63     21    30    2      14
boulders          69.58  79.84  88.56  89.73     26    26    1      26
bridge            74.27  82.39  89.16  90.04    110   110    1     110
courtyard         58.49  72.45  86.98  88.95     38    38    1      38
delivery_area     72.14  81.28  88.98  89.95     44    44    1      44
door              78.15  82.36  89.14  90.02      7     7    1       7
electro           50.74  59.85  69.02  70.28     40    45    1      40
exhibition_hall   56.96  71.55  85.65  88.20     68    68    1      68
facade            71.88  79.77  86.72  87.62     75    76    1      75
kicker            62.96  72.35  82.45  83.75     30    31    1      30
lecture_room      45.71  63.85  84.23  87.42     23    23    1      23
living_room       63.66  75.14  87.31  89.09     65    65    1      65
lounge             0.00   0.00   1.39   1.70      2    10    1       2
meadow             1.27   3.71  10.12  11.55      6    15    1       6
observatory       29.04  50.03  80.94  85.92     27    27    1      27
office            24.72  30.96  42.27  45.98     20    26    1      20
old_computer      47.82  66.21  85.59  88.24     54    54    1      54
pipes             39.62  58.25  83.58  87.24     14    14    1      14
playground        52.50  63.96  74.26  75.60     36    38    1      36
relief            73.18  81.72  89.05  89.98     31    31    1      31
relief_2          69.04  79.47  88.54  89.72     31    31    1      31
statue            86.16  88.54  90.43  90.67     11    11    1      11
terrace           73.92  82.37  89.20  90.06     23    23    1      23
terrace_2         47.63  54.91  63.08  64.17     13    13    2      13
terrains          45.50  63.12  84.54  87.66     42    42    1      42
----------------------------------------------------------------------
overall           62.11  72.85  83.52  85.08    857   898   27     850
----------------------------------------------------------------------
average           52.30  62.43  73.72  75.45     34    36    1      34
I20251013 13:19:57.789058 140470133313664 compare.py:main:57] Results B:
=====scenes===== ======AUC @ X deg (%)====== ===images=== =components=
                  0.5    1.0    5.0    10.0     reg   all  num largest

==============================eth3d=dslr==============================
botanical_garden   0.00   0.00   0.00   0.00      2    30    1       2
boulders          28.67  52.57  80.10  84.62     26    26    1      26
bridge             2.59   6.25  12.26  13.69    101   110    5      81
courtyard          6.84  14.56  31.38  35.04     27    38    2      27
delivery_area      0.03   0.08   2.59   5.55     39    44    2      39
door               2.52   3.42  10.62  11.80      7     7    2       7
electro           19.55  35.76  58.99  63.13     41    45    2      41
exhibition_hall   11.76  25.39  49.02  53.35     68    68    1      68
facade             7.20  16.16  32.01  35.37     74    76    3      69
kicker            19.05  42.41  74.01  79.28     30    31    1      30
lecture_room      17.48  35.82  72.03  80.00     23    23    1      23
living_room       14.65  30.86  56.20  60.62     65    65    2      65
lounge             9.59  17.74  26.87  28.59      6    10    1       6
meadow             0.00   0.97  11.68  24.00     11    15    1      11
observatory       10.73  19.42  30.88  32.60     27    27    3      14
office             6.20  14.36  33.36  37.44     22    26    2      22
old_computer       5.15  12.50  33.66  38.83     54    54    1      54
pipes              5.28  17.06  37.51  41.23     14    14    2      14
playground         0.78   1.58   3.51   4.07     11    38    2      10
relief             2.97  12.99  26.26  28.09     31    31    2      18
relief_2          10.34  20.84  33.68  35.41     31    31    2      20
statue            25.47  55.00  83.47  87.19     11    11    1      11
terrace            9.05  26.83  65.63  76.41     23    23    1      23
terrace_2         45.65  64.96  85.33  88.12     13    13    1      13
terrains          20.28  47.35  81.04  85.82     42    42    1      42
----------------------------------------------------------------------
overall            8.20  17.77  33.60  36.84    799   898   43     736
----------------------------------------------------------------------
average           11.27  23.00  41.28  45.21     32    36    2      29
I20251013 13:19:57.789286 140470133313664 compare.py:main:58] Results A - B:
=====scenes===== ======AUC @ X deg (%)====== ===images=== =components=
                  0.5    1.0    5.0    10.0     reg   all  num largest

==============================eth3d=dslr==============================
botanical_garden  12.54  16.68  21.85  22.63     19     0    1      12
boulders          40.91  27.27   8.46   5.11      0     0    0       0
bridge            71.68  76.14  76.90  76.35      9     0   -4      29
courtyard         51.65  57.89  55.60  53.91     11     0   -1      11
delivery_area     72.11  81.20  86.39  84.39      5     0   -1       5
door              75.63  78.94  78.52  78.22      0     0   -1       0
electro           31.19  24.10  10.03   7.15     -1     0   -1      -1
exhibition_hall   45.20  46.15  36.64  34.85      0     0    0       0
facade            64.68  63.62  54.71  52.25      1     0   -2       6
kicker            43.91  29.94   8.44   4.47      0     0    0       0
lecture_room      28.23  28.03  12.20   7.42      0     0    0       0
living_room       49.01  44.28  31.11  28.47      0     0   -1       0
lounge            -9.59 -17.74 -25.48 -26.88     -4     0    0      -4
meadow             1.27   2.74  -1.56 -12.45     -5     0    0      -5
observatory       18.31  30.62  50.05  53.33      0     0   -2      13
office            18.52  16.60   8.91   8.53     -2     0   -1      -2
old_computer      42.66  53.71  51.92  49.41      0     0    0       0
pipes             34.34  41.20  46.07  46.01      0     0   -1       0
playground        51.72  62.38  70.75  71.53     25     0   -1      26
relief            70.21  68.73  62.79  61.89      0     0   -1      13
relief_2          58.70  58.63  54.86  54.31      0     0   -1      11
statue            60.69  33.54   6.97   3.48      0     0    0       0
terrace           64.86  55.54  23.57  13.65      0     0    0       0
terrace_2          1.98 -10.05 -22.26 -23.95      0     0    1       0
terrains          25.22  15.77   3.50   1.83      0     0    0       0
----------------------------------------------------------------------
overall           53.90  55.08  49.92  48.25     58     0  -16     114
----------------------------------------------------------------------
average           41.03  39.43  32.44  30.24      2     0   -1       5

@ahojnnes
Copy link
Contributor Author

Looks like we have to make ALIKED+LightGlue work with ONNX. ALIKED is blocked on the following issue: fabio-sim/LightGlue-ONNX#103 - I don't have time to dig deep into the issue.

@shing-yan
Copy link

I use the following settings for my project.

XFeat extracts a maximum of 10000 keypoints, and LighterGlue has the following config parameters.

default_conf_xfeat = {
    "name": "lighterglue",  # just for interfacing
    "input_dim": 64,  # input descriptor dimension (autoselected from weights)
    "descriptor_dim": 96,
    "add_scale_ori": False,
    "add_laf": False,  # for KeyNetAffNetHardNet
    "scale_coef": 1.0,  # to compensate for the SIFT scale bigger than KeyNet
    "n_layers": 6,
    "num_heads": 1,
    "flash": True,  # enable FlashAttention if available.
    "mp": False,  # enable mixed precision
    "depth_confidence": -1,  # early stopping, disable with -1
    "width_confidence": 0.95,  # point pruning, disable with -1
    "filter_threshold": 0.1,  # match threshold
    "weights": None,
}

XFeat also has a built-in semi-dense matcher, dubbed XFeat*, which performs feature matching at a coarse resolution and refinement by predicting pixel-level offsets (see Section 3.2 Dense matching and the implementation).

Feel free to let me know if there is anything I can contribute.

@reynoldscem-oculo
Copy link

@ahojnnes does this seem like a viable alternative to the onnx runtime?

https://github.com/MrNeRF/Light_Glue_CPP

@ahojnnes
Copy link
Contributor Author

@reynoldscem-oculo Thank you. There is an existing pull request integrating this here: #3068 - There are, unfortunately, several issues with the implementation and portability across platforms.

@reynoldscem-oculo
Copy link

@ahojnnes ah yes I see! Thanks for the context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants