Skip to content

Conversation

@VanitasCodes
Copy link
Contributor

Fixes #8099

Problem

When multiple node assignments have the same cost in the linear sum assignment, the algorithm arbitrarily picked one using min on indices without considering downstream edge costs. This caused different results depending on node ordering in the input graphs.

# Before fix:
G1 vs G1: distance = 8.0  # Wrong!
G2 vs G1: distance = 4.0  # Correct

Solution

Modified get_edit_ops to evaluate ALL assignments from linear_sum_assignment with their associated edge costs, then explore them in order of total estimated cost.

Changes

  • similarity.py: Evaluate all LSA assignments with edge costs before yielding
  • test_similarity.py:
    • Use pytest.approx for float comparison
    • Check unique canonical paths instead of raw count
    • Added TestGEDNodeOrderingInvariance test class

Testing

All 75 tests pass. The original issue is fixed:

# After fix:
G1 vs G1: distance = 4.0  # Correct!
G2 vs G1: distance = 4.0  # Correct!

Copilot AI review requested due to automatic review settings December 23, 2025 05:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the graph edit distance algorithm where optimal_edit_paths was not finding the true optimal path when multiple node assignments had identical costs. The issue was that the algorithm arbitrarily selected one assignment using min on indices without considering downstream edge costs, leading to different results depending on node ordering.

Key Changes:

  • Modified get_edit_ops in similarity.py to evaluate ALL linear sum assignment candidates with their edge costs, then sort and yield them by total estimated cost
  • Updated tests to use pytest.approx for float comparisons and check for unique canonical paths rather than raw path counts
  • Added comprehensive test cases in TestGEDNodeOrderingInvariance to verify the fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
networkx/algorithms/similarity.py Modified get_edit_ops to evaluate all LSA assignments with edge costs and sort by total cost estimate before yielding candidates
networkx/algorithms/tests/test_similarity.py Updated float comparison to use pytest.approx, refined path counting logic to check unique canonical paths, and added new test class for node ordering invariance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert len(result_vector) == k_val, (
f"panther_vector_similarity k={k_val} returned {len(result_vector)} results"
)
assert 1 not in result_vector, "Source node should not be in results"
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line before class definition. PEP 8 recommends two blank lines before top-level class definitions. Add a blank line here to improve readability and follow Python style conventions.

Suggested change
assert 1 not in result_vector, "Source node should not be in results"
assert 1 not in result_vector, "Source node should not be in results"

Copilot uses AI. Check for mistakes.
else:
# get reduced Cv efficiently
]

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot uses AI. Check for mistakes.
matched_uv,
)
Ce_xy = reduce_Ce(Ce, xy, len(pending_g), len(pending_h))

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot uses AI. Check for mistakes.

if prune(matched_cost + Cv.ls + localCe.ls + Ce_xy.ls):
continue

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot uses AI. Check for mistakes.

# Calculate total cost estimate for sorting
total_cost_estimate = Cv.C[i, j] + localCe.ls + Ce_xy.ls

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Suggested change

Copilot uses AI. Check for mistakes.
first_candidates.append(
((i, j), Cv_ij, xy, Ce_xy, Cv.C[i, j] + localCe.ls, total_cost_estimate)
)

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot uses AI. Check for mistakes.
Comment on lines 1167 to 1232
def test_optimal_edit_paths_node_ordering(self):
"""GED should be invariant to node ordering in input graphs."""

def node_subst_cost(node_1, node_2):
# Asymmetric cost function that creates ties in node assignment
if node_2["i"] == 1:
return 0
elif node_2["i"] == 2 or node_1["i"] == 3:
return 2
else:
return 2.2

def make_clique(nodes):
G = nx.Graph()
for x in nodes:
G.add_node(x, i=x)
for y in nodes:
if x != y:
G.add_edge(x, y)
return G

G1 = make_clique([1, 2, 3])
G2 = make_clique([3, 2, 1]) # Same graph, different node order

_, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)
_, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost)

# Both should find the same optimal distance
assert dist1 == dist2

def test_optimal_edit_paths_finds_optimal(self):
"""The returned distance should be the true minimum."""

def node_subst_cost(node_1, node_2):
if node_2["i"] == 1:
return 0
elif node_2["i"] == 2 or node_1["i"] == 3:
return 2
else:
return 2.2

def make_clique(nodes):
G = nx.Graph()
for x in nodes:
G.add_node(x, i=x)
for y in nodes:
if x != y:
G.add_edge(x, y)
return G

G1 = make_clique([1, 2, 3])

_, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node_subst_cost and make_clique functions are duplicated in both test methods. Consider extracting these as class-level helper methods or using pytest fixtures to reduce code duplication and improve maintainability.

Suggested change
def test_optimal_edit_paths_node_ordering(self):
"""GED should be invariant to node ordering in input graphs."""
def node_subst_cost(node_1, node_2):
# Asymmetric cost function that creates ties in node assignment
if node_2["i"] == 1:
return 0
elif node_2["i"] == 2 or node_1["i"] == 3:
return 2
else:
return 2.2
def make_clique(nodes):
G = nx.Graph()
for x in nodes:
G.add_node(x, i=x)
for y in nodes:
if x != y:
G.add_edge(x, y)
return G
G1 = make_clique([1, 2, 3])
G2 = make_clique([3, 2, 1]) # Same graph, different node order
_, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)
_, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost)
# Both should find the same optimal distance
assert dist1 == dist2
def test_optimal_edit_paths_finds_optimal(self):
"""The returned distance should be the true minimum."""
def node_subst_cost(node_1, node_2):
if node_2["i"] == 1:
return 0
elif node_2["i"] == 2 or node_1["i"] == 3:
return 2
else:
return 2.2
def make_clique(nodes):
G = nx.Graph()
for x in nodes:
G.add_node(x, i=x)
for y in nodes:
if x != y:
G.add_edge(x, y)
return G
G1 = make_clique([1, 2, 3])
_, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)
@staticmethod
def node_subst_cost(node_1, node_2):
# Asymmetric cost function that creates ties in node assignment
if node_2["i"] == 1:
return 0
elif node_2["i"] == 2 or node_1["i"] == 3:
return 2
else:
return 2.2
@staticmethod
def make_clique(nodes):
G = nx.Graph()
for x in nodes:
G.add_node(x, i=x)
for y in nodes:
if x != y:
G.add_edge(x, y)
return G
def test_optimal_edit_paths_node_ordering(self):
"""GED should be invariant to node ordering in input graphs."""
G1 = self.make_clique([1, 2, 3])
G2 = self.make_clique([3, 2, 1]) # Same graph, different node order
_, dist1 = nx.optimal_edit_paths(
G1, G1, node_subst_cost=self.node_subst_cost
)
_, dist2 = nx.optimal_edit_paths(
G2, G1, node_subst_cost=self.node_subst_cost
)
# Both should find the same optimal distance
assert dist1 == dist2
def test_optimal_edit_paths_finds_optimal(self):
"""The returned distance should be the true minimum."""
G1 = self.make_clique([1, 2, 3])
_, dist = nx.optimal_edit_paths(
G1, G1, node_subst_cost=self.node_subst_cost
)

Copilot uses AI. Check for mistakes.
@amcandio
Copy link
Contributor

Amazing, thank you so much! I am not an expert on this algorithm (and code is not simple, for sure) but I can see how this change increases the search-space and fixes the reported bug. Left a minor comment and +1 on Copilot's dedupe comment.

I think this is a good usecase for property based testing, which would generalize the test cases you added. You can do something similar as in #7954. For example, having a test case that generates a random graph G and a derived graph G' where you can somehow bound edit_distance(G, G'), then you validate the property edit_distance(G, G') <= edit_distance_upper_bound(G, G').

More specifically:

import random
import networkx as nx

def perturb_edges(G, k):
    Gp = G.copy()
    edges = list(G.edges())
    non_edges = list(nx.non_edges(G))

    for _ in range(k):
        if random.random() < 0.5 and edges:
            e = random.choice(edges)
            Gp.remove_edge(*e)
            edges.remove(e)
        else:
            e = random.choice(non_edges)
            Gp.add_edge(*e)
            non_edges.remove(e)

    return Gp

# property
# edit_distance(G, Gp) <= k

else:
return 2.2

def make_clique(nodes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nx.complete_graph

@VanitasCodes
Copy link
Contributor Author

Thanks for the review @amcandio! I've addressed your comments:

  • Replaced custom make_clique with nx.complete_graph
  • Added property-based tests following the approach from Fixing nx.diameter inconsistent results with usebounds=True #7954:
    • Edge perturbation bound: GED(G, G') <= k when k edges changed
    • Identity: GED(G, G) == 0
    • Symmetry: GED(G1, G2) == GED(G2, G1)
    • Triangle inequality: d(A,C) <= d(A,B) + d(B,C)
    • Consistency: graph_edit_distance and optimal_edit_paths return same cost

All 136 tests pass locally.

os.environ.get("NETWORKX_TEST_BACKEND") is not None,
reason="Backend may not preserve node attributes",
)
def test_optimal_edit_paths_node_ordering(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this test. The other is the one that covers the actual bug. And your property based testing already cover ordering

"""
rng = random.Random(seed)
Gp = G.copy()
edges = list(Gp.edges())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set

"""
G = nx.erdos_renyi_graph(n, 0.5, seed=seed)

assert nx.graph_edit_distance(G, G) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your could also test that reshuffling the order doesn't change the edit distance

Graph edit distance should be symmetric.
"""
G1 = nx.erdos_renyi_graph(n, 0.4, seed=seed)
G2 = nx.erdos_renyi_graph(n, 0.4, seed=seed + 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a random seed and pass the same to both

rng = random.Random(seed)
n = 4 # Keep small for speed

A = nx.erdos_renyi_graph(n, 0.5, seed=rng.randint(0, 10000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor

@amcandio amcandio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments. Q: would any of these tests have found the bug?

@VanitasCodes
Copy link
Contributor Author

@amcandio The test_ged_identity with reshuffled node ordering would have caught this bug, since the issue was that different node orderings could produce different (suboptimal) results. The perturbation test also indirectly covers it since it relies on finding the true optimal distance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

nx.optimal_edit_paths not finding optimal

2 participants