-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix optimal_edit_paths not finding optimal with tied node costs #8421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix optimal_edit_paths not finding optimal with tied node costs #8421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the graph edit distance algorithm where optimal_edit_paths was not finding the true optimal path when multiple node assignments had identical costs. The issue was that the algorithm arbitrarily selected one assignment using min on indices without considering downstream edge costs, leading to different results depending on node ordering.
Key Changes:
- Modified
get_edit_opsinsimilarity.pyto evaluate ALL linear sum assignment candidates with their edge costs, then sort and yield them by total estimated cost - Updated tests to use
pytest.approxfor float comparisons and check for unique canonical paths rather than raw path counts - Added comprehensive test cases in
TestGEDNodeOrderingInvarianceto verify the fix
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
networkx/algorithms/similarity.py |
Modified get_edit_ops to evaluate all LSA assignments with edge costs and sort by total cost estimate before yielding candidates |
networkx/algorithms/tests/test_similarity.py |
Updated float comparison to use pytest.approx, refined path counting logic to check unique canonical paths, and added new test class for node ordering invariance |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert len(result_vector) == k_val, ( | ||
| f"panther_vector_similarity k={k_val} returned {len(result_vector)} results" | ||
| ) | ||
| assert 1 not in result_vector, "Source node should not be in results" |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing blank line before class definition. PEP 8 recommends two blank lines before top-level class definitions. Add a blank line here to improve readability and follow Python style conventions.
| assert 1 not in result_vector, "Source node should not be in results" | |
| assert 1 not in result_vector, "Source node should not be in results" |
networkx/algorithms/similarity.py
Outdated
| else: | ||
| # get reduced Cv efficiently | ||
| ] | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.
networkx/algorithms/similarity.py
Outdated
| matched_uv, | ||
| ) | ||
| Ce_xy = reduce_Ce(Ce, xy, len(pending_g), len(pending_h)) | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.
networkx/algorithms/similarity.py
Outdated
|
|
||
| if prune(matched_cost + Cv.ls + localCe.ls + Ce_xy.ls): | ||
| continue | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.
networkx/algorithms/similarity.py
Outdated
|
|
||
| # Calculate total cost estimate for sorting | ||
| total_cost_estimate = Cv.C[i, j] + localCe.ls + Ce_xy.ls | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.
networkx/algorithms/similarity.py
Outdated
| first_candidates.append( | ||
| ((i, j), Cv_ij, xy, Ce_xy, Cv.C[i, j] + localCe.ls, total_cost_estimate) | ||
| ) | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.
| def test_optimal_edit_paths_node_ordering(self): | ||
| """GED should be invariant to node ordering in input graphs.""" | ||
|
|
||
| def node_subst_cost(node_1, node_2): | ||
| # Asymmetric cost function that creates ties in node assignment | ||
| if node_2["i"] == 1: | ||
| return 0 | ||
| elif node_2["i"] == 2 or node_1["i"] == 3: | ||
| return 2 | ||
| else: | ||
| return 2.2 | ||
|
|
||
| def make_clique(nodes): | ||
| G = nx.Graph() | ||
| for x in nodes: | ||
| G.add_node(x, i=x) | ||
| for y in nodes: | ||
| if x != y: | ||
| G.add_edge(x, y) | ||
| return G | ||
|
|
||
| G1 = make_clique([1, 2, 3]) | ||
| G2 = make_clique([3, 2, 1]) # Same graph, different node order | ||
|
|
||
| _, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost) | ||
| _, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost) | ||
|
|
||
| # Both should find the same optimal distance | ||
| assert dist1 == dist2 | ||
|
|
||
| def test_optimal_edit_paths_finds_optimal(self): | ||
| """The returned distance should be the true minimum.""" | ||
|
|
||
| def node_subst_cost(node_1, node_2): | ||
| if node_2["i"] == 1: | ||
| return 0 | ||
| elif node_2["i"] == 2 or node_1["i"] == 3: | ||
| return 2 | ||
| else: | ||
| return 2.2 | ||
|
|
||
| def make_clique(nodes): | ||
| G = nx.Graph() | ||
| for x in nodes: | ||
| G.add_node(x, i=x) | ||
| for y in nodes: | ||
| if x != y: | ||
| G.add_edge(x, y) | ||
| return G | ||
|
|
||
| G1 = make_clique([1, 2, 3]) | ||
|
|
||
| _, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost) | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node_subst_cost and make_clique functions are duplicated in both test methods. Consider extracting these as class-level helper methods or using pytest fixtures to reduce code duplication and improve maintainability.
| def test_optimal_edit_paths_node_ordering(self): | |
| """GED should be invariant to node ordering in input graphs.""" | |
| def node_subst_cost(node_1, node_2): | |
| # Asymmetric cost function that creates ties in node assignment | |
| if node_2["i"] == 1: | |
| return 0 | |
| elif node_2["i"] == 2 or node_1["i"] == 3: | |
| return 2 | |
| else: | |
| return 2.2 | |
| def make_clique(nodes): | |
| G = nx.Graph() | |
| for x in nodes: | |
| G.add_node(x, i=x) | |
| for y in nodes: | |
| if x != y: | |
| G.add_edge(x, y) | |
| return G | |
| G1 = make_clique([1, 2, 3]) | |
| G2 = make_clique([3, 2, 1]) # Same graph, different node order | |
| _, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost) | |
| _, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost) | |
| # Both should find the same optimal distance | |
| assert dist1 == dist2 | |
| def test_optimal_edit_paths_finds_optimal(self): | |
| """The returned distance should be the true minimum.""" | |
| def node_subst_cost(node_1, node_2): | |
| if node_2["i"] == 1: | |
| return 0 | |
| elif node_2["i"] == 2 or node_1["i"] == 3: | |
| return 2 | |
| else: | |
| return 2.2 | |
| def make_clique(nodes): | |
| G = nx.Graph() | |
| for x in nodes: | |
| G.add_node(x, i=x) | |
| for y in nodes: | |
| if x != y: | |
| G.add_edge(x, y) | |
| return G | |
| G1 = make_clique([1, 2, 3]) | |
| _, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost) | |
| @staticmethod | |
| def node_subst_cost(node_1, node_2): | |
| # Asymmetric cost function that creates ties in node assignment | |
| if node_2["i"] == 1: | |
| return 0 | |
| elif node_2["i"] == 2 or node_1["i"] == 3: | |
| return 2 | |
| else: | |
| return 2.2 | |
| @staticmethod | |
| def make_clique(nodes): | |
| G = nx.Graph() | |
| for x in nodes: | |
| G.add_node(x, i=x) | |
| for y in nodes: | |
| if x != y: | |
| G.add_edge(x, y) | |
| return G | |
| def test_optimal_edit_paths_node_ordering(self): | |
| """GED should be invariant to node ordering in input graphs.""" | |
| G1 = self.make_clique([1, 2, 3]) | |
| G2 = self.make_clique([3, 2, 1]) # Same graph, different node order | |
| _, dist1 = nx.optimal_edit_paths( | |
| G1, G1, node_subst_cost=self.node_subst_cost | |
| ) | |
| _, dist2 = nx.optimal_edit_paths( | |
| G2, G1, node_subst_cost=self.node_subst_cost | |
| ) | |
| # Both should find the same optimal distance | |
| assert dist1 == dist2 | |
| def test_optimal_edit_paths_finds_optimal(self): | |
| """The returned distance should be the true minimum.""" | |
| G1 = self.make_clique([1, 2, 3]) | |
| _, dist = nx.optimal_edit_paths( | |
| G1, G1, node_subst_cost=self.node_subst_cost | |
| ) |
0f2d550 to
d8b83ed
Compare
d8b83ed to
5f7e9b6
Compare
|
Amazing, thank you so much! I am not an expert on this algorithm (and code is not simple, for sure) but I can see how this change increases the search-space and fixes the reported bug. Left a minor comment and +1 on Copilot's dedupe comment. I think this is a good usecase for property based testing, which would generalize the test cases you added. You can do something similar as in #7954. For example, having a test case that generates a random graph G and a derived graph G' where you can somehow bound edit_distance(G, G'), then you validate the property More specifically: import random
import networkx as nx
def perturb_edges(G, k):
Gp = G.copy()
edges = list(G.edges())
non_edges = list(nx.non_edges(G))
for _ in range(k):
if random.random() < 0.5 and edges:
e = random.choice(edges)
Gp.remove_edge(*e)
edges.remove(e)
else:
e = random.choice(non_edges)
Gp.add_edge(*e)
non_edges.remove(e)
return Gp
# property
# edit_distance(G, Gp) <= k |
| else: | ||
| return 2.2 | ||
|
|
||
| def make_clique(nodes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use nx.complete_graph
|
Thanks for the review @amcandio! I've addressed your comments:
All 136 tests pass locally. |
| os.environ.get("NETWORKX_TEST_BACKEND") is not None, | ||
| reason="Backend may not preserve node attributes", | ||
| ) | ||
| def test_optimal_edit_paths_node_ordering(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop this test. The other is the one that covers the actual bug. And your property based testing already cover ordering
| """ | ||
| rng = random.Random(seed) | ||
| Gp = G.copy() | ||
| edges = list(Gp.edges()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set
| """ | ||
| G = nx.erdos_renyi_graph(n, 0.5, seed=seed) | ||
|
|
||
| assert nx.graph_edit_distance(G, G) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your could also test that reshuffling the order doesn't change the edit distance
| Graph edit distance should be symmetric. | ||
| """ | ||
| G1 = nx.erdos_renyi_graph(n, 0.4, seed=seed) | ||
| G2 = nx.erdos_renyi_graph(n, 0.4, seed=seed + 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a random seed and pass the same to both
| rng = random.Random(seed) | ||
| n = 4 # Keep small for speed | ||
|
|
||
| A = nx.erdos_renyi_graph(n, 0.5, seed=rng.randint(0, 10000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
amcandio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments. Q: would any of these tests have found the bug?
|
@amcandio The |
Fixes #8099
Problem
When multiple node assignments have the same cost in the linear sum assignment, the algorithm arbitrarily picked one using
minon indices without considering downstream edge costs. This caused different results depending on node ordering in the input graphs.Solution
Modified
get_edit_opsto evaluate ALL assignments fromlinear_sum_assignmentwith their associated edge costs, then explore them in order of total estimated cost.Changes
similarity.py: Evaluate all LSA assignments with edge costs before yieldingtest_similarity.py:pytest.approxfor float comparisonTestGEDNodeOrderingInvariancetest classTesting
All 75 tests pass. The original issue is fixed: