Fix optimal_edit_paths not finding optimal with tied node costs #8421

VanitasCodes · 2025-12-23T05:38:42Z

Problem

When multiple node assignments have the same cost in the linear sum assignment, the algorithm arbitrarily picked one using min on indices without considering downstream edge costs. This caused different results depending on node ordering in the input graphs.

# Before fix:
G1 vs G1: distance = 8.0  # Wrong!
G2 vs G1: distance = 4.0  # Correct

Solution

Modified get_edit_ops to evaluate ALL assignments from linear_sum_assignment with their associated edge costs, then explore them in order of total estimated cost.

Changes

similarity.py: Evaluate all LSA assignments with edge costs before yielding
test_similarity.py:
- Use pytest.approx for float comparison
- Check unique canonical paths instead of raw count
- Added TestGEDNodeOrderingInvariance test class

Testing

All 75 tests pass. The original issue is fixed:

# After fix:
G1 vs G1: distance = 4.0  # Correct!
G2 vs G1: distance = 4.0  # Correct!

Copilot

Pull request overview

This PR fixes a bug in the graph edit distance algorithm where optimal_edit_paths was not finding the true optimal path when multiple node assignments had identical costs. The issue was that the algorithm arbitrarily selected one assignment using min on indices without considering downstream edge costs, leading to different results depending on node ordering.

Key Changes:

Modified get_edit_ops in similarity.py to evaluate ALL linear sum assignment candidates with their edge costs, then sort and yield them by total estimated cost
Updated tests to use pytest.approx for float comparisons and check for unique canonical paths rather than raw path counts
Added comprehensive test cases in TestGEDNodeOrderingInvariance to verify the fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`networkx/algorithms/similarity.py`	Modified `get_edit_ops` to evaluate all LSA assignments with edge costs and sort by total cost estimate before yielding candidates
`networkx/algorithms/tests/test_similarity.py`	Updated float comparison to use `pytest.approx`, refined path counting logic to check unique canonical paths, and added new test class for node ordering invariance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-23T05:44:47Z

networkx/algorithms/tests/test_similarity.py

            assert len(result_vector) == k_val, (
                f"panther_vector_similarity k={k_val} returned {len(result_vector)} results"
            )
            assert 1 not in result_vector, "Source node should not be in results"


Missing blank line before class definition. PEP 8 recommends two blank lines before top-level class definitions. Add a blank line here to improve readability and follow Python style conventions.

Suggested change

assert 1 not in result_vector, "Source node should not be in results"

assert 1 not in result_vector, "Source node should not be in results"

Copilot · 2025-12-23T05:44:48Z

networkx/algorithms/similarity.py

-        else:
-            # get reduced Cv efficiently
+        ]
+


This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot · 2025-12-23T05:44:48Z

networkx/algorithms/similarity.py

+                matched_uv,
+            )
+            Ce_xy = reduce_Ce(Ce, xy, len(pending_g), len(pending_h))
+


This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot · 2025-12-23T05:44:48Z

networkx/algorithms/similarity.py

+
+            if prune(matched_cost + Cv.ls + localCe.ls + Ce_xy.ls):
+                continue
+


This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot · 2025-12-23T05:44:48Z

networkx/algorithms/similarity.py

+
+            # Calculate total cost estimate for sorting
+            total_cost_estimate = Cv.C[i, j] + localCe.ls + Ce_xy.ls
+


This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Suggested change

Copilot · 2025-12-23T05:44:49Z

networkx/algorithms/similarity.py

+            first_candidates.append(
+                ((i, j), Cv_ij, xy, Ce_xy, Cv.C[i, j] + localCe.ls, total_cost_estimate)
+            )
+


This line contains trailing whitespace. Remove the trailing whitespace to follow PEP 8 style guidelines and maintain consistent code formatting.

Copilot · 2025-12-23T05:44:49Z

networkx/algorithms/tests/test_similarity.py

+    def test_optimal_edit_paths_node_ordering(self):
+        """GED should be invariant to node ordering in input graphs."""
+
+        def node_subst_cost(node_1, node_2):
+            # Asymmetric cost function that creates ties in node assignment
+            if node_2["i"] == 1:
+                return 0
+            elif node_2["i"] == 2 or node_1["i"] == 3:
+                return 2
+            else:
+                return 2.2
+
+        def make_clique(nodes):
+            G = nx.Graph()
+            for x in nodes:
+                G.add_node(x, i=x)
+                for y in nodes:
+                    if x != y:
+                        G.add_edge(x, y)
+            return G
+
+        G1 = make_clique([1, 2, 3])
+        G2 = make_clique([3, 2, 1])  # Same graph, different node order
+
+        _, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)
+        _, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost)
+
+        # Both should find the same optimal distance
+        assert dist1 == dist2
+
+    def test_optimal_edit_paths_finds_optimal(self):
+        """The returned distance should be the true minimum."""
+
+        def node_subst_cost(node_1, node_2):
+            if node_2["i"] == 1:
+                return 0
+            elif node_2["i"] == 2 or node_1["i"] == 3:
+                return 2
+            else:
+                return 2.2
+
+        def make_clique(nodes):
+            G = nx.Graph()
+            for x in nodes:
+                G.add_node(x, i=x)
+                for y in nodes:
+                    if x != y:
+                        G.add_edge(x, y)
+            return G
+
+        G1 = make_clique([1, 2, 3])
+
+        _, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)
+


The node_subst_cost and make_clique functions are duplicated in both test methods. Consider extracting these as class-level helper methods or using pytest fixtures to reduce code duplication and improve maintainability.

Suggested change

def test_optimal_edit_paths_node_ordering(self):

"""GED should be invariant to node ordering in input graphs."""

def node_subst_cost(node_1, node_2):

# Asymmetric cost function that creates ties in node assignment

if node_2["i"] == 1:

return 0

elif node_2["i"] == 2 or node_1["i"] == 3:

return 2

else:

return 2.2

def make_clique(nodes):

G = nx.Graph()

for x in nodes:

G.add_node(x, i=x)

for y in nodes:

if x != y:

G.add_edge(x, y)

return G

G1 = make_clique([1, 2, 3])

G2 = make_clique([3, 2, 1]) # Same graph, different node order

_, dist1 = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)

_, dist2 = nx.optimal_edit_paths(G2, G1, node_subst_cost=node_subst_cost)

# Both should find the same optimal distance

assert dist1 == dist2

def test_optimal_edit_paths_finds_optimal(self):

"""The returned distance should be the true minimum."""

def node_subst_cost(node_1, node_2):

if node_2["i"] == 1:

return 0

elif node_2["i"] == 2 or node_1["i"] == 3:

return 2

else:

return 2.2

def make_clique(nodes):

G = nx.Graph()

for x in nodes:

G.add_node(x, i=x)

for y in nodes:

if x != y:

G.add_edge(x, y)

return G

G1 = make_clique([1, 2, 3])

_, dist = nx.optimal_edit_paths(G1, G1, node_subst_cost=node_subst_cost)

@staticmethod

def node_subst_cost(node_1, node_2):

# Asymmetric cost function that creates ties in node assignment

if node_2["i"] == 1:

return 0

elif node_2["i"] == 2 or node_1["i"] == 3:

return 2

else:

return 2.2

@staticmethod

def make_clique(nodes):

G = nx.Graph()

for x in nodes:

G.add_node(x, i=x)

for y in nodes:

if x != y:

G.add_edge(x, y)

return G

def test_optimal_edit_paths_node_ordering(self):

"""GED should be invariant to node ordering in input graphs."""

G1 = self.make_clique([1, 2, 3])

G2 = self.make_clique([3, 2, 1]) # Same graph, different node order

_, dist1 = nx.optimal_edit_paths(

G1, G1, node_subst_cost=self.node_subst_cost

)

_, dist2 = nx.optimal_edit_paths(

G2, G1, node_subst_cost=self.node_subst_cost

)

# Both should find the same optimal distance

assert dist1 == dist2

def test_optimal_edit_paths_finds_optimal(self):

"""The returned distance should be the true minimum."""

G1 = self.make_clique([1, 2, 3])

_, dist = nx.optimal_edit_paths(

G1, G1, node_subst_cost=self.node_subst_cost

)

…orkx#8099)

amcandio · 2025-12-23T14:47:00Z

Amazing, thank you so much! I am not an expert on this algorithm (and code is not simple, for sure) but I can see how this change increases the search-space and fixes the reported bug. Left a minor comment and +1 on Copilot's dedupe comment.

I think this is a good usecase for property based testing, which would generalize the test cases you added. You can do something similar as in #7954. For example, having a test case that generates a random graph G and a derived graph G' where you can somehow bound edit_distance(G, G'), then you validate the property edit_distance(G, G') <= edit_distance_upper_bound(G, G').

More specifically:

import random
import networkx as nx

def perturb_edges(G, k):
    Gp = G.copy()
    edges = list(G.edges())
    non_edges = list(nx.non_edges(G))

    for _ in range(k):
        if random.random() < 0.5 and edges:
            e = random.choice(edges)
            Gp.remove_edge(*e)
            edges.remove(e)
        else:
            e = random.choice(non_edges)
            Gp.add_edge(*e)
            non_edges.remove(e)

    return Gp

# property
# edit_distance(G, Gp) <= k

amcandio · 2025-12-23T14:33:58Z

networkx/algorithms/tests/test_similarity.py

+            else:
+                return 2.2
+
+        def make_clique(nodes):


You can use nx.complete_graph

VanitasCodes · 2025-12-23T21:35:29Z

Thanks for the review @amcandio! I've addressed your comments:

Replaced custom make_clique with nx.complete_graph
Added property-based tests following the approach from Fixing nx.diameter inconsistent results with usebounds=True #7954:
- Edge perturbation bound: GED(G, G') <= k when k edges changed
- Identity: GED(G, G) == 0
- Symmetry: GED(G1, G2) == GED(G2, G1)
- Triangle inequality: d(A,C) <= d(A,B) + d(B,C)
- Consistency: graph_edit_distance and optimal_edit_paths return same cost

All 136 tests pass locally.

amcandio · 2025-12-24T08:10:08Z

networkx/algorithms/tests/test_similarity.py

+        os.environ.get("NETWORKX_TEST_BACKEND") is not None,
+        reason="Backend may not preserve node attributes",
+    )
+    def test_optimal_edit_paths_node_ordering(self):


Let's drop this test. The other is the one that covers the actual bug. And your property based testing already cover ordering

amcandio · 2025-12-24T08:11:27Z

networkx/algorithms/tests/test_similarity.py

+        """
+        rng = random.Random(seed)
+        Gp = G.copy()
+        edges = list(Gp.edges())


amcandio · 2025-12-24T08:13:19Z

networkx/algorithms/tests/test_similarity.py

+        """
+        G = nx.erdos_renyi_graph(n, 0.5, seed=seed)
+
+        assert nx.graph_edit_distance(G, G) == 0


Your could also test that reshuffling the order doesn't change the edit distance

amcandio · 2025-12-24T08:14:04Z

networkx/algorithms/tests/test_similarity.py

+        Graph edit distance should be symmetric.
+        """
+        G1 = nx.erdos_renyi_graph(n, 0.4, seed=seed)
+        G2 = nx.erdos_renyi_graph(n, 0.4, seed=seed + 100)


You can create a random seed and pass the same to both

amcandio · 2025-12-24T08:14:34Z

networkx/algorithms/tests/test_similarity.py

+        rng = random.Random(seed)
+        n = 4  # Keep small for speed
+
+        A = nx.erdos_renyi_graph(n, 0.5, seed=rng.randint(0, 10000))


Same as above

amcandio

Another round of comments. Q: would any of these tests have found the bug?

VanitasCodes · 2025-12-24T08:49:19Z

@amcandio The test_ged_identity with reshuffled node ordering would have caught this bug, since the issue was that different node orderings could produce different (suboptimal) results. The perturbation test also indirectly covers it since it relies on finding the true optimal distance.

Copilot AI review requested due to automatic review settings December 23, 2025 05:38

Copilot started reviewing on behalf of VanitasCodes December 23, 2025 05:40 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

VanitasCodes mentioned this pull request Dec 23, 2025

nx.optimal_edit_paths not finding optimal #8099

Open

VanitasCodes force-pushed the fix-optimal-edit-paths-8099 branch from 0f2d550 to d8b83ed Compare December 23, 2025 06:09

Fix optimal_edit_paths not finding optimal with tied node costs (netw…

5f7e9b6

…orkx#8099)

VanitasCodes force-pushed the fix-optimal-edit-paths-8099 branch from d8b83ed to 5f7e9b6 Compare December 23, 2025 06:41

amcandio added the type: Bug fix label Dec 23, 2025

amcandio reviewed Dec 23, 2025

View reviewed changes

networkx/algorithms/tests/test_similarity.py Outdated

else:

return 2.2

def make_clique(nodes):

Copy link

Contributor

amcandio Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nx.complete_graph

Address review: use nx.complete_graph and add property-based tests

5f6e8b9

amcandio reviewed Dec 24, 2025

View reviewed changes

Address review: drop redundant test, use set, improve randomization

2ed407f

	assert 1 not in result_vector, "Source node should not be in results"
	assert 1 not in result_vector, "Source node should not be in results"


		if prune(matched_cost + Cv.ls + localCe.ls + Ce_xy.ls):
		continue


		# Calculate total cost estimate for sorting
		total_cost_estimate = Cv.C[i, j] + localCe.ls + Ce_xy.ls

Uh oh!

Fix optimal_edit_paths not finding optimal with tied node costs #8421

Are you sure you want to change the base?

Fix optimal_edit_paths not finding optimal with tied node costs #8421

Conversation

VanitasCodes commented Dec 23, 2025

Problem

Solution

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio commented Dec 23, 2025

Uh oh!

amcandio Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

VanitasCodes commented Dec 23, 2025

Uh oh!

amcandio Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

amcandio left a comment

Choose a reason for hiding this comment

Uh oh!

VanitasCodes commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants