Skip to content

ECO: Add C++ Graph Edit Distance implementation with MCS optimization#3813

Open
alirezazd wants to merge 12 commits intogoogle:mainfrom
alirezazd:mcs_ged
Open

ECO: Add C++ Graph Edit Distance implementation with MCS optimization#3813
alirezazd wants to merge 12 commits intogoogle:mainfrom
alirezazd:mcs_ged

Conversation

@alirezazd
Copy link

@alirezazd alirezazd commented Feb 9, 2026

This PR adds a C++ implementation of Graph Edit Distance (GED) for the ECO system, replacing the existing Python/NetworkX approach.

What's included

  • Core GED algorithm with LAP solver integration
  • MCS (Maximum Common Subgraph) preprocessing for performance (~10-100x speedup)
  • Graph data structures for representing XLS IR
  • CLI tool (ged_main) and IR patch generation
  • Bazel build rules and integration tests
  • Third-party dependencies organized in libs/ (tinyxml2, scipy_lap)

Regarding the new C++ GED

The Python version works but is slow on large graphs. This C++ implementation is faster, integrates directly with XLS IR APIs, and supports advanced features like pinned nodes and MCS optimization.

Migration

Python tools (ir_diff_main.py, ir2gxl.py) are marked for deprecation but still functional. They'll be removed in a future PR once the C++ version is proven stable.

Testing

Builds cleanly and includes tests with real DSLX examples (crc32, apfloat_fmac, riscv_simple, etc).

@grebe Please consider reviewing this PR. I tried to organize it into separate commits for convenience.

@google-cla
Copy link

google-cla bot commented Feb 9, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

- Move graph data structures: graph.h/cpp -> graph.h/cc
- Move GXL parser: gxl_parser.h/cpp -> gxl_parser.h/cc
- Move LAP solver: lap.h/cc -> lap_solver.h/cc
- Move GED algorithm: ged.h/cc
- Move MCS algorithm: mcs.h/cpp -> mcs.h/cc
- Move cost functions: cost_functions.h -> ged_cost_functions.h
- Move main binary: main.cpp -> ged_main.cc
- Move IR patch gen: ir_patch_gen.h/cc
- Update all header guards to XLS_ECO_* format
- Update all includes to use xls/eco/ prefix
- Rename .cpp -> .cc for XLS convention

This flattens the directory structure and makes the codebase more
consistent with XLS conventions. The mcs_ged/ directory now only
contains BUILD file and third-party libs/.
- Move tinyxml2 files into libs/tinyxml2/ subdirectory
- Add scipy_lap in libs/scipy_lap/ subdirectory
- Update BUILD paths for new lib structure
- Each third-party lib now has its own directory
- graph.h/cc: XLSGraph, XLSNode, XLSEdge for representing XLS IR as graphs
- gxl_parser.h/cc: Parse GXL (Graph eXchange Language) format
- Foundation for cpp graph edit distance algorithm
- lap_solver.cc: Wrapper for scipy LAP solver
- Used by GED algorithm for optimal node matching
- ged.h/cc: Core GED algorithm with cost matrix and LAP integration
- Supports node/edge substitution, insertion, deletion costs
- Handles pinned nodes (nodes that must match)
- Add cc_library targets: graph, gxl_parser, lap_solver, ged, mcs
- Add cc_binary: ged_main
- Add ir_patch_gen_lib target
- All GED infrastructure now buildable
- mcs.h/cc: Find maximum common subgraph between two graphs
- Preprocessing step that accelerates GED by identifying node correspondences
- Reduces GED search space significantly
- ged_main.cc: CLI tool to compute GED between two GXL graphs
- Supports MCS preprocessing option
- Outputs edit operations and cost
- ir_patch_gen.h/cc: Convert GED results to IR patch protobuf
- ir_patch.proto: Updated proto definitions
- patch_ir.h/cc/main.cc: Apply patches to XLS IR
- ir_patch_gen.py: Python utilities for patch generation
- Replace requirement() with @xls_pip_deps// for old Python tools
- Update Receive node construction to match new upstream API signature
- Add payload_type parameter required by new Receive constructor
This commit contains two types of changes:
1. Upstream updates to ir2nx.py merged during rebase (large diff)
2. Our deprecation TODOs added to Python GED toolchain:
   - ir_diff.py: NetworkX GED → ged.h/cc (10-100x faster)
   - ir_diff_main.py: Python tool → ged_main.cc CLI
   - ir_patch_gen.py: Python patcher → patch_ir.h/cc
   - ir2nx.py: NetworkX converter → xls_ir_to_cytoscape.cc
   - ir2gxl.py: GXL exporter → gxl_parser.h/cc (from upstream)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant