Skip to content

[pysrc2cpg] Lower match pattern bindings into proper AST assignments#5918

Open
allsmog wants to merge 1 commit into
joernio:masterfrom
allsmog:py/match-pattern-bindings
Open

[pysrc2cpg] Lower match pattern bindings into proper AST assignments#5918
allsmog wants to merge 1 commit into
joernio:masterfrom
allsmog:py/match-pattern-bindings

Conversation

@allsmog
Copy link
Copy Markdown
Contributor

@allsmog allsmog commented Apr 2, 2026

Follow-up to the discussion in #5910 where the reviewer correctly noted that bare pattern variables as statements don't encode the destructuring semantics.

This PR lowers match patterns into assignment nodes using the same index access and assignment primitives that tuple unpacking already uses (createIndexAccess, createAssignmentToIdentifier, getUnusedName).

Lowering examples

Pattern Lowering
case [a, b]: a = subject[0], b = subject[1]
case x: (catch-all) x = subject
case [a, b] as whole: recurse + whole = subject
case {"key": val}: val = subject[key]
case Point(x=a, y=b): a = subject.x, b = subject.y
case 42: / case _: no assignments (no bindings)
Complex subjects temp variable to avoid re-evaluation

JumpTarget nodes are preserved for CfgCreator compatibility. Nested patterns use temp variables following the same convention as nested tuple unpacking in createValueToTargetsDecomposition.

Test plan

  • 11 MatchCpgTests: AST structure, CFG, sequence bindings, catch-all, wildcard, literal, complex subject, alias
  • 2 DataFlowTests: taint flows from parameter through sequence pattern binding and catch-all binding to sink
  • Full pysrc2cpg suite: 504 tests pass
  • E2E: built pysrc2cpg, generated CPG from Python match code, verified data flow traces through pattern bindings

Replaces the string-only pattern representation in match/case blocks
with proper AST nodes that encode destructuring semantics, enabling
data flow tracking through pattern-bound variables.

For each match pattern type, generate assignment nodes using the same
index access and assignment primitives as tuple unpacking:

- MatchSequence [a, b]: a = subject[0], b = subject[1]
- MatchAs (catch-all): x = subject
- MatchAs (alias): recurse + whole = subject
- MatchMapping: name = subject[key]
- MatchClass: positional index + keyword field access
- MatchOr: process first alternative (all bind same names)
- MatchStar: rest = subject (simplified flow)
- Complex subjects: temp variable to avoid re-evaluation

JumpTarget nodes are preserved for CfgCreator compatibility.
Nested patterns use temp variables like tuple unpacking does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant