Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: remove assignments handled by capture library #15255

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Jan 8, 2024

Addresses https://github.com/github/codeql-python-team/issues/764

The following quick-query, run on top 100 Python projects, consistently reported about 1% of assignments can be removed:

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.internal.VariableCapture

predicate superflous(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
  exists(AssignmentDefinition def |
    nodeFrom.(DataFlow::CfgNode).getNode() = def.getValue() and
    nodeTo.(DataFlow::CfgNode).getNode() = def.getDefiningNode() and
    exists(CapturedVariable v | v.getAStore() = nodeTo.asExpr())
  )
}

predicate allConsidered(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
  exists(AssignmentDefinition def |
    nodeFrom.(DataFlow::CfgNode).getNode() = def.getValue() and
    nodeTo.(DataFlow::CfgNode).getNode() = def.getDefiningNode()
  )
}

from int s, int a, int p
where
  s = count(DataFlow::Node nodeFrom, DataFlow::Node nodeTo | superflous(nodeFrom, nodeTo)) and
  a = count(DataFlow::Node nodeFrom, DataFlow::Node nodeTo | allConsidered(nodeFrom, nodeTo)) and
  p = 100 * s / a
select s, a, p

@yoff yoff added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Jan 8, 2024
@github-actions github-actions bot added the Python label Jan 8, 2024
@yoff yoff marked this pull request as ready for review January 10, 2024 12:43
@yoff yoff requested a review from a team as a code owner January 10, 2024 12:43
@yoff
Copy link
Contributor Author

yoff commented Jan 10, 2024

Evaluation did not show any improvements, so this is a clean-up more than an optimisation...

@yoff yoff added the no-change-note-required This PR does not need a change note label Jan 10, 2024
@RasmusWL
Copy link
Member

I don't understand the rationale behind this. Can you please highlight what problem this solves?

Doing a quick eval of superflous on django, the first result I got looks like something we would want to keep no matter what.

@yoff
Copy link
Contributor Author

yoff commented Jan 16, 2024

I don't understand the rationale behind this. Can you please highlight what problem this solves?

Now that we are using the variable capture library, these assignments are modelled twice:

  • as regular assignments by our standard modelling
  • as stores into the capturing heap by the variable capture library

this is at best unnecessary and at worst a performance problem (the DCA run indicates that it is mostly the former, but it could be the latter in pathological cases).

Doing a quick eval of superflous on django, the first result I got looks like something we would want to keep no matter what.

They should all be relevant and something we want to keep modelling. But once should be enough.

@RasmusWL
Copy link
Member

Now that we are using the variable capture library, these assignments are modelled twice:

  • as regular assignments by our standard modelling
  • as stores into the capturing heap by the variable capture library

ok, so for the code below you want to remove the local flow from 1 to x, since flow for x will be handled by the variable-capture library.

Do we have cases where this also affects correctness of our analysis? that is, without this change I assume we would have flow from 1 to the last use of x -- will that still be the case?

def outer():
    x = 1
    def inner():
        nonlocal x
        x = 2
        print(x)
    print(x) # <-- flow from 1 to this x is ok
    inner()
    print(x) # <-- flow from 1 to this x is not what happens in execution

AHA! I had forgotten the detail that our simple local flow step is defined as:

predicate simpleLocalFlowStep(Node nodeFrom, Node nodeTo) {
simpleLocalFlowStepForTypetracking(nodeFrom, nodeTo)
or
summaryFlowSteps(nodeFrom, nodeTo)
or
variableCaptureLocalFlowStep(nodeFrom, nodeTo)
}

so there are "new" local flow steps added through the variable capture library; do these cover all the flow we had before? -- did you make any queries to verify that we did not lose any flow?

(what I could see happening is that if we lose local flow, our type-tracking won't be as good, which will certainly be a problem)

@RasmusWL RasmusWL marked this pull request as draft March 11, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants