-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C#: Restrict dataflow node creation to source and source-referenced entities #17482
C#: Restrict dataflow node creation to source and source-referenced entities #17482
Conversation
TExplicitParameterNode(Parameter p, DataFlowCallable c) { | ||
p = c.asCallable(_).(CallableUsedInSource).getAParameter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java simply uses the constraint exists(p.getCallable().getBody())
. Would that be enough in this case?
TExplicitParameterNode(Parameter p, DataFlowCallable c) { | |
p = c.asCallable(_).(CallableUsedInSource).getAParameter() | |
TExplicitParameterNode(Parameter p, DataFlowCallable c) { | |
exists(Callable c0 | c0 = c.asCallable(_) and p = c0.getAParameter() and exists(c0.getBody())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refined this further at #17483 -- my observations so far:
There are methods with no body but which are from source where we probably ought to include parameter nodes, e.g. abstract and interface methods
There are also methods that have a body but for which fromSource
doesn't hold -- synthetic methods, so far as I can tell, such as default constructors.
Therefore I've included both criteria for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are methods with no body but which are from source where we probably ought to include parameter nodes, e.g. abstract and interface methods
Why? What use are they? Those parameter nodes cannot flow to anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They indeed cannot; my concern was whether interface or abstract method parameter nodes might be relevant to models, either MaD-written or implemented in QL? The short answer is I don't know, but the cost of keeping them is probably small since they're bounded by the size of the source code, and giving nodes as-is to everything present in source is the least surprising route for any third-party code we don't get to exercise via DCA and QA.
I'm leaning towards variant #17483 since it plays things safer (creating more nodes than this variant) but still seems to have significantly beneficial performance consequences. I've now started QA, linked from that PR. |
Variant #17483 has now passed DCA and QA as noted on that PR -- I therefore propose we should merge that variant and invite reviews there. |
This addresses a problem observed when libraries define a very large number of types with large class hierarchies, for example as part of generated code. Since user code is very unlikely to refer to all or even most of them, by excluding such code from dataflow node generation and therefore from the C# dataflow hooks' idea of a "relevant" type, we can make many of the type predicates defined in DataFlowPrivate.qll very much cheaper.
#17483 is a variation on this PR that additionally considers virtual dispatch targets and accessed (but not called) callables as being used in source, and therefore meriting dataflow node creation.