Skip to content

C++: Speed up the cpp/unbounded-write query for an upcoming change #18485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

MathiasVP
Copy link
Contributor

This fixes a performance problem in #18017 on some projects compiled as C code (where we infer many new guards on #18017 which causes the barrier to explode in size).

This PR adds a very cheap pruning phase to the cpp/unbounded-write query. The "cheapness" comes from the fact that we only evaluate the first stage of dataflow. This provides enough pruning to make the performance impact of #18017 negligible (compare this run with this run).

@Copilot Copilot AI review requested due to automatic review settings January 13, 2025 16:10
@MathiasVP MathiasVP requested a review from a team as a code owner January 13, 2025 16:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (2)
  • cpp/ql/lib/semmle/code/cpp/controlflow/IRGuards.qll: Language not supported
  • cpp/ql/src/Security/CWE/CWE-120/UnboundedWrite.ql: Language not supported

Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more

@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jan 13, 2025
@github-actions github-actions bot added the C++ label Jan 13, 2025
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

* To find the specific guard that performs the comparison
* use `IRGuards.comparesLt`.
*/
predicate comparesLt(Operand left, Operand right, int k, boolean isLt, AbstractValue value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want this predicate and the one below, given that they just forward arguments to compares_eq and compares_lt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the fanout from g to g.comparesLt(...) is quite large on certain repositories, so we want to create a way to compute a version of comparesLt that can be used to prune the set of guards we want to look at early (in the interestingLessThanOrEqual predicate below).

So the important part is that there's no GuardCondition as a column to this predicate (as that is what's giving the large fan-out). We can include the ValueNumber (i.e., the first column from compares_lt) that is currently omitted from this predicate, but since it wasn't needed for the pruning in UnboundedWrite I decided to leave it out to not expose that implementation detail.

predicate interestingLessThanOrEqual(Operand left) {
exists(DataFlowImplCommon::NodeEx node |
node.asNode().asOperand() = left and
BarrierFlow::Stages::Stage1::sinkNode(node, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding/guessing is that stage one does some cheap flow starting at sources to find sink that could be interesting, and we are using that capability to prune down barriers as well.

Is this something that the data flow library could actually do itself? Prune down barriers in this way. Or could it be bad to do more generally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the lessThanOrEqual predicate itself blows up due to the fanout from g in g.comparesLt(left, _, _, true, branch) (or g.comparesEq(left, _, _, true, branch)) on some projects containing a function with many guards that guard equivalent expressions (so that there are many guards for the same left).

As an alternative we could've inlined lessThanOrEqual and isBarrier, and then we'd most likely get to a point where the resulting predicate was restricted by an internal call inside the dataflow library that ensures that the node being barrier'd is part of the path from a source to a sink, but this more robust 🙂

@MathiasVP MathiasVP merged commit aa55b8e into github:main Jan 14, 2025
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants