Skip to content

Commit dc055a8

Browse files
authored
GUFA: Fix a nondeterminism bug by pre-filtering (#7331)
The repeated "merge new content in, and filter based on the location it arrives to" operation is non-commutative, and it turns out there was a corner case we missed. Filtering before and after is enough to make us return the same result with the ordering swapped. This does make GUFA 5% slower, unfortunately. But this does result in better code in some cases aside from fixing the nondeterminism. Also add clearer comments about the problem here. We likely need to just make the order of operations here deterministic (though a downside of that is that the fuzzer wouldn't find bugs like this, and it would be slower).
1 parent 9370310 commit dc055a8

File tree

2 files changed

+111
-21
lines changed

2 files changed

+111
-21
lines changed

src/ir/possible-contents.cpp

Lines changed: 41 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2413,23 +2413,35 @@ bool Flower::updateContents(LocationIndex locationIndex,
24132413
auto location = getLocation(locationIndex);
24142414

24152415
// Handle special cases: Some locations can only contain certain contents, so
2416-
// filter accordingly. In principle we need to filter both before and after
2417-
// combining with existing content; filtering afterwards is obviously
2418-
// necessary as combining two things will create something larger than both,
2419-
// and our representation has limitations (e.g. two different ref types will
2420-
// result in a cone, potentially a very large one). Filtering beforehand is
2421-
// necessary for the a more subtle reason: consider a location that contains
2422-
// an i8 which is sent a 0 and then 0x100. If we filter only after, then we'd
2423-
// combine 0 and 0x100 first and get "unknown integer"; only by filtering
2424-
// 0x100 to 0 beforehand (since 0x100 & 0xff => 0) will we combine 0 and 0 and
2425-
// not change anything, which is correct.
2416+
// filter accordingly. For example, if anyref arrives to a non-nullable
2417+
// location, we know it must be (ref any). As a result, each time we update
2418+
// the contents at a location we are both merging in the new contents, and
2419+
// filtering based on what we know of the location.
24262420
//
2427-
// For efficiency reasons we aim to only filter once, depending on the type of
2428-
// filtering. Most can be filtered a single time afterwards, while for data
2429-
// locations, where the issue is packed integer fields, it's necessary to do
2430-
// it before as we've mentioned, and also sufficient (see details in
2431-
// filterDataContents).
2421+
// The operation of merging in new content and also filtering is *not*
2422+
// commutative. Set intersection and union of course is, but the shapes we
2423+
// work with here are limited, e.g. we have cones which include all children
2424+
// up to a fixed depth (and not specific children or each with a different
2425+
// depth). For example, if we start e.g. with a ref.func literal, and a
2426+
// ref.null arrives, then merging results in a cone that allows null, as that
2427+
// is the best shape we have that includes both. If the location is non-
2428+
// nullable then the cone becomes non-nullable, so we ended up with something
2429+
// worse than the original ref.func literal. In contrast, if we filtered the
2430+
// new contents first, the null would vanish (as no null is possible in the
2431+
// non-nullable location), so that order ends up better.
2432+
//
2433+
// For those reasons we filter the new contents arriving and also the merged
2434+
// contents afterwards, to try to get the best results. This also avoids some
2435+
// nondeterminism hazards with different orders. TODO: This does not avoid
2436+
// them all, in principle, due to lack of commutativity. Using a deterministic
2437+
// order (like abstract interpretation) would fix that.
24322438
if (auto* dataLoc = std::get_if<DataLocation>(&location)) {
2439+
// Filtering data contents is especially important to do before, and not
2440+
// necessary afterwards. For example, imagine a location that contains an
2441+
// i8 which is sent a 0 and then 0x100. If we filter only after, then we'd
2442+
// combine 0 and 0x100 first and get "unknown integer"; only by filtering
2443+
// 0x100 to 0 beforehand (since 0x100 & 0xff => 0) will we combine 0 and 0
2444+
// and not change anything, which is best.
24332445
filterDataContents(newContents, *dataLoc);
24342446
#if defined(POSSIBLE_CONTENTS_DEBUG) && POSSIBLE_CONTENTS_DEBUG >= 2
24352447
std::cout << " pre-filtered data contents:\n";
@@ -2438,21 +2450,29 @@ bool Flower::updateContents(LocationIndex locationIndex,
24382450
#endif
24392451
} else if (auto* exprLoc = std::get_if<ExpressionLocation>(&location)) {
24402452
if (exprLoc->expr->is<StructGet>() || exprLoc->expr->is<ArrayGet>()) {
2441-
// Packed data reads must be filtered before the combine() operation, as
2442-
// we must only combine the filtered contents (e.g. if 0xff arrives which
2443-
// as a signed read is truly 0xffffffff then we cannot first combine the
2444-
// existing 0xffffffff with the new 0xff, as they are different, and the
2445-
// result will no longer be a constant). There is no need to filter atomic
2446-
// RMW operations here because they always do unsigned reads.
2453+
// As mentioned above, data locations can have packed reads, which require
2454+
// filtering before. Note that there is no need to filter atomic RMW
2455+
// operations here because they always do unsigned reads.
24472456
filterPackedDataReads(newContents, *exprLoc);
24482457
#if defined(POSSIBLE_CONTENTS_DEBUG) && POSSIBLE_CONTENTS_DEBUG >= 2
24492458
std::cout << " pre-filtered packed read contents:\n";
24502459
newContents.dump(std::cout, &wasm);
24512460
std::cout << '\n';
24522461
#endif
24532462
}
2463+
2464+
// Generic filtering. We do this both before and after.
2465+
//
2466+
// The outcome of this filtering does not affect whether it is worth sending
2467+
// more later (we compute that at the end), so use a temp out var for that.
2468+
bool worthSendingMoreTemp = true;
2469+
filterExpressionContents(newContents, *exprLoc, worthSendingMoreTemp);
2470+
} else if (auto* globalLoc = std::get_if<GlobalLocation>(&location)) {
2471+
// Generic filtering. We do this both before and after.
2472+
filterGlobalContents(newContents, *globalLoc);
24542473
}
24552474

2475+
// After filtering newContents, combine it onto the existing contents.
24562476
contents.combine(newContents);
24572477

24582478
if (contents.isNone()) {

test/lit/passes/gufa-cast-all.wast

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,3 +284,73 @@
284284
)
285285
)
286286

287+
;; Test pre-filtering.
288+
(module
289+
;; CHECK: (type $A (func))
290+
(type $A (func))
291+
292+
;; CHECK: (import "a" "b" (global $global i32))
293+
(import "a" "b" (global $global i32))
294+
295+
;; CHECK: (elem declare func $test)
296+
297+
;; CHECK: (func $test (type $A)
298+
;; CHECK-NEXT: (drop
299+
;; CHECK-NEXT: (block (result (ref $A))
300+
;; CHECK-NEXT: (drop
301+
;; CHECK-NEXT: (block $block (result (ref $A))
302+
;; CHECK-NEXT: (drop
303+
;; CHECK-NEXT: (block (result (ref $A))
304+
;; CHECK-NEXT: (drop
305+
;; CHECK-NEXT: (br_if $block
306+
;; CHECK-NEXT: (ref.func $test)
307+
;; CHECK-NEXT: (global.get $global)
308+
;; CHECK-NEXT: )
309+
;; CHECK-NEXT: )
310+
;; CHECK-NEXT: (ref.func $test)
311+
;; CHECK-NEXT: )
312+
;; CHECK-NEXT: )
313+
;; CHECK-NEXT: (br_on_non_null $block
314+
;; CHECK-NEXT: (ref.null nofunc)
315+
;; CHECK-NEXT: )
316+
;; CHECK-NEXT: (unreachable)
317+
;; CHECK-NEXT: )
318+
;; CHECK-NEXT: )
319+
;; CHECK-NEXT: (ref.func $test)
320+
;; CHECK-NEXT: )
321+
;; CHECK-NEXT: )
322+
;; CHECK-NEXT: )
323+
(func $test (type $A)
324+
;; This block is declared as having type $A. Two values appear to reach it:
325+
;; one from a br that sends a ref.func, and one from a br_on_non_null which
326+
;; sends a null with the type (ref nofunc) (in practice that branch is not
327+
;; taken, of course, but GUFA does see all branches; later optimizations
328+
;; would optimize the branch away).
329+
;;
330+
;; We see the ref.func first, so the block $block begins with that content.
331+
;; Then we see the null arrive. Immediately combining the null with a
332+
;; ref.func would give a cone - the best shape we have that can allow both a
333+
;; null and a ref.func. If we later filter the result to the block, which is
334+
;; non-nullable, the cone becomes non-nullable too - but it is a cone now,
335+
;; and not the original ref.func, preventing us from applying the constant
336+
;; value of the ref.func in the output. Early filtering of the arriving
337+
;; content fixes this: the null is immediately filtered into nothing, since
338+
;; it is null and the location can only contain non-nullable contents. As a
339+
;; result, we can optimize the block (and the br_if) to return a ref.func.
340+
(drop
341+
(block $block (result (ref $A))
342+
(drop
343+
(br_if $block
344+
(ref.func $test)
345+
(global.get $global)
346+
)
347+
)
348+
(br_on_non_null $block
349+
(ref.null nofunc)
350+
)
351+
(unreachable)
352+
)
353+
)
354+
)
355+
)
356+

0 commit comments

Comments
 (0)