GUFA: Fix a nondeterminism bug by pre-filtering (#7331)

kripken · web-flow · commit dc055a8ebae5 · 2025-03-05T09:27:06.000-08:00
The repeated "merge new content in, and filter based on the location it
arrives to" operation is non-commutative, and it turns out there was a
corner case we missed. Filtering before and after is enough to make us
return the same result with the ordering swapped.

This does make GUFA 5% slower, unfortunately. But this does result
in better code in some cases aside from fixing the nondeterminism.

Also add clearer comments about the problem here. We likely need to
just make the order of operations here deterministic (though a downside
of that is that the fuzzer wouldn't find bugs like this, and it would be
slower).
diff --git a/src/ir/possible-contents.cpp b/src/ir/possible-contents.cpp
@@ -2413,23 +2413,35 @@ bool Flower::updateContents(LocationIndex locationIndex,
   auto location = getLocation(locationIndex);
 
   // Handle special cases: Some locations can only contain certain contents, so
-  // filter accordingly. In principle we need to filter both before and after
-  // combining with existing content; filtering afterwards is obviously
-  // necessary as combining two things will create something larger than both,
-  // and our representation has limitations (e.g. two different ref types will
-  // result in a cone, potentially a very large one). Filtering beforehand is
-  // necessary for the a more subtle reason: consider a location that contains
-  // an i8 which is sent a 0 and then 0x100. If we filter only after, then we'd
-  // combine 0 and 0x100 first and get "unknown integer"; only by filtering
-  // 0x100 to 0 beforehand (since 0x100 & 0xff => 0) will we combine 0 and 0 and
-  // not change anything, which is correct.
+  // filter accordingly. For example, if anyref arrives to a non-nullable
+  // location, we know it must be (ref any). As a result, each time we update
+  // the contents at a location we are both merging in the new contents, and
+  // filtering based on what we know of the location.
   //
-  // For efficiency reasons we aim to only filter once, depending on the type of
-  // filtering. Most can be filtered a single time afterwards, while for data
-  // locations, where the issue is packed integer fields, it's necessary to do
-  // it before as we've mentioned, and also sufficient (see details in
-  // filterDataContents).
+  // The operation of merging in new content and also filtering is *not*
+  // commutative. Set intersection and union of course is, but the shapes we
+  // work with here are limited, e.g. we have cones which include all children
+  // up to a fixed depth (and not specific children or each with a different
+  // depth). For example, if we start e.g. with a ref.func literal, and a
+  // ref.null arrives, then merging results in a cone that allows null, as that
+  // is the best shape we have that includes both. If the location is non-
+  // nullable then the cone becomes non-nullable, so we ended up with something
+  // worse than the original ref.func literal. In contrast, if we filtered the
+  // new contents first, the null would vanish (as no null is possible in the
+  // non-nullable location), so that order ends up better.
+  //
+  // For those reasons we filter the new contents arriving and also the merged
+  // contents afterwards, to try to get the best results. This also avoids some
+  // nondeterminism hazards with different orders. TODO: This does not avoid
+  // them all, in principle, due to lack of commutativity. Using a deterministic
+  // order (like abstract interpretation) would fix that.
   if (auto* dataLoc = std::get_if<DataLocation>(&location)) {
+    // Filtering data contents is especially important to do before, and not
+    // necessary afterwards. For example, imagine a location that contains an
+    // i8 which is sent a 0 and then 0x100. If we filter only after, then we'd
+    // combine 0 and 0x100 first and get "unknown integer"; only by filtering
+    // 0x100 to 0 beforehand (since 0x100 & 0xff => 0) will we combine 0 and 0
+    // and not change anything, which is best.
     filterDataContents(newContents, *dataLoc);
 #if defined(POSSIBLE_CONTENTS_DEBUG) && POSSIBLE_CONTENTS_DEBUG >= 2
     std::cout << "  pre-filtered data contents:\n";
@@ -2438,21 +2450,29 @@ bool Flower::updateContents(LocationIndex locationIndex,
 #endif
   } else if (auto* exprLoc = std::get_if<ExpressionLocation>(&location)) {
     if (exprLoc->expr->is<StructGet>() || exprLoc->expr->is<ArrayGet>()) {
-      // Packed data reads must be filtered before the combine() operation, as
-      // we must only combine the filtered contents (e.g. if 0xff arrives which
-      // as a signed read is truly 0xffffffff then we cannot first combine the
-      // existing 0xffffffff with the new 0xff, as they are different, and the
-      // result will no longer be a constant). There is no need to filter atomic
-      // RMW operations here because they always do unsigned reads.
+      // As mentioned above, data locations can have packed reads, which require
+      // filtering before. Note that there is no need to filter atomic RMW
+      // operations here because they always do unsigned reads.
       filterPackedDataReads(newContents, *exprLoc);
 #if defined(POSSIBLE_CONTENTS_DEBUG) && POSSIBLE_CONTENTS_DEBUG >= 2
       std::cout << "  pre-filtered packed read contents:\n";
       newContents.dump(std::cout, &wasm);
       std::cout << '\n';
 #endif
     }
+
+    // Generic filtering. We do this both before and after.
+    //
+    // The outcome of this filtering does not affect whether it is worth sending
+    // more later (we compute that at the end), so use a temp out var for that.
+    bool worthSendingMoreTemp = true;
+    filterExpressionContents(newContents, *exprLoc, worthSendingMoreTemp);
+  } else if (auto* globalLoc = std::get_if<GlobalLocation>(&location)) {
+    // Generic filtering. We do this both before and after.
+    filterGlobalContents(newContents, *globalLoc);
   }
 
+  // After filtering newContents, combine it onto the existing contents.
   contents.combine(newContents);
 
   if (contents.isNone()) {
diff --git a/test/lit/passes/gufa-cast-all.wast b/test/lit/passes/gufa-cast-all.wast
@@ -284,3 +284,73 @@
   )
 )
 
+;; Test pre-filtering.
+(module
+  ;; CHECK:      (type $A (func))
+  (type $A (func))
+
+  ;; CHECK:      (import "a" "b" (global $global i32))
+  (import "a" "b" (global $global i32))
+
+  ;; CHECK:      (elem declare func $test)
+
+  ;; CHECK:      (func $test (type $A)
+  ;; CHECK-NEXT:  (drop
+  ;; CHECK-NEXT:   (block (result (ref $A))
+  ;; CHECK-NEXT:    (drop
+  ;; CHECK-NEXT:     (block $block (result (ref $A))
+  ;; CHECK-NEXT:      (drop
+  ;; CHECK-NEXT:       (block (result (ref $A))
+  ;; CHECK-NEXT:        (drop
+  ;; CHECK-NEXT:         (br_if $block
+  ;; CHECK-NEXT:          (ref.func $test)
+  ;; CHECK-NEXT:          (global.get $global)
+  ;; CHECK-NEXT:         )
+  ;; CHECK-NEXT:        )
+  ;; CHECK-NEXT:        (ref.func $test)
+  ;; CHECK-NEXT:       )
+  ;; CHECK-NEXT:      )
+  ;; CHECK-NEXT:      (br_on_non_null $block
+  ;; CHECK-NEXT:       (ref.null nofunc)
+  ;; CHECK-NEXT:      )
+  ;; CHECK-NEXT:      (unreachable)
+  ;; CHECK-NEXT:     )
+  ;; CHECK-NEXT:    )
+  ;; CHECK-NEXT:    (ref.func $test)
+  ;; CHECK-NEXT:   )
+  ;; CHECK-NEXT:  )
+  ;; CHECK-NEXT: )
+  (func $test (type $A)
+    ;; This block is declared as having type $A. Two values appear to reach it:
+    ;; one from a br that sends a ref.func, and one from a br_on_non_null which
+    ;; sends a null with the type (ref nofunc) (in practice that branch is not
+    ;; taken, of course, but GUFA does see all branches; later optimizations
+    ;; would optimize the branch away).
+    ;;
+    ;; We see the ref.func first, so the block $block begins with that content.
+    ;; Then we see the null arrive. Immediately combining the null with a
+    ;; ref.func would give a cone - the best shape we have that can allow both a
+    ;; null and a ref.func. If we later filter the result to the block, which is
+    ;; non-nullable, the cone becomes non-nullable too - but it is a cone now,
+    ;; and not the original ref.func, preventing us from applying the constant
+    ;; value of the ref.func in the output. Early filtering of the arriving
+    ;; content fixes this: the null is immediately filtered into nothing, since
+    ;; it is null and the location can only contain non-nullable contents. As a
+    ;; result, we can optimize the block (and the br_if) to return a ref.func.
+    (drop
+      (block $block (result (ref $A))
+        (drop
+          (br_if $block
+            (ref.func $test)
+            (global.get $global)
+          )
+        )
+        (br_on_non_null $block
+          (ref.null nofunc)
+        )
+        (unreachable)
+      )
+    )
+  )
+)
+