Set extents of empty tensors to zeroVal() during concretization #449

jacobhinkle · 2023-06-05T16:40:44Z

A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors.

This PR:

Marks a Fusion as dynamic if dynamic reshapes/resizes exist or if any alive TensorView has a static size-zero extent or a dynamic extent, since it might be empty. This is means only Fusions with nothing but concrete non-zero sizes are static now. That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and those TensorViews will be modified during concretization.
Adds a pass done during getConcretizationInfo() that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor.
During concretization, sets the size-0 extents of identified empty tensors to constant 0.

When encountering a new set of input sizes/scalars, we evaluate a minimal set of Vals (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero.

Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448).

Fixes #365 and fixes #264. This replaces PRs #369 and #269.

jacobhinkle · 2023-06-05T16:45:19Z

In this approach, I've replaced extents that concretize to zero with fusion->zeroVal(), which we could use as a stand-in for having a new IterType for empty extents. Using the check for constant zero means we could avoid needing a separate IterType for size-0 reduction axes.

This might make the lookups there a bit slower, as previously these would be very small sets. However, this is necessary in order to find differences in empty-tensor concretization. Note that the actual keys won't commonly change but this will find input vals that affect output sizes, such as in FusionStandaloneIota_CUDA, which was failing before this fix.

Verifies that this branch fixes #365

jacobhinkle · 2023-06-08T19:23:43Z

test/test_gpu1.cpp

+      at::empty({}, options),
+      at::empty({}, options),


Fixed a transient bug where these tensors were not technically set as CUDA tensors.

test/test_dynamic_transform.cpp

With this the only failing test is due to not stopping traversal at empty tensor definitions.

jacobhinkle · 2023-07-12T19:10:49Z

csrc/dynamic_transform.cpp

-  for (auto stmt : all_stmts) {
-    if (stmt->isA<Val>()) {
-      mutate(stmt);
+  auto all_stmts = StmtSort::getStmts(info_->fusion());


Note we no longer traverse into members.

Don't remember why it did

I changed it to true in #258 so that the traversal would handle IterDomains, but I didn't have a clear enough picture of how that should work at that time.

Now we explicitly are only traversing the TensorView graph, and we only handle IterDomain members manually.

jacobhinkle · 2023-07-12T19:16:10Z

csrc/ops/alias.h

 TORCH_CUDA_CU_API TensorView* pad(
    TensorView* x,
    const std::vector<Val*>& pad_widths,
-    Val* value = nullptr);
+    Val* value = nullptr,
+    std::optional<IterType> iter_type_opt = std::nullopt);

 //! Concatenate tensors in the given dimension
 TORCH_CUDA_CU_API TensorView* cat(
    const std::vector<TensorView*>& inputs,
-    int64_t dim);
+    int64_t dim,
+    std::optional<IterType> iter_type_opt = std::nullopt);


These options let us pipe through an optional IterType to avoid symbolic IterTypes popping up during the RemoveEmptyPass. This is a bug that was exposed by the dynamic empty cat tests.

jacobhinkle · 2023-07-12T20:01:55Z

csrc/fusion_segmenter.cpp

+      if (output_uses.size() == 0) {
+        // Unused outputs terminate here
+        continue;
+      }


This fixes a segfault that popped up in the Standalone* tests in test_tensor_factories.cpp. These tests have scalar inputs which are used in FullOps as fill values. When the extents are zero, concretization replaces those FullOps with others so that the output size is hardcoded. This means the original fill value (which is now irrelevant since the output is empty) has no uses. The cast to target dtype before the full means we have input -> cast -> (no uses), which caused a segfault here. We now just check that there's at least one use to avoid a segfault.

jacobhinkle · 2023-07-12T20:16:39Z

!build

naoyam · 2023-07-13T09:34:20Z

csrc/optimization/remove_empty.cpp

+      // symbolic axis, since the inputs have potentially symbolic extents in
+      // the cat dimension. However, since we have already undergone
+      // concretization at this point, we can trust that the original IterType,


A little confused here. If concretization is done, why inputs may have symbolic extents?

The input extents are symbolic (e.g. i0) or they are zeroVal(), but since we use the cat command here to do the replacement, we need some way to tell it to not use IterType::Symbolic and trust us at this point that the output will be Iteration.

Oh, so, this part:

// cat() might result in // symbolic axis, since the inputs have potentially symbolic extents in // the cat dimension.

It just refers to what cat does in general, but in this case we don't want cat to be conservative as we are sure it doesn't need to use symbolic domains.

Yes exactly. It should be more clear in the comment probably.

Updated the comment.

naoyam · 2023-07-13T10:07:00Z

csrc/fusion_segmenter.cpp

      }

-      if (expr->output(0)->uses().size() > 1) {
+      // expr is a unary op so there is a single output. Here we look at that


Is this a general bug fix or specific to the replacement of size-zero extents?

I think it was a general bug that was difficult to tease out. I will try and replicate it on main to be sure.

Filed #585. I will move this fix to a small PR there for further discussion. Until that's fixed, the present PR will break all the tests in test_tensor_factories.cpp.

csrc/dynamic_transform.cpp

naoyam · 2023-07-13T10:25:28Z

csrc/dynamic_transform.cpp

+      ir_utils::replaceValInExpr(use, ext, zero);
    }
+    // Register the concretization of this scalar, which allows us to replace it
+    // whenever it is used as an extent member of an IterDomain.


This is because ext is "used" by an IterDomain but it isn't part of uses(), correct?

Yes exactly. When we replace in uses() we update things like i0 + 1, but iS2{i0} would remain unless registered for mutation.

Can you please add this to the code comment as well?

naoyam · 2023-07-13T10:26:09Z

csrc/dynamic_transform.cpp

-  for (auto stmt : all_stmts) {
-    if (stmt->isA<Val>()) {
-      mutate(stmt);
+  auto all_stmts = StmtSort::getStmts(info_->fusion());


Don't remember why it did

naoyam · 2023-07-13T10:29:48Z

csrc/dynamic_transform.cpp


    auto concretized_id =
-        IterDomainBuilder(root_id).iter_type(*id_type).build();
+        IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>())


Is this change because the extent of root_id may be changed to 0?

Yes exactly, when we call OptOutMutator::mutate(root_id) it will register it for mutation if any of its members are mutated, including the extent. Since we are going to register another mutation here, we don't want to lose those changes, so we base concretized_id on the mutated ID.

naoyam · 2023-07-13T10:46:16Z

csrc/dynamic_transform.cpp

 void DynamicTransformConcretizer::mutate(TensorView* tv) {
-  if (!tv->domain()->hasSymbolicAxis()) {
-    return;
+  for (auto root_id : tv->getRootDomain()) {


This looks a bit confusing. A root ID may be mutated here, but there's also propagateFromProducerToConsumer below, which may also mutate a root ID. Is there any concern of conflicts?

At line 659 in propagateFromProducerToConsumer, we reflect the earlier mutation by basing the new mutated ID on maybeMutated(root_id). That way we compose both mutations.

Well, what happens if the same root ID is mutated both at line 546 and within line 551? Doesn't the latter just overwrite the mutation at line 546?

Oh, are you referring to line 659 before this PR?

It will overwrite the mutation, but since we using IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>()) as the basis of the new mutation lets us update the IterType without changing other mutations like the extent.

OK, I see it now. Maybe it'd be helpful to add that to the code comment too.

Co-authored-by: Naoya Maruyama <[email protected]>

naoyam

I'm approving this now, but let me review the segmentation change separately before merging this

jacobhinkle · 2023-07-13T19:39:19Z

I'm approving this now, but let me review the segmentation change separately before merging this

Sounds good. The current fix in the segmentation PR is slightly different from what is here. I think that one makes more sense and is a bit simpler so if approved, I'll just revert the one here before merging main.

naoyam · 2023-07-13T20:45:23Z

Approved the other PR as well

jacobhinkle added 5 commits June 3, 2023 21:01

Introduce info_.has_possible_empty_tensor_

55b0dfd

Remove fusion arg from concretizeFusion

759d803

Sketch of empty branch finding in conc info

e0f4eb1

Cleanup and fix assumptions in a couple tests

5b56be7

noReductions, replaceOutput, add tests

7d01339

jacobhinkle mentioned this pull request Jun 5, 2023

Fusion level axioms #440

Merged

jacobhinkle added 6 commits June 5, 2023 12:51

Clean up clang-tidy

c16c362

Merge remote-tracking branch 'origin/main' into remove_empty_branches

137ad20

Fix FusionMagicSchedulerInstanceNormalizationBackward_CUDA

b15929f

Bind to expanded extent if needed, and only if non-const

2a2eef7

Print pre-concretization fusion for fusion_ir_concretized

aee5a2f

jacobhinkle mentioned this pull request Jun 8, 2023

Clip slice range expressions #460

Merged

jacobhinkle added 10 commits June 8, 2023 11:09

Remove empty reductions

d41decc

Clean up EmptyBranchFinder

96e105a

Simplify removeEmptyBranches

3892578

Merge remote-tracking branch 'origin/main' into remove_empty_branches

0e75ba1

Evaluate extents instead of shallow getInt

e38d027

Add FusionResizeMultiSliceEmpty_CUDA test

6099940

Verifies that this branch fixes #365

Add FusionReduceZeroElementTensor_CUDA

478ed4a

Sweep reduction dims in reduce zero elt test

5ce87fe

Remove debug print

13a5b57

Add failing tests

c56bd86

jacobhinkle commented Jun 8, 2023

View reviewed changes

test/test_dynamic_transform.cpp Outdated Show resolved Hide resolved

jacobhinkle commented Jun 8, 2023

View reviewed changes

test/test_dynamic_transform.cpp Outdated Show resolved Hide resolved

jacobhinkle added 3 commits June 8, 2023 16:24

Add Fusion::replaceInput

145f4de

With this the only failing test is due to not stopping traversal at empty tensor definitions.

Silence clang-tidy

91101b9

Fix test by switching from BackwardVisitor to standalone function

9a4f857

Clean up stale code

aa5d75c

jacobhinkle commented Jul 12, 2023

View reviewed changes

jacobhinkle added 2 commits July 12, 2023 15:12

Remove unused Fusion::replaceInput

ae8b5b5

Minor comment cleanup

da70cc1

jacobhinkle commented Jul 12, 2023

View reviewed changes

jacobhinkle added 3 commits July 12, 2023 15:21

Update comment in FusionResizeMultiSliceEmpty_CUDA

899ee21

Verify that tv2 is replaced by full in #365 repro test

69356eb

Fix segfault in segmentation

4dcff1e

jacobhinkle commented Jul 12, 2023

View reviewed changes

Merge branch 'main' into remove_empty_branches

7817163

jacobhinkle marked this pull request as ready for review July 12, 2023 20:02

jacobhinkle added 2 commits July 12, 2023 16:10

Fix clang-tidy warning about .empty()

aac04e8

Update test to accomodate recent Scalar refactor

a1c9238

naoyam reviewed Jul 13, 2023

View reviewed changes

jacobhinkle and others added 3 commits July 13, 2023 07:29

Initialize maybe_zero_extents_ more efficiently

16a9418

Co-authored-by: Naoya Maruyama <[email protected]>

Minor type and linter fix

91e3c04

Merge remote-tracking branch 'origin/main' into remove_empty_branches

756e533

jacobhinkle mentioned this pull request Jul 13, 2023

Segfault segmenting in presence of dangling unary Op chain from inputs #585

Closed

jacobhinkle and others added 3 commits July 13, 2023 15:31

Expand comment about why we concretize ext->zero

14468a6

Update comment about cat in EmptyTensorRemover

6f2db3b

Merge branch 'main' into remove_empty_branches

8eda179

naoyam approved these changes Jul 13, 2023

View reviewed changes

jacobhinkle added 2 commits July 14, 2023 07:16

Merge remote-tracking branch 'origin/main' into remove_empty_branches

a55586a

Merge remote-tracking branch 'origin/main' into remove_empty_branches

3c10b7a

jacobhinkle merged commit 44bb3d6 into main Jul 14, 2023

jacobhinkle deleted the remove_empty_branches branch July 14, 2023 13:45

jacobhinkle mentioned this pull request Jul 14, 2023

Refactor concretization traversal #576

Closed

Set extents of empty tensors to zeroVal() during concretization #449

Set extents of empty tensors to zeroVal() during concretization #449

Uh oh!

Conversation

jacobhinkle commented Jun 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented Jun 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented Jul 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented Jul 13, 2023

Uh oh!

naoyam commented Jul 13, 2023

Uh oh!

Reviewers

jacobhinkle commented Jun 5, 2023 •

edited

Loading

jacobhinkle Jul 13, 2023 •

edited

Loading