Refactor concretization traversal #576

jacobhinkle · 2023-07-10T16:29:04Z

Concretization requires two fundamental operations:

Concretize individual dynamic operations. For example, we need to set the root to rfactor transforms in a dynamic reshape.
Propagate information downstream. This is because the IterTypes and extent expressions for IterDomains might change during step 1 but downstream expressions have already been defined using the original symbolic IterDomains. This propagation step ensures that we have no dangling symbolic IterDomains and that extent expressions are properly replaced so that everything is consistent and appears as it would have if the Fusion had been static at definition.

The downstream propagation in the second operation needs to pass through IterDomain expressions like those found in pad or reshape ops. But it also needs to cross from producer TVs to consumers. Those jumps are more complicated to handle in mutate(IterDomain*), since we cannot use the standard definition/uses machinery to find tensor expressions linking IterDomains in this way. Instead, currently we use mutate(TensorView* tv) and call propagateProducerToConsumer() at the beginning of that method. That method finds exact mapped producer IDs using tv->definition() and PairwiseRootDomainMap and propagates their concretization information to the corresponding root IDs in tv. The root IDs are then updated and propagation along IterDomain expressions is done between root and rfactor of tv.

The problem comes when we'd like to propagate information in the root->rfactor expressions. Since we do not modify the Fusion during traversal, these expressions are fixed. However, when we call propagateProducerToConsumer() we might register a concretization of some root IterDomains. These concretizations should ideally propagate in all of their uses. However, propagateProducerToConsumer() is called in mutate(TensorView* tv) which is called after all of tv's dependencies have been mutated. In particular, tv->domain() and all IterDomains in it must be mutated before this point. Subsequently StmtSort::getExprs is called on the unmutated root and rfactor, requiring manual mutation of intermediate exprs. That is why in the current code we need to perform another manual traversal from root to rfactor inside mutate(tv), which is the cause of some unneeded complexity since those IterDomains might be mutated multiple times.

This PR addresses this by splitting the traversal into three loops, each of which is done in topological order:

First we loop over all Vals which are neither TensorDomains or TensorViews and call mutate(val) on each. We exclude TDs and TVs here since OptOutMutator::mutate actually modifies the Fusion when these are called. This loop does not modify the Fusion at all, but registers mutations for most Vals.
The second loop is over all Exprs. This actually will remove any expression with registered mutations of its inputs, outputs, or attributes, and replaces it with a new one linking those new vals.
The third loop is only over TensorDomains and TensorViews. TensorDomains are registered for mutation if their IterDomains were registered for mutation, at which point a new TensorDomain is created. It is important that the Fusion has properly linked root to rfactor IDs at this point, which is done in the second loop. TensorViews then have their domain() (which is mutable) set to the new TensorDomain.

In order to perform the producer to consumer jump across TensorView expressions, we first extract mappings from consumer IDs to sets of producer IDs before the traversal begins. Those sets are then looked up in mutate(IterDomain*) in the first loop.

Note on `IterVisitor`'s topological ordering

Concretization, like most other traversals in nvfuser, uses IterVisitor to obtain a topologically ordered set of statements in the Fusion. This class guarantees that the statements will be in proper topological order with respect to the Fusion graph. This graph has directed edges from input Vals to Exprs, from Exprs to output Vals, from attribute Statements to Exprs, and from member Statements to Vals (such as TensorDomain (domain) -> TensorView, Val (start, stop, extent) -> IterDomain, etc.).

At first glance it seems that this is sufficient, but it does not represent the relation that the rfactor IterDomains of producer TensorViews are dependencies of their corresponding consumer root IterDomains. These relations between "aligned" IterDomains are not present in the graph since they could easily become inconsistent when replacing TensorDomains or TensorViews. However, their absence means that there may be valid topological orderings that visit consumer domains before producer domains if neither has any other dependencies.

The current implementation of IterVisitor::traverseBetween does in fact maintain the ordering we'd like, since consumer->definition() is processed before consumer->domain(). A comment is added to the definition of IterVisitor to briefly explain this.

jacobhinkle · 2023-07-10T16:51:43Z

csrc/dynamic_transform.cpp


-  void mutate(TensorView* tv) final;
-
  void mutate(TensorDomain* td) final;


The override of mutate(TensorDomain*) merely updates contiguity to reflect any introduced Broadcast IDs. This is a general situation that could probably be added to OptOutMutator::mutate(TensorDomain*) instead to further simplify the concretization code.

jacobhinkle · 2023-07-10T16:53:34Z

csrc/dynamic_transform.cpp

+  if (auto def = id->definition()) {
+    // Determine concrete IterType based on promotion of inputs to def
+    IterType iter_type = IterType::Symbolic;
+    for (auto inp_id : ir_utils::filterByType<IterDomain>(def->inputs())) {
+      auto updated_id = maybeMutated(inp_id)->as<IterDomain>();
+      iter_type = ops::promoteIterType(iter_type, updated_id->getIterType());
+    }
+    TORCH_INTERNAL_ASSERT(
+        iter_type != IterType::Symbolic,
+        "Failed to concretize an output IterType for expression: ",
+        def->toString());
+    auto concretized_id = IterDomainBuilder(id).iter_type(iter_type).build();
+    registerConcretization(id, concretized_id);
+  } else {


This branch replaces the root->rfactor propagation that was removed from mutate(TensorView*).

jacobhinkle · 2023-07-10T16:54:00Z

csrc/dynamic_transform.cpp

+    // IterDomains without definitions might be root domains for the output of a
+    // TensorView expression. If so, we should propagate their concretization in
+    // the producer to consumer direction.


This branch replaces propagateFromProducerToConsumer.

jacobhinkle · 2023-07-10T16:58:58Z

test/test_dynamic_transform.cpp

+      // This was previously "working" by concretizing the size-1 pad to
+      // Iteration, even though it should be Broadcast. When set properly to
+      // Broadcast, it fails with an error in ConcretizedBroadcastDomains.
+      //{{3, 5}, {0, -4}, true},


This case deserves its own issue, which I will add. When there is a broadcast domain introduced by concretizing a resize we hit an error since we can't concretize the broadcast. On main, this case actually concretizes the pad as Iteration even if extent is 1. Instead, we should probably translate a Resize that results in size 1 as a select+broadcast or a full(pad_value)+broadcast; however that is a complicated change since we need to operate on the TensorView containing the Resized ID, meaning we would change concretization info to track TV ops (cat, pad, slice) instead of ID op Resize. For now I have disabled this case. Once I file an issue I'll point to it here in the comment.

jacobhinkle · 2023-07-10T17:03:42Z

csrc/dynamic_transform.cpp

-    if (stmt->isA<Val>()) {
-      mutate(stmt);


Instead of only mutating Vals, we now mutate Exprs as well, which replaces the Expr in place if any inputs or outputs have changed. Note that outputs of Exprs are mutated after their definition has been mutated, so we should be careful updating a Val that has a definition. But of course we should be careful in that case in the existing code too.

Also register extents in concretizeReshape

jacobhinkle · 2023-07-12T14:01:05Z

csrc/dynamic_transform.cpp

+  std::vector<Val*> non_tds_tvs;
+  std::vector<Expr*> all_exprs;
+  std::vector<Val*> tvs_and_tds;
+  for (auto stmt : StmtSort::getStmts(info_->fusion(), true)) {
+    if (stmt->isExpr()) {
+      all_exprs.push_back(stmt->asExpr());
+    } else {
+      auto val = stmt->asVal();
+      if (val->isA<TensorView>() || val->isA<TensorDomain>()) {
+        tvs_and_tds.push_back(val);
+      } else {
+        non_tds_tvs.push_back(val);
+      }


What was previously a single loop over all_stmts is now three separate loops over these subsets.

jacobhinkle · 2023-07-12T14:20:43Z

csrc/mutator.cpp

+    if (updated_id->isBroadcast()) {
+      contig.at(i) = std::nullopt;
+    }
+  }


This was previously done in dynamic_transform.cpp but I think it makes sense to always recompute contig if mutating Symbolic to Broadcast.

Note that this only covers the case where the original ID was Symbolic and checks that it was marked contig then. If we mutated from an original ID with type Broadcast or Iteration we might want to fill in a different contiguity here instead.

jacobhinkle · 2023-07-14T13:46:38Z

Closing in favor of #449, which was fixed without needing such an invasive refactor.

jacobhinkle added 3 commits July 10, 2023 11:03

Refactor to traverse in forward direction only

a58ce3e

Comment out PadShmoo {{3, 5}, {0, -4}, true} case

88f5d5e

Add comment about topo ordering in iter_visitor.h

bdec706

jacobhinkle commented Jul 10, 2023

View reviewed changes

jacobhinkle requested a review from naoyam July 10, 2023 17:03

jacobhinkle marked this pull request as ready for review July 10, 2023 17:04

jacobhinkle added 2 commits July 12, 2023 08:28

More carefully handle extents in mutate(IterDomain*)

e3f5a35

Also register extents in concretizeReshape

Traverse in three passes: Most Vals, Exprs, TV/TDs

4de65f9

jacobhinkle marked this pull request as draft July 12, 2023 12:44

jacobhinkle and others added 2 commits July 12, 2023 09:54

Fix logic and clean up comments

8dab796

Merge branch 'main' into concretization_topo_order

d278b02

jacobhinkle marked this pull request as ready for review July 12, 2023 14:00

jacobhinkle commented Jul 12, 2023

View reviewed changes

jacobhinkle changed the title ~~Refactor concretization to traverse in strict topo order~~ Refactor concretization traversal Jul 12, 2023

Move contiguity update to OptOutMutator::mutate(TensorDomain*)

9b78b17

jacobhinkle commented Jul 12, 2023

View reviewed changes

jacobhinkle added a commit that referenced this pull request Jul 12, 2023

Fix up this PR based on lessons from #576

7f9b588

jacobhinkle marked this pull request as draft July 13, 2023 11:47

jacobhinkle closed this Jul 14, 2023

jacobhinkle deleted the concretization_topo_order branch July 25, 2023 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor concretization traversal #576

Refactor concretization traversal #576

Uh oh!

jacobhinkle commented Jul 10, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jul 10, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jul 10, 2023

Uh oh!

jacobhinkle Jul 10, 2023

Uh oh!

jacobhinkle Jul 10, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jul 10, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jul 12, 2023

Uh oh!

jacobhinkle Jul 12, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jul 12, 2023

Uh oh!

jacobhinkle commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		void mutate(TensorView* tv) final;

		void mutate(TensorDomain* td) final;

Refactor concretization traversal #576

Refactor concretization traversal #576

Uh oh!

Conversation

jacobhinkle commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note on IterVisitor's topological ordering

Uh oh!

jacobhinkle Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented Jul 10, 2023 •

edited

Loading

Note on `IterVisitor`'s topological ordering

jacobhinkle Jul 10, 2023 •

edited

Loading

jacobhinkle Jul 10, 2023 •

edited

Loading

jacobhinkle Jul 10, 2023 •

edited

Loading

jacobhinkle Jul 12, 2023 •

edited

Loading