Skip to content

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Jun 5, 2023

A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors.

This PR:

  • Marks a Fusion as dynamic if dynamic reshapes/resizes exist or if any alive TensorView has a static size-zero extent or a dynamic extent, since it might be empty. This is means only Fusions with nothing but concrete non-zero sizes are static now. That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and those TensorViews will be modified during concretization.
  • Adds a pass done during getConcretizationInfo() that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor.
  • During concretization, sets the size-0 extents of identified empty tensors to constant 0.

When encountering a new set of input sizes/scalars, we evaluate a minimal set of Vals (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero.

Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448).

Fixes #365 and fixes #264. This replaces PRs #369 and #269.

@jacobhinkle jacobhinkle mentioned this pull request Jun 5, 2023
@jacobhinkle
Copy link
Collaborator Author

In this approach, I've replaced extents that concretize to zero with fusion->zeroVal(), which we could use as a stand-in for having a new IterType for empty extents. Using the check for constant zero means we could avoid needing a separate IterType for size-0 reduction axes.

This might make the lookups there a bit slower, as previously these
would be very small sets. However, this is necessary in order to find
differences in empty-tensor concretization. Note that the actual keys
won't commonly change but this will find input vals that affect output
sizes, such as in FusionStandaloneIota_CUDA, which was failing before
this fix.
Comment on lines +8109 to +8110
at::empty({}, options),
at::empty({}, options),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a transient bug where these tensors were not technically set as CUDA tensors.

With this the only failing test is due to not stopping traversal at
empty tensor definitions.
for (auto stmt : all_stmts) {
if (stmt->isA<Val>()) {
mutate(stmt);
auto all_stmts = StmtSort::getStmts(info_->fusion());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we no longer traverse into members.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remember why it did

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to true in #258 so that the traversal would handle IterDomains, but I didn't have a clear enough picture of how that should work at that time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we explicitly are only traversing the TensorView graph, and we only handle IterDomain members manually.

Comment on lines 97 to +107
TORCH_CUDA_CU_API TensorView* pad(
TensorView* x,
const std::vector<Val*>& pad_widths,
Val* value = nullptr);
Val* value = nullptr,
std::optional<IterType> iter_type_opt = std::nullopt);

//! Concatenate tensors in the given dimension
TORCH_CUDA_CU_API TensorView* cat(
const std::vector<TensorView*>& inputs,
int64_t dim);
int64_t dim,
std::optional<IterType> iter_type_opt = std::nullopt);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options let us pipe through an optional IterType to avoid symbolic IterTypes popping up during the RemoveEmptyPass. This is a bug that was exposed by the dynamic empty cat tests.

Comment on lines 3405 to 3408
if (output_uses.size() == 0) {
// Unused outputs terminate here
continue;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a segfault that popped up in the Standalone* tests in test_tensor_factories.cpp. These tests have scalar inputs which are used in FullOps as fill values. When the extents are zero, concretization replaces those FullOps with others so that the output size is hardcoded. This means the original fill value (which is now irrelevant since the output is empty) has no uses. The cast to target dtype before the full means we have input -> cast -> (no uses), which caused a segfault here. We now just check that there's at least one use to avoid a segfault.

@jacobhinkle jacobhinkle marked this pull request as ready for review July 12, 2023 20:02
@jacobhinkle
Copy link
Collaborator Author

!build

Comment on lines 265 to 267
// symbolic axis, since the inputs have potentially symbolic extents in
// the cat dimension. However, since we have already undergone
// concretization at this point, we can trust that the original IterType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused here. If concretization is done, why inputs may have symbolic extents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input extents are symbolic (e.g. i0) or they are zeroVal(), but since we use the cat command here to do the replacement, we need some way to tell it to not use IterType::Symbolic and trust us at this point that the output will be Iteration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so, this part:

       // cat() might result in
      // symbolic axis, since the inputs have potentially symbolic extents in
      // the cat dimension.

It just refers to what cat does in general, but in this case we don't want cat to be conservative as we are sure it doesn't need to use symbolic domains.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. It should be more clear in the comment probably.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

}

if (expr->output(0)->uses().size() > 1) {
// expr is a unary op so there is a single output. Here we look at that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a general bug fix or specific to the replacement of size-zero extents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a general bug that was difficult to tease out. I will try and replicate it on main to be sure.

Copy link
Collaborator Author

@jacobhinkle jacobhinkle Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #585. I will move this fix to a small PR there for further discussion. Until that's fixed, the present PR will break all the tests in test_tensor_factories.cpp.

ir_utils::replaceValInExpr(use, ext, zero);
}
// Register the concretization of this scalar, which allows us to replace it
// whenever it is used as an extent member of an IterDomain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because ext is "used" by an IterDomain but it isn't part of uses(), correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. When we replace in uses() we update things like i0 + 1, but iS2{i0} would remain unless registered for mutation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this to the code comment as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (auto stmt : all_stmts) {
if (stmt->isA<Val>()) {
mutate(stmt);
auto all_stmts = StmtSort::getStmts(info_->fusion());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remember why it did


auto concretized_id =
IterDomainBuilder(root_id).iter_type(*id_type).build();
IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change because the extent of root_id may be changed to 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, when we call OptOutMutator::mutate(root_id) it will register it for mutation if any of its members are mutated, including the extent. Since we are going to register another mutation here, we don't want to lose those changes, so we base concretized_id on the mutated ID.

void DynamicTransformConcretizer::mutate(TensorView* tv) {
if (!tv->domain()->hasSymbolicAxis()) {
return;
for (auto root_id : tv->getRootDomain()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit confusing. A root ID may be mutated here, but there's also propagateFromProducerToConsumer below, which may also mutate a root ID. Is there any concern of conflicts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 659 in propagateFromProducerToConsumer, we reflect the earlier mutation by basing the new mutated ID on maybeMutated(root_id). That way we compose both mutations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what happens if the same root ID is mutated both at line 546 and within line 551? Doesn't the latter just overwrite the mutation at line 546?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, are you referring to line 659 before this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will overwrite the mutation, but since we using IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>()) as the basis of the new mutation lets us update the IterType without changing other mutations like the extent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see it now. Maybe it'd be helpful to add that to the code comment too.

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this now, but let me review the segmentation change separately before merging this

@jacobhinkle
Copy link
Collaborator Author

I'm approving this now, but let me review the segmentation change separately before merging this

Sounds good. The current fix in the segmentation PR is slightly different from what is here. I think that one makes more sense and is a bit simpler so if approved, I'll just revert the one here before merging main.

@naoyam
Copy link
Collaborator

naoyam commented Jul 13, 2023

Approved the other PR as well

@jacobhinkle jacobhinkle merged commit 44bb3d6 into main Jul 14, 2023
@jacobhinkle jacobhinkle deleted the remove_empty_branches branch July 14, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple slices with one of them zero-length fails assertion Floating-point exception scheduling reduction of zero-element tensor

3 participants