-
Notifications
You must be signed in to change notification settings - Fork 69
Set extents of empty tensors to zeroVal() during concretization #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In this approach, I've replaced extents that concretize to zero with |
This might make the lookups there a bit slower, as previously these would be very small sets. However, this is necessary in order to find differences in empty-tensor concretization. Note that the actual keys won't commonly change but this will find input vals that affect output sizes, such as in FusionStandaloneIota_CUDA, which was failing before this fix.
Verifies that this branch fixes #365
| at::empty({}, options), | ||
| at::empty({}, options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a transient bug where these tensors were not technically set as CUDA tensors.
With this the only failing test is due to not stopping traversal at empty tensor definitions.
| for (auto stmt : all_stmts) { | ||
| if (stmt->isA<Val>()) { | ||
| mutate(stmt); | ||
| auto all_stmts = StmtSort::getStmts(info_->fusion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we no longer traverse into members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remember why it did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to true in #258 so that the traversal would handle IterDomains, but I didn't have a clear enough picture of how that should work at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we explicitly are only traversing the TensorView graph, and we only handle IterDomain members manually.
| TORCH_CUDA_CU_API TensorView* pad( | ||
| TensorView* x, | ||
| const std::vector<Val*>& pad_widths, | ||
| Val* value = nullptr); | ||
| Val* value = nullptr, | ||
| std::optional<IterType> iter_type_opt = std::nullopt); | ||
|
|
||
| //! Concatenate tensors in the given dimension | ||
| TORCH_CUDA_CU_API TensorView* cat( | ||
| const std::vector<TensorView*>& inputs, | ||
| int64_t dim); | ||
| int64_t dim, | ||
| std::optional<IterType> iter_type_opt = std::nullopt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These options let us pipe through an optional IterType to avoid symbolic IterTypes popping up during the RemoveEmptyPass. This is a bug that was exposed by the dynamic empty cat tests.
csrc/fusion_segmenter.cpp
Outdated
| if (output_uses.size() == 0) { | ||
| // Unused outputs terminate here | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a segfault that popped up in the Standalone* tests in test_tensor_factories.cpp. These tests have scalar inputs which are used in FullOps as fill values. When the extents are zero, concretization replaces those FullOps with others so that the output size is hardcoded. This means the original fill value (which is now irrelevant since the output is empty) has no uses. The cast to target dtype before the full means we have input -> cast -> (no uses), which caused a segfault here. We now just check that there's at least one use to avoid a segfault.
|
!build |
csrc/optimization/remove_empty.cpp
Outdated
| // symbolic axis, since the inputs have potentially symbolic extents in | ||
| // the cat dimension. However, since we have already undergone | ||
| // concretization at this point, we can trust that the original IterType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confused here. If concretization is done, why inputs may have symbolic extents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input extents are symbolic (e.g. i0) or they are zeroVal(), but since we use the cat command here to do the replacement, we need some way to tell it to not use IterType::Symbolic and trust us at this point that the output will be Iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so, this part:
// cat() might result in
// symbolic axis, since the inputs have potentially symbolic extents in
// the cat dimension.
It just refers to what cat does in general, but in this case we don't want cat to be conservative as we are sure it doesn't need to use symbolic domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. It should be more clear in the comment probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment.
| } | ||
|
|
||
| if (expr->output(0)->uses().size() > 1) { | ||
| // expr is a unary op so there is a single output. Here we look at that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a general bug fix or specific to the replacement of size-zero extents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a general bug that was difficult to tease out. I will try and replicate it on main to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #585. I will move this fix to a small PR there for further discussion. Until that's fixed, the present PR will break all the tests in test_tensor_factories.cpp.
| ir_utils::replaceValInExpr(use, ext, zero); | ||
| } | ||
| // Register the concretization of this scalar, which allows us to replace it | ||
| // whenever it is used as an extent member of an IterDomain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because ext is "used" by an IterDomain but it isn't part of uses(), correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. When we replace in uses() we update things like i0 + 1, but iS2{i0} would remain unless registered for mutation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this to the code comment as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| for (auto stmt : all_stmts) { | ||
| if (stmt->isA<Val>()) { | ||
| mutate(stmt); | ||
| auto all_stmts = StmtSort::getStmts(info_->fusion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remember why it did
|
|
||
| auto concretized_id = | ||
| IterDomainBuilder(root_id).iter_type(*id_type).build(); | ||
| IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change because the extent of root_id may be changed to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, when we call OptOutMutator::mutate(root_id) it will register it for mutation if any of its members are mutated, including the extent. Since we are going to register another mutation here, we don't want to lose those changes, so we base concretized_id on the mutated ID.
| void DynamicTransformConcretizer::mutate(TensorView* tv) { | ||
| if (!tv->domain()->hasSymbolicAxis()) { | ||
| return; | ||
| for (auto root_id : tv->getRootDomain()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit confusing. A root ID may be mutated here, but there's also propagateFromProducerToConsumer below, which may also mutate a root ID. Is there any concern of conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 659 in propagateFromProducerToConsumer, we reflect the earlier mutation by basing the new mutated ID on maybeMutated(root_id). That way we compose both mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, what happens if the same root ID is mutated both at line 546 and within line 551? Doesn't the latter just overwrite the mutation at line 546?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, are you referring to line 659 before this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will overwrite the mutation, but since we using IterDomainBuilder(maybeMutated(root_id)->as<IterDomain>()) as the basis of the new mutation lets us update the IterType without changing other mutations like the extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see it now. Maybe it'd be helpful to add that to the code comment too.
naoyam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this now, but let me review the segmentation change separately before merging this
Sounds good. The current fix in the segmentation PR is slightly different from what is here. I think that one makes more sense and is a bit simpler so if approved, I'll just revert the one here before merging main. |
|
Approved the other PR as well |
A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors.
This PR:
Fusionas dynamic if dynamic reshapes/resizes exist or if any aliveTensorViewhas a static size-zero extent or a dynamic extent, since it might be empty. This is means only Fusions with nothing but concrete non-zero sizes are static now. That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and thoseTensorViews will be modified during concretization.getConcretizationInfo()that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor.When encountering a new set of input sizes/scalars, we evaluate a minimal set of
Vals (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero.Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448).
Fixes #365 and fixes #264. This replaces PRs #369 and #269.