[CK_TILE] Fix DP + 2 Tile Stream-K Validation Errors #3269
Merged
+18
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
In CK Tile Stream-K, there may be multiple workgroups contributing to a given tile in the C tensor. When the accumulation strategy involves atomics, there may be round off error in cases where the accumulator type is not the same as the C type. To compute an error tolerance for test validation, the Stream-K Tile Partitioner has a function called
estimate_num_wgs_per_tileto estimate the number of workgroups per tile. That said, this function only provides an estimate; it may underestimate in some cases. This underestimation was causing the error tolerance computed bycalculate_rtol_atolto be too low for the DP + 2 Tile SK test cases. This led to a regression for some Stream-K tests. Note, that these validation failures due to round-off errors may not always be present due to the non-determinism that accompany atomics.Thus, this change updates the
estimate_num_wgs_per_tilefunction to explicitly return the value of 2 for DP+2TSK instances to ensure that we have a better error tolerance to avoid test failures due to round-off error.We tested locally on gfx90a, gfx942, and gfx950; all tests pass. We found some unrelated issues on gfx908, so we have disabled our tests on gfx908 for now and will be creating a ticket to investigate the issues.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files