Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single pass scan group virtualization #2041

Merged
merged 4 commits into from
Nov 17, 2023
Merged

Conversation

sortraev
Copy link
Collaborator

@sortraev sortraev commented Nov 3, 2023

This PR adds group virtualization to SegScan.SinglePass. The generated single pass scan code now respects the suggested/requested num_groups/block_size, which in turn allows scanomaps with array construction in the map KernelBody, since mem expansion bases its expansion on this information.

Also

  • status flags (for the lookback step) are now initialized inside the kernel
  • some light code refactoring

@athas athas added the run-benchmarks Makes GA run the benchmark suite. label Nov 3, 2023
Copy link
Member

@athas athas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a comment somewhere in the module (it can be in the header) outlining the virtualisation strategy.

src/Futhark/CodeGen/ImpGen/GPU/SegScan.hs Show resolved Hide resolved
src/Futhark/CodeGen/ImpGen/GPU/SegScan/SinglePass.hs Outdated Show resolved Hide resolved
src/Futhark/CodeGen/ImpGen/GPU/SegScan/SinglePass.hs Outdated Show resolved Hide resolved
@athas
Copy link
Member

athas commented Nov 6, 2023

There is a style issue, but worse, it also appears some of the scan benchmarks fail with the CUDA backend. (There's also a bunch of other things that fail because we are moving servers - you'll have to pick out the interesting failures from the wreckage.)

@sortraev
Copy link
Collaborator Author

sortraev commented Nov 8, 2023

There is a style issue, but worse, it also appears some of the scan benchmarks fail with the CUDA backend. (There's also a bunch of other things that fail because we are moving servers - you'll have to pick out the interesting failures from the wreckage.)

Yes, I see now that the benchmarks also fail when I run manually on the A100. I think the problem had to do with our change of the status flags initialization (we simply changed it to align with Cosmin's and our own prototype), but I'm not exactly sure why -- it was not a problem on our own 4090.

Anyway, I have pushed a commit which passes the benchmarks when I run them manually on the A100, so hopefully the bug has been squished (assuming this was the bug I was looking for; admittedly I don't see why it would be a bug). I will look into the other change requests and CI failures at a later time -- thanks for the comments

sortraev and others added 2 commits November 17, 2023 14:19
Added description of virtualisation strategy, as well as two variable
renamings and a number of style fixes.
Apparently I was using an older version of Ormolu.
@athas athas merged commit f7a36ee into master Nov 17, 2023
24 checks passed
@athas athas deleted the single-pass-scan-group-virt branch November 17, 2023 23:11
athas added a commit that referenced this pull request Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Makes GA run the benchmark suite.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants