-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single pass scan group virtualization #2041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a comment somewhere in the module (it can be in the header) outlining the virtualisation strategy.
There is a style issue, but worse, it also appears some of the scan benchmarks fail with the CUDA backend. (There's also a bunch of other things that fail because we are moving servers - you'll have to pick out the interesting failures from the wreckage.) |
Yes, I see now that the benchmarks also fail when I run manually on the A100. I think the problem had to do with our change of the status flags initialization (we simply changed it to align with Cosmin's and our own prototype), but I'm not exactly sure why -- it was not a problem on our own 4090. Anyway, I have pushed a commit which passes the benchmarks when I run them manually on the A100, so hopefully the bug has been squished (assuming this was the bug I was looking for; admittedly I don't see why it would be a bug). I will look into the other change requests and CI failures at a later time -- thanks for the comments |
Added description of virtualisation strategy, as well as two variable renamings and a number of style fixes.
Apparently I was using an older version of Ormolu.
This PR adds group virtualization to SegScan.SinglePass. The generated single pass scan code now respects the suggested/requested num_groups/block_size, which in turn allows scanomaps with array construction in the map KernelBody, since mem expansion bases its expansion on this information.
Also