-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add summation kernels #71
Conversation
92fa90d
to
f956f2b
Compare
|
||
using namespace gpu; | ||
|
||
#define LIMITS { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine as is for the experimental implementation, i wonder if there's a way we can populate this by polling system properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it to the issue. #72
LGTM, 2x improvement doesn't seem to bad for a laptop. Any reason not to merge yet? |
@austinvhuang It works now, but it's much slower than the CPU version. |
@austinvhuang Thank you! |
Thanks! Merged - will take a look at the 2D reduction. |
This PR implements reduction-kernels.
It works correctly, but it's only about twice as fast as CPU's one.