Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add summation kernels #71

Merged
merged 4 commits into from
Nov 3, 2024
Merged

Conversation

junjihashimoto
Copy link
Collaborator

This PR implements reduction-kernels.
It works correctly, but it's only about twice as fast as CPU's one.

$ ./build/reduce
Initializing 67108864 values
[info] Requesting adapter
[info] Requesting device
[info] Waiting for device request to end
[info] Device request ended
Start testing sum(x) on 67108864 values
Duration(CPU): 4493.9 microseconds
sum_cpu: -3248.239746
Start testing sum(x) on 67108864 values
Duration(GPU): 2175.1 microseconds
sum_gpu: -3248.113770
Success: diff = 0.125977
Computed 67108864 values of kSum(x)
[info] Context destroyed


using namespace gpu;

#define LIMITS { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine as is for the experimental implementation, i wonder if there's a way we can populate this by polling system properties?

Copy link
Collaborator Author

@junjihashimoto junjihashimoto Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it to the issue. #72

@austinvhuang
Copy link
Contributor

LGTM, 2x improvement doesn't seem to bad for a laptop. Any reason not to merge yet?

@austinvhuang austinvhuang changed the base branch from main to dev November 2, 2024 13:26
@junjihashimoto junjihashimoto marked this pull request as ready for review November 3, 2024 07:16
@junjihashimoto
Copy link
Collaborator Author

@austinvhuang
I set this a draft because the 2D reduction didn't work properly.
gpt2-backward uses 2D reduction.

It works now, but it's much slower than the CPU version.
Might be ok to merge it for now.

@junjihashimoto
Copy link
Collaborator Author

@austinvhuang Thank you!

@austinvhuang austinvhuang merged commit e94aa02 into AnswerDotAI:dev Nov 3, 2024
@austinvhuang
Copy link
Contributor

Thanks! Merged - will take a look at the 2D reduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants