-
-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse grads #577
Comments
+1. This would be helpful for me as well. |
I'm happy in principle to just always return sparse gradients in the case of |
I'll take a stab at it, if you have anything I can look at to help me get started, I'd appreciate that. |
Probably the best thing is to dig through how onehotmatrices work; that's another kind of GPU-compatible sparse type that should be very similar to what you want to do here. |
Ah, that's right, I don't actually need cusparse to do sparse things on the gpu. Thanks! I'll try to put something together, probably on friday. |
We now have an efficient one-hot array implementation. General sparse matrix support is better handled in NNlib and the GPU libraries, I think. |
an Embedding layer is in the works #1516 |
Related to #206 and partly to #66, a feature request: allow sparse gradients for getindex, as pytorch does for its embedding layer.
I have a large embedding matrix and the gradient for getindex is creating a large array of zeroes every time it's called and this kills my GPU performance. I don't think the getindex should use sparse data structures in the general case, and I'm not sure whats the best API for that, but this is just a big road block for me.
I have troubles using views and sparse arrays with CuArrays and Flux (https://github.com/JuliaGPU/CuArrays.jl/issues/267#issue-403606632), so I couldn't really experiment with the idea.
Some API ideas
a) Define a minimal
Embedding
struct and use a special gradient definition for getindexing into this type.E[:,i]
dispatches to the sparse definition of getindex becauseE isa Embedding
.b) Define a special indexing type, i.e.
X[:, sparsely(i)]
dispatches to the sparse definition of getindex becase of the resulting type ofsparsely(i)
c) Define a function
sparsegetindex(x, i...)
that is justgetindex
with a sparse grad definitionAs a workaround I guess I can split the big embbedding matrix into multiple small ones, but I'm really not looking forward working with this kind of setup.
Thanks a lot and please let me know if I can help (but my GPU and Tracker knowledge is limited).
The text was updated successfully, but these errors were encountered: