Description
Related to #206 and partly to #66, a feature request: allow sparse gradients for getindex, as pytorch does for its embedding layer.
I have a large embedding matrix and the gradient for getindex is creating a large array of zeroes every time it's called and this kills my GPU performance. I don't think the getindex should use sparse data structures in the general case, and I'm not sure whats the best API for that, but this is just a big road block for me.
I have troubles using views and sparse arrays with CuArrays and Flux (https://github.com/JuliaGPU/CuArrays.jl/issues/267#issue-403606632), so I couldn't really experiment with the idea.
Some API ideas
a) Define a minimal Embedding
struct and use a special gradient definition for getindexing into this type. E[:,i]
dispatches to the sparse definition of getindex because E isa Embedding
.
b) Define a special indexing type, i.e. X[:, sparsely(i)]
dispatches to the sparse definition of getindex becase of the resulting type of sparsely(i)
c) Define a function sparsegetindex(x, i...)
that is just getindex
with a sparse grad definition
As a workaround I guess I can split the big embbedding matrix into multiple small ones, but I'm really not looking forward working with this kind of setup.
Thanks a lot and please let me know if I can help (but my GPU and Tracker knowledge is limited).