Sparse grads

Related to #206 and partly to #66, a feature request: allow sparse gradients for getindex, as [pytorch does](https://pytorch.org/docs/stable/nn.html?highlight=embedding#torch.nn.functional.embedding) for its embedding layer. 

I have a large embedding matrix and the gradient for getindex is creating a large array of zeroes every time it's called and this kills my GPU performance. I don't think the getindex should use sparse data structures in the general case, and I'm not sure whats the best API for that, but this is just a big road block for me.

I have troubles using views and sparse arrays with CuArrays and Flux (https://github.com/JuliaGPU/CuArrays.jl/issues/267#issue-403606632), so I couldn't really experiment with the idea.

Some API ideas
a) Define a minimal `Embedding` struct and use a special gradient definition for getindexing into this type. `E[:,i]` dispatches to the sparse definition of getindex because `E isa Embedding`.
b) Define a special indexing type, i.e. `X[:, sparsely(i)]` dispatches to the sparse definition of getindex becase of the resulting type of `sparsely(i)`
c) Define a function `sparsegetindex(x, i...)` that is just `getindex` with a sparse grad definition

As a workaround I guess I can split the big embbedding matrix into multiple small ones, but I'm really not looking forward working with this kind of setup.

Thanks a lot and please let me know if I can help (but my GPU and Tracker knowledge is limited).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sparse grads #577

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Sparse grads #577

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions