Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse grads for getindex #589

Closed
wants to merge 2 commits into from
Closed

Sparse grads for getindex #589

wants to merge 2 commits into from

Conversation

Drvi
Copy link

@Drvi Drvi commented Feb 2, 2019

This is a proposition that fixes #577. I basically delay the gradient computation to happen inplace in the accum!() call and not in the @grad definition, which allows me to avoid the copy. On an extreme cornercase:

using CuArrays
using Flux

Ea = gpu(param(randn(64, 1_000_000)));
Eb = gpu(param(randn(64, 65_535)));
i = UInt16.(collect(1:5_000));
loss(i,n) = sum(sum(Eb[:, i] .+ Ea[:, rand(1:size(Ea,2), 1)]) for _ in 1:n)

function g(n, t, i)
   for _ in 1:t
      print("loss ")
      CuArrays.@time l = loss(i, n)
      print("back ")
      CuArrays.@time Flux.back!(l)
   end
end

g(100, 10, i)

Before, I got the following timings:

loss   0.149092 seconds (35.33 k CPU allocations: 2.126 MiB) (600 GPU allocations: 245.150 MiB, 26.88% gc time of which 100.00% spent allocating)
back   7.784618 seconds (64.09 k CPU allocations: 3.001 MiB, 29.91% gc time) (900 GPU allocations: 25.882 GiB, 3.20% gc time of which 100.00% spent allocating)

And after:

loss   0.062988 seconds (32.28 k CPU allocations: 2.061 MiB) (600 GPU allocations: 245.150 MiB)
back   0.405314 seconds (78.89 k CPU allocations: 4.502 MiB, 24.22% gc time) (1.30 k GPU allocations: 734.404 MiB)

There is a downside: getindexing into the sparse structure I use is very slow and I think there is some getindexing happening in the jacobians (at least the jacobian tests were complaining).

Please let me know what you think.

@Drvi
Copy link
Author

Drvi commented Feb 4, 2019

I realized the type signatures might be a little bit too strong here, so I'm adjusting.
Strangely, at one point I got following error:

MethodError: no method matching Flux.Tracker.SparseGrad(::CuArray{Float64,1}, ::Tuple{UnitRange{Int64}}, ::Tuple{Int64}, ::CuArray{Float32,1})

this would suggest that
Δ (::CuArray{Float64,1} and xs (::CuArray{Float32,1}) in @grad definition had different eltypes -- is that expected? (my loss was Float32 and my weights as well).

Δ′ = zero(xs)
Δ′[i...] = data(Δ)
(nobacksies(:getindex, Δ′), map(_->nothing, i)...)
checkbounds(xs, i...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed given that we already did xs[i...]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch:)

Base.similar(x::SparseGrad{T,N,S,P,O}) where {T,N,S,P,O} = similar(O, size(x))

#FIXME: Very slow getindex.
function Base.getindex(x::SparseGrad, i...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to just implement the scalar version and have the rest fall back? Or is this need for the GPU?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the scalar case is worth it only for a very small queries (the indexin()s are expensive), hence I added the allocating version for the general case.

@MikeInnes
Copy link
Member

I like the general idea here but I don't think this should touch back.jl. SparseGrad should be orthogonal to the AD other than being used by getindex (like OneHot is). But I think this is pretty close to that anyway.

@pshashk
Copy link
Contributor

pshashk commented Feb 10, 2019

Gradient definition of view is very similar to getindex. Maybe we can use sparse gradients there as well?
That way things like selectdim(embeddind_matrix, 2, indices) will avoid all kinds of copies.

@Drvi
Copy link
Author

Drvi commented Feb 10, 2019

Yes, touching things in back.jl didn't feel like the optimal thing to do...
I now think that I should focus more on how SubArray handles indexing rather than on SparseArray. I have some studying to do with regards the this and broadcasting system, so this may take me a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sparse grads
4 participants