Sparse grads for getindex #589

Drvi · 2019-02-02T17:28:59Z

This is a proposition that fixes #577. I basically delay the gradient computation to happen inplace in the accum!() call and not in the @grad definition, which allows me to avoid the copy. On an extreme cornercase:

using CuArrays
using Flux

Ea = gpu(param(randn(64, 1_000_000)));
Eb = gpu(param(randn(64, 65_535)));
i = UInt16.(collect(1:5_000));
loss(i,n) = sum(sum(Eb[:, i] .+ Ea[:, rand(1:size(Ea,2), 1)]) for _ in 1:n)

function g(n, t, i)
   for _ in 1:t
      print("loss ")
      CuArrays.@time l = loss(i, n)
      print("back ")
      CuArrays.@time Flux.back!(l)
   end
end

g(100, 10, i)

Before, I got the following timings:

loss   0.149092 seconds (35.33 k CPU allocations: 2.126 MiB) (600 GPU allocations: 245.150 MiB, 26.88% gc time of which 100.00% spent allocating)
back   7.784618 seconds (64.09 k CPU allocations: 3.001 MiB, 29.91% gc time) (900 GPU allocations: 25.882 GiB, 3.20% gc time of which 100.00% spent allocating)

And after:

loss   0.062988 seconds (32.28 k CPU allocations: 2.061 MiB) (600 GPU allocations: 245.150 MiB)
back   0.405314 seconds (78.89 k CPU allocations: 4.502 MiB, 24.22% gc time) (1.30 k GPU allocations: 734.404 MiB)

There is a downside: getindexing into the sparse structure I use is very slow and I think there is some getindexing happening in the jacobians (at least the jacobian tests were complaining).

Please let me know what you think.

Drvi · 2019-02-04T12:54:17Z

I realized the type signatures might be a little bit too strong here, so I'm adjusting.
Strangely, at one point I got following error:

MethodError: no method matching Flux.Tracker.SparseGrad(::CuArray{Float64,1}, ::Tuple{UnitRange{Int64}}, ::Tuple{Int64}, ::CuArray{Float32,1})

this would suggest that
Δ (::CuArray{Float64,1} and xs (::CuArray{Float32,1}) in @grad definition had different eltypes -- is that expected? (my loss was Float32 and my weights as well).

MikeInnes · 2019-02-06T14:54:58Z

src/tracker/lib/array.jl

-    Δ′ = zero(xs)
-    Δ′[i...] = data(Δ)
-    (nobacksies(:getindex, Δ′), map(_->nothing, i)...)
+    checkbounds(xs, i...)


Is this needed given that we already did xs[i...]?

Good catch:)

MikeInnes · 2019-02-06T14:56:19Z

src/tracker/back.jl

+Base.similar(x::SparseGrad{T,N,S,P,O}) where {T,N,S,P,O} = similar(O, size(x))
+
+#FIXME: Very slow getindex.
+function Base.getindex(x::SparseGrad, i...)


Is it possible to just implement the scalar version and have the rest fall back? Or is this need for the GPU?

If I remember correctly, the scalar case is worth it only for a very small queries (the indexin()s are expensive), hence I added the allocating version for the general case.

MikeInnes · 2019-02-06T14:58:07Z

I like the general idea here but I don't think this should touch back.jl. SparseGrad should be orthogonal to the AD other than being used by getindex (like OneHot is). But I think this is pretty close to that anyway.

pshashk · 2019-02-10T07:13:55Z

Gradient definition of view is very similar to getindex. Maybe we can use sparse gradients there as well?
That way things like selectdim(embeddind_matrix, 2, indices) will avoid all kinds of copies.

Drvi · 2019-02-10T10:28:15Z

Yes, touching things in back.jl didn't feel like the optimal thing to do...
I now think that I should focus more on how SubArray handles indexing rather than on SparseArray. I have some studying to do with regards the this and broadcasting system, so this may take me a while.

Drvi added 2 commits February 2, 2019 17:47

Init sparse grads for getindex

3923244

Loosen type signatures for SparseGrad

8f74c99

Drvi mentioned this pull request Feb 6, 2019

Add higher order autodiff for getindex #595

Closed

MikeInnes reviewed Feb 6, 2019

View reviewed changes

pshashk mentioned this pull request Mar 27, 2019

Plans for supporting higher dimensional data? #708

Closed

CarloLucibello closed this Dec 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse grads for getindex #589

Sparse grads for getindex #589

Drvi commented Feb 2, 2019

Drvi commented Feb 4, 2019

MikeInnes Feb 6, 2019

Drvi Feb 6, 2019

MikeInnes Feb 6, 2019

Drvi Feb 6, 2019

MikeInnes commented Feb 6, 2019

pshashk commented Feb 10, 2019

Drvi commented Feb 10, 2019

Sparse grads for getindex #589

Sparse grads for getindex #589

Conversation

Drvi commented Feb 2, 2019

Drvi commented Feb 4, 2019

MikeInnes Feb 6, 2019

Choose a reason for hiding this comment

Drvi Feb 6, 2019

Choose a reason for hiding this comment

MikeInnes Feb 6, 2019

Choose a reason for hiding this comment

Drvi Feb 6, 2019

Choose a reason for hiding this comment

MikeInnes commented Feb 6, 2019

pshashk commented Feb 10, 2019

Drvi commented Feb 10, 2019