-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment to multiple arrays is not differentiable on GPU since Zygote.jl 0.6.67 #1470
Comments
Reduced to https://github.com/JuliaDiff/ChainRules.jl/blob/v1.58.0/src/rulesets/Base/indexing.jl#L147. This should happen for any GPU array type: julia> using JLArrays; x = jl(ones(1));
julia> xs = Union{typeof(x),Nothing}[x, x]
2-element Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}:
[1.0]
[1.0]
julia> view(xs, 1) .+= Ref(x)
ERROR: MethodError: no method matching parent(::Type{SubArray{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}, 0, Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}, Tuple{Int64}, true}})
Closest candidates are:
parent(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S})
@ LinearAlgebra /mnt/fastdisk/brianc_home/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/adjtrans.jl:341
parent(::Union{LinearAlgebra.LowerTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UnitLowerTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UnitUpperTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UpperTriangular{T, S} where S<:AbstractMatrix{T}} where T)
@ LinearAlgebra /mnt/fastdisk/brianc_home/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/triangular.jl:164
parent(::Union{LinearAlgebra.Hermitian{T, S}, LinearAlgebra.Symmetric{T, S}} where {T, S})
@ LinearAlgebra /mnt/fastdisk/brianc_home/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/symmetric.jl:275
...
Stacktrace:
[1] backend(#unused#::Type{SubArray{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}, 0, Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}, Tuple{Int64}, true}})
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:151
[2] backend(x::SubArray{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}, 0, Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}, Tuple{Int64}, true})
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:149
[3] _copyto!
@ ~/.julia/packages/GPUArrays/dAUOE/src/host/broadcast.jl:70 [inlined]
[4] materialize!
@ ~/.julia/packages/GPUArrays/dAUOE/src/host/broadcast.jl:46 [inlined]
[5] materialize!(dest::SubArray{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}, 0, Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}, Tuple{Int64}, true}, bc::Base.Broadcast.Broadcasted{JLArrays.JLArrayStyle{0}, Nothing, typeof(+), Tuple{SubArray{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}, 0, Vector{Union{Nothing, DenseJLVector{Float64}, JLVector{Float64}}}, Tuple{Int64}, true}, Base.RefValue{JLArray{Float64, 1}}}})
@ Base.Broadcast ./broadcast.jl:881
[6] top-level scope
@ REPL[14]:1 My feeling is that the BroadcastStyle being selected here is incorrect (broadcasting into a CPU array of GPU arrays should not treat the destination like a GPU array). However, I'm not sure if this is something which could be safely changed. You may want to ask the JuliaGPU folks about this behaviour. |
I've looked into this a bit. I can modify the example to avoid use of a using JLArrays, Zygote
function f(x)
a = x[1:4]
b = x[5:8]
sum(a + b)
end
x = jl(randn(8))
@show Zygote.gradient(f, x) This gives the stack trace:
and this points to the use of using JLArrays
dx = jl(randn(8))
view(dx, 1) .= 1 I suggest replacing all such views with setting up a cpu array: for ind in eachindex(dx_cpu)
dx_cpu[ind] = ind in inds ? 1 : 0
end and transferring it to gpu followed by multiplication with the gradient
|
We can't use mutation like this because the rules themselves need to be differentiable. The correct fix IMO would be to get GPUArrays broadcasting to ignore this particular case, which is why I mentioned bringing the issue up on the JuliaGPU side above. |
Is it possible to use a |
I would not call broadcasting into an array of arrays an edge case. It just so happens that this doesn't work for GPU arrays, but that could be a matter of Julia's broadcasting machinery not providing enough information to allow it to work. Opened JuliaGPU/GPUArrays.jl#505 to figure out what's going on. In the meantime, it looks like a simple fix is to change the aforementioned line in ChainRules (which is in a function that allows mutation) to use a code path which avoids broadcasting. That's what the linked ChainRules PR does. |
I found the following code doesn't work.
Please note that you'll need to run it on GPU with Zygote.jl 0.6.67. The issue is not reproducible on CPU or Zygote.jl 0.6.66.
The example code is a bit artificial but I noticed this when I tried to use the
chunk
function of MLUtils.jl. I've filed an issue there but because it is rather caused by a recent change of Zygote.jl I'd like to file a separated issue here.The text was updated successfully, but these errors were encountered: