-
-
Notifications
You must be signed in to change notification settings - Fork 64
Gradient definitions & supertypes for Zygote, continued #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/zygote.jl
Outdated
# Define a new species of projection operator for this type: | ||
ChainRulesCore.ProjectTo(x::VectorOfArray) = ChainRulesCore.ProjectTo{VectorOfArray}() | ||
|
||
# Gradient from iteration will be e.g. Vector{Vector}, this makes it another AbstractMatrix | ||
(::ChainRulesCore.ProjectTo{VectorOfArray})(dx::AbstractVector{<:AbstractArray}) = VectorOfArray(dx) | ||
# Gradient from broadcasting will be another AbstractArray | ||
(::ChainRulesCore.ProjectTo{VectorOfArray})(dx::AbstractArray) = dx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this may not be necessary at all. One thing I thought to test was whether iteration like this worked without it, but it does, it hits the @adjoint getindex
rule:
julia> function iter(vofa)
s = 0
for a in vofa
s += prod(a)
end
s
end;
julia> gradient(iter, va)[1]
VectorOfArray{Float64,3}:
2-element Vector{Matrix{Float64}}:
[0.007377548303139424 0.0014293014720444096 0.0004998127128840348; 0.0005414269337139141 0.0007721441834498009 0.0006559506948612249; 0.0042378737935180105 0.0006765914005991947 0.00045986425172967415]
[0.002774992305796606 0.002978041675310144 0.004412709924140469; 0.0056425205408066285 0.005228088118453952 0.0036646150274027; 0.0036825199036535465 0.004902176341789764 0.045170987413739046]
src/zygote.jl
Outdated
|
||
# These rules duplicate the `rrule` methods above, because Zygote looks for an `@adjoint` | ||
# definition first, and finds its own before finding those. | ||
|
||
ZygoteRules.@adjoint function getindex(VA::AbstractVectorOfArray, i::Union{Int,AbstractArray{Int},CartesianIndex,Colon,BitArray,AbstractArray{Bool}}) | ||
function AbstractVectorOfArray_getindex_adjoint(Δ) | ||
Δ′ = [ (i == j ? Δ : zero(x)) for (x,j) in zip(VA.u, 1:length(VA))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the PR, but this seems like it allocates quite a bit, when iterating a VectorOfArray. I guess that using Fill(0.0, size(Δ))
would often make Δ′
have an abstract type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would make it an abstract type and sometimes hurt inference. Then we'd have to rely on union optimizations and pray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on union optimizations might be the right idea here though, I'll have to check.
Co-authored-by: Michael Abbott <[email protected]>
Continues #168
@mcabbott
I didn't have the write permissions for some reason so continuing it here.