Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check master #493

Closed
wants to merge 5 commits into from
Closed

Check master #493

wants to merge 5 commits into from

Conversation

ChrisRackauckas
Copy link
Member

No description provided.

@ChrisRackauckas
Copy link
Member Author

https://github.com/SciML/DiffEqSensitivity.jl/pull/493/checks?check_run_id=3687059298#step:6:295

Something happened maybe in Zygote that makes it so that it now misses some adjoints?

@DhairyaLGandhi @mcabbott this is a pretty big master failure and I hope we can identify a fix ASAP. I'll try bounding.

@mcabbott
Copy link

mcabbott commented Sep 23, 2021

Error is from
https://github.com/SciML/DiffEqSensitivity.jl/blob/master/test/literal_adjoint.jl#L23
which I can't reproduce locally, as I get a segfault:

julia> Zygote.gradient(test,u0,p)

signal (11): Segmentation fault: 11
in expression starting at REPL[9]:1
union_isinlinable at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)

My guess from the stack trace is that what's new is
https://github.com/FluxML/Zygote.jl/blob/master/src/compiler/chainrules.jl#L200
which you can try removing to confirm:

@eval Zygote function ChainRulesCore.rrule_via_ad(config::ZygoteRuleConfig, f_args...; kwargs...)
    y, pb = if !isempty(kwargs)
        kwf() = first(f_args)(Base.tail(f_args)...; kwargs...)
        _y, _pb = _pullback(config.context, kwf)
        _y, Δ -> first(_pb(Δ)).f_args  # `first` should be `only`
    else
        _pullback(config.context, f_args...)
    end
    ad_pullback(Δ) = zygote2differential(pb(wrap_chainrules_output(Δ)), f_args)
    return y, ad_pullback
end

as I believe the rule for sum(f, OneTo(3)) calls back into Zygote via rrule_via_ad().

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 23, 2021

Yeah, this doesn't look good. Which rule does this miss specifically? It might also explain JuliaGaussianProcesses/KernelFunctions.jl#364 needing the ChainRules types in the adjoint. Maybe we can avoid checking for an rrule first.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 23, 2021

Found the issue - its @inline wrap_chainrules_output(x::AbstractThunk) = wrap_chainrules_output(unthunk(x)) from FluxML/Zygote.jl#1044 removing that solves the issue, but would need unthunking separately

@ChrisRackauckas
Copy link
Member Author

So will there be a hotfix patch removing it or should I upper bound?

@DhairyaLGandhi
Copy link
Member

This will need to be fixed, since otherwise we start leaking a lot of the ChainRules's types in the returned gradients.

(Thunk(ProjectTo{AbstractArray}(element = ProjectTo{Float64}(), axes = (Base.OneTo(2),))  DiffEqSensitivity.var"#380#386"{0, ODESolution{Any, 2, Vector{Any}, Nothing, Nothing, Vector{Float64}, Nothing, ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(lv!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, ...

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 23, 2021

This still means there are ChainRules' types in the adjoints, so it would still error on unthunking. So it doesn't quite solve it.

@DhairyaLGandhi
Copy link
Member

@ChrisRackauckas mind running the tests here with FluxML/Zygote.jl#1075 ?

@ChrisRackauckas
Copy link
Member Author

Seems like there are still failures

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 24, 2021

Did a little bit of sleuthing. Turns out, RecursiveArrayTools has been broken as well. FluxML/Zygote.jl#1044 added restrictions on how to represent the gradient. For example

julia> va = RecursiveArrayTools.VectorOfArray([rand(3,3), rand(3,3)]);

julia> gradient(va) do va
         sum(va[1])
       end
ERROR: DimensionMismatch("variable with size(x) == (3, 3, 2) cannot have a gradient with size(dx) == (2,)")
Stacktrace:
 [1] (::ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}})(dx::Vector{AbstractMatrix{Float64}})
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/ChM7X/src/projection.jl:219
 [2] _project
   @ ~/Downloads/mwes/diffeqsensitivity/Zygote.jl/src/compiler/chainrules.jl:141 [inlined]
 [3] map(f::typeof(Zygote._project), t::Tuple{VectorOfArray{Float64, 3, Vector{Matrix{Float64}}}}, s::Tuple{Vector{AbstractMatrix{Float64}}})
   @ Base ./tuple.jl:232
 [4] gradient(f::Function, args::VectorOfArray{Float64, 3, Vector{Matrix{Float64}}})
   @ Zygote ~/Downloads/mwes/diffeqsensitivity/Zygote.jl/src/compiler/interface.jl:77
 [5] top-level scope
   @ REPL[30]:1

Also, rather than initialising gradients using zero(arg) (specifically in the getindex adjoint), we now get something different due to projection

x = Any[ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), [1.0, 0.0]]

which of course RAT doesn't know how to handle as the gradient of a VectorOfArray. Thankfully, we are still preserving the indices that we need.

Also notice that we get a type instability here, earlier, this was inferred type stable, now it isn't.

Since the array is initialised using eltype from the partial, we should be producing correct types of arrays already via similar. So I think we should be removing projection here. Removing the projection solves the issue reported in this issue, but not RAT.

@mcabbott
Copy link

mcabbott commented Sep 25, 2021

RecursiveArrayTools's problem is that it claims to be a 3-array of numbers, but when you iterate it, is instead a list of matrices. I think that's going to break a lot of code that believes what it says about itself:

julia> va = RecursiveArrayTools.VectorOfArray([rand(3,3), rand(3,3)]);

julia> va isa AbstractArray{<:Any, 3}
true

julia> size(va)  # 3-array
(3, 3, 2)

julia> eachindex(va)  # vector of arrays
Base.OneTo(2)

julia> length(va)
2

julia> CartesianIndices(va)  # 3-array
3×3×2 CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}:
[:, :, 1] =
 CartesianIndex(1, 1, 1)  CartesianIndex(1, 2, 1)  CartesianIndex(1, 3, 1)
...

julia> first(va)  # vector of arrays
3×3 Matrix{Float64}:
 0.620685  0.343948  0.940048
 0.848516  0.633285  0.243568
 0.775576  0.950859  0.134642

julia> va .+ 1  # 3-array broadcasting
VectorOfArray{Float64,3}:
2-element Vector{Matrix{Float64}}:
 [1.6206853179111187 1.3439483231282066 1.940048306517085; 1.8485156907122404 1.6332850011729758 1.243567963268383; 1.7755759351198481 1.9508590097266754 1.1346416765694358]
 [1.426320523589507 1.4595848092326293 1.0161374955217441; 1.8125891607780416 1.2695014949100343 1.314350910162879; 1.694957600928874 1.0902405127361687 1.5597647050882553]

julia> va[1] + 1
ERROR: MethodError: no method matching +(::Matrix{Float64}, ::Int64)

julia> collect(va)  # trusts eltype?
ERROR: MethodError: Cannot `convert` an object of type Matrix{Float64} to an object of type Float64

julia> Array(va)  # 3-array
3×3×2 Array{Float64, 3}:
[:, :, 1] =
 0.620685  0.343948  0.940048
...

julia> va[1] == va[:,1]  # neither 3-array nor vector of arrays? 
true

julia> Array(va)[:,1,1] == va[:,1,1]    # 3-array
true

julia> va[1] == first(eachslice(va; dims=3))    # 3-array
true

Zygote could special-case it one way or another, because it's well-established. But this is pretty surprising behaviour.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Sep 25, 2021

Nothing about that breaks the iteration interface of the AbstractArray interface, so it's not surprising to the interfaces and is something code should be compatible with if it's not making untrue assumptions.

@DhairyaLGandhi
Copy link
Member

I wouldn't want to special case it in Zygote. Established or not, it obeys regular Julia semantics and implements the same array interface as any other array.

@mcabbott
Copy link

mcabbott commented Sep 25, 2021

So what gradients are acceptable for this object? It's a requirement that the tangent vector can be added to the original, right?

julia> va = RecursiveArrayTools.VectorOfArray([rand(3,4), rand(3,4)]);

julia> va + ones(size(va)) |> summary  # seems like gradient can be a 3-array of same size
"3×4×2 Array{Float64, 3}"

julia> va .+ ones(size(va)) |> summary  # the same
"3×4×2 Array{Float64, 3}"

julia> pullback(va -> sum(va .+ 1), va)[2](1.0)  # this gradient makes an array
(Fill(1.0, 3, 3, 2),)

julia> gradient((x,y) -> sum(x ./ y), va, rand(3)); # this gradient makes an array

julia> summary.(ans)
("3×3×2 Array{Float64, 3}", "3-element Vector{Float64}")

julia> y, bk = pullback(va -> sum(va[1]), va); # like the example above

julia> summary(y)
"Float64"

julia> bk(1.0)[1] |> summary  # Zygote produces a vector of matrices
"2-element Vector{AbstractMatrix{Float64}}"

julia> va + bk(1.0)  # adding that to original is an error
ERROR: MethodError: no method matching +(::VectorOfArray{Float64, 3, Vector{Matrix{Float64}}}, ::Tuple{Vector{AbstractMatrix{Float64}}})

julia> va .+ bk(1.0)  # maybe .+ ? different error.
ERROR: MethodError: no method matching +(::Float64, ::Vector{AbstractMatrix{Float64}})

So what Zygote was producing seems wrong. Must it produce a Tangent? What consumes this, if not +?

Or maybe bk(1.0)[1] ought to be re-wrapped to make another VectorOfArray, so that the gradient is always AbstractArray{<:Any, 3}? This kind of enforcement sounds like what ProjectTo is good for, once you tell it what is acceptable. I see that RAT already depends on CRC, so the simplest version is a few lines:

julia> ChainRulesCore.ProjectTo(x::VectorOfArray) = ProjectTo{VectorOfArray}()

julia> (::ProjectTo{VectorOfArray})(dx::AbstractVector{<:AbstractArray}) = VectorOfArray(dx)

julia> (::ProjectTo{VectorOfArray})(dx::AbstractArray) = dx

julia> gradient(va -> sum(va[1]), va)[1]
VectorOfArray{Float64,3}:
2-element Vector{AbstractMatrix{Float64}}:
 3×3 Fill{Float64}, with entries equal to 1.0
 [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0]

julia> ans + va |> summary
"VectorOfArray{Float64,3}"

Should gradient(va -> sum(va .+ 1), va)[1] also be wrapped in something else with this same dual nature, rather than just being an array? I don't see an existing structure like VectorOfArray(eachslice(A)).

And, should this work? The in-between indexing that's neither vector-of-arrays not a 3-array:

julia> pullback(va -> sum(va[:,1]), va)[2](1.0)
ERROR: BoundsError: attempt to access 3×3×2 VectorOfArray{Float64, 3, Vector{Matrix{Float64}}} at index [1:3, 1]

@ChrisRackauckas
Copy link
Member Author

What's your suggestion on what I should do then to make it not error? I would've assumed ProjectTo with just the first two overloads would be correct.

@mcabbott
Copy link

assumed ProjectTo with just the first two overloads

This will break 3-array gradients like gradient((x,y) -> sum(x ./ y), va, rand(3)). And won't accept the VofA gradient producded by the (::ProjectTo{VectorOfArray})(dx::AbstractVector{<:AbstractArray}) method.

Whether what's written will "make it not error" I don't know. This is a real question:

What consumes this, if not +?

i.e. what happens downstream? Did it previously handle both 3-array and vector of array gradients, or only the latter? Does it just use iteration or will it expect indexing / broadcasting to work exactly like a vector of matrices?

@ChrisRackauckas
Copy link
Member Author

It broadcasts like an Array under the interpretation of its multidimensional indexing. It just uses the AbstractArray interface fallbacks for all of that.

@mcabbott
Copy link

Sure. What I mean is, what broadcasting behaviour is expected of the gradient? Changing a vector of arrays into a VectorOfArrays (as the above projection does) will change this.

@ChrisRackauckas
Copy link
Member Author

It should broadcast like a VectorOfArray

@ChrisRackauckas ChrisRackauckas deleted the ChrisRackauckas-patch-2 branch October 26, 2021 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants