-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check master #493
Check master #493
Conversation
https://github.com/SciML/DiffEqSensitivity.jl/pull/493/checks?check_run_id=3687059298#step:6:295 Something happened maybe in Zygote that makes it so that it now misses some adjoints? @DhairyaLGandhi @mcabbott this is a pretty big master failure and I hope we can identify a fix ASAP. I'll try bounding. |
Error is from
My guess from the stack trace is that what's new is
as I believe the rule for |
Yeah, this doesn't look good. Which rule does this miss specifically? It might also explain JuliaGaussianProcesses/KernelFunctions.jl#364 needing the ChainRules types in the adjoint. Maybe we can avoid checking for an |
Found the issue - its |
So will there be a hotfix patch removing it or should I upper bound? |
This will need to be fixed, since otherwise we start leaking a lot of the ChainRules's types in the returned gradients. (Thunk(ProjectTo{AbstractArray}(element = ProjectTo{Float64}(), axes = (Base.OneTo(2),)) ∘ DiffEqSensitivity.var"#380#386"{0, ODESolution{Any, 2, Vector{Any}, Nothing, Nothing, Vector{Float64}, Nothing, ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(lv!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, ... |
This still means there are ChainRules' types in the adjoints, so it would still error on unthunking. So it doesn't quite solve it. |
@ChrisRackauckas mind running the tests here with FluxML/Zygote.jl#1075 ? |
Seems like there are still failures |
Did a little bit of sleuthing. Turns out, RecursiveArrayTools has been broken as well. FluxML/Zygote.jl#1044 added restrictions on how to represent the gradient. For example julia> va = RecursiveArrayTools.VectorOfArray([rand(3,3), rand(3,3)]);
julia> gradient(va) do va
sum(va[1])
end
ERROR: DimensionMismatch("variable with size(x) == (3, 3, 2) cannot have a gradient with size(dx) == (2,)")
Stacktrace:
[1] (::ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}})(dx::Vector{AbstractMatrix{Float64}})
@ ChainRulesCore ~/.julia/packages/ChainRulesCore/ChM7X/src/projection.jl:219
[2] _project
@ ~/Downloads/mwes/diffeqsensitivity/Zygote.jl/src/compiler/chainrules.jl:141 [inlined]
[3] map(f::typeof(Zygote._project), t::Tuple{VectorOfArray{Float64, 3, Vector{Matrix{Float64}}}}, s::Tuple{Vector{AbstractMatrix{Float64}}})
@ Base ./tuple.jl:232
[4] gradient(f::Function, args::VectorOfArray{Float64, 3, Vector{Matrix{Float64}}})
@ Zygote ~/Downloads/mwes/diffeqsensitivity/Zygote.jl/src/compiler/interface.jl:77
[5] top-level scope
@ REPL[30]:1 Also, rather than initialising gradients using x = Any[ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), [1.0, 0.0]] which of course RAT doesn't know how to handle as the gradient of a Also notice that we get a type instability here, earlier, this was inferred type stable, now it isn't. Since the array is initialised using eltype from the partial, we should be producing correct types of arrays already via |
RecursiveArrayTools's problem is that it claims to be a 3-array of numbers, but when you iterate it, is instead a list of matrices. I think that's going to break a lot of code that believes what it says about itself:
Zygote could special-case it one way or another, because it's well-established. But this is pretty surprising behaviour. |
Nothing about that breaks the iteration interface of the AbstractArray interface, so it's not surprising to the interfaces and is something code should be compatible with if it's not making untrue assumptions. |
I wouldn't want to special case it in Zygote. Established or not, it obeys regular Julia semantics and implements the same array interface as any other array. |
So what gradients are acceptable for this object? It's a requirement that the tangent vector can be added to the original, right?
So what Zygote was producing seems wrong. Must it produce a Tangent? What consumes this, if not Or maybe
Should And, should this work? The in-between indexing that's neither vector-of-arrays not a 3-array:
|
What's your suggestion on what I should do then to make it not error? I would've assumed ProjectTo with just the first two overloads would be correct. |
This will break 3-array gradients like Whether what's written will "make it not error" I don't know. This is a real question:
i.e. what happens downstream? Did it previously handle both 3-array and vector of array gradients, or only the latter? Does it just use iteration or will it expect indexing / broadcasting to work exactly like a vector of matrices? |
It broadcasts like an Array under the interpretation of its multidimensional indexing. It just uses the AbstractArray interface fallbacks for all of that. |
Sure. What I mean is, what broadcasting behaviour is expected of the gradient? Changing a vector of arrays into a VectorOfArrays (as the above projection does) will change this. |
It should broadcast like a VectorOfArray |
No description provided.