-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove custom poislogpdf for ForwardDiff.Dual #160
Conversation
73123ec
to
d835982
Compare
To me it seems the definition could maybe just be removed completely? With StatsFuns and ForwardDiff and without DistributionsAD: julia> using StatsFuns, ForwardDiff
julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 1), [0.0])
1-element Vector{Float64}:
NaN
julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 0), [0.0])
1-element Vector{Float64}:
-1.0 |
d835982
to
7e9582c
Compare
Unfortunately not - with StatsFuns and ForwardDiff and without DistributionsAD, (Also, the custom push-forward here should be faster than the default) |
Why should it return
Yes, of course, one should benchmark it before removing the custom definition. Intuitively, I would assume that already the default "naive" approach is quite fast since the function is so simple with respect to lambda (https://github.com/JuliaStats/StatsFuns.jl/blob/bc45e187a0cb29e45facd10c44c067afac5210fb/src/distrs/pois.jl#L18) and DiffRules defines a custom differentiation rule for |
Test failures in testset |
I assume the test failures are caused by Adapt 3.3 (or, more precisely, JuliaGPU/Adapt.jl#41). |
Well, Also, even without DistributionsAD, we get |
Co-authored-by: David Widmann <[email protected]>
Thanks for the prettier version, @devmotion ! |
Ah, thanks, yes, it makes completely sense. Somehow I confused myself 🙂 However, actually the default without DistributionsAD works also in these cases, unfortunately it is just an occurrence of https://juliadiff.org/ForwardDiff.jl/latest/user/advanced.html#Fixing-NaN/Inf-Issues-1: with the NaN-safe setting (or eg. JuliaDiff/ForwardDiff.jl#451) one gets julia> using StatsFuns, ForwardDiff
julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 0), [0.0])
1-element Vector{Float64}:
-1.0
julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 1), [0.0])
1-element Vector{Float64}:
Inf
julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 2), [0.0])
1-element Vector{Float64}:
Inf So I still think one should consider removing the custom definition (and type piracy...) here in DistributionsAD, in particular if it is not (significantly) slower. The NaN issue could maybe be solved with Preferences.jl in Julia 1.6 (JuliaDiff/ForwardDiff.jl#181). |
That's not merged yet though - also, tracking down errors caused by NaN isn't always easy (took me a while in this case, AdvancedHMC suddenly tried so sample at So maybe we could keep the custom push-forward for a while, until things shake out on the ForwardDiff side? Would make life easier on users. |
It's definitely more convenient for users (I ran into the NaN issue as well multiple times) but even with the custom definition here the problem can occur - it is a general ForwardDiff issue that is solved only by the nan-safe setting. |
I never ran into be before myself though, so I guess it won't occur in the majority of use cases? In any case, how about merging this for now, since it certainly can't hurt? And maybe review which push-forwards in DistributionsAD may have become obsolete at a later time, once ForwardDiff has a NaN-safe mode that users can enable without modifying it's source - or maybe even has become the default - and that has been used in the wild for a while? |
As I said it happened to me multiple times with SciML (it is/was even mentioned in the SciML docs but I can't find it anymore right now). Anyway, I would assume it does not matter for AdvancedHMC whether I also benchmarked
julia> using StatsFuns, ForwardDiff, BenchmarkTools
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
34.320 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
33.966 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
39.136 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
38.696 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
37.365 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
40.332 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
40.891 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
43.127 ns (0 allocations: 0 bytes)
julia> using StatsFuns, ForwardDiff, DistributionsAD, BenchmarkTools
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
83.098 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
84.523 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
126.920 ns (0 allocations: 0 bytes)
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
176.739 ns (0 allocations: 0 bytes) So I still think the definition in DistributionsAD should just be removed: the corner case that this PR fixes already works with the default definition, it should not matter if |
All right - I took it out and renamed the PR. |
Thanks! Let's wait for #161 and ensure that tests still pass. |
Sure thing - I'll implement a workaround in my code for now. |
The remaining test error seems to be unrelated and caused by changes in PoissonBinomial in the latest release of Distributions. |
Currently,
poislogpdf(ForwardDiff.Dual(0.0, Δ), x)
returnsDual{Tag}(0.0, NaN)
, but should returnDual{Tag}(-0.1, -Δ)
.