Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove custom poislogpdf for ForwardDiff.Dual #160

Merged
merged 10 commits into from
Apr 26, 2021

Conversation

oschulz
Copy link
Contributor

@oschulz oschulz commented Apr 18, 2021

Currently, poislogpdf(ForwardDiff.Dual(0.0, Δ), x) returns Dual{Tag}(0.0, NaN), but should return Dual{Tag}(-0.1, -Δ).

@devmotion
Copy link
Member

devmotion commented Apr 18, 2021

To me it seems the definition could maybe just be removed completely? With StatsFuns and ForwardDiff and without DistributionsAD:

julia> using StatsFuns, ForwardDiff

julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 1), [0.0])
1-element Vector{Float64}:
 NaN

julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 0), [0.0])
1-element Vector{Float64}:
 -1.0

@oschulz
Copy link
Contributor Author

oschulz commented Apr 18, 2021

To me it seems the definition could maybe just be removed completely?

Unfortunately not - with StatsFuns and ForwardDiff and without DistributionsAD, poislogpdf(ForwardDiff.Dual(0, Δ), 0) does indeed work. Unfortunately, then poislogpdf(ForwardDiff.Dual(0, Δ), 1) returns Dual{Tag}(-Inf, NaN) but should return Dual{Tag}(-Inf, Inf).

(Also, the custom push-forward here should be faster than the default)

@devmotion
Copy link
Member

Unfortunately, then poislogpdf(ForwardDiff.Dual(0, Δ), 1) returns Dual{Tag}(-Inf, NaN) but should return Dual{Tag}(-Inf, Inf).

Why should it return Inf?

(Also, the custom push-forward here should be faster than the default)

Yes, of course, one should benchmark it before removing the custom definition. Intuitively, I would assume that already the default "naive" approach is quite fast since the function is so simple with respect to lambda (https://github.com/JuliaStats/StatsFuns.jl/blob/bc45e187a0cb29e45facd10c44c067afac5210fb/src/distrs/pois.jl#L18) and DiffRules defines a custom differentiation rule for log.

@oschulz
Copy link
Contributor Author

oschulz commented Apr 18, 2021

Test failures in testset adapt_randn seem unrelated to this PR (at least I also get them on master on my local system).

@devmotion
Copy link
Member

I assume the test failures are caused by Adapt 3.3 (or, more precisely, JuliaGPU/Adapt.jl#41).

src/forwarddiff.jl Outdated Show resolved Hide resolved
@oschulz
Copy link
Contributor Author

oschulz commented Apr 18, 2021

Why should it return Inf?

Well, poislogpdf(v, x) goes to -Inf for x > 0 at v = 0, so the derivative becomes infinite. And the custom derivative in DistributionsAD (x/val - 1) does the same, for x > 1.

Also, even without DistributionsAD, we get poislogpdf(ForwardDiff.Dual(eps(0.0), Δ), 1) == Dual{Nothing}(-744.4400719213812,Inf), so it makes sense to define poislogpdf(ForwardDiff.Dual(0.0, Δ), 1) == Dual{Nothing}(-Inf,Inf).

@oschulz
Copy link
Contributor Author

oschulz commented Apr 19, 2021

Thanks for the prettier version, @devmotion !

@devmotion
Copy link
Member

Ah, thanks, yes, it makes completely sense. Somehow I confused myself 🙂

However, actually the default without DistributionsAD works also in these cases, unfortunately it is just an occurrence of https://juliadiff.org/ForwardDiff.jl/latest/user/advanced.html#Fixing-NaN/Inf-Issues-1: with the NaN-safe setting (or eg. JuliaDiff/ForwardDiff.jl#451) one gets

julia> using StatsFuns, ForwardDiff

julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 0), [0.0])
1-element Vector{Float64}:
 -1.0

julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 1), [0.0])
1-element Vector{Float64}:
 Inf

julia> ForwardDiff.gradient(x -> poislogpdf(x[1], 2), [0.0])
1-element Vector{Float64}:
 Inf

So I still think one should consider removing the custom definition (and type piracy...) here in DistributionsAD, in particular if it is not (significantly) slower. The NaN issue could maybe be solved with Preferences.jl in Julia 1.6 (JuliaDiff/ForwardDiff.jl#181).

@oschulz
Copy link
Contributor Author

oschulz commented Apr 19, 2021

... with the NaN-safe setting ...

That's not merged yet though - also, tracking down errors caused by NaN isn't always easy (took me a while in this case, AdvancedHMC suddenly tried so sample at [NaN, ...] because of the gradient from the sample before, which itself was at a perfectly valid point.

So maybe we could keep the custom push-forward for a while, until things shake out on the ForwardDiff side? Would make life easier on users.

@devmotion
Copy link
Member

It's definitely more convenient for users (I ran into the NaN issue as well multiple times) but even with the custom definition here the problem can occur - it is a general ForwardDiff issue that is solved only by the nan-safe setting.

@oschulz
Copy link
Contributor Author

oschulz commented Apr 19, 2021

it is a general ForwardDiff issue that is solved only by the nan-safe setting

I never ran into be before myself though, so I guess it won't occur in the majority of use cases?

In any case, how about merging this for now, since it certainly can't hurt? And maybe review which push-forwards in DistributionsAD may have become obsolete at a later time, once ForwardDiff has a NaN-safe mode that users can enable without modifying it's source - or maybe even has become the default - and that has been used in the wild for a while?

@devmotion
Copy link
Member

devmotion commented Apr 19, 2021

I never ran into be before myself though, so I guess it won't occur in the majority of use cases?

As I said it happened to me multiple times with SciML (it is/was even mentioned in the SciML docs but I can't find it anymore right now).

Anyway, I would assume it does not matter for AdvancedHMC whether NaN or Inf is returned - in both cases, the value is not finite and the step is rejected.

I also benchmarked poislogpdf and it turns out that it is much better to just use the default and not to load DistributionsAD:

  • StatsFuns + ForwardDiff (latest release):
julia> using StatsFuns, ForwardDiff, BenchmarkTools

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
  34.320 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
  33.966 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
  39.136 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
  38.696 ns (0 allocations: 0 bytes)
  • StatsFuns + ForwardDiff#nansafe:
julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
  37.365 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
  40.332 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
  40.891 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
  43.127 ns (0 allocations: 0 bytes)
  • StatsFuns + ForwardDiff + DistributionsAD (latest release):
julia> using StatsFuns, ForwardDiff, DistributionsAD, BenchmarkTools

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(0.0));
  83.098 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(0.0));
  84.523 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 0)), $(rand()));
  126.920 ns (0 allocations: 0 bytes)

julia> @btime ForwardDiff.derivative($(x -> poislogpdf(x, 1)), $(rand()));
  176.739 ns (0 allocations: 0 bytes)

So I still think the definition in DistributionsAD should just be removed: the corner case that this PR fixes already works with the default definition, it should not matter if NaN or Inf is returned for AdvancedHMC etc (and it can be fixed with the nansafe setting and hopefully will be fixed upstream), and the default version is 2-5 times faster.

@oschulz
Copy link
Contributor Author

oschulz commented Apr 19, 2021

All right - I took it out and renamed the PR.

@oschulz oschulz changed the title Fix ForwardDiff of poislogpdf at v of zero Remove custom poislogpdf for ForwardDiff.Dual Apr 19, 2021
@devmotion
Copy link
Member

Thanks! Let's wait for #161 and ensure that tests still pass.

@oschulz
Copy link
Contributor Author

oschulz commented Apr 19, 2021

Sure thing - I'll implement a workaround in my code for now.

@devmotion
Copy link
Member

The remaining test error seems to be unrelated and caused by changes in PoissonBinomial in the latest release of Distributions.

@devmotion devmotion merged commit 12e2288 into TuringLang:master Apr 26, 2021
@oschulz oschulz deleted the diff-poislogpdf branch May 7, 2021 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants