-
Notifications
You must be signed in to change notification settings - Fork 235
Description
Due to recent improvements in ForwardDiff, this sort of model can now error quite easily. This causes a bunch of CI failures (most recently on the docs).
julia> using Turing, Random
@model function f()
var ~ InverseGamma(2, 3)
sd = sqrt(var)
mean ~ Normal(0.0, sd)
1.5 ~ Normal(mean, sd)
2.0 ~ Normal(mean, sd)
end
f (generic function with 2 methods)
julia> sample(Xoshiro(231), f(), NUTS(), 2000; progress=false)
┌ Info: Found initial step size
└ ϵ = 0.8
ERROR: DomainError with Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}(0.0,NaN,NaN):
Normal: the condition σ >= zero(σ) is not satisfied.
[...]The solution to this, it seems, is to enable NaN-safe mode for ForwardDiff. I can confirm that that does indeed make the above call work:
julia> using ForwardDiff, Preferences
julia> set_preferences!(ForwardDiff, "nansafe_mode" => true);
# reload session and rerun
julia> sample(Xoshiro(231), f(), NUTS(), 2000; progress=false)
┌ Info: Found initial step size
└ ϵ = 0.8
Chains MCMC chain (2000×16×1 Array{Float64, 3}):
Iterations = 1001:1:3000
Number of chains = 1
Samples per chain = 2000
Wall duration = 3.34 seconds
Compute duration = 3.34 seconds
parameters = var, mean
internals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.This is fine, but honestly quite annoying. Firstly, it's not optimal that ForwardDiff uses Preferences.jl, which is stored somewhere on the user's working directory and is not propagated to different environments or even directories. It would be way better if it was controlled via ADTypes, and if we could change the default backend to be NaN-safe ForwardDiff. According to the docs page linked above, dynamically enabling NaN-safe mode via a config struct has been planned for many years, but this has not actually materialised.
Secondly, it's also pretty unfortunate that ForwardDiff is Turing's default backend, so this sort of error can crop up quite easily: even if individual failures don't happen often (in the example above, 231 was the first seed that failed, so it fails ~ 0.5% of the time), the simple fact is we run so many tests and docs builds with NUTS that something somewhere is bound to break quite often.
There isn't really much that can be done here. Some ideas I have are:
-
Inside the inner constructor of
LogDensityFunction, put an@infoor@warnabout this behaviour. We can putmaxlog=1to avoid being spammy. That might be a bit too extreme and annoying, though; and given that ForwardDiff is the default backend, putting a warning there means that literally everyone who ever runs a model withNUTSwill see it, so I'm not inclined towards this. -
Write a docs page about it. This is partly why I'm opening an issue even though I can't do much about it, it's so that the information is out there somewhere.
-
Pray that one day it's configurable via ADTypes, and then update our default.
Of these three, the second is the only realistic one.