ForwardDiff NaN-safe mode

Due to recent improvements in ForwardDiff, this sort of model can now error quite easily. This causes a bunch of [CI failures](https://github.com/TuringLang/Turing.jl/issues/2744) (most recently [on the docs](https://github.com/TuringLang/docs/actions/runs/21687166551/job/62537214513)).

```julia
julia> using Turing, Random
       @model function f()
           var ~ InverseGamma(2, 3)
           sd = sqrt(var)
           mean ~ Normal(0.0, sd)
           1.5 ~ Normal(mean, sd)
           2.0 ~ Normal(mean, sd)
       end
f (generic function with 2 methods)

julia> sample(Xoshiro(231), f(), NUTS(), 2000; progress=false)
┌ Info: Found initial step size
└   ϵ = 0.8
ERROR: DomainError with Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}(0.0,NaN,NaN):
Normal: the condition σ >= zero(σ) is not satisfied.
[...]
```

The solution to this, it seems, is to [enable NaN-safe mode for ForwardDiff](https://juliadiff.org/ForwardDiff.jl/stable/user/advanced/#Fixing-NaN/Inf-Issues). I can confirm that that does indeed make the above call work:

```julia
julia> using ForwardDiff, Preferences

julia> set_preferences!(ForwardDiff, "nansafe_mode" => true);

# reload session and rerun

julia> sample(Xoshiro(231), f(), NUTS(), 2000; progress=false)
┌ Info: Found initial step size
└   ϵ = 0.8
Chains MCMC chain (2000×16×1 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 1
Samples per chain = 2000
Wall duration     = 3.34 seconds
Compute duration  = 3.34 seconds
parameters        = var, mean
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Use `describe(chains)` for summary statistics and quantiles.
```

This is fine, but honestly quite annoying. Firstly, it's not optimal that ForwardDiff uses Preferences.jl, which is stored somewhere on the user's working directory and is not propagated to different environments or even directories. It would be way better if it was controlled via ADTypes, and if we could change the default backend to be NaN-safe ForwardDiff. According to the docs page linked above, dynamically enabling NaN-safe mode via a config struct has been planned for many years, but this has not actually materialised.

Secondly, it's also pretty unfortunate that ForwardDiff is Turing's default backend, so this sort of error can crop up quite easily: even if individual failures don't happen often (in the example above, 231 was the first seed that failed, so it fails ~ 0.5% of the time), the simple fact is we run so many tests and docs builds with NUTS that something somewhere is bound to break quite often.

There isn't really much that can be done here. Some ideas I have are:

1. Inside the inner constructor of `LogDensityFunction`, put an `@info` or `@warn` about this behaviour. We can put `maxlog=1` to avoid being spammy. That might be a bit too extreme and annoying, though; and given that ForwardDiff *is* the default backend, putting a warning there means that literally everyone who ever runs a model with `NUTS` will see it, so I'm not inclined towards this.

2. Write a docs page about it. This is partly why I'm opening an issue even though I can't do much about it, it's so that the information is out there somewhere.

3. Pray that one day it's configurable via ADTypes, and then update our default.

Of these three, the second is the only realistic one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ForwardDiff NaN-safe mode #2769

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ForwardDiff NaN-safe mode #2769

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions