Skip to content

Conversation

penelopeysm
Copy link
Member

Transferred from TuringLang/Turing.jl#2664.

Copy link
Contributor

github-actions bot commented Sep 5, 2025

Benchmark Report for Commit 206d661

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.5 |                 1.6 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                660.6 |                45.0 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                431.0 |                55.8 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1257.2 |                29.4 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6577.7 |                27.2 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1462.4 |                29.7 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               1019.2 |                 4.6 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5877.0 |                 4.3 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1031.8 |                 9.0 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              66107.0 |                 3.9 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8859.2 |                 9.9 |
|               Dynamic |        10 |    mooncake |             typed |   true |                133.1 |                11.5 |
|              Submodel |         1 |    mooncake |             typed |   true |                 12.7 |                 5.2 |
|                   LDA |        12 | reversediff |             typed |   true |               1044.1 |                 4.4 |

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.41%. Comparing base (3db45ee) to head (206d661).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1036      +/-   ##
==========================================
+ Coverage   82.34%   82.41%   +0.06%     
==========================================
  Files          38       39       +1     
  Lines        3949     3964      +15     
==========================================
+ Hits         3252     3267      +15     
  Misses        697      697              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link

coveralls commented Sep 5, 2025

Pull Request Test Coverage Report for Build 17814346881

Details

  • 0 of 16 (0.0%) changed or added relevant lines in 1 file are covered.
  • 27 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.08%) to 82.676%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicPPLMarginalLogDensitiesExt.jl 0 16 0.0%
Files with Coverage Reduction New Missed Lines %
src/model.jl 1 86.73%
src/debug_utils.jl 3 89.27%
src/extract_priors.jl 3 60.53%
src/pointwise_logdensities.jl 3 93.06%
src/values_as_in_model.jl 4 72.41%
src/threadsafe.jl 13 62.16%
Totals Coverage Status
Change from base Build 17494778045: 0.08%
Covered Lines: 3269
Relevant Lines: 3954

💛 - Coveralls

Copy link
Member Author

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElOceanografo Here's the updated PR. I got rid of the double negative sign, and added some documentation. If you'd like to have push access to this let me know and I can add you!

Comment on lines 66 to 67
# Use linked `varinfo` to that we're working in unconstrained space
varinfo_linked = DynamicPPL.link(varinfo, model)
Copy link
Member Author

@penelopeysm penelopeysm Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MLD require a linked VarInfo, or would an unlinked VarInfo be fine?

The reason I'm thinking about this is because if the VarInfo is linked, then all the parameters that are later supplied must be in linked space, which is potentially a bit confusing (though nothing that can't be fixed by documentation). Example:

julia> using DynamicPPL, Distributions, Bijectors, MarginalLogDensities

julia> @model function f()
           x ~ Normal()
           y ~ Beta(2, 2)
       end
f (generic function with 2 methods)

julia> m = marginalize(f(), [@varname(x)]);

julia> m([0.5]) # this 0.5 is in linked space
0.3436055008678415

julia> logpdf(Beta(2, 2), 0.5) # this 0.5 is unlinked, so logp is wrong
0.4054651081081644

julia> inverse(Bijectors.bijector(Beta(2, 2)))(0.5) # this is the unlinked value corresponding to 0.5
0.6224593312018546

julia> logpdf(Beta(2, 2), 0.6224593312018546) # now logp matches
0.3436055008678416

If an unlinked VarInfo is acceptable, then the choice of varinfo should probably also be added as an argument to marginalize.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just copied this from @torfjelde's original example without understanding the linked/unlinked VarInfo distinction. I agree that unlinked would be make more sense in this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing a test I ran into this:

julia> using MarginalLogDensities, Distributions

julia> f(x, d) = logpdf(d.dist, x[1])
f (generic function with 1 method)

julia> MarginalLogDensity(f, [0.5], [1], (;dist=Normal()))(Float64[])
-1.1102230246251565e-16

julia> MarginalLogDensity(f, [0.5], [1], (;dist=Beta(2, 2)))(Float64[])
0.2846828704729192

My impression is that it should be zero. I added Cubature() and it goes down to -3e-10 (yay!), but for the equivalent linked varinfo case, even using Cubature() doesn't help: it still returns a constant additive term of around 1.7. Am I correct in saying that this is generally expected?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just because the Laplace approximation is fitting a beta distribution with a normal, based on the curvature at the peak of the beta, and it's not a good approximation. By default, Cubature also does an LA to select its upper and lower integration limits, based on 6x the standard deviation of the approximating normal, which may or may not be good enough:

# default 6 sigma
julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature())(Float64[])
-0.0018478445031467727

# manually setting wider integration limits
julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature(upper=[20], lower=[-20]))(Float64[])
-5.775533047958272e-6

Thinking a bit more about the linked/unlinked question, I'm not actually sure unlinked is right. The MarginalLogDensity is almost always going to be used to optimize or sample from the fixed-effect parameters, meaning they need to be linked.

Interpreting the optimum/chain is then potentially confusing, and this confusion is probably more likely if you used DynamicPPL/Turing to write your model, since you don't have to handle the parameter transforms yourself, as you would if you wrote the model function from scratch. But I think that's better than having an MLD object that produces optimizer/sampler errors.

For now, I think we can just document this behavior, along with a clear example of how to un-link the variables.

Copy link
Member Author

@penelopeysm penelopeysm Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For example, if we marginalise x out of the model above, do you know what it would mean to 'unlink' the value of y? I don't think I've fully wrapped my head around it yet 😅. My current suspicion is that we'd need to access the original value of x that's stored inside the MarginalLogDensity object, but I am pretty confused. I might have to read the MLD source code more to see what's going on.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the problem is when the link function for one variable depends on another one.

The MLD object stores the full parameter vector, and updates the marginalized subset to their optimal values every time it does a Laplace approximation. So if you need the full set of parameters you can get them via mld.u, see here. So you could get a link function for y, though it wouldn't necessarily be the right one, since it would be for a point estimate of x rather than integrating over its distribution.

That should be doable taking advantage of the cached Laplace approx info...i.e., if the unlinked $y$ is related to the linked $y_L$ as $y = \mathrm{invlink}(y_L, x)$, then

$$ E[y] = \int \mathrm{invlink}(y_L, x) p(x) dx $$

where we can define $p(x)$ as an MVNormal based on the mode and Hessian stored in the MLD object.

Also worth noting that calling maximum_a_posteriori(f()) with your example produces a different mode than MCMC:

julia> m = f()

julia> mean(sample(m, NUTS(), 1000))
┌ Info: Found initial step size
└   ϵ = 0.4
Sampling 100%|██████████████████████████████████████████| Time: 0:00:00
Mean
  parameters      mean 
      Symbol   Float64 

           x   -0.0507
           y    0.9262


julia> maximum_a_posteriori(m)
ModeResult with maximized lp of -0.90
[0.6120031806556528, 0.6120031809603704]

so maybe for now we can file this under "problematic model structure for deterministic optimization," add a warning box to the docs, and call it good enough...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yup, I'd be happy to leave that as a "to be figured out". Then, that just leaves the question of how to invlink the values for a non-pathological model, which I think should be easy enough with the u field. Let me see if I can put together an example and a test, then I'd be quite happy to merge this in.

Copy link
Member Author

@penelopeysm penelopeysm Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working through it analytically (not 100% sure if I got everything right, but some plots back me up qualitatively), it's probably a bit of a nonsensical example because we have that

$$\begin{align*} p(x) &= npdf(x) \\ p(y|x) &= \begin{cases} \frac{npdf(y)}{1 - ncdf(x)} & x \leq y \\ 0 & \text{otherwise} \end{cases}\\ \implies \quad p(y) &= \int_{-\infty}^{\infty} p(y|x) p(x) \mathrm{d}x \\ &= \int_{-\infty}^{y} npdf(y) \left(\frac{npdf(x)}{1 - ncdf(x)} \right) \mathrm{d}x \\ \end{align*}$$

where npdf and ncdf are the pdf and cdf for the unit normal distribution. But I'm pretty sure that $p(y)$ can be made arbitrarily large by increasing $y$, because as $x \to +\infty$, $ncdf(x) \to 1$ and so the denominator in the integrand vanishes.

I guess the obvious question is whether there is some example that doesn't suffer from this, but maybe that's for another time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning to the docstring, so hopefully that'll be fine for now!

Copy link
Contributor

github-actions bot commented Sep 5, 2025

DynamicPPL.jl documentation for PR #1036 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1036/

@ElOceanografo
Copy link

DynamicPPL.jl documentation for PR #1036 is available at: https://TuringLang.github.io/DynamicPPL.jl/previews/PR1036/

Looks like the Marginalization section in the docs got inserted in the middle of the Prediction section here

@penelopeysm
Copy link
Member Author

@ElOceanografo Alright I think we are good to go :) Can I get you to take a final look and see if you're happy? Also @sunxd3 likewise, a sanity check to make sure I didn't mess anything up on Turing side would be appreciated :)

Copy link
Member

@sunxd3 sunxd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versioning is not a must change, so I am approving now

Looks very clean, thanks both for working on it!

@@ -122,6 +122,7 @@ export AbstractVarInfo,
fix,
unfix,
predict,
marginalize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we are exporting new function, I want to tag @yebai to take a look

my assessment is that the risk is low, and benefit is high

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful, but given that it is implemented as a pkg extension, we should print a message if relevant packages are not correctly loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added an error hint so now it does this:

julia> using DynamicPPL, Distributions; @model f() = x ~ Normal(); marginalize(f(), [@varname(x)])
ERROR: MethodError: no method matching marginalize(::Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DefaultContext}, ::Vector{VarName{:x, typeof(identity)}})
The function `marginalize` exists, but no method is defined for this combination of argument types.

    `marginalize` requires MarginalLogDensities.jl to be loaded.
    Please run `using MarginalLogDensities` before calling `marginalize`.

Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @penelopeysm.

One more thing: we tend to follow British spelling in Turing.jl. Can we alias marginalise to marginalize and export both? This would keep everyone happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to choose one and stick with it. The standard practice for programming is to use AmE spellings and that's generally true even for packages that are written in the UK. IMO it's best to stick to AmE spellings to reduce the amount of mental overhead for users (most people will expect AmE spellings).

So far, we have actually stuck to -ize, e.g. DynamicPPL.contextualize, AbstractPPL.concretize, AdvancedHMC.initialize!, ... so it would be both internally and externally consistent to use marginalize. Turing has src/optimisation/Optimisation.jl but doesn't export anything called 'optimise', only MAP and MLE.

Comments are a different matter of course :) -- but IMO identifiers and argument names should always use -ize


An extension for MarginalLogDensities.jl has been added.

Loading DynamicPPL and MarginalLogDensities now provides the `DynamicPPL.marginalize` function to marginalize out variables from a model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a sentence on its limitations, eg, only works with differentiable models.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElOceanografo Does Cubature() allow it to work with non-differentiable models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants