MarginalLogDensities extension #1036

penelopeysm · 2025-09-05T16:54:35Z

Transferred from TuringLang/Turing.jl#2664.

See: TuringLang/Turing.jl#2664 Co-authored-by: Sam Urmy <[email protected]>

github-actions · 2025-09-05T17:04:57Z

Benchmark Report for Commit `206d661`

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.5 |                 1.6 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                660.6 |                45.0 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                431.0 |                55.8 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1257.2 |                29.4 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6577.7 |                27.2 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1462.4 |                29.7 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               1019.2 |                 4.6 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5877.0 |                 4.3 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1031.8 |                 9.0 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              66107.0 |                 3.9 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8859.2 |                 9.9 |
|               Dynamic |        10 |    mooncake |             typed |   true |                133.1 |                11.5 |
|              Submodel |         1 |    mooncake |             typed |   true |                 12.7 |                 5.2 |
|                   LDA |        12 | reversediff |             typed |   true |               1044.1 |                 4.4 |

codecov · 2025-09-05T17:07:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.41%. Comparing base (3db45ee) to head (206d661).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1036      +/-   ##
==========================================
+ Coverage   82.34%   82.41%   +0.06%     
==========================================
  Files          38       39       +1     
  Lines        3949     3964      +15     
==========================================
+ Hits         3252     3267      +15     
  Misses        697      697

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-09-05T17:08:33Z

Pull Request Test Coverage Report for Build 17814346881

Details

0 of 16 (0.0%) changed or added relevant lines in 1 file are covered.
27 unchanged lines in 6 files lost coverage.
Overall coverage increased (+0.08%) to 82.676%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ext/DynamicPPLMarginalLogDensitiesExt.jl	0	16	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/model.jl	1	86.73%
src/debug_utils.jl	3	89.27%
src/extract_priors.jl	3	60.53%
src/pointwise_logdensities.jl	3	93.06%
src/values_as_in_model.jl	4	72.41%
src/threadsafe.jl	13	62.16%

Totals
Change from base Build 17494778045:	0.08%
Covered Lines:	3269
Relevant Lines:	3954

💛 - Coveralls

penelopeysm

@ElOceanografo Here's the updated PR. I got rid of the double negative sign, and added some documentation. If you'd like to have push access to this let me know and I can add you!

ext/DynamicPPLMarginalLogDensitiesExt.jl

penelopeysm · 2025-09-05T17:12:04Z

ext/DynamicPPLMarginalLogDensitiesExt.jl

+    # Use linked `varinfo` to that we're working in unconstrained space
+    varinfo_linked = DynamicPPL.link(varinfo, model)


Does MLD require a linked VarInfo, or would an unlinked VarInfo be fine?

The reason I'm thinking about this is because if the VarInfo is linked, then all the parameters that are later supplied must be in linked space, which is potentially a bit confusing (though nothing that can't be fixed by documentation). Example:

julia> using DynamicPPL, Distributions, Bijectors, MarginalLogDensities julia> @model function f() x ~ Normal() y ~ Beta(2, 2) end f (generic function with 2 methods) julia> m = marginalize(f(), [@varname(x)]); julia> m([0.5]) # this 0.5 is in linked space 0.3436055008678415 julia> logpdf(Beta(2, 2), 0.5) # this 0.5 is unlinked, so logp is wrong 0.4054651081081644 julia> inverse(Bijectors.bijector(Beta(2, 2)))(0.5) # this is the unlinked value corresponding to 0.5 0.6224593312018546 julia> logpdf(Beta(2, 2), 0.6224593312018546) # now logp matches 0.3436055008678416

If an unlinked VarInfo is acceptable, then the choice of varinfo should probably also be added as an argument to marginalize.

I had just copied this from @torfjelde's original example without understanding the linked/unlinked VarInfo distinction. I agree that unlinked would be make more sense in this context.

Cool, will change

While writing a test I ran into this:

julia> using MarginalLogDensities, Distributions julia> f(x, d) = logpdf(d.dist, x[1]) f (generic function with 1 method) julia> MarginalLogDensity(f, [0.5], [1], (;dist=Normal()))(Float64[]) -1.1102230246251565e-16 julia> MarginalLogDensity(f, [0.5], [1], (;dist=Beta(2, 2)))(Float64[]) 0.2846828704729192

My impression is that it should be zero. I added Cubature() and it goes down to -3e-10 (yay!), but for the equivalent linked varinfo case, even using Cubature() doesn't help: it still returns a constant additive term of around 1.7. Am I correct in saying that this is generally expected?

I think this is just because the Laplace approximation is fitting a beta distribution with a normal, based on the curvature at the peak of the beta, and it's not a good approximation. By default, Cubature also does an LA to select its upper and lower integration limits, based on 6x the standard deviation of the approximating normal, which may or may not be good enough:

# default 6 sigma julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature())(Float64[]) -0.0018478445031467727 # manually setting wider integration limits julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature(upper=[20], lower=[-20]))(Float64[]) -5.775533047958272e-6

Thinking a bit more about the linked/unlinked question, I'm not actually sure unlinked is right. The MarginalLogDensity is almost always going to be used to optimize or sample from the fixed-effect parameters, meaning they need to be linked.

Interpreting the optimum/chain is then potentially confusing, and this confusion is probably more likely if you used DynamicPPL/Turing to write your model, since you don't have to handle the parameter transforms yourself, as you would if you wrote the model function from scratch. But I think that's better than having an MLD object that produces optimizer/sampler errors.

For now, I think we can just document this behavior, along with a clear example of how to un-link the variables.

(For example, if we marginalise x out of the model above, do you know what it would mean to 'unlink' the value of y? I don't think I've fully wrapped my head around it yet 😅. My current suspicion is that we'd need to access the original value of x that's stored inside the MarginalLogDensity object, but I am pretty confused. I might have to read the MLD source code more to see what's going on.)

I see, the problem is when the link function for one variable depends on another one.

The MLD object stores the full parameter vector, and updates the marginalized subset to their optimal values every time it does a Laplace approximation. So if you need the full set of parameters you can get them via mld.u, see here. So you could get a link function for y, though it wouldn't necessarily be the right one, since it would be for a point estimate of x rather than integrating over its distribution.

That should be doable taking advantage of the cached Laplace approx info...i.e., if the unlinked $y$ is related to the linked $y_L$ as $y = \mathrm{invlink}(y_L, x)$, then

$$ E[y] = \int \mathrm{invlink}(y_L, x) p(x) dx $$

where we can define $p(x)$ as an MVNormal based on the mode and Hessian stored in the MLD object.

Also worth noting that calling maximum_a_posteriori(f()) with your example produces a different mode than MCMC:

julia> m = f() julia> mean(sample(m, NUTS(), 1000)) ┌ Info: Found initial step size └ ϵ = 0.4 Sampling 100%|██████████████████████████████████████████| Time: 0:00:00 Mean parameters mean Symbol Float64 x -0.0507 y 0.9262 julia> maximum_a_posteriori(m) ModeResult with maximized lp of -0.90 [0.6120031806556528, 0.6120031809603704]

so maybe for now we can file this under "problematic model structure for deterministic optimization," add a warning box to the docs, and call it good enough...

Thanks! Yup, I'd be happy to leave that as a "to be figured out". Then, that just leaves the question of how to invlink the values for a non-pathological model, which I think should be easy enough with the u field. Let me see if I can put together an example and a test, then I'd be quite happy to merge this in.

Working through it analytically (not 100% sure if I got everything right, but some plots back me up qualitatively), it's probably a bit of a nonsensical example because we have that

$$\begin{align*} p(x) &= npdf(x) \\ p(y|x) &= \begin{cases} \frac{npdf(y)}{1 - ncdf(x)} & x \leq y \\ 0 & \text{otherwise} \end{cases}\\ \implies \quad p(y) &= \int_{-\infty}^{\infty} p(y|x) p(x) \mathrm{d}x \\ &= \int_{-\infty}^{y} npdf(y) \left(\frac{npdf(x)}{1 - ncdf(x)} \right) \mathrm{d}x \\ \end{align*}$$

where npdf and ncdf are the pdf and cdf for the unit normal distribution. But I'm pretty sure that $p(y)$ can be made arbitrarily large by increasing $y$, because as $x \to +\infty$, $ncdf(x) \to 1$ and so the denominator in the integrand vanishes.

I guess the obvious question is whether there is some example that doesn't suffer from this, but maybe that's for another time.

Added a warning to the docstring, so hopefully that'll be fine for now!

github-actions · 2025-09-05T17:27:50Z

DynamicPPL.jl documentation for PR #1036 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1036/

ElOceanografo · 2025-09-16T04:07:24Z

DynamicPPL.jl documentation for PR #1036 is available at: https://TuringLang.github.io/DynamicPPL.jl/previews/PR1036/

Looks like the Marginalization section in the docs got inserted in the middle of the Prediction section here

ext/DynamicPPLMarginalLogDensitiesExt.jl

HISTORY.md

Co-authored-by: Sam Urmy <[email protected]>

penelopeysm · 2025-09-18T13:15:03Z

@ElOceanografo Alright I think we are good to go :) Can I get you to take a final look and see if you're happy? Also @sunxd3 likewise, a sanity check to make sure I didn't mess anything up on Turing side would be appreciated :)

sunxd3

The versioning is not a must change, so I am approving now

Looks very clean, thanks both for working on it!

Project.toml

ext/DynamicPPLMarginalLogDensitiesExt.jl

sunxd3 · 2025-09-19T08:02:42Z

src/DynamicPPL.jl

@@ -122,6 +122,7 @@ export AbstractVarInfo,
    fix,
    unfix,
    predict,
+    marginalize,


given that we are exporting new function, I want to tag @yebai to take a look

my assessment is that the risk is low, and benefit is high

It is useful, but given that it is implemented as a pkg extension, we should print a message if relevant packages are not correctly loaded.

Just added an error hint so now it does this:

julia> using DynamicPPL, Distributions; @model f() = x ~ Normal(); marginalize(f(), [@varname(x)]) ERROR: MethodError: no method matching marginalize(::Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DefaultContext}, ::Vector{VarName{:x, typeof(identity)}}) The function `marginalize` exists, but no method is defined for this combination of argument types. `marginalize` requires MarginalLogDensities.jl to be loaded. Please run `using MarginalLogDensities` before calling `marginalize`. Stacktrace: [1] top-level scope @ REPL[1]:1

Thanks @penelopeysm.

One more thing: we tend to follow British spelling in Turing.jl. Can we alias marginalise to marginalize and export both? This would keep everyone happy.

I think it's better to choose one and stick with it. The standard practice for programming is to use AmE spellings and that's generally true even for packages that are written in the UK. IMO it's best to stick to AmE spellings to reduce the amount of mental overhead for users (most people will expect AmE spellings).

So far, we have actually stuck to -ize, e.g. DynamicPPL.contextualize, AbstractPPL.concretize, AdvancedHMC.initialize!, ... so it would be both internally and externally consistent to use marginalize. Turing has src/optimisation/Optimisation.jl but doesn't export anything called 'optimise', only MAP and MLE.

Comments are a different matter of course :) -- but IMO identifiers and argument names should always use -ize

yebai · 2025-09-19T08:49:47Z

HISTORY.md

+
+An extension for MarginalLogDensities.jl has been added.
+
+Loading DynamicPPL and MarginalLogDensities now provides the `DynamicPPL.marginalize` function to marginalize out variables from a model.


Perhaps add a sentence on its limitations, eg, only works with differentiable models.

@ElOceanografo Does Cubature() allow it to work with non-differentiable models?

penelopeysm and others added 3 commits September 5, 2025 17:40

Transfer MarginalLogDensities extension from Turing

4b820f5

See: TuringLang/Turing.jl#2664 Co-authored-by: Sam Urmy <[email protected]>

Add documentation

8fab001

Bump patch, add changelog

105013a

github-actions bot assigned penelopeysm Sep 5, 2025

penelopeysm commented Sep 5, 2025

View reviewed changes

penelopeysm added 2 commits September 5, 2025 18:19

Add compat entry for MLD

1b3e76b

Fix docs

aaca138

Allow user to specify VarInfo used for marginalisation

14f9f46

penelopeysm added 4 commits September 16, 2025 15:45

Use linked varinfo by default

00c08b2

Make the non-essential stuff all keyword arguments

e9eabb4

Fix docs

844ec4c

Add VarInfo(::MarginalLogDensity) method

f4049c1

penelopeysm commented Sep 18, 2025

View reviewed changes

ext/DynamicPPLMarginalLogDensitiesExt.jl Outdated Show resolved Hide resolved

HISTORY.md Outdated Show resolved Hide resolved

penelopeysm and others added 3 commits September 18, 2025 12:29

Use new cached_params function

144bee7

Add more detailed changelog

29840be

Co-authored-by: Sam Urmy <[email protected]>

Merge branch 'main' into py/mld-ext

e590e02

penelopeysm requested review from ElOceanografo and sunxd3 and removed request for ElOceanografo September 18, 2025 13:15

This was referenced Sep 18, 2025

Integrating Turing and MarginalLogDensities TuringLang/Turing.jl#2398

Closed

Extension for MarginalLogDensities.jl, take 2 TuringLang/Turing.jl#2664

Closed

sunxd3 approved these changes Sep 19, 2025

View reviewed changes

Project.toml Show resolved Hide resolved

ext/DynamicPPLMarginalLogDensitiesExt.jl Outdated Show resolved Hide resolved

sunxd3 reviewed Sep 19, 2025

View reviewed changes

yebai reviewed Sep 19, 2025

View reviewed changes

penelopeysm added 3 commits September 19, 2025 10:34

Add error hint if marginalize is called before loading MLD

06599cf

fix comma -> semicolon typo

015534b

remove test with symbol

206d661

		# Use linked `varinfo` to that we're working in unconstrained space
		varinfo_linked = DynamicPPL.link(varinfo, model)


		An extension for MarginalLogDensities.jl has been added.

		Loading DynamicPPL and MarginalLogDensities now provides the `DynamicPPL.marginalize` function to marginalize out variables from a model.

MarginalLogDensities extension #1036

Are you sure you want to change the base?

MarginalLogDensities extension #1036

Conversation

penelopeysm commented Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit 206d661

Computer Information

Benchmark Results

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 17814346881

Details

💛 - Coveralls

Uh oh!

penelopeysm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

penelopeysm Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

ElOceanografo commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

penelopeysm commented Sep 18, 2025

Uh oh!

sunxd3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 5, 2025 •

edited

Loading

Benchmark Report for Commit `206d661`

codecov bot commented Sep 5, 2025 •

edited

Loading

coveralls commented Sep 5, 2025 •

edited

Loading

penelopeysm Sep 5, 2025 •

edited

Loading

penelopeysm Sep 16, 2025 •

edited

Loading

penelopeysm Sep 17, 2025 •

edited

Loading