-
Notifications
You must be signed in to change notification settings - Fork 36
MarginalLogDensities extension #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
See: TuringLang/Turing.jl#2664 Co-authored-by: Sam Urmy <[email protected]>
Benchmark Report for Commit 206d661Computer Information
Benchmark Results
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1036 +/- ##
==========================================
+ Coverage 82.34% 82.41% +0.06%
==========================================
Files 38 39 +1
Lines 3949 3964 +15
==========================================
+ Hits 3252 3267 +15
Misses 697 697 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pull Request Test Coverage Report for Build 17814346881Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElOceanografo Here's the updated PR. I got rid of the double negative sign, and added some documentation. If you'd like to have push access to this let me know and I can add you!
# Use linked `varinfo` to that we're working in unconstrained space | ||
varinfo_linked = DynamicPPL.link(varinfo, model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does MLD require a linked VarInfo, or would an unlinked VarInfo be fine?
The reason I'm thinking about this is because if the VarInfo is linked, then all the parameters that are later supplied must be in linked space, which is potentially a bit confusing (though nothing that can't be fixed by documentation). Example:
julia> using DynamicPPL, Distributions, Bijectors, MarginalLogDensities
julia> @model function f()
x ~ Normal()
y ~ Beta(2, 2)
end
f (generic function with 2 methods)
julia> m = marginalize(f(), [@varname(x)]);
julia> m([0.5]) # this 0.5 is in linked space
0.3436055008678415
julia> logpdf(Beta(2, 2), 0.5) # this 0.5 is unlinked, so logp is wrong
0.4054651081081644
julia> inverse(Bijectors.bijector(Beta(2, 2)))(0.5) # this is the unlinked value corresponding to 0.5
0.6224593312018546
julia> logpdf(Beta(2, 2), 0.6224593312018546) # now logp matches
0.3436055008678416
If an unlinked VarInfo is acceptable, then the choice of varinfo should probably also be added as an argument to marginalize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just copied this from @torfjelde's original example without understanding the linked/unlinked VarInfo distinction. I agree that unlinked would be make more sense in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While writing a test I ran into this:
julia> using MarginalLogDensities, Distributions
julia> f(x, d) = logpdf(d.dist, x[1])
f (generic function with 1 method)
julia> MarginalLogDensity(f, [0.5], [1], (;dist=Normal()))(Float64[])
-1.1102230246251565e-16
julia> MarginalLogDensity(f, [0.5], [1], (;dist=Beta(2, 2)))(Float64[])
0.2846828704729192
My impression is that it should be zero. I added Cubature()
and it goes down to -3e-10 (yay!), but for the equivalent linked varinfo case, even using Cubature()
doesn't help: it still returns a constant additive term of around 1.7. Am I correct in saying that this is generally expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just because the Laplace approximation is fitting a beta distribution with a normal, based on the curvature at the peak of the beta, and it's not a good approximation. By default, Cubature
also does an LA to select its upper and lower integration limits, based on 6x the standard deviation of the approximating normal, which may or may not be good enough:
# default 6 sigma
julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature())(Float64[])
-0.0018478445031467727
# manually setting wider integration limits
julia> MarginalLogDensity(f, [0.5], [1], (;dist=TDist(5)), Cubature(upper=[20], lower=[-20]))(Float64[])
-5.775533047958272e-6
Thinking a bit more about the linked/unlinked question, I'm not actually sure unlinked is right. The MarginalLogDensity
is almost always going to be used to optimize or sample from the fixed-effect parameters, meaning they need to be linked.
Interpreting the optimum/chain is then potentially confusing, and this confusion is probably more likely if you used DynamicPPL/Turing to write your model, since you don't have to handle the parameter transforms yourself, as you would if you wrote the model function from scratch. But I think that's better than having an MLD object that produces optimizer/sampler errors.
For now, I think we can just document this behavior, along with a clear example of how to un-link the variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For example, if we marginalise x
out of the model above, do you know what it would mean to 'unlink' the value of y
? I don't think I've fully wrapped my head around it yet 😅. My current suspicion is that we'd need to access the original value of x
that's stored inside the MarginalLogDensity object, but I am pretty confused. I might have to read the MLD source code more to see what's going on.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the problem is when the link function for one variable depends on another one.
The MLD object stores the full parameter vector, and updates the marginalized subset to their optimal values every time it does a Laplace approximation. So if you need the full set of parameters you can get them via mld.u
, see here. So you could get a link function for y
, though it wouldn't necessarily be the right one, since it would be for a point estimate of x
rather than integrating over its distribution.
That should be doable taking advantage of the cached Laplace approx info...i.e., if the unlinked
where we can define MVNormal
based on the mode and Hessian stored in the MLD object.
Also worth noting that calling maximum_a_posteriori(f())
with your example produces a different mode than MCMC:
julia> m = f()
julia> mean(sample(m, NUTS(), 1000))
┌ Info: Found initial step size
└ ϵ = 0.4
Sampling 100%|██████████████████████████████████████████| Time: 0:00:00
Mean
parameters mean
Symbol Float64
x -0.0507
y 0.9262
julia> maximum_a_posteriori(m)
ModeResult with maximized lp of -0.90
[0.6120031806556528, 0.6120031809603704]
so maybe for now we can file this under "problematic model structure for deterministic optimization," add a warning box to the docs, and call it good enough...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yup, I'd be happy to leave that as a "to be figured out". Then, that just leaves the question of how to invlink the values for a non-pathological model, which I think should be easy enough with the u
field. Let me see if I can put together an example and a test, then I'd be quite happy to merge this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working through it analytically (not 100% sure if I got everything right, but some plots back me up qualitatively), it's probably a bit of a nonsensical example because we have that
where npdf
and ncdf
are the pdf and cdf for the unit normal distribution. But I'm pretty sure that
I guess the obvious question is whether there is some example that doesn't suffer from this, but maybe that's for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a warning to the docstring, so hopefully that'll be fine for now!
DynamicPPL.jl documentation for PR #1036 is available at: |
Looks like the Marginalization section in the docs got inserted in the middle of the Prediction section here |
Co-authored-by: Sam Urmy <[email protected]>
@ElOceanografo Alright I think we are good to go :) Can I get you to take a final look and see if you're happy? Also @sunxd3 likewise, a sanity check to make sure I didn't mess anything up on Turing side would be appreciated :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versioning is not a must change, so I am approving now
Looks very clean, thanks both for working on it!
@@ -122,6 +122,7 @@ export AbstractVarInfo, | |||
fix, | |||
unfix, | |||
predict, | |||
marginalize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we are exporting new function, I want to tag @yebai to take a look
my assessment is that the risk is low, and benefit is high
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is useful, but given that it is implemented as a pkg extension, we should print a message if relevant packages are not correctly loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added an error hint so now it does this:
julia> using DynamicPPL, Distributions; @model f() = x ~ Normal(); marginalize(f(), [@varname(x)])
ERROR: MethodError: no method matching marginalize(::Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DefaultContext}, ::Vector{VarName{:x, typeof(identity)}})
The function `marginalize` exists, but no method is defined for this combination of argument types.
`marginalize` requires MarginalLogDensities.jl to be loaded.
Please run `using MarginalLogDensities` before calling `marginalize`.
Stacktrace:
[1] top-level scope
@ REPL[1]:1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @penelopeysm.
One more thing: we tend to follow British spelling in Turing.jl. Can we alias marginalise
to marginalize
and export both? This would keep everyone happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to choose one and stick with it. The standard practice for programming is to use AmE spellings and that's generally true even for packages that are written in the UK. IMO it's best to stick to AmE spellings to reduce the amount of mental overhead for users (most people will expect AmE spellings).
So far, we have actually stuck to -ize, e.g. DynamicPPL.contextualize
, AbstractPPL.concretize
, AdvancedHMC.initialize!
, ... so it would be both internally and externally consistent to use marginalize
. Turing has src/optimisation/Optimisation.jl
but doesn't export anything called 'optimise', only MAP and MLE.
Comments are a different matter of course :) -- but IMO identifiers and argument names should always use -ize
|
||
An extension for MarginalLogDensities.jl has been added. | ||
|
||
Loading DynamicPPL and MarginalLogDensities now provides the `DynamicPPL.marginalize` function to marginalize out variables from a model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a sentence on its limitations, eg, only works with differentiable models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElOceanografo Does Cubature()
allow it to work with non-differentiable models?
Transferred from TuringLang/Turing.jl#2664.