Skip to content

Commit efbc9c5

Browse files
Red-Portalyebai
andauthored
Rewriting advancedvi docs (#53)
* update README * fix math in README * fix README * fix README references * fix README * add documentation (initial commit) * fix bug for bijector with 1 MC sample with tests * fix remove redundant helpers for `reparam_with_entropy` for bijector * fix docstrings, export for `MvLocationScale` * add documentation * add compat bounds to documentation * fix tabs * fix footnote in docs * Delete docs/src/elbo/overview.md~ * Delete docs/src/elbo/repgradelbo.md~ * Update docs/src/elbo/repgradelbo.md Co-authored-by: Hong Ge <[email protected]> * Update README.md Co-authored-by: Hong Ge <[email protected]> * Update README.md Co-authored-by: Hong Ge <[email protected]> * Update README.md Co-authored-by: Hong Ge <[email protected]> * update README * Update README.md Co-authored-by: Hong Ge <[email protected]> * updated README * Update docs/src/elbo/overview.md Co-authored-by: Hong Ge <[email protected]> * Update docs/src/elbo/overview.md Co-authored-by: Hong Ge <[email protected]> * Update docs/src/elbo/overview.md Co-authored-by: Hong Ge <[email protected]> * Update docs/src/elbo/overview.md Co-authored-by: Hong Ge <[email protected]> * remove operatorname use * fix README * Update docs/make.jl Co-authored-by: Hong Ge <[email protected]> * Update docs/src/elbo/repgradelbo.md Co-authored-by: Hong Ge <[email protected]> * fix change q_trans to q_transformed * fix docs --------- Co-authored-by: Hong Ge <[email protected]>
1 parent 0d07b86 commit efbc9c5

File tree

14 files changed

+741
-243
lines changed

14 files changed

+741
-243
lines changed

README.md

Lines changed: 90 additions & 224 deletions
Original file line numberDiff line numberDiff line change
@@ -1,250 +1,116 @@
1-
# AdvancedVI.jl
2-
A library for variational Bayesian inference in Julia.
3-
4-
At the time of writing (05/02/2020), implementations of the variational inference (VI) interface and some algorithms are implemented in [Turing.jl](https://github.com/TuringLang/Turing.jl). The idea is to soon separate the VI functionality in Turing.jl out and into this package.
5-
6-
The purpose of this package will then be to provide a common interface together with implementations of standard algorithms and utilities with the goal of ease of use and the ability for other packages, e.g. Turing.jl, to write a light wrapper around AdvancedVI.jl for integration.
71

8-
As an example, in Turing.jl we support automatic differentiation variational inference (ADVI) but really the only piece of code tied into the Turing.jl is the conversion of a `Turing.Model` to a `logjoint(z)` function which computes `z ↦ log p(x, z)`, with `x` denoting the observations embedded in the `Turing.Model`. As long as this `logjoint(z)` method is compatible with some AD framework, e.g. `ForwardDiff.jl` or `Zygote.jl`, this is all we need from Turing.jl to be able to perform ADVI!
9-
10-
## [WIP] Interface
11-
- `vi`: the main interface to the functionality in this package
12-
- `vi(model, alg)`: only used when `alg` has a default variational posterior which it will provide.
13-
- `vi(model, alg, q::VariationalPosterior, θ)`: `q` represents the family of variational distributions and `θ` is the initial parameters "indexing" the starting distribution. This assumes that there exists an implementation `Variational.update(q, θ)` which returns the variational posterior corresponding to parameters `θ`.
14-
- `vi(model, alg, getq::Function, θ)`: here `getq(θ)` is a function returning a `VariationalPosterior` corresponding to `θ`.
15-
- `optimize!(vo, alg::VariationalInference{AD}, q::VariationalPosterior, model::Model, θ; optimizer = TruncatedADAGrad())`
16-
- `grad!(vo, alg::VariationalInference, q, model::Model, θ, out, args...)`
17-
- Different combinations of variational objectives (`vo`), VI methods (`alg`), and variational posteriors (`q`) might use different gradient estimators. `grad!` allows us to specify these different behaviors.
2+
# AdvancedVI.jl
3+
[AdvancedVI](https://github.com/TuringLang/AdvancedVI.jl) provides implementations of variational inference (VI) algorithms, which is a family of algorithms aiming for scalable approximate Bayesian inference by leveraging optimization.
4+
`AdvancedVI` is part of the [Turing](https://turinglang.org/stable/) probabilistic programming ecosystem.
5+
The purpose of this package is to provide a common accessible interface for various VI algorithms and utilities so that other packages, e.g. `Turing`, only need to write a light wrapper for integration.
6+
For example, integrating `Turing` with `AdvancedVI.ADVI` only involves converting a `Turing.Model` into a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl) and extracting a corresponding `Bijectors.bijector`.
187

198
## Examples
20-
### Variational Inference
21-
A very simple generative model is the following
229

23-
μ ~ 𝒩(0, 1)
24-
xᵢ ∼ 𝒩(μ, 1) , ∀i = 1, …, n
10+
`AdvancedVI` works with differentiable models specified as a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl).
11+
For example, for the normal-log-normal model:
2512

26-
where μ and xᵢ are some ℝᵈ vectors and 𝒩 denotes a d-dimensional multivariate Normal distribution.
13+
$$
14+
\begin{aligned}
15+
x &\sim \mathrm{LogNormal}\left(\mu_x, \sigma_x^2\right) \\
16+
y &\sim \mathcal{N}\left(\mu_y, \sigma_y^2\right),
17+
\end{aligned}
18+
$$
2719

28-
Given a set of `n` observations `[x₁, …, xₙ]` we're interested in finding the distribution `p(μ∣x₁, …, xₙ)` over the mean `μ`. We can obtain (an approximation to) this distribution that using AdvancedVI.jl!
29-
30-
First we generate some observations and set up the problem:
20+
a `LogDensityProblem` can be implemented as
3121
```julia
32-
julia> using Distributions
33-
34-
julia> d = 2; n = 100;
35-
36-
julia> observations = randn((d, n)); # 100 observations from 2D 𝒩(0, 1)
37-
38-
julia> # Define generative model
39-
# μ ~ 𝒩(0, 1)
40-
# xᵢ ∼ 𝒩(μ, 1) , ∀i = 1, …, n
41-
prior(μ) = logpdf(MvNormal(ones(d)), μ)
42-
prior (generic function with 1 method)
43-
44-
julia> likelihood(x, μ) = sum(logpdf(MvNormal(μ, ones(d)), x))
45-
likelihood (generic function with 1 method)
46-
47-
julia> logπ(μ) = likelihood(observations, μ) + prior(μ)
48-
logπ (generic function with 1 method)
49-
50-
julia> logπ(randn(2)) # <= just checking that it works
51-
-311.74132761437653
52-
```
53-
Now there are mainly two different ways of specifying the approximate posterior (and its family). The first is by providing a mapping from distribution parameters to the distribution `θ ↦ q(⋅∣θ)`:
54-
```julia
55-
julia> using DistributionsAD, AdvancedVI
22+
using LogDensityProblems
23+
using SimpleUnPack
24+
25+
struct NormalLogNormal{MX,SX,MY,SY}
26+
μ_x::MX
27+
σ_x::SX
28+
μ_y::MY
29+
Σ_y::SY
30+
end
5631

57-
julia> # Using a function z ↦ q(⋅∣z)
58-
getq(θ) = TuringDiagMvNormal(θ[1:d], exp.(θ[d + 1:4]))
59-
getq (generic function with 1 method)
60-
```
61-
Then we make the choice of algorithm, a subtype of `VariationalInference`,
62-
```julia
63-
julia> # Perform VI
64-
advi = ADVI(10, 10_000)
65-
ADVI{AdvancedVI.ForwardDiffAD{40}}(10, 10000)
66-
```
67-
And finally we can perform VI! The usual inferface is to call `vi` which behind the scenes takes care of the optimization and returns the resulting variational posterior:
68-
```julia
69-
julia> q = vi(logπ, advi, getq, randn(4))
70-
[ADVI] Optimizing...100% Time: 0:00:01
71-
TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}(m=[0.16282745378074515, 0.15789310089462574], σ=[0.09519377533754399, 0.09273176907111745])
72-
```
73-
Let's have a look at the resulting ELBO:
74-
```julia
75-
julia> AdvancedVI.elbo(advi, q, logπ, 1000)
76-
-287.7866366886285
77-
```
78-
Unfortunately, the *final* value of the ELBO is not always a very good diagnostic, though the ELBO is an important metric to keep an eye on during training since an *increase* in the ELBO means we're going in the right direction. Luckily, this is such a simple problem that we can indeed obtain a closed form solution! Because we're lazy (at least I am), we'll let [ConjugatePriors.jl](https://github.com/JuliaStats/ConjugatePriors.jl) do this for us:
79-
```julia
80-
julia> # True posterior
81-
using ConjugatePriors
32+
function LogDensityProblems.logdensity(model::NormalLogNormal, θ)
33+
(; μ_x, σ_x, μ_y, Σ_y) = model
34+
logpdf(LogNormal(μ_x, σ_x), θ[1]) + logpdf(MvNormal(μ_y, Σ_y), θ[2:end])
35+
end
8236

83-
julia> pri = MvNormal(zeros(2), ones(2));
37+
function LogDensityProblems.dimension(model::NormalLogNormal)
38+
length(model.μ_y) + 1
39+
end
8440

85-
julia> true_posterior = posterior((pri, pri.Σ), MvNormal, observations)
86-
DiagNormal(
87-
dim: 2
88-
μ: [0.1746546592601148, 0.16457110079543008]
89-
Σ: [0.009900990099009901 0.0; 0.0 0.009900990099009901]
90-
)
41+
function LogDensityProblems.capabilities(::Type{<:NormalLogNormal})
42+
LogDensityProblems.LogDensityOrder{0}()
43+
end
9144
```
92-
Comparing to our variational approximation, this looks pretty good! Worth noting that in this particular case the variational posterior seems to overestimate the variance.
9345

94-
To conclude, let's make a somewhat pretty picture:
46+
Since the support of `x` is constrained to be positive, and VI is best done in the unconstrained Euclidean space, we need to use a *bijector* to transform `x` into unconstrained Euclidean space. We will use the [`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl) package for this purpose.
47+
This corresponds to the automatic differentiation variational inference (ADVI) formulation[^KTRGB2017].
9548
```julia
96-
julia> using Plots
97-
98-
julia> p_samples = rand(true_posterior, 10_000); q_samples = rand(q, 10_000);
49+
using Bijectors
9950

100-
julia> p1 = histogram(p_samples[1, :], label="p"); histogram!(q_samples[1, :], alpha=0.7, label="q")
101-
102-
julia> title!(raw"$\mu_1$")
103-
104-
julia> p2 = histogram(p_samples[2, :], label="p"); histogram!(q_samples[2, :], alpha=0.7, label="q")
105-
106-
julia> title!(raw"$\mu_2$")
107-
108-
julia> plot(p1, p2)
51+
function Bijectors.bijector(model::NormalLogNormal)
52+
(; μ_x, σ_x, μ_y, Σ_y) = model
53+
Bijectors.Stacked(
54+
Bijectors.bijector.([LogNormal(μ_x, σ_x), MvNormal(μ_y, Σ_y)]),
55+
[1:1, 2:1+length(μ_y)])
56+
end
10957
```
110-
![Histogram](hist.png?raw=true)
111-
112-
### Simple example: using Advanced.jl to directly minimize the KL-divergence between two distributions `p(z)` and `q(z)`
113-
In VI we aim to approximate the true posterior `p(z ∣ x)` by some approximate variational posterior `q(z)` by maximizing the ELBO:
114-
115-
ELBO(q) = 𝔼_q[log p(x, z) - log q(z)]
116-
117-
Observe that we can express the ELBO as the negative KL-divergence between `p(x, ⋅)` and `q(⋅)`:
118-
119-
ELBO(q) = - 𝔼_q[log (q(z) / p(x, z))]
120-
= - KL(q(⋅) || p(x, ⋅))
12158

122-
So if we apply VI to something that isn't an actual posterior, i.e. there's no data involved and we write `p(z ∣ x) = p(z)`, we're really just minimizing the KL-divergence between the distributions.
59+
A simpler approach is to use `Turing`, where a `Turing.Model` can be automatically be converted into a `LogDensityProblem` and a corresponding `bijector` is automatically generated.
12360

124-
Therefore, we can try out `AdvancedVI.jl` real quick by applying using the interface to minimize the KL-divergence between two distributions:
125-
126-
```julia
127-
julia> using Distributions, DistributionsAD, AdvancedVI
128-
129-
julia> # Target distribution
130-
p = MvNormal(ones(2))
131-
ZeroMeanDiagNormal(
132-
dim: 2
133-
μ: [0.0, 0.0]
134-
Σ: [1.0 0.0; 0.0 1.0]
135-
)
136-
137-
julia> logπ(z) = logpdf(p, z)
138-
logπ (generic function with 1 method)
139-
140-
julia> # Make a choice of VI algorithm
141-
advi = ADVI(10, 1000)
142-
ADVI{AdvancedVI.ForwardDiffAD{40}}(10, 1000)
143-
```
144-
Now there are two different ways of specifying the approximate posterior (and its family); the first is by providing a mapping from parameters to distribution `θ ↦ q(⋅∣θ)`:
145-
```julia
146-
julia> # Using a function z ↦ q(⋅∣z)
147-
getq(θ) = TuringDiagMvNormal(θ[1:2], exp.(θ[3:4]))
148-
getq (generic function with 1 method)
149-
150-
julia> # Perform VI
151-
q = vi(logπ, advi, getq, randn(4))
152-
┌ Info: [ADVI] Should only be seen once: optimizer created for θ
153-
objectid(θ) = 0x5ddb564423896704
154-
[ADVI] Optimizing...100% Time: 0:00:01
155-
TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}(m=[-0.012691337868985757, -0.0004442434543332919], σ=[1.0334797673569802, 0.9957355128767893])
156-
```
157-
Or we can check the ELBO (which in this case since, as mentioned, doesn't involve data, is the negative KL-divergence):
61+
Let us instantiate a random normal-log-normal model.
15862
```julia
159-
julia> AdvancedVI.elbo(advi, q, logπ, 1000) # empirical estimate
160-
0.08031049170093245
63+
using LinearAlgebra
64+
65+
n_dims = 10
66+
μ_x = randn()
67+
σ_x = exp.(randn())
68+
μ_y = randn(n_dims)
69+
σ_y = exp.(randn(n_dims))
70+
model = NormalLogNormal(μ_x, σ_x, μ_y, Diagonal(σ_y.^2))
16171
```
162-
It's worth noting that the actual value of the ELBO doesn't really tell us too much about the quality of fit. In this particular case, because we're *directly* minimizing the KL-divergence, we can only say something useful if we reach 0, in which case we have obtained the true distribution.
16372

164-
Let's just quickly check the mean-squared error between the `log p(z)` and `log q(z)` for a random set of samples from the target `p`:
73+
We can perform VI with stochastic gradient descent (SGD) using reparameterization gradient estimates of the ELBO[^TL2014][^RMW2014][^KW2014] as follows:
16574
```julia
166-
julia> zs = rand(p, 100);
167-
168-
julia> mean(abs2, logpdf(q, zs) - logpdf(p, zs))
169-
0.0014889109427524852
170-
```
171-
That doesn't look too bad!
172-
173-
## Implementing your own training loop
174-
Sometimes it might be convenient to roll your own training loop rather than using `vi(...)`. Here's some psuedo-code for how one would do that when used together with Turing.jl:
175-
176-
```julia
177-
using Turing, AdvancedVI, DiffResults
178-
using Turing: Variational
179-
180-
using ProgressMeter
181-
182-
# Assuming you have an instance of a Turing model (`model`)
183-
184-
# 1. Create log-joint needed for ELBO evaluation
185-
logπ = Variational.make_logjoint(model)
186-
187-
# 2. Define objective
188-
variational_objective = Variational.ELBO()
189-
190-
# 3. Optimizer
191-
optimizer = Variational.DecayedADAGrad()
192-
193-
# 4. VI-algorithm
194-
alg = ADVI(10, 1000)
195-
196-
# 5. Variational distribution
197-
function getq(θ)
198-
# ...
199-
end
200-
201-
# 6. [OPTIONAL] Implement convergence criterion
202-
function hasconverged(args...)
203-
# ...
204-
end
205-
206-
# 7. [OPTIONAL] Implement a callback for tracking stats
207-
function callback(args...)
208-
# ...
209-
end
210-
211-
# 8. Train
212-
converged = false
213-
step = 1
214-
215-
prog = ProgressMeter.Progress(num_steps, 1)
216-
217-
diff_results = DiffResults.GradientResult(θ_init)
218-
219-
while (step num_steps) && !converged
220-
# 1. Compute gradient and objective value; results are stored in `diff_results`
221-
AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results)
222-
223-
# 2. Extract gradient from `diff_result`
224-
= DiffResults.gradient(diff_result)
225-
226-
# 3. Apply optimizer, e.g. multiplying by step-size
227-
Δ = apply!(optimizer, θ, ∇)
228-
229-
# 4. Update parameters
230-
@. θ = θ - Δ
231-
232-
# 5. Do whatever analysis you want
233-
callback(args...)
234-
235-
# 6. Update
236-
converged = hasconverged(...) # or something user-defined
237-
step += 1
75+
using Optimisers
76+
using ADTypes, ForwardDiff
77+
using AdvancedVI
78+
79+
# ELBO objective with the reparameterization gradient
80+
n_montecarlo = 10
81+
elbo = AdvancedVI.RepGradELBO(n_montecarlo)
82+
83+
# Mean-field Gaussian variational family
84+
d = LogDensityProblems.dimension(model)
85+
μ = zeros(d)
86+
L = Diagonal(ones(d))
87+
q = AdvancedVI.MeanFieldGaussian(μ, L)
88+
89+
# Match support by applying the `model`'s inverse bijector
90+
b = Bijectors.bijector(model)
91+
binv = inverse(b)
92+
q_transformed = Bijectors.TransformedDistribution(q, binv)
93+
94+
95+
# Run inference
96+
max_iter = 10^3
97+
q, stats, _ = AdvancedVI.optimize(
98+
model,
99+
elbo,
100+
q_transformed,
101+
max_iter;
102+
adbackend = ADTypes.AutoForwardDiff(),
103+
optimizer = Optimisers.Adam(1e-3)
104+
)
238105

239-
ProgressMeter.next!(prog)
240-
end
106+
# Evaluate final ELBO with 10^3 Monte Carlo samples
107+
estimate_objective(elbo, q, model; n_samples=10^4)
241108
```
242109

110+
For more examples and details, please refer to the documentation.
243111

244112
## References
245-
246-
- Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. "An introduction to variational methods for graphical models." Machine learning 37, no. 2 (1999): 183-233.
247-
- Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American statistical Association 112, no. 518 (2017): 859-877.
248-
- Kucukelbir, Alp, Rajesh Ranganath, Andrew Gelman, and David Blei. "Automatic variational inference in Stan." In Advances in Neural Information Processing Systems, pp. 568-576. 2015.
249-
- Salimans, Tim, and David A. Knowles. "Fixed-form variational posterior approximation through stochastic linear regression." Bayesian Analysis 8, no. 4 (2013): 837-882.
250-
- Beal, Matthew James. Variational algorithms for approximate Bayesian inference. 2003.
113+
[^TL2014]: Titsias, M., & Lázaro-Gredilla, M. (2014, June). Doubly stochastic variational Bayes for non-conjugate inference. In *International Conference on Machine Learning*. PMLR.
114+
[^RMW2014]: Rezende, D. J., Mohamed, S., & Wierstra, D. (2014, June). Stochastic backpropagation and approximate inference in deep generative models. In *International Conference on Machine Learning*. PMLR.
115+
[^KW2014]: Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In *International Conference on Learning Representations*.
116+
[^KTRGB2017]: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. *Journal of machine learning research*.

docs/Project.toml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[deps]
2+
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
3+
AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
4+
Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
5+
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
6+
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
7+
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
8+
LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
9+
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
10+
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
11+
QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"
12+
SimpleUnPack = "ce78b400-467f-4804-87d8-8f486da07d0a"
13+
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
14+
15+
[compat]
16+
ADTypes = "0.1.6"
17+
AdvancedVI = "0.3"
18+
Bijectors = "0.13.6"
19+
Distributions = "0.25"
20+
Documenter = "0.26, 0.27"
21+
ForwardDiff = "0.10"
22+
LogDensityProblems = "2.1.1"
23+
Optimisers = "0.3"
24+
Plots = "1"
25+
QuasiMonteCarlo = "0.3"
26+
SimpleUnPack = "1"
27+
StatsFuns = "1"
28+
julia = "1.6"

docs/make.jl

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
2+
using AdvancedVI
3+
using Documenter
4+
5+
DocMeta.setdocmeta!(
6+
AdvancedVI, :DocTestSetup, :(using AdvancedVI); recursive=true
7+
)
8+
9+
makedocs(;
10+
modules = [AdvancedVI],
11+
sitename = "AdvancedVI.jl",
12+
repo = "https://github.com/TuringLang/AdvancedVI.jl/blob/{commit}{path}#{line}",
13+
format = Documenter.HTML(; prettyurls = get(ENV, "CI", nothing) == "true"),
14+
pages = ["AdvancedVI" => "index.md",
15+
"General Usage" => "general.md",
16+
"Examples" => "examples.md",
17+
"ELBO Maximization" => [
18+
"Overview" => "elbo/overview.md",
19+
"Reparameterization Gradient Estimator" => "elbo/repgradelbo.md",
20+
"Location-Scale Variational Family" => "locscale.md",
21+
]],
22+
)
23+
24+
deploydocs(; repo="github.com/TuringLang/AdvancedVI.jl", push_preview=true)

0 commit comments

Comments
 (0)