TuringLang
diff --git a/‎README.md‎
Lines changed: 90 additions & 224 deletions b/‎README.md‎
Lines changed: 90 additions & 224 deletions
diff --git a/‎docs/Project.toml‎
Lines changed: 28 additions & 0 deletions b/‎docs/Project.toml‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎docs/make.jl‎
Lines changed: 24 additions & 0 deletions b/‎docs/make.jl‎
Lines changed: 24 additions & 0 deletions
@@ -1,250 +1,116 @@
-# AdvancedVI.jl
-A library for variational Bayesian inference in Julia.
-
-At the time of writing (05/02/2020), implementations of the variational inference (VI) interface and some algorithms are implemented in [Turing.jl](https://github.com/TuringLang/Turing.jl). The idea is to soon separate the VI functionality in Turing.jl out and into this package.
-
-The purpose of this package will then be to provide a common interface together with implementations of standard algorithms and utilities with the goal of ease of use and the ability for other packages, e.g. Turing.jl, to write a light wrapper around AdvancedVI.jl for integration. 
 
-As an example, in Turing.jl we support automatic differentiation variational inference (ADVI) but really the only piece of code tied into the Turing.jl is the conversion of a `Turing.Model` to a `logjoint(z)` function which computes `z ↦ log p(x, z)`, with `x` denoting the observations embedded in the `Turing.Model`. As long as this `logjoint(z)` method is compatible with some AD framework, e.g. `ForwardDiff.jl` or `Zygote.jl`, this is all we need from Turing.jl to be able to perform ADVI!
-
-## [WIP] Interface
-- `vi`: the main interface to the functionality in this package
-  - `vi(model, alg)`: only used when `alg` has a default variational posterior which it will provide.
-  - `vi(model, alg, q::VariationalPosterior, θ)`: `q` represents the family of variational distributions and `θ` is the initial parameters "indexing" the starting distribution. This assumes that there exists an implementation `Variational.update(q, θ)` which returns the variational posterior corresponding to parameters `θ`.
-  - `vi(model, alg, getq::Function, θ)`: here `getq(θ)` is a function returning a `VariationalPosterior` corresponding to `θ`.
-- `optimize!(vo, alg::VariationalInference{AD}, q::VariationalPosterior, model::Model, θ; optimizer = TruncatedADAGrad())`
-- `grad!(vo, alg::VariationalInference, q, model::Model, θ, out, args...)`
-  - Different combinations of variational objectives (`vo`), VI methods (`alg`), and variational posteriors (`q`) might use different gradient estimators. `grad!` allows us to specify these different behaviors.
+# AdvancedVI.jl
+[AdvancedVI](https://github.com/TuringLang/AdvancedVI.jl) provides implementations of variational inference (VI) algorithms, which is a family of algorithms aiming for scalable approximate Bayesian inference by leveraging optimization.
+`AdvancedVI` is part of the [Turing](https://turinglang.org/stable/) probabilistic programming ecosystem.
+The purpose of this package is to provide a common accessible interface for various VI algorithms and utilities so that other packages, e.g. `Turing`, only need to write a light wrapper for integration. 
+For example, integrating `Turing` with  `AdvancedVI.ADVI` only involves converting a `Turing.Model` into a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl) and extracting a corresponding `Bijectors.bijector`.
 
 ## Examples
-### Variational Inference
-A very simple generative model is the following
 
-    μ ~ 𝒩(0, 1)
-    xᵢ ∼ 𝒩(μ, 1) , ∀i = 1, …, n
+`AdvancedVI` works with differentiable models specified as a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl).
+For example, for the normal-log-normal model:
 
-where μ and xᵢ are some ℝᵈ vectors and 𝒩 denotes a d-dimensional multivariate Normal distribution.
+$$
+\begin{aligned}
+x &\sim \mathrm{LogNormal}\left(\mu_x, \sigma_x^2\right) \\
+y &\sim \mathcal{N}\left(\mu_y, \sigma_y^2\right),
+\end{aligned}
+$$
 
-Given a set of `n` observations `[x₁, …, xₙ]` we're interested in finding the distribution `p(μ∣x₁, …, xₙ)` over the mean `μ`. We can obtain (an approximation to) this distribution that using AdvancedVI.jl!
-
-First we generate some observations and set up the problem:
+a `LogDensityProblem` can be implemented as 
 ```julia
-julia> using Distributions
-
-julia> d = 2; n = 100;
-
-julia> observations = randn((d, n)); # 100 observations from 2D 𝒩(0, 1)
-
-julia> # Define generative model
-       #    μ ~ 𝒩(0, 1)
-       #    xᵢ ∼ 𝒩(μ, 1) , ∀i = 1, …, n
-       prior(μ) = logpdf(MvNormal(ones(d)), μ)
-prior (generic function with 1 method)
-
-julia> likelihood(x, μ) = sum(logpdf(MvNormal(μ, ones(d)), x))
-likelihood (generic function with 1 method)
-
-julia> logπ(μ) = likelihood(observations, μ) + prior(μ)
-logπ (generic function with 1 method)
-
-julia> logπ(randn(2))  # <= just checking that it works
--311.74132761437653
-```
-Now there are mainly two different ways of specifying the approximate posterior (and its family). The first is by providing a mapping from distribution parameters to the distribution `θ ↦ q(⋅∣θ)`:
-```julia
-julia> using DistributionsAD, AdvancedVI
+using LogDensityProblems
+using SimpleUnPack
+
+struct NormalLogNormal{MX,SX,MY,SY}
+    μ_x::MX
+    σ_x::SX
+    μ_y::MY
+    Σ_y::SY
+end
 
-julia> # Using a function z ↦ q(⋅∣z)
-       getq(θ) = TuringDiagMvNormal(θ[1:d], exp.(θ[d + 1:4]))
-getq (generic function with 1 method)
-```
-Then we make the choice of algorithm, a subtype of `VariationalInference`, 
-```julia
-julia> # Perform VI
-       advi = ADVI(10, 10_000)
-ADVI{AdvancedVI.ForwardDiffAD{40}}(10, 10000)
-```
-And finally we can perform VI! The usual inferface is to call `vi` which behind the scenes takes care of the optimization and returns the resulting variational posterior:
-```julia
-julia> q = vi(logπ, advi, getq, randn(4))
-[ADVI] Optimizing...100% Time: 0:00:01
-TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}(m=[0.16282745378074515, 0.15789310089462574], σ=[0.09519377533754399, 0.09273176907111745])
-```
-Let's have a look at the resulting ELBO:
-```julia
-julia> AdvancedVI.elbo(advi, q, logπ, 1000)
--287.7866366886285
-```
-Unfortunately, the *final* value of the ELBO is not always a very good diagnostic, though the ELBO is an important metric to keep an eye on during training since an *increase* in the ELBO means we're going in the right direction. Luckily, this is such a simple problem that we can indeed obtain a closed form solution! Because we're lazy (at least I am), we'll let [ConjugatePriors.jl](https://github.com/JuliaStats/ConjugatePriors.jl) do this for us:
-```julia
-julia> # True posterior
-       using ConjugatePriors
+function LogDensityProblems.logdensity(model::NormalLogNormal, θ)
+    (; μ_x, σ_x, μ_y, Σ_y) = model
+    logpdf(LogNormal(μ_x, σ_x), θ[1]) + logpdf(MvNormal(μ_y, Σ_y), θ[2:end])
+end
 
-julia> pri = MvNormal(zeros(2), ones(2));
+function LogDensityProblems.dimension(model::NormalLogNormal)
+    length(model.μ_y) + 1
+end
 
-julia> true_posterior = posterior((pri, pri.Σ), MvNormal, observations)
-DiagNormal(
-dim: 2
-μ: [0.1746546592601148, 0.16457110079543008]
-Σ: [0.009900990099009901 0.0; 0.0 0.009900990099009901]
-)
+function LogDensityProblems.capabilities(::Type{<:NormalLogNormal})
+    LogDensityProblems.LogDensityOrder{0}()
+end
 ```
-Comparing to our variational approximation, this looks pretty good! Worth noting that in this particular case the variational posterior seems to overestimate the variance.
 
-To conclude, let's make a somewhat pretty picture:
+Since the support of `x` is constrained to be positive, and VI is best done in the unconstrained Euclidean space, we need to use a *bijector* to transform `x` into unconstrained Euclidean space. We will use the [`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl) package for this purpose. 
+This corresponds to the automatic differentiation variational inference (ADVI) formulation[^KTRGB2017].
 ```julia
-julia> using Plots
-
-julia> p_samples = rand(true_posterior, 10_000); q_samples = rand(q, 10_000);
+using Bijectors
 
-julia> p1 = histogram(p_samples[1, :], label="p"); histogram!(q_samples[1, :], alpha=0.7, label="q")
-
-julia> title!(raw"$\mu_1$")
-
-julia> p2 = histogram(p_samples[2, :], label="p"); histogram!(q_samples[2, :], alpha=0.7, label="q")
-
-julia> title!(raw"$\mu_2$")
-
-julia> plot(p1, p2)
+function Bijectors.bijector(model::NormalLogNormal)
+    (; μ_x, σ_x, μ_y, Σ_y) = model
+    Bijectors.Stacked(
+        Bijectors.bijector.([LogNormal(μ_x, σ_x), MvNormal(μ_y, Σ_y)]),
+        [1:1, 2:1+length(μ_y)])
+end
 ```
-![Histogram](hist.png?raw=true)
-
-### Simple example: using Advanced.jl to directly minimize the KL-divergence between two distributions `p(z)` and `q(z)`
-In VI we aim to approximate the true posterior `p(z ∣ x)` by some approximate variational posterior `q(z)` by maximizing the ELBO:
-
-    ELBO(q) = 𝔼_q[log p(x, z) - log q(z)]
-
-Observe that we can express the ELBO as the negative KL-divergence between `p(x, ⋅)` and `q(⋅)`:
-
-    ELBO(q) = - 𝔼_q[log (q(z) / p(x, z))]
-            = - KL(q(⋅) || p(x, ⋅))
 
-So if we apply VI to something that isn't an actual posterior, i.e. there's no data involved and we write `p(z ∣ x) = p(z)`, we're really just minimizing the KL-divergence between the distributions.
+A simpler approach is to use `Turing`, where a `Turing.Model` can be automatically be converted into a `LogDensityProblem` and a corresponding `bijector` is automatically generated.
 
-Therefore, we can try out `AdvancedVI.jl` real quick by applying using the interface to minimize the KL-divergence between two distributions:
-
-```julia
-julia> using Distributions, DistributionsAD, AdvancedVI
-
-julia> # Target distribution
-       p = MvNormal(ones(2))
-ZeroMeanDiagNormal(
-dim: 2
-μ: [0.0, 0.0]
-Σ: [1.0 0.0; 0.0 1.0]
-)
-
-julia> logπ(z) = logpdf(p, z)
-logπ (generic function with 1 method)
-
-julia> # Make a choice of VI algorithm
-       advi = ADVI(10, 1000)
-ADVI{AdvancedVI.ForwardDiffAD{40}}(10, 1000)
-```
-Now there are two different ways of specifying the approximate posterior (and its family); the first is by providing a mapping from parameters to distribution `θ ↦ q(⋅∣θ)`:
-```julia
-julia> # Using a function z ↦ q(⋅∣z)
-       getq(θ) = TuringDiagMvNormal(θ[1:2], exp.(θ[3:4]))
-getq (generic function with 1 method)
-
-julia> # Perform VI
-       q = vi(logπ, advi, getq, randn(4))
-┌ Info: [ADVI] Should only be seen once: optimizer created for θ
-└   objectid(θ) = 0x5ddb564423896704
-[ADVI] Optimizing...100% Time: 0:00:01
-TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}(m=[-0.012691337868985757, -0.0004442434543332919], σ=[1.0334797673569802, 0.9957355128767893])
-```
-Or we can check the ELBO (which in this case since, as mentioned, doesn't involve data, is the negative KL-divergence):
+Let us instantiate a random normal-log-normal model.
 ```julia
-julia> AdvancedVI.elbo(advi, q, logπ, 1000)  # empirical estimate
-0.08031049170093245
+using LinearAlgebra
+
+n_dims = 10
+μ_x    = randn()
+σ_x    = exp.(randn())
+μ_y    = randn(n_dims)
+σ_y    = exp.(randn(n_dims))
+model  = NormalLogNormal(μ_x, σ_x, μ_y, Diagonal(σ_y.^2))
 ```
-It's worth noting that the actual value of the ELBO doesn't really tell us too much about the quality of fit. In this particular case, because we're *directly* minimizing the KL-divergence, we can only say something useful if we reach 0, in which case we have obtained the true distribution.
 
-Let's just quickly check the mean-squared error between the `log p(z)` and `log q(z)` for a random set of samples from the target `p`:
+We can perform VI with stochastic gradient descent (SGD) using reparameterization gradient estimates of the ELBO[^TL2014][^RMW2014][^KW2014] as follows:
 ```julia
-julia> zs = rand(p, 100);
-
-julia> mean(abs2, logpdf(q, zs) - logpdf(p, zs))
-0.0014889109427524852
-```
-That doesn't look too bad!
-
-## Implementing your own training loop
-Sometimes it might be convenient to roll your own training loop rather than using `vi(...)`. Here's some psuedo-code for how one would do that when used together with Turing.jl:
-
-```julia
-using Turing, AdvancedVI, DiffResults
-using Turing: Variational
-
-using ProgressMeter
-
-# Assuming you have an instance of a Turing model (`model`)
-
-# 1. Create log-joint needed for ELBO evaluation
-logπ = Variational.make_logjoint(model)
-
-# 2. Define objective
-variational_objective = Variational.ELBO()
-
-# 3. Optimizer
-optimizer = Variational.DecayedADAGrad()
-
-# 4. VI-algorithm
-alg = ADVI(10, 1000)
-
-# 5. Variational distribution
-function getq(θ)
-    # ...
-end
-
-# 6. [OPTIONAL] Implement convergence criterion
-function hasconverged(args...)
-    # ...
-end
-
-# 7. [OPTIONAL] Implement a callback for tracking stats
-function callback(args...)
-    # ...
-end
-
-# 8. Train
-converged = false
-step = 1
-
-prog = ProgressMeter.Progress(num_steps, 1)
-
-diff_results = DiffResults.GradientResult(θ_init)
-
-while (step ≤ num_steps) && !converged
-    # 1. Compute gradient and objective value; results are stored in `diff_results`
-    AdvancedVI.grad!(variational_objective, alg, getq, model, diff_results)
-
-    # 2. Extract gradient from `diff_result`
-    ∇ = DiffResults.gradient(diff_result)
-
-    # 3. Apply optimizer, e.g. multiplying by step-size
-    Δ = apply!(optimizer, θ, ∇)
-
-    # 4. Update parameters
-    @. θ = θ - Δ
-
-    # 5. Do whatever analysis you want
-    callback(args...)
-
-    # 6. Update
-    converged = hasconverged(...) # or something user-defined
-    step += 1
+using Optimisers
+using ADTypes, ForwardDiff
+using AdvancedVI
+
+# ELBO objective with the reparameterization gradient
+n_montecarlo = 10
+elbo         = AdvancedVI.RepGradELBO(n_montecarlo)
+
+# Mean-field Gaussian variational family
+d = LogDensityProblems.dimension(model)
+μ = zeros(d)
+L = Diagonal(ones(d))
+q = AdvancedVI.MeanFieldGaussian(μ, L)
+
+# Match support by applying the `model`'s inverse bijector
+b             = Bijectors.bijector(model)
+binv          = inverse(b)
+q_transformed = Bijectors.TransformedDistribution(q, binv)
+
+
+# Run inference
+max_iter = 10^3
+q, stats, _ = AdvancedVI.optimize(
+    model,
+    elbo,
+    q_transformed,
+    max_iter;
+    adbackend = ADTypes.AutoForwardDiff(),
+    optimizer = Optimisers.Adam(1e-3)
+)
 
-    ProgressMeter.next!(prog)
-end
+# Evaluate final ELBO with 10^3 Monte Carlo samples
+estimate_objective(elbo, q, model; n_samples=10^4)
 ```
 
+For more examples and details, please refer to the documentation.
 
 ## References
-
-- Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. "An introduction to variational methods for graphical models." Machine learning 37, no. 2 (1999): 183-233.
-- Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American statistical Association 112, no. 518 (2017): 859-877.
-- Kucukelbir, Alp, Rajesh Ranganath, Andrew Gelman, and David Blei. "Automatic variational inference in Stan." In Advances in Neural Information Processing Systems, pp. 568-576. 2015.
-- Salimans, Tim, and David A. Knowles. "Fixed-form variational posterior approximation through stochastic linear regression." Bayesian Analysis 8, no. 4 (2013): 837-882.
-- Beal, Matthew James. Variational algorithms for approximate Bayesian inference. 2003.
+[^TL2014]: Titsias, M., & Lázaro-Gredilla, M. (2014, June). Doubly stochastic variational Bayes for non-conjugate inference. In *International Conference on Machine Learning*. PMLR.
+[^RMW2014]: Rezende, D. J., Mohamed, S., & Wierstra, D. (2014, June). Stochastic backpropagation and approximate inference in deep generative models. In *International Conference on Machine Learning*. PMLR.
+[^KW2014]: Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In *International Conference on Learning Representations*.
+[^KTRGB2017]: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. *Journal of machine learning research*.
@@ -0,0 +1,28 @@
+[deps]
+ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
+AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
+Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
+LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
+Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"
+SimpleUnPack = "ce78b400-467f-4804-87d8-8f486da07d0a"
+StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
+
+[compat]
+ADTypes = "0.1.6"
+AdvancedVI = "0.3"
+Bijectors = "0.13.6"
+Distributions = "0.25"
+Documenter = "0.26, 0.27"
+ForwardDiff = "0.10"
+LogDensityProblems = "2.1.1"
+Optimisers = "0.3"
+Plots = "1"
+QuasiMonteCarlo = "0.3"
+SimpleUnPack = "1"
+StatsFuns = "1"
+julia = "1.6"
@@ -0,0 +1,24 @@
+
+using AdvancedVI
+using Documenter
+
+DocMeta.setdocmeta!(
+    AdvancedVI, :DocTestSetup, :(using AdvancedVI); recursive=true
+)
+
+makedocs(;
+    modules  = [AdvancedVI],
+    sitename = "AdvancedVI.jl",
+    repo     = "https://github.com/TuringLang/AdvancedVI.jl/blob/{commit}{path}#{line}",
+    format   = Documenter.HTML(; prettyurls = get(ENV, "CI", nothing) == "true"),
+    pages    = ["AdvancedVI"        => "index.md",
+                "General Usage"     => "general.md",
+                "Examples"          => "examples.md",
+                "ELBO Maximization" => [
+                    "Overview"                          => "elbo/overview.md",   
+                    "Reparameterization Gradient Estimator"       => "elbo/repgradelbo.md",   
+                    "Location-Scale Variational Family" => "locscale.md",
+                ]],
+)
+
+deploydocs(; repo="github.com/TuringLang/AdvancedVI.jl", push_preview=true)