Skip to content

Commit 2f78875

Browse files
authored
Fix math display errors in readme (#28)
* fix readme math err * minor update readme * add PR links to TODO
1 parent 43c06ab commit 2f78875

File tree

1 file changed

+14
-11
lines changed

1 file changed

+14
-11
lines changed

README.md

+14-11
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
[![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain)
55

66

7+
**Last updated: 2023-Aug-23**
8+
79
A normalizing flow library for Julia.
810

911
The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling.
@@ -37,11 +39,11 @@ Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim
3739
```
3840
where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers.
3941

40-
Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
42+
Since all the transformations are invertible (technically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
4143
```math
4244
q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ
4345
T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ
44-
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x
46+
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\text{det} \nabla_x
4547
T_n(x)\right|.
4648
```
4749
Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity.
@@ -52,17 +54,17 @@ Given the feasibility of i.i.d. sampling and density evaluation, normalizing flo
5254
```math
5355
\begin{aligned}
5456
\text{Reverse KL:}\quad
55-
&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
56-
&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
57-
&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
57+
&\arg\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
58+
&= \arg\min _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
59+
&= \arg\max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
5860
\end{aligned}
5961
```
6062
and
6163
```math
6264
\begin{aligned}
6365
\text{Forward KL:}\quad
64-
&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
65-
&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
66+
&\arg\min _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
67+
&= \arg\min _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
6668
\end{aligned}
6769
```
6870
Both problems can be solved via standard stochastic optimization algorithms,
@@ -71,20 +73,21 @@ such as stochastic gradient descent (SGD) and its variants.
7173
Reverse KL minimization is typically used for **Bayesian computation**, where one
7274
wants to approximate a posterior distribution $p$ that is only known up to a
7375
normalizing constant.
74-
In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant.
76+
In contrast, forward KL minimization is typically used for **generative modeling**,
77+
where one wants to learn the underlying distribution of some data.
7578

7679
## Current status and TODOs
7780

7881
- [x] general interface development
7982
- [x] documentation
80-
- [ ] including more flow examples
83+
- [ ] including more NF examples/Tutorials
84+
- WIP: [PR#11](https://github.com/TuringLang/NormalizingFlows.jl/pull/11)
8185
- [ ] GPU compatibility
86+
- WIP: [PR#25](https://github.com/TuringLang/NormalizingFlows.jl/pull/25)
8287
- [ ] benchmarking
8388

8489
## Related packages
8590
- [Bijectors.jl](https://github.com/TuringLang/Bijectors.jl): a package for defining bijective transformations, which can be used for defining customized flow layers.
8691
- [Flux.jl](https://fluxml.ai/Flux.jl/stable/)
8792
- [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
8893
- [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl)
89-
90-

0 commit comments

Comments
 (0)