Skip to content

Commit

Permalink
[skip ci] LanguageTool
Browse files Browse the repository at this point in the history
  • Loading branch information
ArnoStrouwen committed Jan 9, 2023
1 parent 8331f34 commit 5454bac
Show file tree
Hide file tree
Showing 25 changed files with 122 additions and 120 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
*.jl.*.cov
*.jl.mem
Manifest.toml
/docs/build/
12 changes: 6 additions & 6 deletions docs/src/Benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

## Note on benchmarking and getting the best performance out of the SciML stack's adjoints

From our [recent papers](https://arxiv.org/abs/1812.01892) it's clear that `EnzymeVJP` is the fastest,
especially when the program is setup to be fully non-allocating mutating functions. Thus for all benchmarking,
From our [recent papers](https://arxiv.org/abs/1812.01892), it's clear that `EnzymeVJP` is the fastest,
especially when the program is set up to be fully non-allocating mutating functions. Thus for all benchmarking,
especially with PDEs, this should be done. Neural network libraries don't make use of mutation effectively
[except for SimpleChains.jl](https://julialang.org/blog/2022/04/simple-chains/), so we recommend creating a
neural ODE / universal ODE with `ZygoteVJP` and Flux first, but then check the correctness by moving the
implementation over to SimpleChains and if possible `EnzymeVJP`. This can be an order of magnitude improvement
(or more) in many situations over all of the previous benchmarks using Zygote and Flux, and thus it's
(or more) in many situations over all the previous benchmarks using Zygote and Flux, and thus it's
highly recommended in scenarios that require performance.

## Vs Torchdiffeq 1 million and less ODEs
Expand All @@ -23,10 +23,10 @@ A training benchmark using the spiral ODE from the original neural ODE paper

## Vs torchsde on small SDEs

Using the code from torchsde's README we demonstrated a [>70,000x performance
Using the code from torchsde's README, we demonstrated a [>70,000x performance
advantage over torchsde](https://gist.github.com/ChrisRackauckas/6a03e7b151c86b32d74b41af54d495c6).
Further benchmarking is planned but was found to be computationally infeasible
for the time being.
Further benchmarking is planned, but was found to be computationally infeasible
at this time.

## A bunch of adjoint choices on neural ODEs

Expand Down
10 changes: 5 additions & 5 deletions docs/src/examples/dae/physical_constraints.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ As shown in the [stiff ODE tutorial](https://docs.sciml.ai/SciMLTutorialsOutput/
differential-algebraic equations (DAEs) can be used to impose physical
constraints. One way to define a DAE is through an ODE with a singular mass
matrix. For example, if we make `Mu' = f(u)` where the last row of `M` is all
zeros, then we have a constraint defined by the right hand side. Using
zeros, then we have a constraint defined by the right-hand side. Using
`NeuralODEMM`, we can use this to define a neural ODE where the sum of all 3
terms must add to one. An example of this is as follows:

Expand Down Expand Up @@ -81,7 +81,7 @@ rng = Random.default_rng()

### Differential Equation

First, we define our differential equations as a highly stiff problem which makes the
First, we define our differential equations as a highly stiff problem, which makes the
fitting difficult.

```@example dae2
Expand Down Expand Up @@ -118,7 +118,7 @@ all zeros)

### ODE Function, Problem and Solution

We define and solve our ODE problem to generate the "labeled" data which will be used to
We define and solve our ODE problem to generate the labeled data which will be used to
train our Neural Network.

```@example dae2
Expand All @@ -127,7 +127,7 @@ prob_stiff = ODEProblem(stiff_func, u₀, tspan, p)
sol_stiff = solve(prob_stiff, Rodas5(), saveat = 0.1)
```

Because this is a DAE we need to make sure to use a **compatible solver**.
Because this is a DAE, we need to make sure to use a **compatible solver**.
`Rodas5` works well for this example.

### Neural Network Layers
Expand Down Expand Up @@ -163,7 +163,7 @@ end

### Train Parameters

Training our network requires a **loss function**, an **optimizer** and a
Training our network requires a **loss function**, an **optimizer**, and a
**callback function** to display the progress.

#### Loss
Expand Down
8 changes: 4 additions & 4 deletions docs/src/examples/neural_ode/minibatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,9 @@ xlabel!("Time")
ylabel!("Temp")
```

When training a neural network we need to find the gradient with respect to our data set. There are three main ways to partition our data when using a training algorithm like gradient descent: stochastic, batching and mini-batching. Stochastic gradient descent trains on a single random data point each epoch. This allows for the neural network to better converge to the global minimum even on noisy data but is computationally inefficient. Batch gradient descent trains on the whole data set each epoch and while computationally efficient is prone to converging to local minima. Mini-batching combines both of these advantages and by training on a small random "mini-batch" of the data each epoch can converge to the global minimum while remaining more computationally efficient than stochastic descent. Typically we do this by randomly selecting subsets of the data each epoch and use this subset to train on. We can also pre-batch the data by creating an iterator holding these randomly selected batches before beginning to train. The proper size for the batch can be determined experimentally. Let us see how to do this with Julia.
When training a neural network, we need to find the gradient with respect to our data set. There are three main ways to partition our data when using a training algorithm like gradient descent: stochastic, batching and mini-batching. Stochastic gradient descent trains on a single random data point each epoch. This allows for the neural network to better converge to the global minimum even on noisy data, but is computationally inefficient. Batch gradient descent trains on the whole data set each epoch and while computationally efficient is prone to converging to local minima. Mini-batching combines both of these advantages and by training on a small random "mini-batch" of the data each epoch can converge to the global minimum while remaining more computationally efficient than stochastic descent. Typically, we do this by randomly selecting subsets of the data each epoch and use this subset to train on. We can also pre-batch the data by creating an iterator holding these randomly selected batches before beginning to train. The proper size for the batch can be determined experimentally. Let us see how to do this with Julia.

For this example we will use a very simple ordinary differential equation, newtons law of cooling. We can represent this in Julia like so.
For this example, we will use a very simple ordinary differential equation, newtons law of cooling. We can represent this in Julia like so.

```@example minibatch
using DifferentialEquations, Flux, Random, Plots
Expand Down Expand Up @@ -156,7 +156,7 @@ for (x, y) in train_loader
end
```

Now we train the neural network with a user defined call back function to display loss and the graphs with a maximum of 300 epochs.
Now we train the neural network with a user-defined call back function to display loss and the graphs with a maximum of 300 epochs.

```@example minibatch
numEpochs = 300
Expand All @@ -176,7 +176,7 @@ opt=ADAM(0.05)
Flux.train!(loss_adjoint, Flux.params(θ), ncycle(train_loader,numEpochs), opt, cb=Flux.throttle(cb, 10))
```

Finally we can see how well our trained network will generalize to new initial conditions.
Finally, we can see how well our trained network will generalize to new initial conditions.

```@example minibatch
starting_temp=collect(10:30:250)
Expand Down
14 changes: 7 additions & 7 deletions docs/src/examples/neural_ode/neural_gde.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This tutorial has been adapted from [here](https://github.com/CarloLucibello/GraphNeuralNetworks.jl/blob/master/examples/neural_ode_cora.jl).

In this tutorial we will use Graph Differential Equations (GDEs) to perform classification on the [CORA Dataset](https://relational.fit.cvut.cz/dataset/CORA). We shall be using the Graph Neural Networks primitives from the package [GraphNeuralNetworks](https://github.com/CarloLucibello/GraphNeuralNetworks.jl).
In this tutorial, we will use Graph Differential Equations (GDEs) to perform classification on the [CORA Dataset](https://relational.fit.cvut.cz/dataset/CORA). We shall be using the Graph Neural Networks primitives from the package [GraphNeuralNetworks](https://github.com/CarloLucibello/GraphNeuralNetworks.jl).

```julia
# Load the packages
Expand Down Expand Up @@ -184,7 +184,7 @@ Ã = normalized_adjacency(g, add_self_loops=true) |> device
```
### Training Data

GNNs operate on an entire graph, so we can't do any sort of minibatching here. We predict the entire dataset but train the model in a semi-supervised learning fashion.
GNNs operate on an entire graph, so we can't do any sort of minibatching here. We predict the entire dataset, but train the model in a semi-supervised learning fashion.
```julia
(; train_mask, val_mask, test_mask) = g.ndata
ytrain = y[:,train_mask]
Expand All @@ -202,7 +202,7 @@ epochs = 20
```
## Define the Graph Neural Network

Here we define a type of graph neural networks called `GCNConv`. We use the name `ExplicitGCNConv` to avoid naming conflicts with `GraphNeuralNetworks`. For more informations on defining a layer with `Lux`, please consult to the [doc](http://lux.csail.mit.edu/dev/introduction/overview/#AbstractExplicitLayer-API).
Here, we define a type of graph neural networks called `GCNConv`. We use the name `ExplicitGCNConv` to avoid naming conflicts with `GraphNeuralNetworks`. For more information on defining a layer with `Lux`, please consult to the [doc](http://lux.csail.mit.edu/dev/introduction/overview/#AbstractExplicitLayer-API).


```julia
Expand Down Expand Up @@ -240,7 +240,7 @@ end

## Neural Graph Ordinary Differential Equations

Let us now define the final model. We will use two GNN layers for approximating the gradients for the neural ODE. We use one additional `GCNConv` layer to project the data to a latent space and the a `Dense` layer to project it from the latent space to the predictions. Finally a softmax layer gives us the probability of the input belonging to each target category.
Let us now define the final model. We will use two GNN layers for approximating the gradients for the neural ODE. We use one additional `GCNConv` layer to project the data to a latent space and a `Dense` layer to project it from the latent space to the predictions. Finally, a softmax layer gives us the probability of the input belonging to each target category.

```julia
function diffeqsol_to_array(x::ODESolution{T, N, <:AbstractVector{<:CuArray}}) where {T, N}
Expand All @@ -264,7 +264,7 @@ model = Chain(ExplicitGCNConv(Ã, nin => nhidden, relu),

### Loss Function and Accuracy

We shall be using the standard categorical crossentropy loss function which is used for multiclass classification tasks.
We shall be using the standard categorical crossentropy loss function, which is used for multiclass classification tasks.

```julia
logitcrossentropy(ŷ, y) = mean(-sum(y .* logsoftmax(ŷ); dims=1))
Expand All @@ -283,7 +283,7 @@ end
```

### Setup Model
We need to manually set up our mode with `Lux`, and convert the paramters to `ComponentArray` so that they can work well with sensitivity algorithms.
We need to manually set up our mode with `Lux`, and convert the parameters to `ComponentArray` so that they can work well with sensitivity algorithms.
```julia
rng = Random.default_rng()
Random.seed!(rng, 0)
Expand All @@ -294,7 +294,7 @@ st = st |> device
```
### Optimizer

For this task we will be using the `ADAM` optimizer with a learning rate of `0.01`.
For this task, we will be using the `ADAM` optimizer with a learning rate of `0.01`.

```julia
opt = Optimisers.Adam(0.01f0)
Expand Down
8 changes: 4 additions & 4 deletions docs/src/examples/neural_ode/neural_ode_flux.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Neural Ordinary Differential Equations with Flux

All of the tools of SciMLSensitivity.jl can be used with Flux.jl. A lot of the examples
All the tools of SciMLSensitivity.jl can be used with Flux.jl. A lot of the examples
have been written to use `FastChain` and `sciml_train`, but in all cases this
can be changed to the `Chain` and `Flux.train!` workflow.

Expand Down Expand Up @@ -74,10 +74,10 @@ p,re = Flux.destructure(chain)

returns `p` which is the vector of parameters for the chain and `re` which is
a function `re(p)` that reconstructs the neural network with new parameters
`p`. Using this function we can thus build our neural differential equations in
`p`. Using this function, we can thus build our neural differential equations in
an explicit parameter style.

Let's use this to build and train a neural ODE from scratch. In this example we will
Let's use this to build and train a neural ODE from scratch. In this example, we will
optimize both the neural network parameters `p` and the input initial condition `u0`.
Notice that Optimization.jl works on a vector input, so we have to concatenate `u0`
and `p` and then in the loss function split to the pieces.
Expand Down Expand Up @@ -149,7 +149,7 @@ result_neuralode2 = Optimization.solve(optprob2,
```

Notice that the advantage of this format is that we can use Optim's optimizers, like
`LBFGS` with a full `Chain` object for all of Flux's neural networks, like
`LBFGS` with a full `Chain` object, for all of Flux's neural networks, like
convolutional neural networks.

![](https://user-images.githubusercontent.com/1814174/51399500-1f4dd080-1b14-11e9-8c9d-144f93b6eac2.gif)
12 changes: 6 additions & 6 deletions docs/src/examples/neural_ode/simplechains.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Neural Ordinary Differential Equations with SimpleChains

[SimpleChains](https://github.com/PumasAI/SimpleChains.jl) has demonstrated performance boosts of ~5x and ~30x when compared to other mainstream deep learning frameworks like Pytorch for the training and evaluation in the specific case of small neural networks. For the nitty-gritty details ,as well as, some SciML related videos around the need and applications of such a library we can refer to this [blogpost](https://julialang.org/blog/2022/04/simple-chains/).As for doing Scientific Machine Learning, how do we even begin with training neural ODEs with any generic deep learning library?
[SimpleChains](https://github.com/PumasAI/SimpleChains.jl) has demonstrated performance boosts of ~5x and ~30x when compared to other mainstream deep learning frameworks like Pytorch for the training and evaluation in the specific case of small neural networks. For the nitty-gritty details, as well as, some SciML related videos around the need and applications of such a library, we can refer to this [blogpost](https://julialang.org/blog/2022/04/simple-chains/). As for doing Scientific Machine Learning, how do we even begin with training neural ODEs with any generic deep learning library?

## Training Data

Firstly we'll need data for training the NeuralODE, which can be obtained by solving the ODE `u' = f(u,p,t)` numerically using the SciML ecosystem in Julia.
First, we'll need data for training the NeuralODE, which can be obtained by solving the ODE `u' = f(u,p,t)` numerically using the SciML ecosystem in Julia.

```@example sc_neuralode
using SimpleChains, StaticArrays, OrdinaryDiffEq, SciMLSensitivity, Optimization, OptimizationFlux, Plots
Expand All @@ -25,7 +25,7 @@ data = Array(solve(prob, Tsit5(), saveat = tsteps))

## Neural Network

Next we setup a small neural network. It will be trained to output the derivative of the solution at each time step given the value of the solution at the previous time step and the parameters of the network. Thus, we are treating the neural network as a function `f(u,p,t)`. The difference is that instead of relying on knowing the exact equation for the ODE, we get to solve it only with the data.
Next, we set up a small neural network. It will be trained to output the derivative of the solution at each time step given the value of the solution at the previous time step, and the parameters of the network. Thus, we are treating the neural network as a function `f(u,p,t)`. The difference is that instead of relying on knowing the exact equation for the ODE, we get to solve it only with the data.

```@example sc_neuralode
sc = SimpleChain(
Expand All @@ -42,7 +42,7 @@ f(u,p,t) = sc(u,p)

## NeuralODE, Prediction and Loss

Now instead of the function `trueODE(u,p,t)` in the first code block, we pass the neural network to the ODE solver. This is our NeuralODE. Now in order to train it we obtain predictions from the model and calculate the L2 loss against the data generated numerically previously.
Now instead of the function `trueODE(u,p,t)` in the first code block, we pass the neural network to the ODE solver. This is our NeuralODE. Now, in order to train it, we obtain predictions from the model and calculate the L2 loss against the data generated numerically previously.

```@example sc_neuralode
prob_nn = ODEProblem(f, u0, tspan)
Expand All @@ -60,9 +60,9 @@ end

## Training

The next step is to minimize the loss, so that the NeuralODE gets trained. But in order to be able to do that, we have to be able to backpropagate through the NeuralODE model. Here the backpropagation through the neural network is the easy part and we get that out of the box with any deep learning package(although not as fast as SimpleChains for the small nn case here). But we have to find a way to first propagate the sensitivities of the loss back, first through the ODE solver and then to the neural network.
The next step is to minimize the loss, so that the NeuralODE gets trained. But in order to be able to do that, we have to be able to backpropagate through the NeuralODE model. Here the backpropagation through the neural network is the easy part, and we get that out of the box with any deep learning package(although not as fast as SimpleChains for the small nn case here). But we have to find a way to first propagate the sensitivities of the loss back, first through the ODE solver and then to the neural network.

The adjoint of a neural ODE can be calculated through the various AD algorithms available in SciMLSensitivity.jl. But for working with [StaticArrays](https://docs.sciml.ai/StaticArrays/stable/) in SimpleChains.jl we require a special adjoint method as StaticArrays do not allow any mutation. All the adjoint methods make heavy use of in-place mutation to be performant with the heap allocated normal arrays. For our statically sized, stack allocated StaticArrays, in order to be able to compute the ODE adjoint we need to do everything out of place. Hence we have specifically used `QuadratureAdjoint(autojacvec=ZygoteVJP())` adjoint algorithm in the solve call inside `predict_neuralode(p)` which computes everything out-of-place when u0 is a StaticArray. Hence we can move forward with the training of the NeuralODE
The adjoint of a neural ODE can be calculated through the various AD algorithms available in SciMLSensitivity.jl. But working with [StaticArrays](https://docs.sciml.ai/StaticArrays/stable/) in SimpleChains.jl requires a special adjoint method as StaticArrays do not allow any mutation. All the adjoint methods make heavy use of in-place mutation to be performant with the heap allocated normal arrays. For our statically sized, stack allocated StaticArrays, in order to be able to compute the ODE adjoint we need to do everything out of place. Hence, we have specifically used `QuadratureAdjoint(autojacvec=ZygoteVJP())` adjoint algorithm in the solve call inside `predict_neuralode(p)` which computes everything out-of-place when u0 is a StaticArray. Hence, we can move forward with the training of the NeuralODE

```@example sc_neuralode
callback = function (p, l, pred; doplot = true)
Expand Down
4 changes: 2 additions & 2 deletions docs/src/examples/ode/exogenous_input.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Handling Exogenous Input Signals

The key to using exogeneous input signals is the same as in the rest of the
The key to using exogenous input signals is the same as in the rest of the
SciML universe: just use the function in the definition of the differential
equation. For example, if it's a standard differential equation, you can
use the form
Expand Down Expand Up @@ -30,7 +30,7 @@ which encloses an extra argument into `f` so that `_f` is now the interface-comp
differential equation definition.

Note that you can also learn what the exogenous equation is from data. For an
example on how to do this, you can use the [Optimal Control Example](@ref optcontrol)
example on how to do this, you can use the [Optimal Control Example](@ref optcontrol),
which shows how to parameterize a `u(t)` by a universal function and learn that
from data.

Expand Down
Loading

0 comments on commit 5454bac

Please sign in to comment.