[skip ci] LanguageTool

SciML · Jan 9, 2023 · 5454bac · 5454bac
1 parent 8331f34
commit 5454bac
Show file tree

Hide file tree

Showing 25 changed files with 122 additions and 120 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,4 @@
 *.jl.*.cov
 *.jl.mem
 Manifest.toml
+/docs/build/
diff --git a/docs/src/Benchmark.md b/docs/src/Benchmark.md
@@ -2,13 +2,13 @@
 
 ## Note on benchmarking and getting the best performance out of the SciML stack's adjoints
 
-From our [recent papers](https://arxiv.org/abs/1812.01892) it's clear that `EnzymeVJP` is the fastest,
-especially when the program is setup to be fully non-allocating mutating functions. Thus for all benchmarking,
+From our [recent papers](https://arxiv.org/abs/1812.01892), it's clear that `EnzymeVJP` is the fastest,
+especially when the program is set up to be fully non-allocating mutating functions. Thus for all benchmarking,
 especially with PDEs, this should be done. Neural network libraries don't make use of mutation effectively
 [except for SimpleChains.jl](https://julialang.org/blog/2022/04/simple-chains/), so we recommend creating a
 neural ODE / universal ODE with `ZygoteVJP` and Flux first, but then check the correctness by moving the
 implementation over to SimpleChains and if possible `EnzymeVJP`. This can be an order of magnitude improvement
-(or more) in many situations over all of the previous benchmarks using Zygote and Flux, and thus it's
+(or more) in many situations over all the previous benchmarks using Zygote and Flux, and thus it's
 highly recommended in scenarios that require performance.
 
 ## Vs Torchdiffeq 1 million and less ODEs
@@ -23,10 +23,10 @@ A training benchmark using the spiral ODE from the original neural ODE paper
 
 ## Vs torchsde on small SDEs
 
-Using the code from torchsde's README we demonstrated a [>70,000x performance
+Using the code from torchsde's README, we demonstrated a [>70,000x performance
 advantage over torchsde](https://gist.github.com/ChrisRackauckas/6a03e7b151c86b32d74b41af54d495c6).
-Further benchmarking is planned but was found to be computationally infeasible
-for the time being.
+Further benchmarking is planned, but was found to be computationally infeasible
+at this time.
 
 ## A bunch of adjoint choices on neural ODEs
 

diff --git a/docs/src/examples/dae/physical_constraints.md b/docs/src/examples/dae/physical_constraints.md
@@ -4,7 +4,7 @@ As shown in the [stiff ODE tutorial](https://docs.sciml.ai/SciMLTutorialsOutput/
 differential-algebraic equations (DAEs) can be used to impose physical
 constraints. One way to define a DAE is through an ODE with a singular mass
 matrix. For example, if we make `Mu' = f(u)` where the last row of `M` is all
-zeros, then we have a constraint defined by the right hand side. Using
+zeros, then we have a constraint defined by the right-hand side. Using
 `NeuralODEMM`, we can use this to define a neural ODE where the sum of all 3
 terms must add to one. An example of this is as follows:
 
@@ -81,7 +81,7 @@ rng = Random.default_rng()
 
 ### Differential Equation
 
-First, we define our differential equations as a highly stiff problem which makes the
+First, we define our differential equations as a highly stiff problem, which makes the
 fitting difficult.
 
 ```@example dae2
@@ -118,7 +118,7 @@ all zeros)
 
 ### ODE Function, Problem and Solution
 
-We define and solve our ODE problem to generate the "labeled" data which will be used to
+We define and solve our ODE problem to generate the “labeled” data which will be used to
 train our Neural Network.
 
 ```@example dae2
@@ -127,7 +127,7 @@ prob_stiff = ODEProblem(stiff_func, u₀, tspan, p)
 sol_stiff = solve(prob_stiff, Rodas5(), saveat = 0.1)
 ```
 
-Because this is a DAE we need to make sure to use a **compatible solver**.
+Because this is a DAE, we need to make sure to use a **compatible solver**.
 `Rodas5` works well for this example.
 
 ### Neural Network Layers
@@ -163,7 +163,7 @@ end
 
 ### Train Parameters
 
-Training our network requires a **loss function**, an **optimizer** and a
+Training our network requires a **loss function**, an **optimizer**, and a
 **callback function** to display the progress.
 
 #### Loss

diff --git a/docs/src/examples/neural_ode/minibatch.md b/docs/src/examples/neural_ode/minibatch.md
@@ -88,9 +88,9 @@ xlabel!("Time")
 ylabel!("Temp") 
 ```
 
-When training a neural network we need to find the gradient with respect to our data set. There are three main ways to partition our data when using a training algorithm like gradient descent: stochastic, batching and mini-batching. Stochastic gradient descent trains on a single random data point each epoch. This allows for the neural network to better converge to the global minimum even on noisy data but is computationally inefficient. Batch gradient descent trains on the whole data set each epoch and while computationally efficient is prone to converging to local minima. Mini-batching combines both of these advantages and by training on a small random "mini-batch" of the data each epoch can converge to the global minimum while remaining more computationally efficient than stochastic descent. Typically we do this by randomly selecting subsets of the data each epoch and use this subset to train on. We can also pre-batch the data by creating an iterator holding these randomly selected batches before beginning to train. The proper size for the batch can be determined experimentally. Let us see how to do this with Julia. 
+When training a neural network, we need to find the gradient with respect to our data set. There are three main ways to partition our data when using a training algorithm like gradient descent: stochastic, batching and mini-batching. Stochastic gradient descent trains on a single random data point each epoch. This allows for the neural network to better converge to the global minimum even on noisy data, but is computationally inefficient. Batch gradient descent trains on the whole data set each epoch and while computationally efficient is prone to converging to local minima. Mini-batching combines both of these advantages and by training on a small random "mini-batch" of the data each epoch can converge to the global minimum while remaining more computationally efficient than stochastic descent. Typically, we do this by randomly selecting subsets of the data each epoch and use this subset to train on. We can also pre-batch the data by creating an iterator holding these randomly selected batches before beginning to train. The proper size for the batch can be determined experimentally. Let us see how to do this with Julia. 
 
-For this example we will use a very simple ordinary differential equation, newtons law of cooling. We can represent this in Julia like so. 
+For this example, we will use a very simple ordinary differential equation, newtons law of cooling. We can represent this in Julia like so. 
 
 ```@example minibatch
 using DifferentialEquations, Flux, Random, Plots
@@ -156,7 +156,7 @@ for (x, y) in train_loader
 end
 ```
 
-Now we train the neural network with a user defined call back function to display loss and the graphs with a maximum of 300 epochs. 
+Now we train the neural network with a user-defined call back function to display loss and the graphs with a maximum of 300 epochs. 
 
 ```@example minibatch
 numEpochs = 300
@@ -176,7 +176,7 @@ opt=ADAM(0.05)
 Flux.train!(loss_adjoint, Flux.params(θ), ncycle(train_loader,numEpochs), opt, cb=Flux.throttle(cb, 10))
 ```
 
-Finally we can see how well our trained network will generalize to new initial conditions. 
+Finally, we can see how well our trained network will generalize to new initial conditions. 
 
 ```@example minibatch
 starting_temp=collect(10:30:250)

diff --git a/docs/src/examples/neural_ode/neural_gde.md b/docs/src/examples/neural_ode/neural_gde.md
@@ -2,7 +2,7 @@
 
 This tutorial has been adapted from [here](https://github.com/CarloLucibello/GraphNeuralNetworks.jl/blob/master/examples/neural_ode_cora.jl).
 
-In this tutorial we will use Graph Differential Equations (GDEs) to perform classification on the [CORA Dataset](https://relational.fit.cvut.cz/dataset/CORA). We shall be using the Graph Neural Networks primitives from the package [GraphNeuralNetworks](https://github.com/CarloLucibello/GraphNeuralNetworks.jl).
+In this tutorial, we will use Graph Differential Equations (GDEs) to perform classification on the [CORA Dataset](https://relational.fit.cvut.cz/dataset/CORA). We shall be using the Graph Neural Networks primitives from the package [GraphNeuralNetworks](https://github.com/CarloLucibello/GraphNeuralNetworks.jl).
 
 ```julia
 # Load the packages
@@ -184,7 +184,7 @@ Ã = normalized_adjacency(g, add_self_loops=true) |> device
 ```
 ### Training Data
 
-GNNs operate on an entire graph, so we can't do any sort of minibatching here. We predict the entire dataset but train the model in a semi-supervised learning fashion.
+GNNs operate on an entire graph, so we can't do any sort of minibatching here. We predict the entire dataset, but train the model in a semi-supervised learning fashion.
 ```julia
 (; train_mask, val_mask, test_mask) = g.ndata
 ytrain = y[:,train_mask]
@@ -202,7 +202,7 @@ epochs = 20
 ```
 ## Define the Graph Neural Network
 
-Here we define a type of graph neural networks called `GCNConv`. We use the name `ExplicitGCNConv` to avoid naming conflicts with `GraphNeuralNetworks`. For more informations on defining a layer with `Lux`, please consult to the [doc](http://lux.csail.mit.edu/dev/introduction/overview/#AbstractExplicitLayer-API).
+Here, we define a type of graph neural networks called `GCNConv`. We use the name `ExplicitGCNConv` to avoid naming conflicts with `GraphNeuralNetworks`. For more information on defining a layer with `Lux`, please consult to the [doc](http://lux.csail.mit.edu/dev/introduction/overview/#AbstractExplicitLayer-API).
 
 
 ```julia
@@ -240,7 +240,7 @@ end
 
 ## Neural Graph Ordinary Differential Equations
 
-Let us now define the final model. We will use two GNN layers for approximating the gradients for the neural ODE. We use one additional `GCNConv` layer to project the data to a latent space and the a `Dense` layer to project it from the latent space to the predictions. Finally a softmax layer gives us the probability of the input belonging to each target category.
+Let us now define the final model. We will use two GNN layers for approximating the gradients for the neural ODE. We use one additional `GCNConv` layer to project the data to a latent space and a `Dense` layer to project it from the latent space to the predictions. Finally, a softmax layer gives us the probability of the input belonging to each target category.
 
 ```julia
 function diffeqsol_to_array(x::ODESolution{T, N, <:AbstractVector{<:CuArray}}) where {T, N}
@@ -264,7 +264,7 @@ model = Chain(ExplicitGCNConv(Ã, nin => nhidden, relu),
 
 ### Loss Function and Accuracy
 
-We shall be using the standard categorical crossentropy loss function which is used for multiclass classification tasks.
+We shall be using the standard categorical crossentropy loss function, which is used for multiclass classification tasks.
 
 ```julia
 logitcrossentropy(ŷ, y) = mean(-sum(y .* logsoftmax(ŷ); dims=1))
@@ -283,7 +283,7 @@ end
 ```
 
 ### Setup Model
-We need to manually set up our mode with `Lux`, and convert the paramters to `ComponentArray` so that they can work well with sensitivity algorithms.
+We need to manually set up our mode with `Lux`, and convert the parameters to `ComponentArray` so that they can work well with sensitivity algorithms.
 ```julia
 rng = Random.default_rng()
 Random.seed!(rng, 0)
@@ -294,7 +294,7 @@ st = st |> device
 ```
 ### Optimizer
 
-For this task we will be using the `ADAM` optimizer with a learning rate of `0.01`.
+For this task, we will be using the `ADAM` optimizer with a learning rate of `0.01`.
 
 ```julia
 opt = Optimisers.Adam(0.01f0)

diff --git a/docs/src/examples/neural_ode/neural_ode_flux.md b/docs/src/examples/neural_ode/neural_ode_flux.md
@@ -1,6 +1,6 @@
 # Neural Ordinary Differential Equations with Flux
 
-All of the tools of SciMLSensitivity.jl can be used with Flux.jl. A lot of the examples
+All the tools of SciMLSensitivity.jl can be used with Flux.jl. A lot of the examples
 have been written to use `FastChain` and `sciml_train`, but in all cases this
 can be changed to the `Chain` and `Flux.train!` workflow.
 
@@ -74,10 +74,10 @@ p,re = Flux.destructure(chain)
 
 returns `p` which is the vector of parameters for the chain and `re` which is
 a function `re(p)` that reconstructs the neural network with new parameters
-`p`. Using this function we can thus build our neural differential equations in
+`p`. Using this function, we can thus build our neural differential equations in
 an explicit parameter style.
 
-Let's use this to build and train a neural ODE from scratch. In this example we will
+Let's use this to build and train a neural ODE from scratch. In this example, we will
 optimize both the neural network parameters `p` and the input initial condition `u0`.
 Notice that Optimization.jl works on a vector input, so we have to concatenate `u0`
 and `p` and then in the loss function split to the pieces.
@@ -149,7 +149,7 @@ result_neuralode2 = Optimization.solve(optprob2,
 ```
 
 Notice that the advantage of this format is that we can use Optim's optimizers, like
-`LBFGS` with a full `Chain` object for all of Flux's neural networks, like
+`LBFGS` with a full `Chain` object, for all of Flux's neural networks, like
 convolutional neural networks.
 
 ![](https://user-images.githubusercontent.com/1814174/51399500-1f4dd080-1b14-11e9-8c9d-144f93b6eac2.gif)
diff --git a/docs/src/examples/neural_ode/simplechains.md b/docs/src/examples/neural_ode/simplechains.md
@@ -1,10 +1,10 @@
 # Neural Ordinary Differential Equations with SimpleChains
 
-[SimpleChains](https://github.com/PumasAI/SimpleChains.jl) has demonstrated performance boosts of ~5x and ~30x when compared to other mainstream deep learning frameworks like Pytorch for the training and evaluation in the specific case of small neural networks. For the nitty-gritty details ,as well as, some SciML related videos around the need and applications of such a library we can refer to this [blogpost](https://julialang.org/blog/2022/04/simple-chains/).As for doing Scientific Machine Learning, how do we even begin with training neural ODEs with any generic deep learning library?
+[SimpleChains](https://github.com/PumasAI/SimpleChains.jl) has demonstrated performance boosts of ~5x and ~30x when compared to other mainstream deep learning frameworks like Pytorch for the training and evaluation in the specific case of small neural networks. For the nitty-gritty details, as well as, some SciML related videos around the need and applications of such a library, we can refer to this [blogpost](https://julialang.org/blog/2022/04/simple-chains/). As for doing Scientific Machine Learning, how do we even begin with training neural ODEs with any generic deep learning library?
 
 ## Training Data
 
-Firstly we'll need data for training the NeuralODE, which can be obtained by solving the ODE `u' = f(u,p,t)` numerically using the SciML ecosystem in Julia.
+First, we'll need data for training the NeuralODE, which can be obtained by solving the ODE `u' = f(u,p,t)` numerically using the SciML ecosystem in Julia.
 
 ```@example sc_neuralode
 using SimpleChains, StaticArrays, OrdinaryDiffEq, SciMLSensitivity, Optimization, OptimizationFlux, Plots
@@ -25,7 +25,7 @@ data = Array(solve(prob, Tsit5(), saveat = tsteps))
 
 ## Neural Network
 
-Next we setup a small neural network. It will be trained to output the derivative of the solution at each time step given the value of the solution at the previous time step and the parameters of the network. Thus, we are treating the neural network as a function `f(u,p,t)`. The difference is that instead of relying on knowing the exact equation for the ODE, we get to solve it only with the data.
+Next, we set up a small neural network. It will be trained to output the derivative of the solution at each time step given the value of the solution at the previous time step, and the parameters of the network. Thus, we are treating the neural network as a function `f(u,p,t)`. The difference is that instead of relying on knowing the exact equation for the ODE, we get to solve it only with the data.
 
 ```@example sc_neuralode
 sc = SimpleChain(
@@ -42,7 +42,7 @@ f(u,p,t) = sc(u,p)
 
 ## NeuralODE, Prediction and Loss
 
-Now instead of the function `trueODE(u,p,t)` in the first code block, we pass the neural network to the ODE solver. This is our NeuralODE. Now in order to train it we obtain predictions from the model and calculate the L2 loss against the data generated numerically previously.
+Now instead of the function `trueODE(u,p,t)` in the first code block, we pass the neural network to the ODE solver. This is our NeuralODE. Now, in order to train it, we obtain predictions from the model and calculate the L2 loss against the data generated numerically previously.
 
 ```@example sc_neuralode
 prob_nn = ODEProblem(f, u0, tspan)
@@ -60,9 +60,9 @@ end
 
 ## Training
 
-The next step is to minimize the loss, so that the NeuralODE gets trained. But in order to be able to do that, we have to be able to backpropagate through the NeuralODE model. Here the backpropagation through the neural network is the easy part and we get that out of the box with any deep learning package(although not as fast as SimpleChains for the small nn case here). But we have to find a way to first propagate the sensitivities of the loss back, first through the ODE solver and then to the neural network.
+The next step is to minimize the loss, so that the NeuralODE gets trained. But in order to be able to do that, we have to be able to backpropagate through the NeuralODE model. Here the backpropagation through the neural network is the easy part, and we get that out of the box with any deep learning package(although not as fast as SimpleChains for the small nn case here). But we have to find a way to first propagate the sensitivities of the loss back, first through the ODE solver and then to the neural network.
 
-The adjoint of a neural ODE can be calculated through the various AD algorithms available in SciMLSensitivity.jl. But for working with [StaticArrays](https://docs.sciml.ai/StaticArrays/stable/) in SimpleChains.jl we require a special adjoint method as StaticArrays do not allow any mutation. All the adjoint methods make heavy use of in-place mutation to be performant with the heap allocated normal arrays. For our statically sized, stack allocated StaticArrays, in order to be able to compute the ODE adjoint we need to do everything out of place. Hence we have specifically used `QuadratureAdjoint(autojacvec=ZygoteVJP())` adjoint algorithm in the solve call inside `predict_neuralode(p)` which computes everything out-of-place when u0 is a StaticArray. Hence we can move forward with the training of the NeuralODE
+The adjoint of a neural ODE can be calculated through the various AD algorithms available in SciMLSensitivity.jl. But working with [StaticArrays](https://docs.sciml.ai/StaticArrays/stable/) in SimpleChains.jl requires a special adjoint method as StaticArrays do not allow any mutation. All the adjoint methods make heavy use of in-place mutation to be performant with the heap allocated normal arrays. For our statically sized, stack allocated StaticArrays, in order to be able to compute the ODE adjoint we need to do everything out of place. Hence, we have specifically used `QuadratureAdjoint(autojacvec=ZygoteVJP())` adjoint algorithm in the solve call inside `predict_neuralode(p)` which computes everything out-of-place when u0 is a StaticArray. Hence, we can move forward with the training of the NeuralODE
 
 ```@example sc_neuralode
 callback = function (p, l, pred; doplot = true)

diff --git a/docs/src/examples/ode/exogenous_input.md b/docs/src/examples/ode/exogenous_input.md
@@ -1,6 +1,6 @@
 # Handling Exogenous Input Signals
 
-The key to using exogeneous input signals is the same as in the rest of the
+The key to using exogenous input signals is the same as in the rest of the
 SciML universe: just use the function in the definition of the differential
 equation. For example, if it's a standard differential equation, you can
 use the form
@@ -30,7 +30,7 @@ which encloses an extra argument into `f` so that `_f` is now the interface-comp
 differential equation definition.
 
 Note that you can also learn what the exogenous equation is from data. For an
-example on how to do this, you can use the [Optimal Control Example](@ref optcontrol)
+example on how to do this, you can use the [Optimal Control Example](@ref optcontrol),
 which shows how to parameterize a `u(t)` by a universal function and learn that
 from data.