Cache TLM and Hessian solvers for NLVS blocks #4554
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Computing the adjoint on a
NonlinearVariationalSolve
block has a few optimisations: in effect the LVP and LVS are only initialised once when the block is created, then only the RHS is updated during adjoint evaluation (with some extra fiddling for the non-constant Jacobian case).The TLM and Hessian should look similar to this, but they currently go through the slower code path on
GenericSolveBlock
that does assembly and solver creation on every evaluation, making them significantly slower. For very ballpark figures, a simple Stokes box model takes ~20s for a forward or adjoint evaluation, but ~180 for the first Hessian evaluation (involving a couple of compilation phases, I guess) and ~100 for subsequent Hessians. I think the theory dictates that this should be closer to only 2x the adjoint.This PR is some pretty horrific code-mangling to attempt to bring parity across the different evaluation types. Currently, it sets up the LVP and LVS for TLM and Hessian on a NLVS block initialisation. I think it's fair to assume that in most cases the adjoint LVS is useful, but the TLM and Hessian are less common. Perhaps they can be gated by a flag if you know you'll need them, or initialised lazily (maybe a little more fiddly).
I also have completely hacked my way around the form replacement and constant Jacobian business, and they're almost certainly completely wrong. This probably deserves a more careful eye once the overall approach is a bit more settled. Indeed, maybe the form replacement mechanism could be re-engineered somewhat.
With the changes here, the same box model takes 100s for the first Hessian, then ~65 thereafter. There's still a bunch of time being spent on assembly and form manipulation for the second-order adjoint, and I assume that only needs to happen once. My model also has a couple of
ProjectBlock
that go through the slow path, but they are a tiny percentage of the overall runtime. It does mean it's a bit complicated to follow the logic through, depending on whether theGenericSolveBlock
orNonlinearVariationalSolveBlock
implementations of adj/tlm/hess are being used.