-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Homme(xx)/SL: Enhanced trajectory method. #6874
Conversation
Make this a cmake define triggered by 'SYCL_BUILD'. Move it outside of the threaded region for consistency with t_dis/enablef usage. Switch based on nstep==0. Use a local boolean to avoid repeated calls to t_enablef. (The define prevents any of this code from mattering in all cases except the intended application.)
Also add parameters to namelist definition file.
This comment will be updated with testing results.
The performance check compares master and branch on the case SMS_Ln300.ne30pg2_ne30pg2.F2010-SCREAMv1.scream-perf_test--scream-output-preset-1--scream-L128. This does not exercise the new trajectory method; rather, it's checking for any accidental performance regressions. I use the line
to look at representative timers. |
This needs much more extensive explanation on what's new and how to use the new SL (and advantages). |
Yes, I'm working on that. I'll post a link when I have a sufficiently useful draft. (To turn it on, set semi_lagrange_trajectory_nsubstep = N for N > 0.) |
re please list all new/modified tests explicitly with more explanation behind them. homme modified tests -- are they for your new functionality, or are they modified because moved prescribed winds call is nonbfb? |
oh sounds like there are also new unit tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires extensive documentation on confluence before being approved
I'll describe the tests on Confluence.
This refers to a pre-existing H/Hxx SL BFB test pair that I've switched to one of the new test flow fields. The answers change since the flow field is different. And then there are two new standalone-Homme tests that revealed the difference in the prescribed-wind call stack in the C++ vs F90 dycores. |
@oksanaguba I put a link to a Confluence page above. I'll add draft material today. |
Yes, quite a number of them. They are called from the compose_ut.cpp unit test driver (as usual) and are in sl_advection.F90 and ComposeTransportImplEnhancedTrajectoryTests.cpp. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knowledge on the SL transport and the compose lib is very limited, so it was hard to do a good review. I would rely on folks from the compose proj for a better review.
That said, I think the code could use the injection of a fair amount of comments. The code is very "dense", which makes it hard to read. I know you are owning this code (and I'm glad you are), but for the sake of who may come next to maintain/modify the code (should you move to a different project), it would help a lot to have more inline guidance.
@@ -251,6 +259,40 @@ void deep_copy (FixedCapList<T, DTD>& d, const FixedCapList<T, DTS>& s) { | |||
#endif | |||
} | |||
|
|||
template <typename T> | |||
struct FixedCapListHostOnly { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this class and using a plain std::vector<T>
? The capacity feature is analogue to the vector "reserve" feature, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true, but I have a general FixedCapList for most use cases on device and host. I made this HostOnly version to avoid host-device warnings having to do with mpi::Request. I could use std::vector, but (1) I like having slmm_kernel_assert_high in the impl and (2) I can use the same user code instead of having to switch it to std::vector syntax.
GPTLstop("compose_calc_enhanced_trajectory"); | ||
} | ||
|
||
// Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can put the testing stuff in a separate file? Not much so we can skip compilation outside of testing (it's small enough) but mostly for maintenance. Seeing a 2k+ lines file is always discouraging :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to keep unit tests collocated with the routines they are testing. However, that is just my own opinion. Where would you like me to move them? test_execs/thetal_kokkos_ut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I would at least like to keep the tests in the main directory. I really like being able to call a unit-test routine for a class anywhere I want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mods pushed.
} | ||
|
||
KOKKOS_FUNCTION void | ||
eta_interp_eta (const KernelVariables& kv, const int nlev, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few fcns in this file are without a comment. Would you mind adding a quick 2-liner at the top of each of them, explaining what they do (maybe throw in a formula, if that's easier)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mods pushed.
const auto wrk4 = Homme::subview(buf1d, kv.team_idx); | ||
const auto vwrk = Homme::subview(buf2a, kv.team_idx); | ||
// Reconstruct Lagrangian levels at t1 on arrival column: | ||
// eta_arr_int = I[eta_ref_mid([0,eta_dep_mid,1])](eta_ref_int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are super helpful in a complex, code-rich folder like compose. Here, I am not clear on the notation, and was wondering if you could make the comment a bit clearer. In particular:
- what does the notation
eta([0,eta_dep_mid,1])
mean? What's the inner vector meaning? - what is the notation
I[x](y)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add comments. Re: these specific questions:
I[y(x)](xi)
means an interpolant constructed fromy(x)
and evaluated atxi
.[0,eta_dep_mid,1]
is the concatenation of the eta departure midpoint values with 0 and 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mods pushed.
@@ -5,6 +5,76 @@ | |||
namespace homme { | |||
namespace sl { | |||
|
|||
template <int np_, typename MT> | |||
void run_relaxed_local (CDR<MT>& cdr, const Data& d, Real* q_min_r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comment (inline or at the top of the fcn) regarding what's happening in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also for the run_global fcn. I am not familiar with the algo, which doesn't help, but the code is a bit dense, which makes it hard to read anyways. Some comments (even just a big picture, not necessarily fine-grained) may help those who one day may have to maintain this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove these CEDR additions. They correspond to Sect. 2.2.4 of https://gmd.copernicus.org/articles/15/6285/2022/gmd-15-6285-2022.html. I had these lying around and unwisely added them to this already lengthy PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mods pushed.
template <typename MT> | ||
void sl_h2d (TracerArrays<MT>& ta, bool transfer, Real* dep_points, Int ndim) { | ||
#if defined COMPOSE_PORT | ||
# if defined COMPOSE_HORIZ_OPENMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If COMPOSE_PORT is defined, then we're building for the kokkos dycore, right? In that case we should not have horiz openmp, right? I thought horiz openmp was used in the f90 dycore only...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in a few other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kokkos code path is supported in the F90 dycore; it's just not used in practice because of array ordering (requiring transposes and extra memory) and needing to support HORIZ_OPENMP on CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the cmake logic seems to be such that COMPOSE_PORT is only true for the compose_f90 lib, which is the only one linked against the f90 targets. Maybe I'm misreading the cmake logic though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For the composec++ lib, not the composef90 one, but I get your point.) Yes, that's right, because in practice we don't want to use COMPOSE_PORT for the F90 dycore. That is, nobody would purposely build a dycore that way.
But it does actually work. For such a niche thing, I haven't wanted to waste testing resources to keep it working, but I use the capability when doing dev work on occasion. In short, I do want to retain lines of the sort you're highlighting even though our automated builds never activate them. Note the lines you highlighted correspond to a bridge routine to do the copies/transposes. (Detail: The bridge routine is also used in compose_ut unit tests for F90-vs-C++ comparisons, but of course in this case HORIZ_OPENMP is, as you pointed out, always false.)
I'll add this in a different PR. However, keep the generalized indexing changes and extra argument in the Homme interface because they are useful.
@bartgol I've responded to all your comments. In most cases I've pushed associated code mods. In some cases I've explained why something is the way it is. |
I've added a Confluence page and more comments to the code. See in particular the comments starting here. I've responded to all your points above. Is there anything more you would like before approving? I'd like to start merging this PR on Monday or Tuesday. Thanks. |
This PR brings in a new feature that (1) increases accuracy of semi-Lagrangian tracer transport's trajectory calculations and (2) permits flexible trade-off between the trajectory accuracy and speed. This PR has the following parts: - F90 dycore support, with unit tests (principally in sl_advection.F90); - C++ dycore support, with unit tests (principally in ComposeTransportImplEnhancedTrajectory.cpp); - unit test driver updates: compose_ut.cpp; - two new standalone-Homme tests, one each for the F90 and C++ dycores; - new standalone-Homme transport test module for convergence testing: fully 3D, space-and-time-dependent surface pressure (dcmip2012_test1_conv_mod.F90); - CIME-based ERS tests, one each, for EAM and EAMxx; - cleanup of a timer issue orthogonal to this PR: see commit 'Hommexx: Rework skipping timers in first step.'; - updates to Homme machine files for Perlmutter; - fix C++ dycore's handling of prescribed winds: had to move down in the call stack to match the F90 dycore. e3sm_developer and e3sm_atm_integration pass on Chrysalis. EAMxx test suites pass on Perlmutter and Frontier. There are no performance effects when the stealth feature is off based on tests on Chrysalis (v3.LR 11-year control run), Frontier, and Perlmutter. [non-BFB] due to two new CIME tests, two new standalone-Homme tests, and two modified standalone-Homme tests; otherwise BFB.
This PR brings in a new feature that (1) increases accuracy of semi-Lagrangian tracer transport's trajectory calculations and (2) permits flexible trade-off between the trajectory accuracy and speed. This PR has the following parts:
[non-BFB] due to two new CIME tests, two new standalone-Homme tests, and two modified standalone-Homme tests; otherwise BFB