Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homme(xx)/SL: Enhanced trajectory method. #6874

Merged
merged 10 commits into from
Jan 14, 2025

Conversation

ambrad
Copy link
Member

@ambrad ambrad commented Jan 7, 2025

This PR brings in a new feature that (1) increases accuracy of semi-Lagrangian tracer transport's trajectory calculations and (2) permits flexible trade-off between the trajectory accuracy and speed. This PR has the following parts:

  • F90 dycore support, with unit tests (principally in sl_advection.F90)
  • C++ dycore support, with unit tests (principally in ComposeTransportImplEnhancedTrajectory.cpp)
  • unit test driver updates: compose_ut.cpp
  • two new standalone-Homme tests, one each for the F90 and C++ dycores
  • new standalone-Homme transport test module for convergence testing: fully 3D, space-and-time-dependent surface pressure (dcmip2012_test1_conv_mod.F90)
  • CIME-based ERS tests, one each, for EAM and EAMxx
  • cleanup of a timer issue orthogonal to this PR: see commit "Hommexx: Rework skipping timers in first step."
  • updates to Homme machine files for Perlmutter
  • fix C++ dycore's handling of prescribed winds: had to move down in the call stack to match the F90 dycore

[non-BFB] due to two new CIME tests, two new standalone-Homme tests, and two modified standalone-Homme tests; otherwise BFB

ambrad and others added 5 commits January 6, 2025 17:54
Make this a cmake define triggered by 'SYCL_BUILD'. Move it outside of the
threaded region for consistency with t_dis/enablef usage. Switch based on
nstep==0. Use a local boolean to avoid repeated calls to t_enablef. (The define
prevents any of this code from mattering in all cases except the intended
application.)
Also add parameters to namelist definition file.
@ambrad ambrad self-assigned this Jan 7, 2025
@ambrad ambrad added Stealth PR has feature which, if turned on, could change climate. fka FCC HOMME HOMME standalone issues with the standalone HOMME code that dont impact E3SM EAMxx PRs focused on capabilities for EAMxx non-BFB PR makes roundoff changes to answers. labels Jan 7, 2025
@ambrad
Copy link
Member Author

ambrad commented Jan 7, 2025

This comment will be updated with testing results.

HOMMEBFB_P24.f19_g16_rx1.A.chrysalis_intel.C.20250106_175602_oau21c/TestStatus.log:
The following tests FAILED:
    1 - verifyBaselineResults (Failed)
   41 - thetah-sl-test11conv-r1t2-cdr20 (Failed)
   42 - thetah-sl-test11conv-r0t1-cdr30-rrm (Failed)
   44 - thetah-sl-testconv-3e (Failed)
   47 - thetah-sl-test11conv-r0t1-cdr30-rrm-kokkos (Failed)
   48 - thetah-sl-testconv-3e-kokkos (Failed)
HOMME_P24.f19_g16_rx1.A.chrysalis_intel.C.20250106_175602_oau21c/TestStatus.log:
The following tests FAILED:
    1 - verifyBaselineResults (Failed)
   30 - thetah-sl-test11conv-r1t2-cdr20 (Failed)
   31 - thetah-sl-test11conv-r0t1-cdr30-rrm (Failed)
   33 - thetah-sl-testconv-3e (Failed)
  • The CI checks that fail all show just this one known test failure: mam4_aero_microphys_standalone_baseline_cmp
  • e3sm_atm_integration on Chrysalis has all PASS except the new test, which doesn't have a baseline
  • e3sm_developer on Chrysalis passes
  • Perlmutter e3sm_scream_v1: e3sm_scream_v1 passes against baselines except for (of course) the new sl_nsubstep2 test
  • Perlmutter performance check: Good. (See below for description.)
  • Frontier e3sm_scream_v1_medres: Passes. Note: we don't have baselines on Frontier, and _D builds outside of e3sm_scream_v1_medres but inside e3sm_scream_v1 fail to build due to an ICE (known to us).
  • Frontier performance check: Good.

The performance check compares master and branch on the case SMS_Ln300.ne30pg2_ne30pg2.F2010-SCREAMv1.scream-perf_test--scream-output-preset-1--scream-L128. This does not exercise the new trajectory method; rather, it's checking for any accidental performance regressions. I use the line

grep "RUN_LOOP\"\|compose_transport\|trajectory\|cedr\|vertical\|dirk\|caar" timing/model_timing_stats

to look at representative timers.

@oksanaguba
Copy link
Contributor

This needs much more extensive explanation on what's new and how to use the new SL (and advantages).
Could you please create a confluence page with these details? i assume you have a lot of plots, too, add those there as well.

@ambrad
Copy link
Member Author

ambrad commented Jan 8, 2025

Could you please create a confluence page with these details? i assume you have a lot of plots, too, add those there as well.

Yes, I'm working on that. I'll post a link when I have a sufficiently useful draft.

(To turn it on, set semi_lagrange_trajectory_nsubstep = N for N > 0.)

@oksanaguba
Copy link
Contributor

re fix C++ dycore's handling of prescribed winds: had to move down in the call stack to match the F90 dycore -- was it a bug and/or did you move the call for new functionality?

please list all new/modified tests explicitly with more explanation behind them. homme modified tests -- are they for your new functionality, or are they modified because moved prescribed winds call is nonbfb?

@oksanaguba
Copy link
Contributor

oh sounds like there are also new unit tests?

Copy link
Contributor

@oksanaguba oksanaguba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires extensive documentation on confluence before being approved

@ambrad
Copy link
Member Author

ambrad commented Jan 8, 2025

I'll describe the tests on Confluence.

homme modified tests -- are they for your new functionality, or are they modified because moved prescribed winds call is nonbfb?

This refers to a pre-existing H/Hxx SL BFB test pair that I've switched to one of the new test flow fields. The answers change since the flow field is different. And then there are two new standalone-Homme tests that revealed the difference in the prescribed-wind call stack in the C++ vs F90 dycores.

@ambrad
Copy link
Member Author

ambrad commented Jan 8, 2025

@oksanaguba I put a link to a Confluence page above. I'll add draft material today.

@ambrad
Copy link
Member Author

ambrad commented Jan 9, 2025

oh sounds like there are also new unit tests?

Yes, quite a number of them. They are called from the compose_ut.cpp unit test driver (as usual) and are in sl_advection.F90 and ComposeTransportImplEnhancedTrajectoryTests.cpp.

@ambrad
Copy link
Member Author

ambrad commented Jan 9, 2025

please list all new/modified tests explicitly with more explanation behind them.

   41 - thetah-sl-test11conv-r1t2-cdr20     old test but with a slightly modified flow field
   42 - thetah-sl-test11conv-r0t1-cdr30-rrm     old test but with a slightly modified flow field
   47 - thetah-sl-test11conv-r0t1-cdr30-rrm-kokkos     Hommexx equivalent of above
   44 - thetah-sl-testconv-3e     new test with new flow field; uses the new trajectory method
   48 - thetah-sl-testconv-3e-kokkos     Hommexx equivalent of above

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge on the SL transport and the compose lib is very limited, so it was hard to do a good review. I would rely on folks from the compose proj for a better review.

That said, I think the code could use the injection of a fair amount of comments. The code is very "dense", which makes it hard to read. I know you are owning this code (and I'm glad you are), but for the sake of who may come next to maintain/modify the code (should you move to a different project), it would help a lot to have more inline guidance.

@@ -251,6 +259,40 @@ void deep_copy (FixedCapList<T, DTD>& d, const FixedCapList<T, DTS>& s) {
#endif
}

template <typename T>
struct FixedCapListHostOnly {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this class and using a plain std::vector<T>? The capacity feature is analogue to the vector "reserve" feature, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, but I have a general FixedCapList for most use cases on device and host. I made this HostOnly version to avoid host-device warnings having to do with mpi::Request. I could use std::vector, but (1) I like having slmm_kernel_assert_high in the impl and (2) I can use the same user code instead of having to switch it to std::vector syntax.

GPTLstop("compose_calc_enhanced_trajectory");
}

// Testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can put the testing stuff in a separate file? Not much so we can skip compilation outside of testing (it's small enough) but mostly for maintenance. Seeing a 2k+ lines file is always discouraging :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to keep unit tests collocated with the routines they are testing. However, that is just my own opinion. Where would you like me to move them? test_execs/thetal_kokkos_ut?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would at least like to keep the tests in the main directory. I really like being able to call a unit-test routine for a class anywhere I want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mods pushed.

}

KOKKOS_FUNCTION void
eta_interp_eta (const KernelVariables& kv, const int nlev,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few fcns in this file are without a comment. Would you mind adding a quick 2-liner at the top of each of them, explaining what they do (maybe throw in a formula, if that's easier)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mods pushed.

const auto wrk4 = Homme::subview(buf1d, kv.team_idx);
const auto vwrk = Homme::subview(buf2a, kv.team_idx);
// Reconstruct Lagrangian levels at t1 on arrival column:
// eta_arr_int = I[eta_ref_mid([0,eta_dep_mid,1])](eta_ref_int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are super helpful in a complex, code-rich folder like compose. Here, I am not clear on the notation, and was wondering if you could make the comment a bit clearer. In particular:

  • what does the notation eta([0,eta_dep_mid,1]) mean? What's the inner vector meaning?
  • what is the notation I[x](y)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add comments. Re: these specific questions:

  • I[y(x)](xi) means an interpolant constructed from y(x) and evaluated at xi.
  • [0,eta_dep_mid,1] is the concatenation of the eta departure midpoint values with 0 and 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mods pushed.

@@ -5,6 +5,76 @@
namespace homme {
namespace sl {

template <int np_, typename MT>
void run_relaxed_local (CDR<MT>& cdr, const Data& d, Real* q_min_r,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comment (inline or at the top of the fcn) regarding what's happening in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also for the run_global fcn. I am not familiar with the algo, which doesn't help, but the code is a bit dense, which makes it hard to read anyways. Some comments (even just a big picture, not necessarily fine-grained) may help those who one day may have to maintain this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these CEDR additions. They correspond to Sect. 2.2.4 of https://gmd.copernicus.org/articles/15/6285/2022/gmd-15-6285-2022.html. I had these lying around and unwisely added them to this already lengthy PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mods pushed.

template <typename MT>
void sl_h2d (TracerArrays<MT>& ta, bool transfer, Real* dep_points, Int ndim) {
#if defined COMPOSE_PORT
# if defined COMPOSE_HORIZ_OPENMP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If COMPOSE_PORT is defined, then we're building for the kokkos dycore, right? In that case we should not have horiz openmp, right? I thought horiz openmp was used in the f90 dycore only...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in a few other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kokkos code path is supported in the F90 dycore; it's just not used in practice because of array ordering (requiring transposes and extra memory) and needing to support HORIZ_OPENMP on CPUs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the cmake logic seems to be such that COMPOSE_PORT is only true for the compose_f90 lib, which is the only one linked against the f90 targets. Maybe I'm misreading the cmake logic though...

Copy link
Member Author

@ambrad ambrad Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For the composec++ lib, not the composef90 one, but I get your point.) Yes, that's right, because in practice we don't want to use COMPOSE_PORT for the F90 dycore. That is, nobody would purposely build a dycore that way.

But it does actually work. For such a niche thing, I haven't wanted to waste testing resources to keep it working, but I use the capability when doing dev work on occasion. In short, I do want to retain lines of the sort you're highlighting even though our automated builds never activate them. Note the lines you highlighted correspond to a bridge routine to do the copies/transposes. (Detail: The bridge routine is also used in compose_ut unit tests for F90-vs-C++ comparisons, but of course in this case HORIZ_OPENMP is, as you pointed out, always false.)

ambrad added 3 commits January 9, 2025 14:02
I'll add this in a different PR. However, keep the generalized indexing changes
and extra argument in the Homme interface because they are useful.
@ambrad
Copy link
Member Author

ambrad commented Jan 9, 2025

@bartgol I've responded to all your comments. In most cases I've pushed associated code mods. In some cases I've explained why something is the way it is.

@ambrad
Copy link
Member Author

ambrad commented Jan 10, 2025

@oksanaguba

requires extensive documentation on confluence before being approved

I've added a Confluence page and more comments to the code. See in particular the comments starting here.

I've responded to all your points above.

Is there anything more you would like before approving? I'd like to start merging this PR on Monday or Tuesday. Thanks.

@ambrad ambrad requested a review from oksanaguba January 13, 2025 19:20
ambrad added a commit that referenced this pull request Jan 13, 2025
This PR brings in a new feature that (1) increases accuracy of semi-Lagrangian
tracer transport's trajectory calculations and (2) permits flexible trade-off
between the trajectory accuracy and speed. This PR has the following parts:

- F90 dycore support, with unit tests (principally in sl_advection.F90);
- C++ dycore support, with unit tests (principally in
  ComposeTransportImplEnhancedTrajectory.cpp);
- unit test driver updates: compose_ut.cpp;
- two new standalone-Homme tests, one each for the F90 and C++ dycores;
- new standalone-Homme transport test module for convergence testing: fully 3D,
  space-and-time-dependent surface pressure (dcmip2012_test1_conv_mod.F90);
- CIME-based ERS tests, one each, for EAM and EAMxx;
- cleanup of a timer issue orthogonal to this PR: see commit 'Hommexx: Rework
  skipping timers in first step.';
- updates to Homme machine files for Perlmutter;
- fix C++ dycore's handling of prescribed winds: had to move down in the call
  stack to match the F90 dycore.

e3sm_developer and e3sm_atm_integration pass on Chrysalis. EAMxx test suites
pass on Perlmutter and Frontier. There are no performance effects when the
stealth feature is off based on tests on Chrysalis (v3.LR 11-year control run),
Frontier, and Perlmutter.

[non-BFB] due to two new CIME tests, two new standalone-Homme tests, and two
modified standalone-Homme tests; otherwise BFB.
@ambrad ambrad merged commit 3dfcac8 into E3SM-Project:master Jan 14, 2025
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx HOMME standalone issues with the standalone HOMME code that dont impact E3SM HOMME non-BFB PR makes roundoff changes to answers. Stealth PR has feature which, if turned on, could change climate. fka FCC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants