Skip to content

Addition of TRiSK++ to ocean dynamical core#9

Open
scalandr wants to merge 23 commits intoE3SM-Ocean-Discussion:masterfrom
scalandr:scalandr/ocean/triskCV
Open

Addition of TRiSK++ to ocean dynamical core#9
scalandr wants to merge 23 commits intoE3SM-Ocean-Discussion:masterfrom
scalandr:scalandr/ocean/triskCV

Conversation

@scalandr
Copy link
Copy Markdown
Collaborator

@scalandr scalandr commented Nov 2, 2021

Changed the way the kinetic energy is computed. Now kineticEnergyCell is mapped from kineticEnergyEdge and for the construction of kineticEnergyEdge new weightsOnEdge are computed based on a least-square approach. On init, a new subroutine that computes these weightsOnEdge_lsqr is called.

@mark-petersen mark-petersen force-pushed the scalandr/ocean/triskCV branch from 7c37b3c to 39d0cce Compare November 9, 2021 19:33
@scalandr scalandr force-pushed the scalandr/ocean/triskCV branch from 4bf2c9e to 40bdda9 Compare November 23, 2021 19:07
@mark-petersen mark-petersen force-pushed the scalandr/ocean/triskCV branch 2 times, most recently from f613fa4 to 722fe05 Compare December 7, 2021 19:59
@scalandr scalandr force-pushed the scalandr/ocean/triskCV branch from 722fe05 to 42c4f33 Compare March 8, 2022 17:18
@dengwirda
Copy link
Copy Markdown
Collaborator

@scalandr thanks for looking at these changes to tangentialVelocity.
As a follow-up, I think we can rearrange the computations here a little to simplify + improve efficiency: in the case of TRSK++, I suspect we can just use the lsqr coefficients to compute tangentialVelocity first, and then use tangentialVelocity within the kinetic energy calculation --- essentially just changing the order of operations and removing the extra u_perp variable all together.

@scalandr scalandr force-pushed the scalandr/ocean/triskCV branch from bf98f3d to 7b8a983 Compare June 5, 2022 02:32
mark-petersen pushed a commit that referenced this pull request May 31, 2023
cee/15.0.0 with GPU MPI buffers can crash in a system lib like this:

#4  0x00007fffe159e35b in (anonymous namespace)::do_free_with_callback(void*, void (*)(void*)) [clone .constprop.0] () from /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libtcmalloc_minimal.so.1
#5  0x00007fffe15a8f16 in tc_free () from /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libtcmalloc_minimal.so.1
#6  0x00007fffe99c2bcd in _dlerror_run () from /lib64/libdl.so.2
#7  0x00007fffe99c2481 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#8  0x00007fffea7bce42 in _ad_cray_lock_init () from /opt/cray/pe/lib64/libmpi_cray.so.12
#9  0x00007fffed7eb37a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#10 0x00007fffed7eb496 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#11 0x00007fffed7dc58a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#12 0x0000000000000001 in ?? ()
#13 0x00007fffffff42e7 in ?? ()
#14 0x0000000000000000 in ?? ()

Work around this by using cee/14.0.3.
xylar pushed a commit that referenced this pull request Nov 8, 2023
xylar pushed a commit that referenced this pull request Nov 14, 2025
…t#7866)

This is changing code that is widely used and has been unchanged for a
very long time, so I'm adding a lot of reviewers. Feel free to add
more.

I stumbled across this code when trying to debug a weird eamxx failure:

I observed the following symtoms:

If you look at components/elm/src/main/reweightMod.F90, you'll see the line SHR_ASSERT(bounds%level == BOUNDS_LEVEL_CLUMP, errMsg(__FILE__, __LINE__)) which causes this error:

1: free(): invalid pointer
...
0: #8  0x4c7e3ab in __shr_log_mod_MOD_shr_log_errmsg
0:      at /pscratch/sd/a/acmetest/E3SM/share/util/shr_log_mod.F90:78
0: #9  0x914149 in __reweightmod_MOD_reweight_wrapup
0:      at /pscratch/sd/a/acmetest/E3SM/components/elm/src/main/reweightMod.F90:48
0: #10  0x17bda7d in __dynsubgriddrivermod_MOD_dynsubgrid_wrapup_weight_changes
0:      at /pscratch/sd/a/acmetest/E3SM/components/elm/src/dyn_subgrid/dynSubgridDriverMod.F90:403
0: #11  0x17bf256 in __dynsubgriddrivermod_MOD_dynsubgrid_init._omp_fn.0
0:      at /pscratch/sd/a/acmetest/E3SM/components/elm/src/dyn_subgrid/dynSubgridDriverMod.F90:150

I thought this was because the SHR_ASSERT was failing, but that's not
the case (both bounds%level and BOUNDS_LEVEL_CLUMP are always 2). If I
just comment this SHR_ASSERT out entirely, the test PASSES!

The lowest line of code that triggers the free(): invalid pointer error is:
shr_log_errMsg = 'ERROR in '//trim(file)//' at line '//toString(line)

What I think is happening is that there is some memory corruption that
is causing the freeing of the allocation done by toString to fail. So,
this PR kind of sweeps that issue under the rug, but I do think it's
better not to do dynamic allocations if you don't have to. The test
PASSes with the changes in this PR.

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants