While investigating the divide-by-zero error in mom5 with oneAPI, I have uncovered a different error in CICE4 -- ice_IOUnitsGet: No free units, which originates from this line of CICE code. I get the same error with oneAPI 2025 - so presumably this is an underlying code issue.
The exe is here -- which is compiling the latest release of ESM1.6 (access-esm1p6/dev_2025.04.000) with Intel classic compiler 2021.10.0, and using these fortran flags 'fflags="-fprotect-parens -assume nan_compares -assume ieee_compares -fpe0 -traceback -check all -init=snan -init=array -init=huge"' for mom5, cice4 and um7.
The config uses 12 CICE cpus and I get 12 instances of the same error in the pbs log, followed by a SIGTERM and a traceback from UM7 routines (which I expect are a red-herring. A terminating CICE causes SIGTERM to be sent to the other running processes and that's causes the traceback). The run directory is here: /home/593/ms2335/perf-opt-classic-esm1.6/sapphirerapids/access-esm1.6-PI-sapphirerapids-416-cores-classic-deployed-exe-to-test-sanitizer-and-div-by-zero-esm1p6-pr74
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
orrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.28.s 000014E70A542D10 Unknown Unknown Unknown
um_hg3.exe 000000000041061D Unknown Unknown Unknown
um_hg3.exe 000000000100757C read_multi_ 996 read_multi.f90
um_hg3.exe 0000000000E4D348 um_readdump_ 1605 um_readdump.f90
um_hg3.exe 0000000000C7A040 initdump_ 6686 initdump.f90
um_hg3.exe 00000000006245D8 initial_ 6388 initial.f90
um_hg3.exe 00000000004434A1 Unknown Unknown Unknown
um_hg3.exe 0000000000417376 um_shell_ 3930 um_shell.f90
um_hg3.exe 00000000004104C8 MAIN__ 40 flumeMain.f90
um_hg3.exe 000000000041040D Unknown Unknown Unknown
libc-2.28.so 000014E709F907E5 __libc_start_main Unknown Unknown
um_hg3.exe 000000000041032E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.28.s 0000151689D39D10 Unknown Unknown Unknown
libucp.so.0.0.0 0000151685AD77D8 ucp_worker_progre Unknown Unknown
libmpi.so.40.30.5 000015168A374BFF mca_pml_ucx_recv Unknown Unknown
...
The released version itself (i.e., without any compiler, compiler flags modifications) runs fine.
Pinging @anton-seaice @chrisb13
While investigating the divide-by-zero error in mom5 with oneAPI, I have uncovered a different error in CICE4 --
ice_IOUnitsGet: No free units, which originates from this line of CICE code. I get the same error with oneAPI 2025 - so presumably this is an underlying code issue.The exe is here -- which is compiling the latest release of ESM1.6 (
access-esm1p6/dev_2025.04.000) with Intel classic compiler2021.10.0, and using these fortran flags'fflags="-fprotect-parens -assume nan_compares -assume ieee_compares -fpe0 -traceback -check all -init=snan -init=array -init=huge"'formom5,cice4andum7.The config uses 12 CICE cpus and I get 12 instances of the same error in the pbs log, followed by a SIGTERM and a traceback from UM7 routines (which I expect are a red-herring. A terminating CICE causes SIGTERM to be sent to the other running processes and that's causes the traceback). The run directory is here:
/home/593/ms2335/perf-opt-classic-esm1.6/sapphirerapids/access-esm1.6-PI-sapphirerapids-416-cores-classic-deployed-exe-to-test-sanitizer-and-div-by-zero-esm1p6-pr74The released version itself (i.e., without any compiler, compiler flags modifications) runs fine.
Pinging @anton-seaice @chrisb13