Skip to content

Error in CICE4 when compiling with error-checking flags #19

@manodeep

Description

@manodeep

While investigating the divide-by-zero error in mom5 with oneAPI, I have uncovered a different error in CICE4 -- ice_IOUnitsGet: No free units, which originates from this line of CICE code. I get the same error with oneAPI 2025 - so presumably this is an underlying code issue.

The exe is here -- which is compiling the latest release of ESM1.6 (access-esm1p6/dev_2025.04.000) with Intel classic compiler 2021.10.0, and using these fortran flags 'fflags="-fprotect-parens -assume nan_compares -assume ieee_compares -fpe0 -traceback -check all -init=snan -init=array -init=huge"' for mom5, cice4 and um7.

The config uses 12 CICE cpus and I get 12 instances of the same error in the pbs log, followed by a SIGTERM and a traceback from UM7 routines (which I expect are a red-herring. A terminating CICE causes SIGTERM to be sent to the other running processes and that's causes the traceback). The run directory is here: /home/593/ms2335/perf-opt-classic-esm1.6/sapphirerapids/access-esm1.6-PI-sapphirerapids-416-cores-classic-deployed-exe-to-test-sanitizer-and-div-by-zero-esm1p6-pr74

ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
orrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
libpthread-2.28.s  000014E70A542D10  Unknown               Unknown  Unknown
um_hg3.exe         000000000041061D  Unknown               Unknown  Unknown
um_hg3.exe         000000000100757C  read_multi_               996  read_multi.f90
um_hg3.exe         0000000000E4D348  um_readdump_             1605  um_readdump.f90
um_hg3.exe         0000000000C7A040  initdump_                6686  initdump.f90
um_hg3.exe         00000000006245D8  initial_                 6388  initial.f90
um_hg3.exe         00000000004434A1  Unknown               Unknown  Unknown
um_hg3.exe         0000000000417376  um_shell_                3930  um_shell.f90
um_hg3.exe         00000000004104C8  MAIN__                     40  flumeMain.f90
um_hg3.exe         000000000041040D  Unknown               Unknown  Unknown
libc-2.28.so       000014E709F907E5  __libc_start_main     Unknown  Unknown
um_hg3.exe         000000000041032E  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
libpthread-2.28.s  0000151689D39D10  Unknown               Unknown  Unknown
libucp.so.0.0.0    0000151685AD77D8  ucp_worker_progre     Unknown  Unknown
libmpi.so.40.30.5  000015168A374BFF  mca_pml_ucx_recv      Unknown  Unknown
...

The released version itself (i.e., without any compiler, compiler flags modifications) runs fine.

Pinging @anton-seaice @chrisb13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions