-
Notifications
You must be signed in to change notification settings - Fork 37
segmentation fault during shutdown, sedgem, can't generate output for restart #264
Description
This is using release v0.9.33. On completing the simulation, shutdown fails at sedgem shutdown. Is there any way to recover without having to rerun the experiment? This is a spinup simulation, running 1e6 years without acceleration. What I would like to do is use this as a restart file for another spinup simulation, but attempting to do so produces a similar segmentation fault originating with the same line number in genie.job, but after only a few years of simulation/saving. Below is the error message.
Initialising SEDGEM module shutdown ...
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f8b700218c2 in ???
derpycode/muffindoc#1 0x7f8b70020a55 in ???
derpycode/muffindoc#2 0x7f8b6fd6204f in ???
at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
derpycode/muffindoc#3 0x563636d2b3db in ???
derpycode/muffindoc#4 0x563636d05838 in ???
derpycode/muffindoc#5 0x56363696a9ca in ???
derpycode/muffindoc#6 0x563636973f5e in ???
derpycode/muffindoc#7 0x56363695291e in ???
derpycode/muffindoc#8 0x7f8b6fd4d249 in __libc_start_call_main
at ../sysdeps/nptl/libc_start_call_main.h:58
derpycode/muffindoc#9 0x7f8b6fd4d304 in __libc_start_main_impl
at ../csu/libc-start.c:360
derpycode/muffindoc#10 0x563636952940 in ???
derpycode/muffindoc#11 0xffffffffffffffff in ???
./genie.job: line 357: 1263740 Segmentation fault ./genie.exe
real 26121m14.894s
user 26117m14.995s
sys 1m44.597s
cp: cannot stat 'fort.2': No such file or directory
ERROR: !!!!!!!!!! ERROR PROCESSING !!!!!!!!!!
Thanks in advance for suggestions on how to proceed.
Per advice I am putting the base and user-config file I was using, and the entire output of the 1 Myr experiment, and the restart file it started from, here:
https://umd.box.com/s/qhp196dotupisnd8ufnbkxvjfpvmm7qj
Run command was:
./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4 1000000 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN3c
I can also make a copy of the user-config and run from the failed user-config as restart, and see the same error crop up in just the first few years, e.g.
cp -rp exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN5
./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN5 100 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4
Let me know if you can see it OK?
Thank you!