Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

cesm2.1.3 running error: forrtl: severe (174): SIGSEGV, segmentation fault occurred

Lei

Lei
New Member
Hi, I am attempting to run an CESM2.1.3 case (compset BSSP126), and I encountered an error about half an hour after submitting the case. The CESM log reads:

2 IMOD, NAPROC, NBLKRS, NSPEC, RSBLKS= 1 72 9
600 5
600 5
1 IMOD, NAPROC, NBLKRS, NSPEC, RSBLKS= 1 72 0
600 0
2 IMOD, NAPROC, NBLKRS, NSPEC, RSBLKS= 1 72 9
600 5
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
calcsize j,iq,jac, lsfrm,lstoo 1 1 1 26 21
calcsize j,iq,jac, lsfrm,lstoo 1 1 2 26 21
calcsize j,iq,jac, lsfrm,lstoo 1 2 1 22 15
calcsize j,iq,jac, lsfrm,lstoo 1 2 2 22 15
calcsize j,iq,jac, lsfrm,lstoo 1 3 1 24 17
calcsize j,iq,jac, lsfrm,lstoo 1 3 2 24 17
calcsize j,iq,jac, lsfrm,lstoo 1 4 1 25 20
calcsize j,iq,jac, lsfrm,lstoo 1 4 2 25 20
calcsize j,iq,jac, lsfrm,lstoo 1 5 1 23 19
calcsize j,iq,jac, lsfrm,lstoo 1 5 2 23 19
calcsize j,iq,jac, lsfrm,lstoo 2 1 1 21 26
calcsize j,iq,jac, lsfrm,lstoo 2 1 2 21 26
calcsize j,iq,jac, lsfrm,lstoo 2 2 1 15 22
calcsize j,iq,jac, lsfrm,lstoo 2 2 2 15 22
calcsize j,iq,jac, lsfrm,lstoo 2 3 1 17 24
calcsize j,iq,jac, lsfrm,lstoo 2 3 2 17 24
calcsize j,iq,jac, lsfrm,lstoo 2 4 1 20 25
calcsize j,iq,jac, lsfrm,lstoo 2 4 2 20 25
calcsize j,iq,jac, lsfrm,lstoo 2 5 1 19 23
calcsize j,iq,jac, lsfrm,lstoo 2 5 2 19 23
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000003AA0CF3 for__signal_handl Unknown Unknown
libpthread-2.17.s 00002AB1F15C96D0 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB1EFE50150 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB1EF688877 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB1EFE8A4DD Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB1EFE8D6F7 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB1EFE9391F MPI_Startall Unknown Unknown
libmpifort.so.12. 00002AB1F0EF706B mpi_startall Unknown Unknown
cesm.exe 0000000002F4D295 Unknown Unknown Unknown
cesm.exe 0000000002ED6322 Unknown Unknown Unknown
cesm.exe 0000000000440127 Unknown Unknown Unknown
cesm.exe 000000000041EAA6 Unknown Unknown Unknown
cesm.exe 000000000043FCB2 Unknown Unknown Unknown
cesm.exe 000000000041B9A2 Unknown Unknown Unknown
libc-2.17.so 00002AB1F17F8445 __libc_start_main Unknown Unknown
cesm.exe 000000000041B8A9 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000003AA0CF3 for__signal_handl Unknown Unknown
libpthread-2.17.s 00002AB0486B86D0 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB046F3F150 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB046777877 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB046F794DD Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB046F7C6F7 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AB046F8291F MPI_Startall Unknown Unknown
libmpifort.so.12. 00002AB047FE606B mpi_startall Unknown Unknown
cesm.exe 0000000002F4D295 Unknown Unknown Unknown
cesm.exe 0000000002ED6322 Unknown Unknown Unknown
cesm.exe 0000000000440127 Unknown Unknown Unknown
cesm.exe 000000000041EAA6 Unknown Unknown Unknown
cesm.exe 000000000043FCB2 Unknown Unknown Unknown
cesm.exe 000000000041B9A2 Unknown Unknown Unknown
libc-2.17.so 00002AB0488E7445 __libc_start_main Unknown Unknown
cesm.exe 000000000041B8A9 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000003AA0CF3 for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B2DAD08A6D0 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B2DAB911150 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B2DAB149877 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B2DAB94B4DD Unknown Unknown Unknown
libmpi.so.12.0.0 00002B2DAB94E6F7 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B2DAB95491F MPI_Startall Unknown Unknown
libmpifort.so.12. 00002B2DAC9B806B mpi_startall Unknown Unknown
cesm.exe 0000000002F4D295 Unknown Unknown Unknown
cesm.exe 0000000002ED6322 Unknown Unknown Unknown
cesm.exe 0000000000440127 Unknown Unknown Unknown
cesm.exe 000000000041EAA6 Unknown Unknown Unknown
cesm.exe 000000000043FCB2 Unknown Unknown Unknown
cesm.exe 000000000041B9A2 Unknown Unknown Unknown
libc-2.17.so 00002B2DAD2B9445 __libc_start_main Unknown Unknown
cesm.exe 000000000041B8A9 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000003AA0CF3 for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B9D8D3C66D0 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B9D8BC4D150 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B9D8B485877 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B9D8BC874DD Unknown Unknown Unknown
libmpi.so.12.0.0 00002B9D8BC8A6F7 Unknown Unknown Unknown
libmpi.so.12.0.0 00002B9D8BC9091F MPI_Startall Unknown Unknown
libmpifort.so.12. 00002B9D8CCF406B mpi_startall Unknown Unknown
cesm.exe 0000000002F4D295 Unknown Unknown Unknown
cesm.exe 0000000002ED6322 Unknown Unknown Unknown
cesm.exe 0000000000440127 Unknown Unknown Unknown
cesm.exe 000000000041EAA6 Unknown Unknown Unknown
cesm.exe 000000000043FCB2 Unknown Unknown Unknown
cesm.exe 000000000041B9A2 Unknown Unknown Unknown
libc-2.17.so 00002B9D8D5F5445 __libc_start_main Unknown Unknown
cesm.exe 000000000041B8A9 Unknown Unknown Unknown

I run this case on server that used intel and impi version 2019, ulimit setting as follow:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 254621
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


I would be happy about any help!
 

Attachments

  • config_batch.xml.txt
    23.2 KB · Views: 1
  • config_compilers.xml.txt
    42.8 KB · Views: 2
  • config_machines.xml.txt
    109.8 KB · Views: 3

Lei

Lei
New Member
These are log files.
 

Attachments

  • wav.log.28490.230723-202648.txt
    2.1 KB · Views: 1
  • rof.log.28490.230723-202648.txt
    15.4 KB · Views: 0
  • ocn.log.28490.230723-202648.txt
    935.7 KB · Views: 0
  • lnd.log.28490.230723-202648.txt
    207.2 KB · Views: 2
  • ice.log.28490.230723-202648.txt
    37.3 KB · Views: 0
  • glc.log.28490.230723-202648.txt
    18.9 KB · Views: 0
  • cpl.log.28490.230723-202648.txt
    87.8 KB · Views: 1
  • atm.log.28490.230723-202648.txt
    374.2 KB · Views: 0
  • cesm.log.28490.230723-202648.zip
    52.1 KB · Views: 7

slevis

Moderator
The "segmentation fault" makes me wonder whether you changed the code or the input data in some way before submitting this simulation. If so, then I would have you first make sure that the model runs without any changes to the code or input data. If, however, you have not changed anything, then you may have a porting issue, and I will let others try to help.
 

Lei

Lei
New Member
The "segmentation fault" makes me wonder whether you changed the code or the input data in some way before submitting this simulation. If so, then I would have you first make sure that the model runs without any changes to the code or input data. If, however, you have not changed anything, then you may have a porting issue, and I will let others try to help.
Thank you very much for your reply! I didn’t try to change any code, and whether I have change the input data or not, the same error occurs. In this case I don’t change any input data, but in another case, I changed the input data of cam (co2flux_fuel_file and aircraft_co2_file), the same segmentation fault occurred about half an hour after submitting. In fact, I try to use different compsets to build case to test if they still show errors, and they all failed because of the same "segmentation fault" error.
 

xxr

xuxr123
New Member
Thank you very much for your reply! I didn’t try to change any code, and whether I have change the input data or not, the same error occurs. In this case I don’t change any input data, but in another case, I changed the input data of cam (co2flux_fuel_file and aircraft_co2_file), the same segmentation fault occurred about half an hour after submitting. In fact, I try to use different compsets to build case to test if they still show errors, and they all failed because of the same "segmentation fault" error.
Hi! Have you solved this issue? I'm encountering the same problem. I'm using the FW compset freerun and getting the same error.
 

wfc1102@163_com

New Member
These are log files.
Your cesm log shows "no dedicated output process, any file system" and libmpi issues. This is probably due to that different mpi versions are used during building and running. Please check carefully.
 
Top