Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Mpirun and Job Submission Errors

Will Smith

Will Smith
New Member
Hi there,

I am currently encountering issues while running a case with CESM2.3.alpha17. The problems involve an mpirun error, along with the messages "Submitted job case.run with id None" and "Submitted job case.st_archive with id None". I have attached the cesm.log file, other relevant case files, and a screenshot of the MPI library settings for your review.

Here are the settings I used for the case:
./xmlchange NTASKS=2
./xmlchange STOP_OPTION=ndays
./xmlchange STOP_N=1
./xmlchange RESUBMIT=0
./xmlchange NTHRDS=1
./xmlchange JOB_WALLCLOCK_TIME=96:00:00
./xmlchange CREATE_ESMF_PET_FILES=TRUE
./case.setup
./preview_namelists
echo "empty_htapes = .true." >> user_nl_cam
echo "hist_empty_htapes = .true." >> user_nl_clm
echo "rtmhist_nhtfrq = -876000" >> user_nl_mosart
echo "history_frequency = 100" >> user_nl_cism
./preview_namelists
./case.setup --reset
./case.build --skip-provenance-check
./preview_run
./case.submit

The attached screenshot shows the settings related to the MPI library, which I suspect might be linked to the issues I'm experiencing. Could you please help me identify where things might be going wrong?

Thank you for your assistance.

Best,
Will
 

Attachments

  • 2024-05-01 10.08.56.png
    2024-05-01 10.08.56.png
    80.2 KB · Views: 5
  • test7.6.1.sh.e4924383.zip
    480 bytes · Views: 1
  • test7.6.1.sh.o4924383.zip
    3.1 KB · Views: 1
  • cesm.log.240430-104502.zip
    6.1 KB · Views: 1

Will Smith

Will Smith
New Member
For reference,

Many thanks
 

Attachments

  • test7.6.1.sh.o4924383.txt
    20.1 KB · Views: 0
  • test7.6.1.sh.e4924383.txt
    513 bytes · Views: 1
  • cesm.log.240430-104502.txt
    21.5 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
According to the cesm log you are failing due to an incompatible netcdf file. I can't tell from that log which file you are having the problem with,
that's probably in one of the component logs. The message is

bort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
 

Will Smith

Will Smith
New Member
According to the cesm log you are failing due to an incompatible netcdf file. I can't tell from that log which file you are having the problem with,
that's probably in one of the component logs. The message is

bort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Hi Jedwards,

Thanks for your info. I'v checked the log files of the other components, but I don't seem to find issues regarding netcdf. Could you please provide a more detailed explanation? The attached are the log files of the other components and a screenshot of netcdf version for your review.

Best,
Will
 

Attachments

  • 2024-05-01 16.37.50.png
    2024-05-01 16.37.50.png
    26 KB · Views: 4
  • atm.log.240430-104502.txt
    24.4 KB · Views: 1
  • glc.log.240430-104502.txt
    16.1 KB · Views: 1
  • lnd.log.240430-104502.txt
    91.2 KB · Views: 1
  • med.log.240430-104502.txt
    60.8 KB · Views: 1
  • rof.log.240430-104502.txt
    6.6 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
It looks like the problem may be with:
/mnt/iusers01/fatpou01/sees01/s29826zs/scratch/Projects/inputdata/atm/datm7/atm_forcing.datm7.GSWP3.0.5d.v1.c170516/Solar/clmforc.GSWP3.c2011.0.5x0.5.Solr.2000-01.nc

Please confirm the md5sum of that file is 5ee6f7fe2c4b8110a9d44a9beacc48b4

Why do you have ./xmlchange CREATE_ESMF_PET_FILES=TRUE? Are there any errors in the PET files?
 

Will Smith

Will Smith
New Member
It looks like the problem may be with:
/mnt/iusers01/fatpou01/sees01/s29826zs/scratch/Projects/inputdata/atm/datm7/atm_forcing.datm7.GSWP3.0.5d.v1.c170516/Solar/clmforc.GSWP3.c2011.0.5x0.5.Solr.2000-01.nc

Please confirm the md5sum of that file is 5ee6f7fe2c4b8110a9d44a9beacc48b4

Why do you have ./xmlchange CREATE_ESMF_PET_FILES=TRUE? Are there any errors in the PET files?
Hi,

I confirm he md5sum of that file is 5ee6f7fe2c4b8110a9d44a9beacc48b4. As for ./xmlchange CREATE_ESMF_PET_FILES=TRUE, just for the stable operation of the model.
 

Attachments

  • 2024-05-01 17.19.17.png
    2024-05-01 17.19.17.png
    11.4 KB · Views: 3

jedwards

CSEG and Liaisons
Staff member
You didn't answer about errors in the PET files - I recommend you leave that value set as FALSE.
 

Will Smith

Will Smith
New Member
You didn't answer about errors in the PET files - I recommend you leave that value set as FALSE.
Hi,
I'm sorry for the oversight earlier. I've been unable to find any information on PET files. I've set CREATE_ESMF_PET_FILES to FALSE, but I'm still facing issues with mpirun failing. I've attached the updated cesm.log for your reference. Additionally, I tested the MPI by running a basic MPI parallel programme, and I've attached a screenshot of the results. Could you offer any advice on how to resolve this issue?

Best,
Will
 

Attachments

  • cesm.log.240501-174431.txt
    21.5 KB · Views: 2
  • 2024-05-01 19.03.06.png
    2024-05-01 19.03.06.png
    82.7 KB · Views: 3

jedwards

CSEG and Liaisons
Staff member
Again - it is not an mpirun issue. It is an issue with an input or output file, I can't tell from the logs you are providing what file is causing the problem. Perhaps you should try running in debug mode:'
./xmlchange DEBUG=TRUE
./case.build --clean-all
./case.build
./case.submit
 

Will Smith

Will Smith
New Member
Again - it is not an mpirun issue. It is an issue with an input or output file, I can't tell from the logs you are providing what file is causing the problem. Perhaps you should try running in debug mode:'
./xmlchange DEBUG=TRUE
./case.build --clean-all
./case.build
./case.submit
Hi Jedwards,

I followed your advice to run the process in debug mode, but I encountered a problem during the case build phase, the case build failed. Attached is the log file that were generated for your reference. Could you please suggest any adjustments that might help resolve this issue?

Thank you for your continued support.
 

Attachments

  • CDEPS.bldlog.240502-141728.txt
    136.3 KB · Views: 1
Top