Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Issue with getting WACCMX running on new machine

Hello all,

I am trying to get WACCMX up and running on our Hera machine, but I have run into an issue where the simulation fails at about the 30 sec mark. Please note that I am using CESM2.1.3 and we already have been using "normal" WACCM successfully for some time now. We are using intel as a compiler, impi, and an older version of ESMF. Here is the error message:

Abort(805523716) on node 443 (rank 443 in comm 0): Fatal error in PMPI_Ibsend: Invalid tag, error stack:
PMPI_Ibsend(116): Invalid tag, value is 4230443
Abort(671305988) on node 107 (rank 107 in comm 0): Fatal error in PMPI_Ibsend: Invalid tag, error stack:
PMPI_Ibsend(116): Invalid tag, value is 1270107

I am not exactly sure how to address this issue. I have done a little looking around online, and I think it has possibly something to do with undeclared variables, but quite frankly I am not sure what do do about this error.

Any insight would be appreciated!
Thank you,
Chris Maloney
 

sacks

Bill Sacks
CSEG and Liaisons
Staff member
I have a few thoughts, following from our discussion on the ESMF support list, and based on this similar thread: MPI tag exceeds limit when using >240 MPI tasks

(1) Can you try this with fewer processors to see if that solves the problem as in the above thread?

(2) As in the above thread, can you change some settings to avoid use of the ESMF library to see if that solves the problem?

(3) Assuming things work without the ESMF library (in (2)): I forget if you have tried this with the latest version of ESMF (8.4.2). If not, can you try that?
 
Thanks for the response Bill,

I am hesitant to reduce the processor numbers as WACCMX is already relatively slow, but it appears that option #2 has gotten around the issue. I've successfully completed a short run by turning off ESMF in 'env_build.xml', rebuilding with '-waccmx wxi' and '-nlevs 126'. I will also try downloading the newest version of ESMF and see if that works. Having all of WACCMX faculties working for these simulations would be ideal, but just having it run is a great start!

I appreciate your help on this!
Chris Maloney
 

sacks

Bill Sacks
CSEG and Liaisons
Staff member
Sounds great. My thought with (1) is that it might help narrow down the problem, not that you'd necessarily want to run that way for real. But trying (3) next sounds like a good approach.
 
Top