Hi all,
I am running a F composet (
2000_CAM60_CLM51%SP_CICE%PRES_DOCN%DOM_SROF_SGLC_SWAV
) simulation . When I ./xmlchange NTASKS=1280, the model stopped quickly with an error from PET0088.ESMF_LogFile:
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Faxx_taux
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Faxx_tauy
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Sx_u10
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2566 ABORTING JOB
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_phases_prep_atm_mod.F90:232 Failure - Passing error in return code
20240412 090005.558 ERROR PET0088 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:2207 Failure - Passing error in return code
I also checked the CAM log. It stopped at reading the file below :
(shr_strdata_readstrm) reading file ub: /work/n02/n02/yuansun/cesm/cesm_inputdata/lnd/clm2/ndepdata/fndep_clm_hist_b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensmean_1849-2015_monthly_0.9x1.25_c180926.nc 1813
FV subcycling - nv, n2, nsplit, dt = 2 1 4 225.00000000000000
Divergence damping: use 4th order damping
nstep, te 0 0.25979474672344298E+10 0.25979474672344298E+10 -0.00000000000000000E+00 0.98517053779670925E+05 0.22552395239472389E+03
chem_surfvals_set: ncdate= 20000101 co2vmr= 3.7037219767789082E-004
chem_surfvals_set: ch4vmr= 1.7796140990627879E-006 n2ovmr= 3.1559961319861702E-007 f11vmr= 6.9287665509229100E-010 f12vmr= 5.3898092182081726E-010
READ_NEXT_TRCDATA emiss_anthro
When I tried with ./xmlchange NTASKS=128, the case ran without error but was super slow. Do I need to consider computing balance issues? When I set NTASKS_ATM= 1280 and NTASKS _LND=128, the simulations ran for several minutes but stopped again. I found several suggestion for B compset, but I am still not clear about the F comspet setting (PE layouts).
Thanks for any comments.
Best
Yuan
I am running a F composet (
2000_CAM60_CLM51%SP_CICE%PRES_DOCN%DOM_SROF_SGLC_SWAV
) simulation . When I ./xmlchange NTASKS=1280, the model stopped quickly with an error from PET0088.ESMF_LogFile:
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Faxx_taux
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Faxx_tauy
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2561 ERROR: 1 nans found in Sx_u10
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2566 ABORTING JOB
20240412 090005.558 ERROR PET0088 /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/components/cmeps/cime_config/../mediator/med_phases_prep_atm_mod.F90:232 Failure - Passing error in return code
20240412 090005.558 ERROR PET0088 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:2207 Failure - Passing error in return code
I also checked the CAM log. It stopped at reading the file below :
(shr_strdata_readstrm) reading file ub: /work/n02/n02/yuansun/cesm/cesm_inputdata/lnd/clm2/ndepdata/fndep_clm_hist_b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensmean_1849-2015_monthly_0.9x1.25_c180926.nc 1813
FV subcycling - nv, n2, nsplit, dt = 2 1 4 225.00000000000000
Divergence damping: use 4th order damping
nstep, te 0 0.25979474672344298E+10 0.25979474672344298E+10 -0.00000000000000000E+00 0.98517053779670925E+05 0.22552395239472389E+03
chem_surfvals_set: ncdate= 20000101 co2vmr= 3.7037219767789082E-004
chem_surfvals_set: ch4vmr= 1.7796140990627879E-006 n2ovmr= 3.1559961319861702E-007 f11vmr= 6.9287665509229100E-010 f12vmr= 5.3898092182081726E-010
READ_NEXT_TRCDATA emiss_anthro
When I tried with ./xmlchange NTASKS=128, the case ran without error but was super slow. Do I need to consider computing balance issues? When I set NTASKS_ATM= 1280 and NTASKS _LND=128, the simulations ran for several minutes but stopped again. I found several suggestion for B compset, but I am still not clear about the F comspet setting (PE layouts).
Thanks for any comments.
Best
Yuan