Conversation
…replacing C system function with F2008 intrinsic subroutine execute_command_line
…with command line versions using execute_command_line() to work with NAG
|
very exciting Marlee |
…s of random code - WRF utility programs, developer_tests/forward_operators/rttov_unit_tests.f90, and code in NCEP prep_burf/convert_bufr
| if( COMMAND_ARGUMENT_COUNT() .ne. 13 ) then | ||
| print*, 'INCORRECT # OF ARGUMENTS ON COMMAND LINE: ', COMMAND_ARGUMENT_COUNT() | ||
| call exit(1) | ||
| stop 1 |
There was a problem hiding this comment.
are these changes because of fixstystem being removed, or are they due to exit() no being fortran standard and causing problems with compilers (nvhpc?)
There was a problem hiding this comment.
By these changes I mean the exit(X) in the wrf_dart_utiilies, prepbufr code
There was a problem hiding this comment.
These changes are actually for NAG compatibility - NAG can't use exit(X) without an interface being added to the code
nvhpc can't use 'error stop X' but can use 'stop X'
| !> resolutions for the minimum time it will sleep. | ||
| !> Subroutine, no return value. | ||
|
|
||
| subroutine sleep_seconds(naplength) |
There was a problem hiding this comment.
This routine is not called in DART, but it is public. Wonder if there are people out there making use of this routine.
There was a problem hiding this comment.
good note to keep on this PR
| !> Signals to the other MPI tasks that something bad happened and they | ||
| !> should also exit. | ||
|
|
||
| subroutine exit_all(exit_code) |
There was a problem hiding this comment.
At some point the exit logic from utilities should be revisited.
This pull request is skirting around the issue to only deal with fixsystem.
DART/assimilation_code/modules/utilities/utilities_mod.f90
Lines 89 to 99 in 890a3be
| if (errcode /= MPI_SUCCESS) then | ||
| write(*, *) 'MPI_Comm_rank returned error code ', errcode | ||
| call exit(-99) | ||
| error stop errcode |
There was a problem hiding this comment.
explain to me
stop exit_code
vs/
error stop errcode
There was a problem hiding this comment.
stop exit_code - normal termination
error stop exit_code - error termination
These two termination processes are made to work with Coarray, which is a Fortran mechanism for parallelism. Coarray uses this concept of "images", which is conceptually the same thing as an MPI task. So the two termination processes differ in the way they handle synchronizing multiple images upon termination. Obviously, we use MPI, so they are effectively the same to DART.
The only other difference is the error code they return if one isn't specified: "If stop-code is a character expression, STOP returns a status of zero and ERROR STOP returns a status of non-zero. If stop-code is an integer, a status equal to stop-code is returned for both STOP and ERROR_STOP." But since we are always passing an integer code to stop, again these two statements behave the same in the DART code.
And having run both versions, the behavior was exactly the same.
We only call stop from either the null_mpi_utilities_mod or before initialize_mpi_utilities, so we also don't have to be concerned with dealing with MPI at all. Once we initialize MPI, we use MPI_abort instead.
So we actually never needed to use 'error stop' over 'stop'; so it works out that the regular stop is supported by all the compilers, while 'error stop' is incompatible with nvhpc
I forgot to replace these in the non-null versions of the mpi utils mods as shown above, but I've done so now.
Eventually, I think we could reorg a bit and then use the error handler instead
There was a problem hiding this comment.
May as well take out the svn junk at the top if you're taking the bottom svn junk out
- DART $Id$There was a problem hiding this comment.
@hkershaw-brown see this commit - f13e335
In order to remove
Basically the only real change this causes is instead of putting "# Makefile created by mkmf
This is kind of a separate issue though, I can move it to a different PR if needed. Lmk
| @@ -120,28 +69,22 @@ program ftest_mpi | |||
| contains | |||
|
|
|||
| !> wrapper so you only have to make this work in a single place | |||
There was a problem hiding this comment.
Have you tried this with csh?
There was a problem hiding this comment.
@hkershaw-brown I did. I did so by doing call do_execute_command_line("csh -c 'echo hello world'", rc) on line 56 instead of just using bash by default.
I also wrote a tiny bit of code to verify it's actually running in csh by using setenv. This is csh-specific syntax that would fail in bash:
! Test execute_command_line using csh explicitly
! setenv is csh-specific syntax; if rc==0, csh ran it successfully
call do_execute_command_line("csh -c 'setenv TESTVAR hello && echo $TESTVAR'", rc)
if (rc /= 0) then
print *, 'csh execute_command_line test FAILED - rc = ', rc
else
print *, 'csh execute_command_line test passed'
endif
If rc comes back non-zero, then the command failed. But it returned 0 so success:
masmith@derecho4:/glade/derecho/scratch/masmith/DART11.24.0/DART/developer_tests/mpi_utilities/tests> ./ftest_mpi
program start
MPI initialized successfully
My MPI rank is: 0
Total MPI tasks: 1
hello world
All MPI calls succeeded, test passed.
program end
hello
csh execute_command_line test passed
… comments , removing unused optional args
|
awesome thanks Marlee, it might be Friday before I look at this. Cheers, Helen |
Description:
In Fortran 2008 and later standards, EXECUTE_COMMAND_LINE is the preferred intrinsic for executing system commands.
With the use of the Fortran intrinsic execute_command_line, fixsystem is no longer needed for fixing compiler dependencies in the code
This PR replaces calls to the C system function and Fortran instrinsic sleep() with execute_command_line and replaces 'call exit(error_code)' with 'stop error_code'; in the past, these functions either required or rejected interfaces/use statements depending on the compiler (NAG, gfortran). Both execute_command_line and stop work with all compilers and therefore removes the need for these blocks of compiler dependent source code.
This means that the DART code no longer needs to be changed with fixsystem (by commenting in/out these specific blocks of code). mkmf + fixsystem are also no longer reliant on the FC variable in the makefile being the name of the compiler
An additional benefit as well to no longer having fixsystem is that the mpi utilities mods are not automatically edited upon running quickbuild with certain compilers, therefore messing up your git working tree.
This does not change the way in which the user builds DART or interacts with DART.
Link to the spec for more info - https://docs.google.com/document/d/1xOb25pg8vJhHJy7xu6CtFFH_AklQbxpVDB7o8qNfGKs/edit?tab=t.0
I also updated various bits and bobs of the code to be consistent with these changes.
Note for @hkershawbrown - if we just call execute_command_line directly in exit_all, we don’t strictly NEED to move the exit_all subroutine from the mpi_utilities mods to its own module. BUT this is maybe still something we want to do. Maybe this would be best in a separate PR as well
Fixes issue
Fixes #870
Also mentioned in #595
Types of changes
Update to the build system
Documentation changes needed?
Tests
Built and ran lorenz_96 with all mpi options and NAG (Izumi), gfort, ifx compilers
Repeated this after I added some code to force DART to call sleep and exit; works great and is sleeping and exiting appropriately; also repeated this with async4_verbose = .true. causing execute_command_line to be called
Ran quickbuild for some larger models as well with NAG (WRF, cam-fv)
run_all_quickbuilds with all compilers on Derecho - identical failures to that when run on the main branch
NOTE here that the NAG build is failing with mpif08, but this is independent of these changes. This is an issue with the outdated version of the NAG compiler and the mpif08 module:
Supposedly the later versions of NAG (7.1) fix this issue, but 7.0 is the latest version available on Izumi. Here is the separate issue for this #1122
Checklist for merging
Checklist for release
Testing Datasets