Skip to content

Remove fixsystem from the build system#1121

Open
mjs2369 wants to merge 15 commits into
mainfrom
fixsystem
Open

Remove fixsystem from the build system#1121
mjs2369 wants to merge 15 commits into
mainfrom
fixsystem

Conversation

@mjs2369

@mjs2369 mjs2369 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description:

In Fortran 2008 and later standards, EXECUTE_COMMAND_LINE is the preferred intrinsic for executing system commands.

With the use of the Fortran intrinsic execute_command_line, fixsystem is no longer needed for fixing compiler dependencies in the code

This PR replaces calls to the C system function and Fortran instrinsic sleep() with execute_command_line and replaces 'call exit(error_code)' with 'stop error_code'; in the past, these functions either required or rejected interfaces/use statements depending on the compiler (NAG, gfortran). Both execute_command_line and stop work with all compilers and therefore removes the need for these blocks of compiler dependent source code.

This means that the DART code no longer needs to be changed with fixsystem (by commenting in/out these specific blocks of code). mkmf + fixsystem are also no longer reliant on the FC variable in the makefile being the name of the compiler

An additional benefit as well to no longer having fixsystem is that the mpi utilities mods are not automatically edited upon running quickbuild with certain compilers, therefore messing up your git working tree.

This does not change the way in which the user builds DART or interacts with DART.

Link to the spec for more info - https://docs.google.com/document/d/1xOb25pg8vJhHJy7xu6CtFFH_AklQbxpVDB7o8qNfGKs/edit?tab=t.0

I also updated various bits and bobs of the code to be consistent with these changes.

  • In the developer_tests/mpi_utilities/tests , removed the use of fixsystem. These dev tests do not build with quickbuild though ; see mpi developer tests #1123
  • Replaced 'call exit(error_code)' with stop error_code in WRF utility programs, developer_tests/forward_operators/rttov_unit_tests.f90, and code in NCEP prep_burf/convert_bufr

Note for @hkershawbrown - if we just call execute_command_line directly in exit_all, we don’t strictly NEED to move the exit_all subroutine from the mpi_utilities mods to its own module. BUT this is maybe still something we want to do. Maybe this would be best in a separate PR as well

Fixes issue

Fixes #870
Also mentioned in #595

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Update to the build system

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Built and ran lorenz_96 with all mpi options and NAG (Izumi), gfort, ifx compilers

Repeated this after I added some code to force DART to call sleep and exit; works great and is sleeping and exiting appropriately; also repeated this with async4_verbose = .true. causing execute_command_line to be called

Ran quickbuild for some larger models as well with NAG (WRF, cam-fv)
run_all_quickbuilds with all compilers on Derecho - identical failures to that when run on the main branch

NOTE here that the NAG build is failing with mpif08, but this is independent of these changes. This is an issue with the outdated version of the NAG compiler and the mpif08 module:

NAG Fortran Compiler Release 7.0(Yurakucho) Build 7005
Option warning: The -mismatch option disables the -C=calls option
Fatal Error: /home/masmith/DART/assimilation_code/modules/utilities/mpif08_utilities_mod.f90, line 48: Cannot find module MPI_F08
             detected at MPI_F08@<end-of-statement>

Supposedly the later versions of NAG (7.1) fix this issue, but 7.0 is the latest version available on Izumi. Here is the separate issue for this #1122

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

@hkershaw-brown

Copy link
Copy Markdown
Member

very exciting Marlee

mjs2369 added 2 commits June 8, 2026 12:58
…s of random code - WRF utility programs, developer_tests/forward_operators/rttov_unit_tests.f90, and code in NCEP prep_burf/convert_bufr
@mjs2369 mjs2369 marked this pull request as ready for review June 8, 2026 20:42
@mjs2369 mjs2369 requested a review from hkershaw-brown June 8, 2026 20:42
if( COMMAND_ARGUMENT_COUNT() .ne. 13 ) then
print*, 'INCORRECT # OF ARGUMENTS ON COMMAND LINE: ', COMMAND_ARGUMENT_COUNT()
call exit(1)
stop 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes because of fixstystem being removed, or are they due to exit() no being fortran standard and causing problems with compilers (nvhpc?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By these changes I mean the exit(X) in the wrf_dart_utiilies, prepbufr code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are actually for NAG compatibility - NAG can't use exit(X) without an interface being added to the code

nvhpc can't use 'error stop X' but can use 'stop X'

Comment thread assimilation_code/modules/utilities/mpi_utilities_mod.f90
Comment thread assimilation_code/modules/utilities/mpi_utilities_mod.f90 Outdated
!> resolutions for the minimum time it will sleep.
!> Subroutine, no return value.

subroutine sleep_seconds(naplength)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine is not called in DART, but it is public. Wonder if there are people out there making use of this routine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good note to keep on this PR

!> Signals to the other MPI tasks that something bad happened and they
!> should also exit.

subroutine exit_all(exit_code)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point the exit logic from utilities should be revisited.
This pull request is skirting around the issue to only deal with fixsystem.

! this routine is either in the null_mpi_utilities_mod.f90, or in
! the mpi_utilities_mod.f90 file, but it is not a module subroutine.
! the mpi files use this module, and you cannot have circular module
! references. the point of this call is that in the mpi multi-task
! case, you want to call MPI_Abort() to kill the other tasks associated
! with this job when you exit. in the non-mpi case, it just calls exit.
interface
subroutine exit_all(exitval)
integer, intent(in) :: exitval
end subroutine exit_all
end interface

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good note!

Comment thread assimilation_code/modules/utilities/mpif08_utilities_mod.f90
if (errcode /= MPI_SUCCESS) then
write(*, *) 'MPI_Comm_rank returned error code ', errcode
call exit(-99)
error stop errcode

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain to me

stop exit_code

vs/

error stop errcode

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkershaw-brown

stop exit_code - normal termination
error stop exit_code - error termination

These two termination processes are made to work with Coarray, which is a Fortran mechanism for parallelism. Coarray uses this concept of "images", which is conceptually the same thing as an MPI task. So the two termination processes differ in the way they handle synchronizing multiple images upon termination. Obviously, we use MPI, so they are effectively the same to DART.

The only other difference is the error code they return if one isn't specified: "If stop-code is a character expression, STOP returns a status of zero and ERROR STOP returns a status of non-zero. If stop-code is an integer, a status equal to stop-code is returned for both STOP and ERROR_STOP." But since we are always passing an integer code to stop, again these two statements behave the same in the DART code.

And having run both versions, the behavior was exactly the same.

We only call stop from either the null_mpi_utilities_mod or before initialize_mpi_utilities, so we also don't have to be concerned with dealing with MPI at all. Once we initialize MPI, we use MPI_abort instead.

So we actually never needed to use 'error stop' over 'stop'; so it works out that the regular stop is supported by all the compilers, while 'error stop' is incompatible with nvhpc

I forgot to replace these in the non-null versions of the mpi utils mods as shown above, but I've done so now.

Eventually, I think we could reorg a bit and then use the error handler instead

Comment thread build_templates/mkmf Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well take out the svn junk at the top if you're taking the bottom svn junk out

- DART $Id$

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkershaw-brown see this commit - f13e335

In order to remove $Id$, I had to also remove the $version stuff. Which was junk anyways cause $Id$ wasn't doing anything for us.

Basically the only real change this causes is instead of putting "# Makefile created by mkmf $Id$ " at the top of the Makefile, it puts "# Makefile created by mkmf" instead. Which is good cause "$Id$" tells us nothing anyways

This is kind of a separate issue though, I can move it to a different PR if needed. Lmk

Comment thread developer_tests/build_everything/run_all_quickbuilds.sh
Comment thread developer_tests/mpi_utilities/tests/ftest_mpi.f90
@@ -120,28 +69,22 @@ program ftest_mpi
contains

!> wrapper so you only have to make this work in a single place

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this with csh?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkershaw-brown I did. I did so by doing call do_execute_command_line("csh -c 'echo hello world'", rc) on line 56 instead of just using bash by default.

I also wrote a tiny bit of code to verify it's actually running in csh by using setenv. This is csh-specific syntax that would fail in bash:

! Test execute_command_line using csh explicitly
! setenv is csh-specific syntax; if rc==0, csh ran it successfully
call do_execute_command_line("csh -c 'setenv TESTVAR hello && echo $TESTVAR'", rc)
if (rc /= 0) then
   print *, 'csh execute_command_line test FAILED - rc = ', rc
else
   print *, 'csh execute_command_line test passed'
endif

If rc comes back non-zero, then the command failed. But it returned 0 so success:

masmith@derecho4:/glade/derecho/scratch/masmith/DART11.24.0/DART/developer_tests/mpi_utilities/tests> ./ftest_mpi 
 program start
 MPI initialized successfully
 My MPI rank is:            0
 Total MPI tasks:            1
hello world
 All MPI calls succeeded, test passed.
 program end
hello
 csh execute_command_line test passed

@mjs2369 mjs2369 requested a review from hkershaw-brown June 9, 2026 20:12
@hkershaw-brown

Copy link
Copy Markdown
Member

awesome thanks Marlee, it might be Friday before I look at this. Cheers, Helen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: mkmf + fixsystem relies on the FC variable in the makefile being the name of the compiler

2 participants