Skip to content

Adding configuration option to control whether catchment output is written#943

Merged
aaraney merged 10 commits into
NOAA-OWP:masterfrom
robertbartel:f/cat_out_opt/main
Mar 12, 2026
Merged

Adding configuration option to control whether catchment output is written#943
aaraney merged 10 commits into
NOAA-OWP:masterfrom
robertbartel:f/cat_out_opt/main

Conversation

@robertbartel

@robertbartel robertbartel commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Adding config option disable_catchment_output to control catchment output writing, which is false by default. This was cherry-picked from the changes proposed previously in #897.

Additionally, making updates to associated documentation on realization config structure (both directly related to new option and to improve structure around where that addition was made).

Finally, adding some unit tests for the function reading this setting from the config file in the Formulation_Manager class.

JoshCu and others added 3 commits March 10, 2026 15:02
Update to include info on new 'write_catchment_output' config option,
and doing some restructuring to the doc overall to better organize and
include a few things that were missing (and on the same level as this
new item, so now somewhat glaring omissions).
@robertbartel robertbartel added the enhancement New feature or request label Mar 10, 2026
@aaraney

aaraney commented Mar 10, 2026

Copy link
Copy Markdown
Member

This may be an extreme take, but i'm of the opinion this should either be a compile time option or we add options output_root_nexus and output_root_catchment similar to the existing output_root option. The fundamentally changes the behavior of the framework in a way no other configuration option does.

@robertbartel

Copy link
Copy Markdown
Contributor Author

The fundamentally changes the behavior of the framework in a way no other configuration option does.

Even if that's correct, this change seems limited enough in scope that I don't see why that becomes a problem.

Also, I thought it was possible to build ngen with routing support and then configure and run a simulation that doesn't include the routing (if somehow I'm forgetting that it's not, honestly I think that's a flaw). Assuming that's right, I see that as very similar: effectively, letting the user choose whether to run extended functionality with results of the main simulation.

Setting different output roots doesn't seem unreasonable, but I think that's a totally different scope. It would neither inhibit nor be inhibited by this change.

A compile time option might also have merit - that is, make it possible to compile ngen such that catchment output cannot_ be written, even if the user want to - but I also think that's out of scope. And I don't think that is a sufficient replacement for giving the user control.

@hellkite500

hellkite500 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

@robertbartel I took a small liberty of pushing a commit to your branch which creates a "NullStream" using a boost null sink stream. This has the benefit of not incurring any system calls to try write to "/dev/null" but has the same effect. This commit establishes the null stream as the "default" and only uses a file stream when explicitly set via set_output_stream with a file path.

Edit: Don't mind the force push, I forgot to stage the NullStream.hpp 🙄

@hellkite500

Copy link
Copy Markdown
Contributor

I need to double check some code patches to see how far off from this the aggregation mechanics I implemented are, but IIRC it touches some very similar code, and if we are interested in adopting this an an option for writing catchmebnt output, would it make sense to extend this configuration mechanism to enumerate the options, e.g. NO_OUTPUT, AGGREGATE, ALL?

@robertbartel

robertbartel commented Mar 11, 2026

Copy link
Copy Markdown
Contributor Author

I took a small liberty of pushing a commit to your branch which creates a "NullStream" using a boost null sink stream.

No worries, @hellkite500, although I don't fully understand things for this part of the change:

diff --git a/src/NGen.cpp b/src/NGen.cpp
index 27f54f2e13..8c67242fed 100644
--- a/src/NGen.cpp
+++ b/src/NGen.cpp
@@ -401,7 +401,7 @@ int main(int argc, char *argv[]) {
     nexus_collection->update_ids("id");
     std::cout<<"Initializing formulations" << std::endl;
     std::shared_ptr<realization::Formulation_Manager> manager = std::make_shared<realization::Formulation_Manager>(REALIZATION_CONFIG_PATH);
-    manager->read(catchment_collection, utils::getStdOut());
+    manager->read(catchment_collection, utils::StreamHandler::getNullStream());
 
     //TODO refactor manager->read so certain configs can be queried before the entire
     //realization collection is created

This seems to go beyond both the new configurable control mechanism and the implementation details behind it. It kind of feels like something that could have been done locally for convenience and included with things unintentionally. If it is just something to reduce what goes to stdout, I think it might be better to condition on a macro - probably NGEN_QUIET - to enable a choice here for debugging purposes. I think that'd still be simple enough that it's fine to include here.

Or, I might just not be following along with what you have in mind.

@hellkite500

Copy link
Copy Markdown
Contributor

The formulation manager constructs the formulations and passes this stream for the "default" output steam. Later, the stream gets set to a file stream handler and the stdout steam isn't used. Replacing this here ensures that output defaults to null unless explicitly set during the feature iteration later. The stdout here was effectively never used anyways, but it existed to allow construction with some steam.

@robertbartel

Copy link
Copy Markdown
Contributor Author

The formulation manager constructs the formulations and passes this stream for the "default" output steam. Later, the stream gets set to a file stream handler and the stdout steam isn't used. Replacing this here ensures that output defaults to null unless explicitly set during the feature iteration later. The stdout here was effectively never used anyways, but it existed to allow construction with some steam.

Thanks for the clarification. I follow your rationale now, but it does seem like this stream passed to read is being used for logging purposes in at least one place (it eventually gets passed in a call to NetCDFPerFeatureDataProvider::get_shared_provider, as the log_s param). Without getting too hung up on this, I think there are some deeper problems here.

Those can wait, but it appears like they will cause this PR's change to NGen.cpp to have unintended and/or undesirable side effects. I suggest we reverse the change to NGen.cpp. If you really want to keep it, I can take it out, put it back, and then squash the former commit into your other one so that at least it's separated in the commit history.

@hellkite500

hellkite500 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Then we probably need to look closer at the formulation construction and change it to use the default constructor chain instead of passing the output_stream around just to clobber it later. The main goal here was to simplify the logic and reduce the "else" branch that was setting to "/dev/null" by trying to use the null stream by default unless a proper file stream was setup via "set_output_stream".

I guess the best way there would be to revert this change in main and supply the null stream as the arg in the construct_formulation function?

Or explicitly replace the forcing provider arg with the stdout steam, either/or...

Thinking out loud here until I can look now closely at the code. Feel free to mod this however you see fit!

@robertbartel

Copy link
Copy Markdown
Contributor Author

For others' benefit, Nels and I reviewed and discussed further, and it looks like the unintended side effects from his changes to NGen.cpp were limited to an eventual call to data_access::NetCDFPerFeatureDataProvider::get_shared_provider that passed the default stream. It was simple enough to modify that particular usage to have the same effect as before, while keeping the rest of the changes.

@aaraney

aaraney commented Mar 11, 2026

Copy link
Copy Markdown
Member

Also, I thought it was possible to build ngen with routing support and then configure and run a simulation that doesn't include the routing (if somehow I'm forgetting that it's not, honestly I think that's a flaw).

@robertbartel, ahhh I'd forgotten about that! Good call. I guess I still think of them a little differently. Im thinking about this in terms of information loss. By not running routing you can still run routing later. But by turning off catchment outputs, you are losing information.

Comment thread doc/REALIZATION_CONFIGURATION.md Outdated
Comment thread doc/REALIZATION_CONFIGURATION.md Outdated
Comment thread include/realizations/catchment/Formulation_Manager.hpp Outdated
Comment thread include/core/catchment/HY_CatchmentArea.hpp
Refactoring newly-added 'write_catchment_output' option to be
'disable_catchment_output' instead, modifying the related functions,
tests, utilizing code, and documentation appropriately.

Also, tweaking the getter within the Formulation_Manager to simplify and
be more appropriate for an optional bool config value.
Update tests to reflect change from 'write_catchment_output' option to
'disable_catchment_output'.
Comment thread src/core/HY_Features_MPI.cpp Outdated
@JoshCu

JoshCu commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

my original wording was a bit backwards, these changes all look much better :)

Another part of this that I didn't check is if the writing is disabled, is the engine now fetching unused variables from the models? e.g. if I omit "output_variables" on a bmi-multi it will default to writing all variables for the last model to execute to that cat.csv file. With the output writing turned off, are all those now-unused model outputs still being fetched each timestep?

I tested it on the original changes I made with the python lstm by adding this

    def get_value_ptr(self, name: str) -> np.ndarray:
        for n,unit in _output_vars:
            if n == name:
                print(f"fetching {name} at {self._timestep}")
        """Returns a _reference_ to a variable's np.NDArray."""
        return first_containing(name, self._outputs, self._dynamic_inputs).value(name)

and running a day for a single catchment, it looks like it's still fetching the unused values

output

I'm probably misusing bmi-multi adding to some of these duplicates but I'd guess they could be reduce to one per timestep?

fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
realization
{
    "global": {
        "formulations": [
            {
                "name": "bmi_multi",
                "params": {
                    "name": "bmi_multi",
                    "model_type_name": "lstm",
                    "forcing_file": "",
                    "init_config": "",
                    "allow_exceed_end_time": true,
                    "main_output_variable": "land_surface_water__runoff_depth",
                    "modules": [
                        {
                            "name": "bmi_python",
                            "params": {
                                "name": "bmi_python",
                                "python_type": "lstm.bmi_lstm.bmi_LSTM",
                                "model_type_name": "bmi_lstm",
                                "init_config": "./config/cat_config/lstm/{{id}}.yml",
                                "allow_exceed_end_time": true,
                                "main_output_variable": "land_surface_water__runoff_depth",
                                "uses_forcing_file": false
                            }
                        }
                    ]
                }
            }
        ],
        "forcing": {
            "path": "./forcings/forcings.nc",
            "provider": "NetCDF",
            "enable_cache": false
        }
    },
    "time": {
        "start_time": "2010-01-01 00:00:00",
        "end_time": "2010-01-02 00:00:00",
        "output_interval": 3600
    },
    "no_routing": {
        "t_route_config_file_with_path": "./config/troute.yaml"
    },
    "output_root": "./outputs/ngen",
    "write_catchment_output": false
}

Comment thread src/core/HY_Features.cpp Outdated
auto formulation = formulations->get_formulation(feat_id);
formulation->set_output_stream(formulations->get_output_root() + feat_id + ".csv");

if (formulations->is_disable_catchment_output()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (formulations->is_disable_catchment_output()) {
if ( ! formulations->is_disable_catchment_output()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be negated given the change of semantics to "disable" vs "enable"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which also makes me wonder why the tests pass, because if I'm understanding this correctly, catchment csv's would only exist when the disable_cathment_output is true in the config.

@robertbartel robertbartel Mar 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never regretted adding a test, but have often regretted not ...

In this case, I did not add tests specifically for the changes to HY_Features or HY_Features_MPI. Changes were straightforward enough that it did not seem absolutely necessary, and unfortunately these classes are not super easy to add tests to. Right now they don't have test class infrastructure already created. I.e., that looked like a bigger lift that was otherwise easy to compensate for not doing. Or so I thought ...

We have a few options:

  • I won't object if anyone insists on tests being added before this gets merged; just know it'll delay things.
  • A compromise might be to add a complete test class and test cases - including for the changes made here - as a subsequent high-priority task, but don't required that in this PR
  • Now that I'm not being stupid, this is simple enough to visually check and (as I did before) manually test to confirm things, which I can do before this gets approved

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you fix the HY_Features_MPI calls as well, I'm good with just a confirmation that this is the expected behavior for now =)

@hellkite500 hellkite500 Mar 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I did not add tests specifically for the changes to HY_Features.

You caught me here, I haven't reviewed the test code added by this PR myself to know what to expect or not expect. So much context switching! But I think creating a high priority issue for Hy_Features test coverage is a reasonable compromise.

@JoshCu JoshCu mentioned this pull request Mar 11, 2026
@hellkite500

Copy link
Copy Markdown
Contributor

With the output writing turned off, are all those now-unused model outputs still being fetched each timestep?

@JoshCu Effectively, yes. They get queried here and then written via the write_output, which with these changes would just effectively ignore them. We may be able to further optimize this in the future, but I think that can be scoped to another PR.

@robertbartel

Copy link
Copy Markdown
Contributor Author

@JoshCu Effectively, yes. They get queried here and then written via the write_output, which with these changes would just effectively ignore them. We may be able to further optimize this in the future, but I think that can be scoped to another PR.

FWIW, I haven't tested to confirm, but looking quickly through the code, I think there is a way to prevent this via available config options.

For BMI module formulations, the list of variables to include in the catchment output is configurable via the optional param output_variables. If this is explicitly set as an empty list, then the call to Bmi_Module_Formulation::get_output_line_for_timestep that actually gets the values for the Layer class to write won't have any variables to iterate through, so nothing would get fetched.

Comment thread src/NGen.cpp
@aaraney aaraney merged commit 73773cb into NOAA-OWP:master Mar 12, 2026
20 of 21 checks passed
aaraney added a commit to aaraney/ngen-cal that referenced this pull request Mar 12, 2026
aaraney added a commit to aaraney/scribe that referenced this pull request Mar 12, 2026
@robertbartel robertbartel deleted the f/cat_out_opt/main branch March 13, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants