Adding configuration option to control whether catchment output is written#943
Conversation
Update to include info on new 'write_catchment_output' config option, and doing some restructuring to the doc overall to better organize and include a few things that were missing (and on the same level as this new item, so now somewhat glaring omissions).
|
This may be an extreme take, but i'm of the opinion this should either be a compile time option or we add options |
Even if that's correct, this change seems limited enough in scope that I don't see why that becomes a problem. Also, I thought it was possible to build ngen with routing support and then configure and run a simulation that doesn't include the routing (if somehow I'm forgetting that it's not, honestly I think that's a flaw). Assuming that's right, I see that as very similar: effectively, letting the user choose whether to run extended functionality with results of the main simulation. Setting different output roots doesn't seem unreasonable, but I think that's a totally different scope. It would neither inhibit nor be inhibited by this change. A compile time option might also have merit - that is, make it possible to compile ngen such that catchment output cannot_ be written, even if the user want to - but I also think that's out of scope. And I don't think that is a sufficient replacement for giving the user control. |
|
@robertbartel I took a small liberty of pushing a commit to your branch which creates a "NullStream" using a boost null sink stream. This has the benefit of not incurring any system calls to try write to "/dev/null" but has the same effect. This commit establishes the null stream as the "default" and only uses a file stream when explicitly set via set_output_stream with a file path. Edit: Don't mind the force push, I forgot to stage the |
5c95d17 to
33aacc7
Compare
|
I need to double check some code patches to see how far off from this the aggregation mechanics I implemented are, but IIRC it touches some very similar code, and if we are interested in adopting this an an option for writing catchmebnt output, would it make sense to extend this configuration mechanism to enumerate the options, e.g. NO_OUTPUT, AGGREGATE, ALL? |
No worries, @hellkite500, although I don't fully understand things for this part of the change: diff --git a/src/NGen.cpp b/src/NGen.cpp
index 27f54f2e13..8c67242fed 100644
--- a/src/NGen.cpp
+++ b/src/NGen.cpp
@@ -401,7 +401,7 @@ int main(int argc, char *argv[]) {
nexus_collection->update_ids("id");
std::cout<<"Initializing formulations" << std::endl;
std::shared_ptr<realization::Formulation_Manager> manager = std::make_shared<realization::Formulation_Manager>(REALIZATION_CONFIG_PATH);
- manager->read(catchment_collection, utils::getStdOut());
+ manager->read(catchment_collection, utils::StreamHandler::getNullStream());
//TODO refactor manager->read so certain configs can be queried before the entire
//realization collection is createdThis seems to go beyond both the new configurable control mechanism and the implementation details behind it. It kind of feels like something that could have been done locally for convenience and included with things unintentionally. If it is just something to reduce what goes to stdout, I think it might be better to condition on a macro - probably Or, I might just not be following along with what you have in mind. |
|
The formulation manager constructs the formulations and passes this stream for the "default" output steam. Later, the stream gets set to a file stream handler and the stdout steam isn't used. Replacing this here ensures that output defaults to null unless explicitly set during the feature iteration later. The stdout here was effectively never used anyways, but it existed to allow construction with some steam. |
Thanks for the clarification. I follow your rationale now, but it does seem like this stream passed to Those can wait, but it appears like they will cause this PR's change to NGen.cpp to have unintended and/or undesirable side effects. I suggest we reverse the change to NGen.cpp. If you really want to keep it, I can take it out, put it back, and then squash the former commit into your other one so that at least it's separated in the commit history. |
|
Then we probably need to look closer at the formulation construction and change it to use the default constructor chain instead of passing the output_stream around just to clobber it later. The main goal here was to simplify the logic and reduce the "else" branch that was setting to "/dev/null" by trying to use the null stream by default unless a proper file stream was setup via "set_output_stream". I guess the best way there would be to revert this change in main and supply the null stream as the arg in the construct_formulation function? Or explicitly replace the forcing provider arg with the stdout steam, either/or... Thinking out loud here until I can look now closely at the code. Feel free to mod this however you see fit! |
1500202 to
5e81515
Compare
5e81515 to
138f204
Compare
|
For others' benefit, Nels and I reviewed and discussed further, and it looks like the unintended side effects from his changes to NGen.cpp were limited to an eventual call to |
@robertbartel, ahhh I'd forgotten about that! Good call. I guess I still think of them a little differently. Im thinking about this in terms of information loss. By not running routing you can still run routing later. But by turning off catchment outputs, you are losing information. |
Refactoring newly-added 'write_catchment_output' option to be 'disable_catchment_output' instead, modifying the related functions, tests, utilizing code, and documentation appropriately. Also, tweaking the getter within the Formulation_Manager to simplify and be more appropriate for an optional bool config value.
Update tests to reflect change from 'write_catchment_output' option to 'disable_catchment_output'.
|
my original wording was a bit backwards, these changes all look much better :)
I tested it on the original changes I made with the python lstm by adding this def get_value_ptr(self, name: str) -> np.ndarray:
for n,unit in _output_vars:
if n == name:
print(f"fetching {name} at {self._timestep}")
"""Returns a _reference_ to a variable's np.NDArray."""
return first_containing(name, self._outputs, self._dynamic_inputs).value(name)and running a day for a single catchment, it looks like it's still fetching the unused values outputI'm probably misusing bmi-multi adding to some of these duplicates but I'd guess they could be reduce to one per timestep? fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_volume_flux at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 1
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_volume_flux at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2
fetching land_surface_water__runoff_depth at 2realization{
"global": {
"formulations": [
{
"name": "bmi_multi",
"params": {
"name": "bmi_multi",
"model_type_name": "lstm",
"forcing_file": "",
"init_config": "",
"allow_exceed_end_time": true,
"main_output_variable": "land_surface_water__runoff_depth",
"modules": [
{
"name": "bmi_python",
"params": {
"name": "bmi_python",
"python_type": "lstm.bmi_lstm.bmi_LSTM",
"model_type_name": "bmi_lstm",
"init_config": "./config/cat_config/lstm/{{id}}.yml",
"allow_exceed_end_time": true,
"main_output_variable": "land_surface_water__runoff_depth",
"uses_forcing_file": false
}
}
]
}
}
],
"forcing": {
"path": "./forcings/forcings.nc",
"provider": "NetCDF",
"enable_cache": false
}
},
"time": {
"start_time": "2010-01-01 00:00:00",
"end_time": "2010-01-02 00:00:00",
"output_interval": 3600
},
"no_routing": {
"t_route_config_file_with_path": "./config/troute.yaml"
},
"output_root": "./outputs/ngen",
"write_catchment_output": false
} |
| auto formulation = formulations->get_formulation(feat_id); | ||
| formulation->set_output_stream(formulations->get_output_root() + feat_id + ".csv"); | ||
|
|
||
| if (formulations->is_disable_catchment_output()) { |
There was a problem hiding this comment.
| if (formulations->is_disable_catchment_output()) { | |
| if ( ! formulations->is_disable_catchment_output()) { |
There was a problem hiding this comment.
This should be negated given the change of semantics to "disable" vs "enable"?
There was a problem hiding this comment.
Which also makes me wonder why the tests pass, because if I'm understanding this correctly, catchment csv's would only exist when the disable_cathment_output is true in the config.
There was a problem hiding this comment.
I've never regretted adding a test, but have often regretted not ...
In this case, I did not add tests specifically for the changes to HY_Features or HY_Features_MPI. Changes were straightforward enough that it did not seem absolutely necessary, and unfortunately these classes are not super easy to add tests to. Right now they don't have test class infrastructure already created. I.e., that looked like a bigger lift that was otherwise easy to compensate for not doing. Or so I thought ...
We have a few options:
- I won't object if anyone insists on tests being added before this gets merged; just know it'll delay things.
- A compromise might be to add a complete test class and test cases - including for the changes made here - as a subsequent high-priority task, but don't required that in this PR
- Now that I'm not being stupid, this is simple enough to visually check and (as I did before) manually test to confirm things, which I can do before this gets approved
There was a problem hiding this comment.
As long as you fix the HY_Features_MPI calls as well, I'm good with just a confirmation that this is the expected behavior for now =)
There was a problem hiding this comment.
In this case, I did not add tests specifically for the changes to HY_Features.
You caught me here, I haven't reviewed the test code added by this PR myself to know what to expect or not expect. So much context switching! But I think creating a high priority issue for Hy_Features test coverage is a reasonable compromise.
@JoshCu Effectively, yes. They get queried here and then written via the |
FWIW, I haven't tested to confirm, but looking quickly through the code, I think there is a way to prevent this via available config options. For BMI module formulations, the list of variables to include in the catchment output is configurable via the optional param |
Adding config option
disable_catchment_outputto control catchment output writing, which isfalseby default. This was cherry-picked from the changes proposed previously in #897.Additionally, making updates to associated documentation on realization config structure (both directly related to new option and to improve structure around where that addition was made).
Finally, adding some unit tests for the function reading this setting from the config file in the Formulation_Manager class.