StreamCat date string handling in nhdplus_derived

### What happened?

Following the workflow to compute hydrologic signatures in the River Discharge example of the HyRiver docs. Working in VScode jupyter notebook. I was using the example code verbatim except for my own bounding box coordinates and different dates (different code shown below, verbatim code not shown).

dates = ("2000-10-01", "2011-09-30")
bbox = (-115.63, 43.94, -114.96, 44.35)

qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)

The nwis.get_streamflow() function fails and returns this error: ValueError: invalid literal for int() with base 10: '1990:2017'


### What did you expect to happen?

I expected a plot of hydrologic signatures for the specified station and date range.

### Minimal Complete Verifiable Example

```Python
from pygeohydro import NWIS, plot


dates = ("2000-10-01", "2011-09-30")
bbox = (-115.63, 43.94, -114.96, 44.35)
nwis = NWIS()
query = {
    "bBox": ",".join(f"{b:.06f}" for b in bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)

stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

query = {
    "site": ",".join(stations),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info = nwis.get_info(query, expanded=True)
info.set_index("site_no").hcdn_2009

qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)
```

### MVCE confirmation

- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

```Python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 qobs = nwis.get_streamflow(stations, dates, mmd=True)
      2 plot.signatures(qobs)

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pygeohydro/nwis.py:759, in NWIS.get_streamflow(cls, station_ids, dates, freq, mmd, to_xarray)
    757 siteinfo = siteinfo[siteinfo.site_no.isin(sids)]
    758 if mmd:
--> 759     area_sqm = cls._drainage_area_sqm(siteinfo, freq)
    760     ms2mmd = 1000.0 * 24.0 * 3600.0
    761     try:

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pygeohydro/nwis.py:537, in NWIS._drainage_area_sqm(cls, siteinfo, freq)
    535 """Get drainage area of the stations."""
    536 if "nhd_areasqkm" not in siteinfo:
--> 537     area = cls._nhd_info(siteinfo["site_no"].to_list())
    538     area = area[["site_no", "nhd_areasqkm"]].copy()
    539 else:

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pygeohydro/nwis.py:301, in NWIS._nhd_info(site_ids)
    299 except (TypeError, IntCastingNaNError):
    300     area["comid"] = area["comid"].astype("Int32")
--> 301 nhd_area = pynhd.streamcat("fert", comids=area["comid"].dropna().to_list(), area_sqkm=True)
    302 area = area.merge(
    303     nhd_area[["comid", "wsareasqkm"]], left_on="comid", right_on="comid", how="left"
    304 )
    305 area["identifier"] = area["identifier"].str.replace("USGS-", "")

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pynhd/nhdplus_derived.py:726, in streamcat(metric_names, metric_areas, comids, regions, states, counties, conus, percent_full, area_sqkm, lakes_only)
    724 if metric_names is None:
    725     return StreamCat().metrics_df
--> 726 sc = StreamCatValidator(lakes_only)
    727 names = [metric_names] if isinstance(metric_names, str) else metric_names
    728 sc.validate(name=names)

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pynhd/nhdplus_derived.py:586, in StreamCatValidator.__init__(self, lakes_only)
    585 def __init__(self, lakes_only: bool = False) -> None:
--> 586     super().__init__(lakes_only)

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pynhd/nhdplus_derived.py:576, in StreamCat.__init__(self, lakes_only)
    573 self.metrics_df = names
    575 years = names.set_index("METRIC_NAME").YEAR.dropna()
--> 576 self.valid_years = {
    577     str(v): list(range(*(int(y) for y in yrs.split("-"))))
    578     if "-" in yrs
    579     else [int(y) for y in yrs.split(",")]
    580     for v, yrs in years.items()
    581 }

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pynhd/nhdplus_derived.py:579, in <dictcomp>(.0)
    573 self.metrics_df = names
    575 years = names.set_index("METRIC_NAME").YEAR.dropna()
    576 self.valid_years = {
    577     str(v): list(range(*(int(y) for y in yrs.split("-"))))
    578     if "-" in yrs
--> 579     else [int(y) for y in yrs.split(",")]
    580     for v, yrs in years.items()
    581 }

File ~/miniconda/envs/geos505/lib/python3.11/site-packages/pynhd/nhdplus_derived.py:579, in <listcomp>(.0)
    573 self.metrics_df = names
    575 years = names.set_index("METRIC_NAME").YEAR.dropna()
    576 self.valid_years = {
    577     str(v): list(range(*(int(y) for y in yrs.split("-"))))
    578     if "-" in yrs
--> 579     else [int(y) for y in yrs.split(",")]
    580     for v, yrs in years.items()
    581 }

ValueError: invalid literal for int() with base 10: '1990:2017'
```

### Anything else we need to know?

This is the explanation of the error given by co-pilot:

The error ValueError: invalid literal for int() with base 10: '1990:2017' occurs because the code in the pynhd library (specifically in nhdplus_derived.py) is trying to parse a year string from StreamCat data, but the string '1990:2017' contains a colon (:) instead of the expected dash (-) for ranges or comma (,) for lists.

Here's what's happening step-by-step:

When you call nwis.get_streamflow(stations, dates, mmd=True), the mmd=True parameter triggers the calculation of mean monthly discharge, which requires drainage area data.

To get the drainage area, the code calls pynhd.streamcat("fert", ...) to retrieve StreamCat metrics.

Inside StreamCat.__init__(), the code parses the YEAR column from the metrics data to build valid_years. It expects year ranges to use dashes (e.g., "1990-2017") or comma-separated lists (e.g., "1990,2017").

For the string '1990:2017', it doesn't contain a dash, so it falls back to splitting by commas and converting each part to an integer. Since '1990:2017' splits to ['1990:2017'], int('1990:2017') fails because the colon isn't a valid character for an integer.

This appears to be a data formatting issue in the StreamCat dataset where a year range is incorrectly delimited with a colon instead of a dash. The pynhd library doesn't handle this case.

### Environment

<details>

SYS INFO
--------
commit: None
python: 3.11.14 | packaged by conda-forge | (main, Oct 22 2025, 22:53:07) [Clang 19.1.7 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')

PACKAGE                VERSION
-------------------------------
async-retriever        0.19.3
pygeoogc               0.19.4
pygeoutils             0.19.5
py3dep                 0.19.0
pynhd                  0.19.4
pygridmet              N/A
pydaymet               N/A
hydrosignatures        0.19.3
pynldas2               N/A
pygeohydro             0.19.4
tiny-retriever         N/A
aiodns                 3.0.0
aiofiles               25.1.0
aiohttp                3.13.2
aiohttp-client-cache   0.14.1
aiosqlite              0.21.0
brotli                 1.1.0
cytoolz                1.1.0
orjson                 3.11.4
numpy                  2.3.5
pandas                 2.3.3
scipy                  1.16.3
xarray                 2025.12.0
numba                  N/A
numbagg                N/A
click                  8.3.0
geopandas              1.1.1
rasterio               1.4.3
rioxarray              0.19.0
shapely                2.1.2
netcdf4                1.7.3
pyproj                 3.7.2
defusedxml             0.7.1
folium                 0.20.0
h5netcdf               1.7.2
matplotlib             3.10.8
planetary-computer     N/A
pystac-client          N/A
joblib                 1.5.2
multidict              6.6.3
owslib                 0.34.1
requests               2.32.5
requests-cache         1.2.1
typing-extensions      4.15.0
url-normalize          2.2.1
urllib3                2.5.0
yarl                   1.22.0
networkx               3.5
pyarrow                21.0.0
py7zr                  N/A
flox                   N/A
opt-einsum             N/A
-------------------------------
None


</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamCat date string handling in nhdplus_derived #98

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

SYS INFO

PACKAGE VERSION

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StreamCat date string handling in nhdplus_derived #98

Description

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

SYS INFO

PACKAGE VERSION

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions