Stats: subclasses for NSLC and Station Identifier #3550
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Try out a way to handle the two ways of identifying stations/channels. The old way being a combination of network, station, location and channel code ("NSLC" sometimes also referred to "SNLC"), often following the definitions of the SEED 2.4 standard (which mostly restrict the length of each code part and most prominently also define the "channel" code as exactly three well defined characters). And the new way being the definition of FDSN Source Identifier, used in MiniSEED v3, comprised of namespace, network, station, location, band, source and subsource channels (which imposes less strict restrictions on lengths of individual codes and also while kind of promoting to stick to one character per band/source/subsource code, does not have strict requirements to stick to one character each).
One potential way to try and do this cleanly is with having two subclasses to
Stats. The aim being to try and make it almost behave like before so to make it "easy" for most normal experience users, but silently under the hood being very clear of the two different systems, with clear exceptions being raised in scenarios with IDs that can not be mapped without ambiguity (e.g. a NSLC-type channel code with less than 3 characters) to be flexible and well defined for more experienced "power" users.This PR is not fully done yet and still has some bugs and still missing the logic for mapping back and forth with exceptions defined for ambiguous mapping actions, But I'd like to hear some feedback, since I feel like we kind of need some change/addition to
Statsbefore we can release a new major version with MiniSEED v3 support (from simplemseed or mseedlib or both).The way to do this is a little bit hacky under the hood, it needs overwriting
Stats.__new__so thatStats()actually returns an instance of a subclass ofStats, thus acting like an instance factory for its subclasses while still being able to act as a parent class. This is a design pattern encountered in Python very, very rarely, which to me seems the main negative of the approach right now.Other than that, existing codes should keep most of their behavior intact. The object most people will interact with is of a different type (
NSLCStats) but it still hasStatsas a baseclass, soisinstance(..., Stats)checks will still have the same result, only checks fortype(trace.stats)would have a different result, but that is a bit of a weird and very specific check to do in the first place.Why was it initiated? Any relevant Issues?
The two systems (NSLC vs. Source Identifier) are not fully compatible and in both directions there can be cases where a mapping can not be made. This means that there is a need for some kind of logic to handle data coming in from NSLC based data formats as well as Source Identifier based data formats (MiniSEED v3), because in some cases reading one format and writing to another will not be possible with a fully automatic mapping, so we kind of want well defined exceptions being raised etc, so the users can work around issues in well defined ways.
This is just one possible way to do it, and it would be good to discuss other options.
The main consideration is, that currently our Stats object is based on NSLC with one field for "channel". Therefore if we leave it like that while already integrating MiniSEED v3 support, we would have to make a mapping from Source Identifier to NSLC right when reading data, which can be problematic and ambiguous and make additional problems in reverse mapping back when writing the same data in MiniSEED v3 again even. So it feels the better way is to somehow make Stats Source-Identifier-ready right away and only do mappings back or forth on the fly whenever needed and not earlier, so that problems in mapping only pop up when there really is a need to map.
PR Checklist
masterfor new features,maintenance_...for bug fixesno_cilabel can be added to skip CI buildsJust add the
build_docstag to this PR.Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
Please post a link to the relevant piece of documentation.
clients.fdsn) should be tested for the PR,just add the
test_networktag to this PR.CHANGELOG.txt.CONTRIBUTORS.txt.from all the CI builds look correct. Add the "upload_plots" tag so that plotting
outputs are attached as artifacts.
CODEOWNERSwith your github handleready for reviewlabel when you are ready for the PR to be reviewed.