Skip to content

Conversation

@megies
Copy link
Member

@megies megies commented Mar 11, 2025

What does this PR do?

Try out a way to handle the two ways of identifying stations/channels. The old way being a combination of network, station, location and channel code ("NSLC" sometimes also referred to "SNLC"), often following the definitions of the SEED 2.4 standard (which mostly restrict the length of each code part and most prominently also define the "channel" code as exactly three well defined characters). And the new way being the definition of FDSN Source Identifier, used in MiniSEED v3, comprised of namespace, network, station, location, band, source and subsource channels (which imposes less strict restrictions on lengths of individual codes and also while kind of promoting to stick to one character per band/source/subsource code, does not have strict requirements to stick to one character each).

One potential way to try and do this cleanly is with having two subclasses to Stats. The aim being to try and make it almost behave like before so to make it "easy" for most normal experience users, but silently under the hood being very clear of the two different systems, with clear exceptions being raised in scenarios with IDs that can not be mapped without ambiguity (e.g. a NSLC-type channel code with less than 3 characters) to be flexible and well defined for more experienced "power" users.

This PR is not fully done yet and still has some bugs and still missing the logic for mapping back and forth with exceptions defined for ambiguous mapping actions, But I'd like to hear some feedback, since I feel like we kind of need some change/addition to Stats before we can release a new major version with MiniSEED v3 support (from simplemseed or mseedlib or both).

The way to do this is a little bit hacky under the hood, it needs overwriting Stats.__new__ so that Stats() actually returns an instance of a subclass of Stats, thus acting like an instance factory for its subclasses while still being able to act as a parent class. This is a design pattern encountered in Python very, very rarely, which to me seems the main negative of the approach right now.
Other than that, existing codes should keep most of their behavior intact. The object most people will interact with is of a different type (NSLCStats) but it still has Stats as a baseclass, so isinstance(..., Stats) checks will still have the same result, only checks for type(trace.stats) would have a different result, but that is a bit of a weird and very specific check to do in the first place.

>>> from obspy.core import Stats
>>> header_nslc = {'network': 'BW', 'station': 'MANZ', 'location': '', 'channel': 'BHZ'}
>>> stats = Stats(header=header_nslc)
>>> print(stats)
         network: BW
         station: MANZ
        location: 
         channel: BHZ
       starttime: 1970-01-01T00:00:00.000000Z
         endtime: 1970-01-01T00:00:00.000000Z
   sampling_rate: 1.0
           delta: 1.0
            npts: 0
           calib: 1.0
>>> print(isinstance(stats, Stats))
True
>>> print(type(stats))
<class 'obspy.core.trace.NSLCStats'>
>>> from obspy.core import Stats
>>> header_sid = {'namespace': 'FDSN', 'network': 'BW', 'station': 'MANZ', 'location': '',
...               'band': 'B', 'source': 'H', 'subsource': 'Z'}
>>> stats = Stats(header=header_sid)
>>> print(stats)
       namespace: FDSN
         network: BW
         station: MANZ
        location: 
            band: B
          source: H
       subsource: Z
       starttime: 1970-01-01T00:00:00.000000Z
         endtime: 1970-01-01T00:00:00.000000Z
   sampling_rate: 1.0
           delta: 1.0
            npts: 0
           calib: 1.0
>>> print(isinstance(stats, Stats))
True
>>> print(type(stats))
<class 'obspy.core.trace.SourceIdentifierStats'>

Why was it initiated? Any relevant Issues?

The two systems (NSLC vs. Source Identifier) are not fully compatible and in both directions there can be cases where a mapping can not be made. This means that there is a need for some kind of logic to handle data coming in from NSLC based data formats as well as Source Identifier based data formats (MiniSEED v3), because in some cases reading one format and writing to another will not be possible with a fully automatic mapping, so we kind of want well defined exceptions being raised etc, so the users can work around issues in well defined ways.

This is just one possible way to do it, and it would be good to discuss other options.

The main consideration is, that currently our Stats object is based on NSLC with one field for "channel". Therefore if we leave it like that while already integrating MiniSEED v3 support, we would have to make a mapping from Source Identifier to NSLC right when reading data, which can be problematic and ambiguous and make additional problems in reverse mapping back when writing the same data in MiniSEED v3 again even. So it feels the better way is to somehow make Stats Source-Identifier-ready right away and only do mappings back or forth on the fly whenever needed and not earlier, so that problems in mapping only pop up when there really is a need to map.

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • While the PR is still work-in-progress, the no_ci label can be added to skip CI builds
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just add the build_docs tag to this PR.
    Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
    Please post a link to the relevant piece of documentation.
  • If all tests including network modules (e.g. clients.fdsn) should be tested for the PR,
    just add the test_network tag to this PR.
  • All tests still pass.
  • Any new features or fixed regressions are covered via new tests.
  • Any new or changed features are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .
  • If the changes affect any plotting functions you have checked that the plots
    from all the CI builds look correct. Add the "upload_plots" tag so that plotting
    outputs are attached as artifacts.
  • New modules, add the module to CODEOWNERS with your github handle
  • Add the yellow ready for review label when you are ready for the PR to be reviewed.

@megies megies added the .core issues affecting our functionality at the very core label Mar 11, 2025
@megies megies added this to the 1.5.0 milestone Mar 11, 2025
@megies megies marked this pull request as draft March 12, 2025 08:53
@megies
Copy link
Member Author

megies commented Mar 14, 2025

Actually, maybe it's a better and more lightweight / less changes approach to rather create a ChannelCode object and use that in existing Stats instead of the approach of subclassing Stats..

Edit: hmm.. even though users use stats.channel as a string.. so maybe make it into a property

@megies
Copy link
Member Author

megies commented Mar 28, 2025

As this PR becomes more of a favorite for me, I've added more details and improved the original description for this PR.

@ThomasLecocq ThomasLecocq modified the milestones: 1.5.0, 1.6.0 Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.core issues affecting our functionality at the very core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants