Stats: subclasses for NSLC and Station Identifier #3550

megies · 2025-03-11T13:06:27Z

What does this PR do?

Try out a way to handle the two ways of identifying stations/channels. The old way being a combination of network, station, location and channel code ("NSLC" sometimes also referred to "SNLC"), often following the definitions of the SEED 2.4 standard (which mostly restrict the length of each code part and most prominently also define the "channel" code as exactly three well defined characters). And the new way being the definition of FDSN Source Identifier, used in MiniSEED v3, comprised of namespace, network, station, location, band, source and subsource channels (which imposes less strict restrictions on lengths of individual codes and also while kind of promoting to stick to one character per band/source/subsource code, does not have strict requirements to stick to one character each).

One potential way to try and do this cleanly is with having two subclasses to Stats. The aim being to try and make it almost behave like before so to make it "easy" for most normal experience users, but silently under the hood being very clear of the two different systems, with clear exceptions being raised in scenarios with IDs that can not be mapped without ambiguity (e.g. a NSLC-type channel code with less than 3 characters) to be flexible and well defined for more experienced "power" users.

This PR is not fully done yet and still has some bugs and still missing the logic for mapping back and forth with exceptions defined for ambiguous mapping actions, But I'd like to hear some feedback, since I feel like we kind of need some change/addition to Stats before we can release a new major version with MiniSEED v3 support (from simplemseed or mseedlib or both).

The way to do this is a little bit hacky under the hood, it needs overwriting Stats.__new__ so that Stats() actually returns an instance of a subclass of Stats, thus acting like an instance factory for its subclasses while still being able to act as a parent class. This is a design pattern encountered in Python very, very rarely, which to me seems the main negative of the approach right now.
Other than that, existing codes should keep most of their behavior intact. The object most people will interact with is of a different type (NSLCStats) but it still has Stats as a baseclass, so isinstance(..., Stats) checks will still have the same result, only checks for type(trace.stats) would have a different result, but that is a bit of a weird and very specific check to do in the first place.

>>> from obspy.core import Stats
>>> header_nslc = {'network': 'BW', 'station': 'MANZ', 'location': '', 'channel': 'BHZ'}
>>> stats = Stats(header=header_nslc)
>>> print(stats)
         network: BW
         station: MANZ
        location: 
         channel: BHZ
       starttime: 1970-01-01T00:00:00.000000Z
         endtime: 1970-01-01T00:00:00.000000Z
   sampling_rate: 1.0
           delta: 1.0
            npts: 0
           calib: 1.0
>>> print(isinstance(stats, Stats))
True
>>> print(type(stats))
<class 'obspy.core.trace.NSLCStats'>

>>> from obspy.core import Stats
>>> header_sid = {'namespace': 'FDSN', 'network': 'BW', 'station': 'MANZ', 'location': '',
...               'band': 'B', 'source': 'H', 'subsource': 'Z'}
>>> stats = Stats(header=header_sid)
>>> print(stats)
       namespace: FDSN
         network: BW
         station: MANZ
        location: 
            band: B
          source: H
       subsource: Z
       starttime: 1970-01-01T00:00:00.000000Z
         endtime: 1970-01-01T00:00:00.000000Z
   sampling_rate: 1.0
           delta: 1.0
            npts: 0
           calib: 1.0
>>> print(isinstance(stats, Stats))
True
>>> print(type(stats))
<class 'obspy.core.trace.SourceIdentifierStats'>

Why was it initiated? Any relevant Issues?

The two systems (NSLC vs. Source Identifier) are not fully compatible and in both directions there can be cases where a mapping can not be made. This means that there is a need for some kind of logic to handle data coming in from NSLC based data formats as well as Source Identifier based data formats (MiniSEED v3), because in some cases reading one format and writing to another will not be possible with a fully automatic mapping, so we kind of want well defined exceptions being raised etc, so the users can work around issues in well defined ways.

This is just one possible way to do it, and it would be good to discuss other options.

The main consideration is, that currently our Stats object is based on NSLC with one field for "channel". Therefore if we leave it like that while already integrating MiniSEED v3 support, we would have to make a mapping from Source Identifier to NSLC right when reading data, which can be problematic and ambiguous and make additional problems in reverse mapping back when writing the same data in MiniSEED v3 again even. So it feels the better way is to somehow make Stats Source-Identifier-ready right away and only do mappings back or forth on the fly whenever needed and not earlier, so that problems in mapping only pop up when there really is a need to map.

PR Checklist

…annel identification

megies · 2025-03-14T10:25:26Z

Actually, maybe it's a better and more lightweight / less changes approach to rather create a ChannelCode object and use that in existing Stats instead of the approach of subclassing Stats..

Edit: hmm.. even though users use stats.channel as a string.. so maybe make it into a property

it directly (without having to explicitely specifying)

megies · 2025-03-28T08:18:13Z

As this PR becomes more of a favorite for me, I've added more details and improved the original description for this PR.

Stats: subclasses for NSLC and Station Identifier types of station/ch…

12c9c29

…annel identification

megies added the .core issues affecting our functionality at the very core label Mar 11, 2025

megies added this to the 1.5.0 milestone Mar 11, 2025

megies requested review from d-chambers and trichter as code owners March 11, 2025 13:06

stats: fix initializing stats with source id in parts

f10292a

megies marked this pull request as draft March 12, 2025 08:53

This was referenced Mar 26, 2025

Stats: type extension to distinguish NSLC and Source Identifier cases #3558

Draft

Stats: How to facilitate FDSN Source Identifier? #3559

Open

megies added 2 commits March 27, 2025 17:02

stats: get rid of __init__ on Stats, so subclasses go to attribdict for

8edbd6b

it directly (without having to explicitely specifying)

minor refactoring and comments

460a372

ThomasLecocq modified the milestones: 1.5.0, 1.6.0 Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stats: subclasses for NSLC and Station Identifier #3550

Stats: subclasses for NSLC and Station Identifier #3550

Uh oh!

megies commented Mar 11, 2025 •

edited

Loading

Uh oh!

megies commented Mar 14, 2025 •

edited

Loading

Uh oh!

megies commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stats: subclasses for NSLC and Station Identifier #3550

Are you sure you want to change the base?

Stats: subclasses for NSLC and Station Identifier #3550

Uh oh!

Conversation

megies commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

Uh oh!

megies commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

megies commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

megies commented Mar 11, 2025 •

edited

Loading

megies commented Mar 14, 2025 •

edited

Loading