SeismicX-PnSn: A Deep Learning Framework for Pg/Sg/Pn/Sn Phase Picking and Its Nationwide Implementation in Chinese Mainland
Code for:
- Title: A Deep Learning Framework for Pg/Sg/Pn/Sn Phase Picking and Its Nationwide Implementation in Mainland China
- Authors: Yuqi Cai, Ziye Yu, et al. (yuziye@cea-igp.ac.cn)
All models in this repository are trained on 2009–2019 national seismic network data at 100 Hz. They are designed for direct inference on continuous three-component waveforms (E/N/Z) for automatic phase picking.
Key notes:
- Training primarily covers stations within ~800 km and includes local/regional P/S phases (Pg/Sg).
- PhaseNet / RNN / LPPN-style models have been validated on ChinArray data with RNN recall ≥ 80% on manually labeled sets.
- Accuracy and speed comparisons are shown in
pickers/speed.jpg.
The pnsn models are designed to detect Pg, Sg, Pn, and Sn on continuous streams using long windows (≈102.4 s) and sliding inference, which improves robustness and reduces operational false triggers compared with short-window pickers.
We provide two major generations:
pickers/pnsn.v1.jit: the first engineering version that has been used in production/engineering workflows.pickers/pnsn.v3.jitandpickers/pnsn.diff.v3.jit: the paper (v3) models used in the manuscript.
Two inference strategies (v3):
- raw:
pickers/pnsn.v3.jit - raw + first-difference (high-pass by differentiation):
pickers/pnsn.diff.v3.jitBoth accept waveforms of arbitrary length and include post-processing (thresholding + NMS) inside the TorchScript graph.
| Model | Size (MB) | P-F1Score | Instrument | Sampling rate | Channels | Max distance | Range | Output phases |
|---|---|---|---|---|---|---|---|---|
| BRNN | 1.9 | 0.857 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| EQTransformer | 3.1 | 0.852 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| PhaseNet (UNet) | 0.8 | 0.815 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| LPPN (Large) | 2.7 | 0.813 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| LPPN (Medium) | 0.4 | 0.808 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| LPPN (Tiny) | 0.3 | 0.757 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| UNet++ | 12 | 0.798 | Broadband | 100 Hz | 3C | 300 km | Global | Pg, Sg |
| pnsn.v1 (engineering) | ~1.9 | – | Broadband, MEMS | 100 Hz | 3C | ~2000 km | Global | Pg, Sg, Pn, Sn |
| pnsn.v3 (paper) | ~1.9 | 0.781 | Broadband, MEMS | 100 Hz | 3C | ~2000 km | Global | Pg, Sg, Pn, Sn |
| pnsn.diff.v3 (paper) | ~1.9 | 0.781 | Broadband, MEMS | 100 Hz | 3C | ~2000 km | Global | Pg, Sg, Pn, Sn |
| tele | ~1.9 | 0.800 | Broadband | 20 Hz | 3C | >3000 km | Global | P |
Important update (deployment recommendation):
- Teleseismic/distant events can also be picked using the pnsn models (especially
pnsn.v3/pnsn.diff.v3) in a unified workflow. - The standalone
tele.jitis not recommended in practice because its performance is not as stable as using pnsn directly on continuous streams (your engineering experience).
TorchScript models under pickers/ include post-processing (confidence thresholding + NMS) and output:
[[phase_type, relative_sample, confidence], ...]
Use the paper model by default:
import numpy as np
import torch
import obspy
import matplotlib.pyplot as plt
mname = "pickers/pnsn.v3.jit" # paper model (v3); use pnsn.v1.jit for legacy engineering pipelines
device = torch.device("cpu")
sess = torch.jit.load(mname).to(device).eval()
stE = obspy.read("data/waveform/XXX.BHE.sac")[0]
stN = obspy.read("data/waveform/XXX.BHN.sac")[0]
stZ = obspy.read("data/waveform/XXX.BHZ.sac")[0]
x = np.stack([stE.data, stN.data, stZ.data], axis=1).astype(np.float32) # [N, 3]
with torch.no_grad():
picks = sess(torch.tensor(x, dtype=torch.float32, device=device)).cpu().numpy()
plt.plot(x[:, 2], alpha=0.5)
for pha, idx, conf in picks:
pha = int(pha)
c = {0:"r", 1:"b", 2:"g", 3:"k"}.get(pha, "k") # 0 Pg, 1 Sg, 2 Pn, 3 Sn
plt.axvline(idx, c=c, alpha=0.8)
plt.show()- Best overall (paper + deployment):
pnsn.v3.jitandpnsn.diff.v3.jit(mobile / dense / fixed networks; Pg/Sg/Pn/Sn). - Legacy engineering compatibility: use
pnsn.v1.jitif you need exact alignment with the original production model behavior. - Speed / small memory: choose LPPN variants.
- Low-recall scenarios: lower the confidence threshold (e.g., to 0.1) and rely on downstream association/QC to control false positives.
- Need per-sample probability traces: use ONNX and apply post-processing externally.
Although a tele.jit model is provided, we recommend using the pnsn model family for distant/teleseismic records as well, to keep a single unified picker and avoid inconsistent behavior between local and distant pipelines (based on operational experience).
Dependencies: torch, numpy, obspy, scipy, matplotlib, tqdm.
Input assumptions:
- 3-component waveform (E/N/Z), typically resampled to 100 Hz for pnsn/most models.
- Typical channel naming:
BHE/BHN/BHZ. - CLI defaults for continuous data traversal are defined in
config/picker.py.
We provide three types of model files:
- .pt files in the ckpt folder, which can be used for transfer learning. Freeze some parameters when adapting to local data.
- Models for picking any length are located in the pickers folder.
- .jit for direct use with PyTorch; post-processing is embedded in the graph and outputs [phase_type, relative_sample, confidence] per pick.
- .onnx for use with onnxruntime, suitable for edge devices.
Use the post functions in picker.onnx.py or picker.py to apply the probability threshold (a) and non-maximum suppression window (b) to the raw prob and time outputs.
- .jit output format: [[Type of phases, relative arrival time, confidence], ...]. Phase types: 0:Pg, 1:Sg, 2:Pn, 3:Sn (Pn/Sn models extend this list).
- .onnx outputs two tensors: prob[i] (per-sample class probabilities, length 3 (indicate Noise, Pg, Sg) or 5 (indicate Noise, Pg, Sg, Pn, Sn)) and time[i] (relative sample index). Combine them with post-processing to form picks.
- Example usage of .jit can be found in picker.jit.py.
- Example usage of .onnx can be found in picker.onnx.py.
When running the phase NMS algorithm with TorchScript (.jit) models, performance bottlenecks may occur; in such cases, it is recommended to use ONNX-based inference instead. A reference implementation is provided in picker.onnx.py, and picker.py can be configured to load and run a .onnx model directly.
For C users, .merge.onnx files combine the time and prob outputs into a single array: [ [time length, number of categories, -, -], [number of categories, noise probability, P-wave probability, S-wave probability], [sample points, noise probability, P-wave probability, S-wave probability], ... ] For example programs in C, contact yuziye@cea-igp.ac.cn.
All TorchScript pickers share the same interface via jit_picker_base.py::SlidingWindowPicker. Each makejit.XXX.py file simply:
- Constructs the underlying network with self.model = UNet()/BRNN()/EQTransformer(), etc.
- Loads a checkpoint whose keys are prefixed with model. (legacy checkpoints are also accepted and will be auto-prefixed).
- Wraps the network with sliding-window preprocessing, softmax, and non-maximum suppression.
- Saves the scripted model into pickers/*.jit.
To rebuild the TorchScript files, run the corresponding script (for example python makejit.unet.py, python makejit.unetpp.py, python makejit.rnn.py, python makejit.pnsn.py, or python makejit.eqt.py). The output .jit files include post-processing, so they return [phase_type, relative_sample, confidence] directly when you call torch.jit.load. Key thresholds baked into the picker interface:
time_sel = torch.masked_select(ot, pc > 0.3) # confidence threshold
selidx = torch.masked_select(selidx, torch.abs(ref - ntime) > 1000) # NMS window (samples)- 0.3 is the default minimum confidence. Lower it to pick more candidates at the cost of extra false triggers.
- 1000 samples (10 seconds at 100 Hz) enforce a single pick per class within that window.
Reduce the window if multiple phases are expected in short succession.
All ONNX pickers share the OnnxSlidingWindowPicker interface defined in onnx_picker_base.py. To regenerate the exported ONNX files:
- Run the corresponding script (for example python makeonnx.unet.py, python makeonnx.unetpp.py, python makeonnx.rnn.py, python makeonnx.pnsn.py, or python makeonnx.eqt.py).
- Each script builds the model (self.model = UNet()/BRNN()/EQTransformer(), etc.), loads checkpoints (auto-prefixing with model. when needed), and wraps it with the shared sliding-window preprocessing.
- Post-processing (probability threshold and NMS) remains outside the ONNX graph; reuse config/picker.py together with the post helpers in picker.onnx.py or picker.py when running inference.
The onnx model can use picker.py for post-processing as it is outside of the model itself
Phase picking provides a more convenient way to directly traverse the directory and pick up all phases.
python picker.py -i path/to/data -o outputname -m pickers/rnn.jit -d device- output file name.txt containing all picked phases
- output file name.log containing processed data information
- output file name.err containing problematic data information The format of the output file is:
#path/to/file
phase name,relative time(s),confident,aboulute time(%Y-%m-%d %H:%M:%S.%f),SNR,AMP,station name,other information
picker.py exposes the -i/--input, -o/--output, -m/--model, and -d/--device arguments (see if name == "main" in the script) and uses the defaults from config/picker.py for details such as channel count (nchannel=3), sampling rate (samplerate=100), probability threshold for ONNX models (prob=0.3), and non-maximum suppression window (nmslen=1000).
The goal of seismic association is to determine the number, location, and timing information of earthquakes from the phase picking results. Currently, there are 3 association algorithms provided:
- REAL methods [reallinker.py]
- LPPN methods [fastlinker.py]
- GaMMA methods [gammalinker.py]
Both models take the picking results as input.
python fastlinker.py -i phase_picking_results.txt -o output_file_name.txt -s station_directoryThe format of the station file is:
network station LOC longitude latitude elevation(m)
For example: SC AXX 00 110.00 38.00 1000.00
The structure of the output association file is:
#EVENT,TIME,LAT,LON,DEP
PHASE,TIME,LAT,LON,TYPE,PROB,STATION,DIST,DELTA,ERROR#
EVENT,2022-04-09 02:28:38.021000,100.6492,25.3660,PICKED_PHASE_TIME_LAT_LON_TYPE_PROB_STATION_DIST_DELTA_ERROR#
PHASE_PICKED_TIME_LAT_LON_TYPE_PROB_STATION_DIST_DELTA_ERROR#
This project provides a Python implementation of the REAL (Rapid Earthquake Association and Location) algorithm for automated earthquake phase association and preliminary event location. The implementation preserves the core logic, decision criteria, and workflow of the original C-based REAL algorithm, while introducing improved modularity, flexible data interfaces, and high-performance acceleration through modern Python tooling. The algorithm follows the canonical REAL workflow of P-phase initiation → grid-based association → multi-phase consistency scoring → event confirmation. All P-phase picks are processed in chronological order using a heap-based scheduler. Each candidate trigger initiates a local three-dimensional grid search (latitude, longitude, depth), where theoretical P- and S-wave travel times are computed under a homogeneous velocity model. Observed picks falling within adaptive time windows are associated, and candidate sources are evaluated using phase count thresholds and robust origin-time scatter metrics.
- High-performance numerical core The computationally intensive grid evaluation and phase matching are implemented using Numba JIT compilation, achieving performance comparable to the original C REAL code while maintaining Python-level readability.
- Flexible pick input formats The engine supports both traditional per-station pick files and modern single-file, multi-station pick formats that include absolute timestamps and confidence scores. All inputs are internally mapped to fixed-size NumPy arrays for efficient vectorized and JIT-compiled processing.
- Confidence-aware association Pick confidence values are preserved throughout the association process and propagated to event-phase outputs, enabling downstream quality control, weighted relocation, or uncertainty analysis.
- Strict consistency with C-REAL criteria Core constraints such as minimum P/S counts, P+S totals, same-station P–S pairing, dtps limits, and the ispeed phase-removal mechanism are implemented to closely match the behavior of the original REAL algorithm.
- Event–phase level output For each detected event, the engine outputs both event-level parameters (origin time, hypocenter, scatter) and detailed phase associations, including pick time, residual, source–receiver distance, confidence, and station identifier.
Overall, this Python-based REAL implementation balances algorithmic fidelity, computational efficiency, and extensibility, making it suitable for both research-oriented seismic analysis and large-scale automated earthquake catalog production.
Station file The station file follows the original REAL format: NET STA COMP STLO STLA ELEV where elevation is given in meters. Example: YN YSW03 BHZ 102.345 24.567 1850
Pick data Two pick input modes are supported:
- Per-station files (C-REAL compatible) Each station has separate P and S files: NET.STA.P.txt NET.STA.S.txt with columns: trigger_time weight amplitude
- Single-file multi-station picks (recommended) A single CSV-style file containing all picks: #data/xxx/YN.YSW03.00.mseed Pg,32640.160,0.936,2021-05-21 09:04:00.165000,...,YN.YSW03.00,... Required fields: * phase name (e.g., Pg, Pn, Sg) * relative trigger time (seconds) * confidence * absolute time string * station identifier (NET.STA.LOC) ###
from real import (
read_station_txt,
load_all_picks_from_singlefile_v2,
FastREAL
)- Load stations
stla, stlo, elev, net, sta = read_station_txt("stations.txt")
- Load picks
MAXTIME = 2.7e6 NNps = 20000
ptrig, strig, pconf, sconf, pabs, sabs = load_all_picks_from_singlefile_v2( pick_file="picks.csv", net=net, sta=sta, max_n=NNps, max_time=MAXTIME, min_conf=0.0, )
- Initialize REAL engine
real = FastREAL( stla, stlo, elev, ptrig, strig, pabs, sabs, pconf, sconf, lat_center_deg=25.0, rx_deg=1.0, rh_km=30.0, dx_deg=0.05, dh_km=2.0, tint_sec=10.0, vp0=6.0, vs0=3.5, s_vp0=6.0, s_vs0=3.5, np0=6, ns0=4, nps0=10, npsboth0=2, std0=1.0, dtps=2.0, nrt=2.0, gcarc0_deg=3.0, ispeed=True, max_time=MAXTIME, net=net, sta=sta, )
- Run association
events, event_phases = real.run( latref0_init=None, lonref0_init=None, max_events=100000 )
Each detected event is returned as: (origin_time, latitude, longitude, depth, scatter, P_count, S_count, P+S_count, PS_both) Associated phase information is stored per event as: (phase, pick_time_str, dt, distance_km, confidence, residual, station_id) Optionally, results can be written to disk in a REAL-compatible text format using:
write_events_real_format(
out_path="events.txt",
events=events,
real=real,
pabs=pabs,
sabs=sabs,
pconf=pconf,
sconf=sconf,
per_event_phase_rows=event_phases,
)- The current implementation assumes a homogeneous velocity model; extension to layered or 3-D models can be achieved by replacing the travel-time kernel.
- Absolute times are treated as naive timestamps and used only for relative timing and output reconstruction.
- For large station counts or dense pick sets, enabling Numba (NUMBA_OK = True) is strongly recommended.
- Research and academic use: released under GPLv3.
- Commercial use / integration / redistribution: please contact the corresponding author to obtain permission and discuss licensing terms (email below).
Yuqi Cai: caiyuqiming@foxmail.com