A comprehensive system for detecting, classifying, and organizing time series events into hierarchical structures suitable for language model training. This enhanced version includes advanced multi-resolution feature extraction with wavelet decomposition, spectral analysis, and complexity measures.
This system transforms raw time series data into rich, hierarchical event annotations that can be used to train language models to understand and describe temporal patterns. The enhanced edition extracts 63 advanced features including wavelet decomposition, spectral analysis, entropy measures, and curvature for superior event detection.
- Multi-scale Analysis: Extracts features at 5-50 timestep windows
- Hierarchical Structure: Organizes events from micro (single points) to global (full sequence)
- Rich Vocabulary: 64 distinct event labels across 7 categories
- π Advanced Features: Wavelets, spectral analysis, entropy, curvature, normalized metrics
- π Wavelet Decomposition: Multi-resolution aligned with EventScale hierarchy
- Multiple Text Formats: Generate training text in various formats
- Efficient Processing: Vectorized operations using PyTorch (~26ms per sequence)
- Extensible Design: Easy to add new detectors and event types
- Production Ready: Optimized for one-time large-scale corpus generation
β
Second derivatives (curvature) - detects acceleration/reversals
β
Rolling min/max/range - support/resistance levels
β
Normalized slope - volatility-adjusted trend strength
β
Spectral features - frequency domain energy (choppy vs smooth)
β
Shannon entropy - complexity/chaos detection
β
Jump detection - discrete event identification
β
Volatility asymmetry - directional risk (bullish vs bearish)
β
Wavelet decomposition β - multi-resolution analysis perfectly aligned with hierarchy
EventScale.GLOBAL (150+ steps) β Wavelet: Approximation (A)
βββ Overall sequence regime (bullish/bearish/sideways/volatile)
EventScale.MACRO (50-150 steps) β Wavelet: Detail D4+
βββ Major trend segments, long volatility regimes
EventScale.MESO (15-50 steps) β Wavelet: Detail D3
βββ Medium trends, local corrections
EventScale.MINI (5-15 steps) β Wavelet: Detail D2
βββ Short segments, volatility clusters
EventScale.MICRO (1-5 steps) β Wavelet: Detail D1
βββ Spikes, single peaks/troughs
[0-335] SIDEWAYS_REGIME (GLOBAL)
ββ [0-120] UPTREND_LONG (MACRO)
β ββ [30-45] DOWNTREND_SHORT (MESO) β Nested correction
β β ββ [38] SPIKE_DOWN (MICRO)
β ββ [50-55] VOLATILITY_SPIKE (MINI)
ββ [121-200] FLAT_SEGMENT (MACRO)
ββ [201-335] DOWNTREND_LONG (MACRO)
ββ [250] LOCAL_PEAK (MICRO)
import torch
from hierarchical_event_labeling import HierarchicalEventDataset
# 1. Prepare your data [batch_size, sequence_length]
x = torch.randn(100, 336) # 100 sequences, 336 timesteps each
# 2. Create enhanced dataset with all features
dataset = HierarchicalEventDataset(
x,
use_spectral=True, # Enable spectral features
use_entropy=True, # Enable entropy features
use_wavelets=True, # Enable wavelet decomposition (RECOMMENDED!)
verbose=True
)
# 3. Get annotation for first sequence
ann = dataset[0]
# 4. View hierarchical structure
ann.print_hierarchy()
# 5. Generate training text
text = ann.to_text(format='depth_marked')
print(text)
# 6. Access enhanced features
print(f"Total features extracted: {len(dataset.features)}")
# Output: 63 features including wavelets!-
Step Movements (10 labels)
FLAT,UP_SMALL,UP_MEDIUM,UP_LARGEDOWN_SMALL,DOWN_MEDIUM,DOWN_LARGESPIKE_UP,SPIKE_DOWN
-
Trend Segments (7 labels)
UPTREND_SHORT,UPTREND_MEDIUM,UPTREND_LONGDOWNTREND_SHORT,DOWNTREND_MEDIUM,DOWNTREND_LONGFLAT_SEGMENT
-
Peaks & Troughs (4 labels)
LOCAL_PEAK,SHARP_PEAKLOCAL_TROUGH,SHARP_TROUGH
-
Volatility Regimes (4 labels)
LOW_VOLATILITY,NORMAL_VOLATILITYHIGH_VOLATILITY,VOLATILITY_SPIKE
-
Change Points (2 labels)
MEAN_SHIFT_UP,MEAN_SHIFT_DOWN
-
Global Regimes (4 labels)
BULLISH_REGIME,BEARISH_REGIMESIDEWAYS_REGIME,VOLATILE_REGIME
Raw Time Series [B, L]
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Enhanced Multi-Scale Feature Extraction β
β β’ Basic derivatives (dx, ddx) β
β β’ Rolling features (mean, std, slope) β
β β’ Extrema (min, max, range) β
β β’ Normalized metrics (norm_slope) β
β β’ Spectral features (low/mid/high bands) β
β β’ Entropy (complexity measure) β
β β’ Wavelet decomposition (D1-D4, A) β
β β’ Jump detection & vol asymmetry β
β Result: 63 features per timestep β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Step-Wise Label Encoding β
β Adaptive quantile thresholding β
β Result: [B, L] label tensor β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Event Detection (per sequence) β
β β’ Enhanced Trend Detector (norm slopes) β
β β’ Peak/Trough Detector (alternation) β
β β’ Volatility Regime Detector β
β Result: Flat list of events β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Hierarchical Structure Building β
β β’ Scale classification (duration-based) β
β β’ Parent-child relationships β
β β’ Tree construction (containment) β
β Result: Hierarchical event tree β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
Hierarchical Annotation
-
EnhancedMultiScaleFeatureExtractor π
- Efficient convolution-based feature extraction
- Multiple temporal window sizes (5, 10, 20, 50)
- NEW: Wavelet decomposition (D1-D4 + approximation)
- NEW: Spectral features (STFT bands)
- NEW: Shannon entropy (complexity)
- NEW: Second derivative (curvature)
- NEW: Normalized slopes (noise-filtered)
- Fully vectorized (batch processing)
-
StepWiseEncoder
- Adaptive quantile thresholding
- Step-by-step movement classification
- Handles varying signal magnitudes
-
Enhanced Event Detectors
EnhancedTrendSegmentDetector: Uses normalized slopes for robustnessPeakTroughDetector: scipy.signal.find_peaks with alternation validationVolatilityRegimeDetector: Rolling std quantiles
-
HierarchicalEventBuilder
- Automatic scale classification
- Containment-based parent finding
- Depth-first tree construction
-
HierarchicalAnnotation
- Complete sequence annotation
- Multiple text format generation
- Event filtering and querying
x: torch.Tensor # Shape: [B, L]
# B = batch size (number of sequences)
# L = sequence length (number of timesteps)annotation = dataset[0]
# Components:
annotation.sequence # [L] Original time series
annotation.step_labels # [L] Step-wise labels (vocab IDs)
annotation.event_roots # List[HierarchicalEvent] - Root nodes
annotation.all_events # List[HierarchicalEvent] - Flattened
# Enhanced: Access all 63 features
dataset.features # Dict with all extracted featuresevent.start # int: Starting timestep
event.end # int: Ending timestep
event.label # int: Vocabulary ID
event.label_name # str: Human-readable name
event.scale # EventScale: MICRO/MINI/MESO/MACRO/GLOBAL
event.event_type # str: trend/peak/volatility/regime
event.confidence # float: Detection confidence
event.metadata # dict: Additional information
event.parent # HierarchicalEvent or None
event.children # List[HierarchicalEvent]Natural mapping to EventScale hierarchy:
| Wavelet Level | Event Scale | Duration | What It Captures |
|---|---|---|---|
| D1 (finest detail) | MICRO | 1-5 steps | Spikes, noise, single-point events |
| D2 | MINI | 5-15 steps | Short oscillations, mini-trends |
| D3 | MESO | 15-50 steps | Medium segments, local patterns |
| D4+ (coarse detail) | MACRO | 50-150 steps | Major trends, large structures |
| A (approximation) | GLOBAL | 150+ steps | Overall direction, regime |
Why wavelets are powerful:
- β Time-localized (unlike FFT which is global)
- β Multi-resolution by design
- β Works for non-stationary signals (financial, sensor data)
- β Fast computation (O(L) complexity)
- β Cleaner peak detection (noise naturally filtered)
Features generated:
'wavelet_d1' to 'wavelet_d4' # Detail coefficients [B, L]
'wavelet_a' # Approximation [B, L]
'wavelet_energy_d1' to 'd4' # Energy at each level [B, L]
'wavelet_energy_a' # Approximation energy [B, L]Frequency-domain analysis:
spec_low_{w}- Low-frequency energy (smooth trends)spec_mid_{w}- Mid-frequency energy (oscillations)spec_high_{w}- High-frequency energy (choppy/noise)
Use case: Distinguish between choppy vs smooth market regimes
Complexity measurement:
entropy_{w}- Shannon entropy in sliding windows
High entropy β irregular, chaotic, noisy
Low entropy β regular, predictable, oscillatory
Acceleration detection:
ddx- Second derivative of signal
Use case: Detect sharp reversals, V-shapes vs U-shapes
Noise-filtered trends:
norm_slope_{w}- Slope divided by volatility
Benefit: Filters out noise, highlights statistically significant moves
Support/resistance levels:
min_{w},max_{w},range_{w}- Local envelopes
Use case: Breakout detection, consolidation patterns
Discrete event identification:
jump_indicator- Binary indicator of sudden level shifts
Directional risk:
vol_asymmetry- Ratio of upside to downside volatility
> 1: Bullish volatility (upside moves larger)
< 1: Bearish volatility (downside moves larger)
[0-335]SIDEWAYS_REGIME >[0-120]UPTREND_LONG >>[30-45]DOWNTREND_SHORT >>>[38]SPIKE_DOWN
- Depth indicated by
>symbols - Compact representation
- ~150-300 tokens per sequence
[0-120]UPTREND_LONG [30-45]DOWNTREND_SHORT [38]SPIKE_DOWN [121-200]FLAT_SEGMENT
- Loses hierarchy information
- Simple sequential list
- ~100-200 tokens per sequence
Overall: sideways regime. 3 major segments detected.
[0-120]: uptrend long (contains: trend, peak).
[30-45]: downtrend short (within uptrend long [0-120]).
- Human-readable
- Includes context
- ~200-400 tokens per sequence
from hierarchical_event_labeling import HierarchicalEventDataset
# Load your data
x = torch.load('your_timeseries.pt') # [B, L]
# Create enhanced dataset (RECOMMENDED)
dataset = HierarchicalEventDataset(
x,
use_spectral=True, # Frequency analysis
use_entropy=True, # Complexity detection
use_wavelets=True, # Multi-resolution (CRITICAL!)
verbose=True
)
# Access annotations
for ann in dataset:
print(ann.to_text())# Maximum quality (recommended for corpus generation)
dataset = HierarchicalEventDataset(
x, use_spectral=True, use_entropy=True, use_wavelets=True
)
# Time: ~26ms/seq, 63 features
# Wavelet-focused (still excellent)
dataset = HierarchicalEventDataset(
x, use_spectral=False, use_entropy=False, use_wavelets=True
)
# Time: ~8ms/seq, 46 features
# Minimal (baseline)
dataset = HierarchicalEventDataset(
x, use_spectral=False, use_entropy=False, use_wavelets=False
)
# Time: ~5ms/seq, 35 features# Get all macro-scale events
macro_events = ann.get_events_at_scale(EventScale.MACRO)
# Get events in time range
events = ann.get_events_in_range(100, 200)
# Filter by type
trends = [e for e in ann.all_events if e.event_type == 'trend']
peaks = [e for e in ann.all_events if e.event_type == 'peak']# Check what features were extracted
print(f"Feature count: {len(dataset.features)}")
# Access specific features
dx = dataset.features['dx'] # First derivative
ddx = dataset.features['ddx'] # Curvature
wavelet_d1 = dataset.features['wavelet_d1'] # Finest details
wavelet_a = dataset.features['wavelet_a'] # Global approximation
entropy = dataset.features['entropy_20'] # Complexityfrom hierarchical_event_labeling import TextCorpusGenerator
# Generate text for all sequences
text_gen = TextCorpusGenerator()
corpus = text_gen.generate_corpus(dataset, format='depth_marked')
# Save to file
with open('training_corpus.txt', 'w') as f:
for text in corpus:
f.write(text + '\n')
# Get statistics
stats = text_gen.estimate_tokens(corpus)
print(f"Total tokens: {stats['total_tokens']:,}")from torch.utils.data import DataLoader
def collate_fn(batch):
sequences = torch.stack([ann.sequence for ann in batch])
texts = [ann.to_text() for ann in batch]
return {'sequences': sequences, 'texts': texts}
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
for batch in dataloader:
# Train your model
passPer Sequence (L=336):
- Feature Extraction: ~20ms (63 features)
- Event Detection: ~4ms
- Hierarchy Building: ~2ms
- Total: ~26ms per sequence
Batch Processing:
- 1,000 sequences: ~26 seconds
- 10,000 sequences: ~4.3 minutes
- 100,000 sequences: ~43 minutes
- 1,000,000 sequences: ~7.2 hours
For one-time corpus generation, this is excellent!
| Feature Group | Time/Seq | Feature Count |
|---|---|---|
| Basic derivatives | 0.5ms | 2 |
| Rolling features | 4ms | 32 |
| Spectral | 10ms | 12 |
| Entropy | 8ms | 4 |
| Wavelets | 3ms | 11 |
| Other | 0.5ms | 2 |
| Total | 26ms | 63 |
- Raw Data: ~4 bytes/value
- All Features: ~250 bytes/timestep (63 features)
- Events: ~200 bytes/event
- Total: ~20-30 MB per 1000 sequences (L=336)
# Small dataset
dataset = HierarchicalEventDataset(torch.randn(100, 336)) # ~3 seconds
# Medium dataset
dataset = HierarchicalEventDataset(torch.randn(10000, 336)) # ~4 minutes
# Large dataset - process in batches
for batch in data_batches:
partial = HierarchicalEventDataset(batch)
# Save intermediate resultsclass CustomDetector:
def detect(self, x: torch.Tensor, features: Dict, idx: int) -> List[SimpleSegment]:
# Your detection logic using enhanced features
segments = []
# Example: Use wavelet energy for detection
wavelet_energy = features['wavelet_energy_d3'][idx]
high_energy_idx = torch.where(wavelet_energy > threshold)[0]
# ... create segments ...
return segments
# Integrate into dataset
class CustomEventDataset(HierarchicalEventDataset):
def __init__(self, x, **kwargs):
super().__init__(x, **kwargs)
self.custom_detector = CustomDetector()
def _build_annotation(self, idx, L):
# ... add custom events to builder ...
passdef custom_text_format(ann):
parts = []
# Add metadata
parts.append(f"LEN:{len(ann.sequence)}")
# Add scale distribution
scale_counts = {}
for event in ann.all_events:
scale_counts[event.scale] = scale_counts.get(event.scale, 0) + 1
parts.append(f"SCALES:{scale_counts}")
# Add events
for event in ann.all_events:
parts.append(f"{event.label_name}@{event.start}")
return " | ".join(parts)-
Time Series Foundation Models
- Pre-train on diverse time series data with rich labels
- Learn temporal pattern language with multi-resolution understanding
- Enhanced features provide richer supervision signal
-
EEG/ECG Signal Analysis
- Detect medical events with wavelet-enhanced precision
- Hierarchical diagnosis with multi-scale patterns
- Entropy features detect anomalous brain/heart activity
-
Financial Data
- Market regime detection with spectral features
- Trading pattern recognition with volatility asymmetry
- Support/resistance with rolling extrema
-
Sensor Networks
- Anomaly detection with jump indicators
- System state monitoring with wavelet decomposition
- Change point detection with curvature
-
Climate Data
- Weather pattern analysis with multi-scale trends
- Long-term trend identification with approximation coefficients
- Complexity analysis with entropy features
Wavelet Type: Daubechies-4 (db4)
- Compact support (localized in time)
- Smooth (reduces noise)
- Orthogonal (no redundancy)
Decomposition Levels: Auto-determined (typically 4-5 for L=336)
max_level = pywt.dwt_max_level(L, 'db4')
levels = min(max_level, 5) # Cap at 5 levelsCoefficient Upsampling: Linear interpolation to original length
- Allows aligned features across all scales
- Enables point-wise analysis
- 5 steps: Micro-patterns, noise filtering
- 10 steps: Local trends, short volatility
- 20 steps: Medium trends, regime detection
- 50 steps: Major trends, global patterns
duration = end - start + 1
if duration <= 5: scale = MICRO # Wavelet D1 range
elif duration <= 15: scale = MINI # Wavelet D2 range
elif duration <= 50: scale = MESO # Wavelet D3 range
elif duration <= 150: scale = MACRO # Wavelet D4+ range
else: scale = GLOBAL # Wavelet approximation- Sort events by scale (largest first)
- For each event, find smallest containing event as parent
- Build tree structure (parent-child links)
- Sort children by start position
pip install torch numpy scipy PyWavelets- Python 3.8+
- PyTorch 1.9+
- NumPy 1.19+
- SciPy 1.5+
- PyWavelets 1.1+
MIT License - see LICENSE file
Contributions welcome! Areas for improvement:
- Additional event detectors (seasonality, cycles with ACF)
- More sophisticated hierarchy algorithms
- Performance optimizations
- Additional text formats
- Support for multivariate time series
- Custom wavelet families
- Adaptive feature selection
For questions or issues, please open a GitHub issue or contact [e240203@e.ntu.edu.sg].
Version: 2.0.0 (Enhanced Edition)
Last Updated: January 2026
Python: 3.8+
Dependencies: PyTorch, NumPy, SciPy, PyWavelets
Recommended Configuration: All features enabled (spectral + entropy + wavelets)