🧠 Brain-Go-Brr V4: Clinical EEG Seizure Detection

O(N) complexity seizure detection via dual-stack state-space architecture

Current Status (v4.2.0):

🔄 FLA Baseline: Epoch 30+, plateaued at 0.257 for 13 epochs (best: 0.284 @ epoch 9), early stop ~epoch 36
🚀 Exp4 (Cyclic LR): Ready to launch - SGDR restarts to escape local minimum
⏸️ BiMamba2: Paused (focusing on local training due to cost)

📋 The Clinical Problem

50 million people worldwide suffer from epilepsy. Continuous EEG monitoring in ICUs could catch seizures early—but current systems fail at a critical bottleneck: false alarm fatigue.

At 10 false alarms per 24 hours, clinical staff stop responding. The gold standard? <1 false alarm per day while maintaining >75% seizure detection. That's what we're building.

🎯 The Technical Challenge

Seizures aren't just temporal patterns or spatial patterns—they're both simultaneously:

Temporal dynamics: Multi-scale patterns from milliseconds (spike transients) → seconds (rhythmic activity) → minutes (ictal evolution)
Spatial propagation: Time-varying electrode connectivity as seizures propagate through neural networks (e.g., C3 → C4 → P3)

Traditional approaches fail because they treat these as separate problems. We model them jointly via time-then-graph ordering.

🔬 Our Approach: Dual-Stack Research Experiment

Controlled A/B comparison of two state-space architectures on identical pipeline:

🔷 Stack 1: BiMamba2 (Baseline)

What: Mamba2 with bidirectional processing
Status: ⏸️ PAUSED at Epoch 6 (Modal A100, $1.1k spent, checkpoints backed up in backups/modal_bimamba2_epoch6/)
Foundation: Fast CUDA kernels, selective state propagation (Gu & Dao 2023)
Motivation: Proven SSM architecture with O(N) efficiency

🔶 Stack 2: Gated DeltaNet (Research Variant)

What: FLA (Flash Linear Attention) with gating + delta rule
Status: 📊 Baseline Complete - 0.284 @ 10 FA/24h (epoch 9), 🧪 Exp1 running (testing regularization)
Foundation: Beats Mamba2 on language modeling (ICLR 2025)
Hypothesis: Better for EEG's abrupt context switches (seizure onsets)
Next: Resume baseline with patience=20 to test "second peak" hypothesis (after exp1 completes)

Why both? Seizures have abrupt onsets (need memory clearing via gating) and persistent patterns (need selective retention via delta rule). Gated Delta theoretically handles both. But does theory match clinical reality? That's what we're testing.

Research transparency: All three outcomes (Gated Delta wins, BiMamba2 wins, or tie) are scientifically valuable. No prior work compares these architectures on clinical EEG analysis. See docs/04-model/flash-linear-attention/FLA_ROADMAP.md for full strategy.

🏗️ Architecture: Theory & Design

🤔 Why Time-Then-Graph?

EvoBrain (NeurIPS 2025) establishes two critical theorems:

Theorem 1 (Dynamic Graphs): Explicit dynamic modeling (time-varying adjacency) is strictly more expressive than implicit (static graphs)
Theorem 2 (Temporal Ordering): time-then-graph > time-and-graph > graph-then-time

Intuition: Temporal features must stabilize before graph operations. Processing graph structure first forces simultaneous learning of both patterns—a harder optimization landscape.

Empirical: EvoBrain achieves 95% AUROC on TUSZ (+23% over baselines).

⚡ Why O(N) Complexity?

Problem scale: 60-second EEG windows at 256Hz = 15,360 samples per channel. Traditional Transformers:

Attention cost: O(N²) = 236M operations per layer
Memory: O(N²) = 900MB just for attention matrices (batch=1)
Inference: 8 Hz/batch (too slow for clinical real-time)

State-space solution: Mamba/GatedDelta achieve O(N) via selective state propagation:

Cost: 15K operations (1500× reduction)
Memory: O(N) = 60KB per layer
Inference: 128 Hz/batch (EEG-Mamba 2024) vs 8 Hz/batch for Transformers

🔄 Architecture Flow

EEG Input (B, 19 channels, 15360 samples @ 256Hz = 60s)
        │
        ▼
  ┌─────────────────────────────────────────────┐
  │ TCN ENCODER (8 layers, 16× downsampling)    │
  │ → Multi-scale temporal decomposition        │
  │ → Dilations: 1→2→4→8→16→32→64→128           │
  │ → Output: (B, 512, 960) compressed features │
  └─────────────────────────────────────────────┘
        │
        ▼
  ┌─────────────────────────────────────────────┐
  │ PROJECTION → Per-Electrode Features         │
  │ → 512 channels → 19 electrodes × 64 dims    │
  │ → Output: (B, 19, 960, 64)                  │
  └─────────────────────────────────────────────┘
        │
        ├──────────────┬──────────────┐
        ▼              ▼              ▼
   ┌─────────┐   ┌─────────┐   ┌───────────┐
   │  NODE   │   │  EDGE   │   │ ADJACENCY │
   │   SSM   │   │   SSM   │   │ ASSEMBLY  │
   │  (19×)  │   │ (171×)  │   │ (learned) │
   └────┬────┘   └────┬────┘   └─────┬─────┘
        │             │              │
        │             └──────┬───────┘
        │                    ▼
        │          ┌────────────────────────┐
        │          │ DYNAMIC LAPLACIAN PE   │
        │          │ → k=16 eigenvectors    │
        │          │ → Every 5 timesteps    │
        │          └──────────┬─────────────┘
        │                     ▼
        │          ┌────────────────────────┐
        │          │ GNN (2× SSGConv)       │
        │          │ → Spatial aggregation  │
        │          │ → Alpha=0.05           │
        │          └──────────┬─────────────┘
        │                     │
        └─────────────────────┴─► (B, 19, 960, 128)
                                  ▼
                        ┌──────────────────┐
                        │ GATED FUSION     │
                        │ → 4-head combine │
                        │ → Node + spatial │
                        └────────┬─────────┘
                                 ▼
                        ┌──────────────────┐
                        │ DECODER          │
                        │ → Upsample 16×   │
                        │ → Per-sample     │
                        └────────┬─────────┘
                                 ▼
                        (B, 15360) logits

Key: SSM boxes = 🔷 BiMamba2 (Stack 1) or 🔶 Gated DeltaNet (Stack 2)

Everything else is identical—TCN frontend, GNN backend, fusion layer. Only the temporal core changes.

💡 Component Justification

1. TCN Encoder: Multi-Scale Temporal Decomposition

Temporal Convolutional Networks (Bai et al. 2018):

Parallelism: Entire 60s window processed simultaneously (vs sequential RNN)
Multi-scale: Dilated convolutions capture patterns at exponentially growing timescales:
- Layer 1 (dilation=1): 50ms receptive field (spike detection)
- Layer 4 (dilation=8): 400ms (rhythmic patterns)
- Layer 8 (dilation=128): 6.4s (ictal evolution)
Stable gradients: Residual connections prevent vanishing gradients

Tradeoff: O(N log N) complexity due to dilation, but negligible for N=15K.

2. State-Space Models: The Heart of the System

Core innovation: Selective state propagation with data-dependent gates

S_t = α_t ⊙ S_{t-1} + v_t ⊗ k_t^T    # Forget (α) + update (v⊗k)
o_t = S_t q_t                          # Retrieve

Where α_t ∈ (0,1) controls per-timestep memory decay (not global like RNNs).

🔷 BiMamba2 Architecture (Stack 1)

Node Stream (19 parallel SSMs):

Purpose: Model per-electrode temporal dynamics independently
Config: 6 layers, d_model=64, d_state=16, bidirectional
Example: Rhythmic spiking in C3 electrode evolves independently
Parameters: 7.2M

Edge Stream (171 pairwise SSMs):

Purpose: Model inter-electrode connectivity strength over time
Config: 2 layers, d_model=16, d_state=8, bidirectional
Example: C3-C4 coherence increases during seizure propagation
Parameters: 1.2M

Total SSM: 8.4M parameters, O(N) complexity

🔶 Gated DeltaNet Architecture (Stack 2)

Key difference: Adds delta rule on top of gating

Delta rule: Selective key-value updates without forgetting others

# Mamba2: Global gate (erases everything)
S_t = α_t ⊙ S_{t-1} + update

# Gated DeltaNet: Targeted update (selective retention)
S_t = α_t ⊙ S_{t-1} + β_t ⊙ (k_t ⊗ v_t - old_memory)

Configuration:

Node Stream: 6 layers, d_model=512, num_heads=6, headdim=8
Edge Stream: 2 layers, d_model=32, num_heads=3, headdim=8

Total SSM: ~8.4M parameters (matched to BiMamba2), O(N) complexity

Hypothesis: Delta rule handles EEG better because:

Gating clears memory during seizure onset (abrupt context switch)
Delta rule preserves persistent patterns (rhythmic activity continues)
BiMamba2 has only gating → may "forget" ongoing rhythms during onset

Reality check: This is a hypothesis. Full TUSZ training will tell us if it's true.

3. Dynamic Laplacian PE: Time-Evolving Graph Structure

EvoBrain's Theorem 1 proves explicit time-varying adjacency is strictly more expressive than static graphs.

Implementation:

Compute k=16 eigenvectors of normalized graph Laplacian every 5 timesteps
Eigenvectors = fixed positional coordinates in spectral space (like Transformer sinusoidal PE)
Learning happens in GNN layers that process PE, not in PE itself (best practice)

Why top-k=3 neighbors? 3 strongest connections capture 85%+ of spatial variance (validated by EvoBrain on EEG).

4. Gated Fusion: Adaptive Feature Combination

Problem: Node stream and GNN produce different feature scales and semantics.

Solution: Multi-head gated fusion learns optimal combination:

g = σ(W_g [node_out; gnn_out])        # Per-feature gates
fused = g ⊙ node_out + (1-g) ⊙ gnn_out  # Weighted merge

This allows the model to emphasize:

Node features when electrodes evolve independently (early seizure)
GNN features when spatial synchronization dominates (propagated seizure)

📊 Model Statistics: Side-by-Side Comparison

Component	BiMamba2 (Stack 1)	Gated DeltaNet (Stack 2)	Complexity
TCN Encoder	12.8M	12.8M (identical)	O(N log N)
Node SSM	7.2M (d_model=64)	7.2M (d_model=512)	O(N)
Edge SSM	1.2M (d_model=16)	1.2M (d_model=32)	O(N)
GNN + LPE	6.2M	6.2M (identical)	O(N·k²)
Fusion	2.1M	2.1M (identical)	O(N)
Decoder	1.0M	1.0M (identical)	O(N)
Total	30.5M	30.5M (matched)	O(N)

🔑 Key: Parameter counts matched for fair comparison. Only Node/Edge SSM layers differ. TCN frontend, GNN backend, fusion, and decoder are 100% identical.

🏥 Dataset: TUSZ Clinical Reality

TUH EEG Seizure Corpus

World's largest open-source seizure dataset (Temple University):

504 hours of continuous EEG from 592 patients
36 hours of seizures (~7% prevalence) → 12:1 class imbalance
19-channel 10-20 montage @ 256Hz (clinical standard)
Patient-based splits (train/dev/eval) → no data leakage

Preprocessing pipeline:

Bandpass filter: 0.5-120Hz
Notch filter: 60Hz (removes powerline noise)
Resample: 256Hz (standardize across recordings)
Windowing: 60s windows, 10s stride (83% overlap)
Normalization: Per-channel z-score + clip to ±10σ (removes outliers)

Our cache system (memory-mapped NPY format):

Train: 4667 files → 61,616 balanced windows (34.2% seizure ratio via oversampling)
Dev: 1832 files → 148,224 natural windows (7.7% seizure ratio, real distribution)
Speed: 99.6% faster startup than NPZ (manifest-based loading)
Memory: <1 GB RAM vs 387 GB for NPZ

Why oversample training? Standard ML practice: Train on balanced data (model learns seizure patterns), validate on natural distribution (measures real-world performance). See docs/05-training/training-methodology.md for detailed explanation.

🎯 Performance Targets: Evidence-Based Goals

Based on verified clinical benchmarks and SOTA research (see docs/00-overview/performance-targets.md for comprehensive analysis):

Primary Target (Match Temple Clinical SOTA)

≤4 FA/24h @ ≥50% sensitivity (NEDC OVERLAP scoring)

Temple NEDC verified: 4 FA/24h @ ~50% sensitivity (real clinical deployments)
SeizureTransformer #1: 26.89 FA/24h @ 45.63% sensitivity (TUSZ eval, 2025)
Our goal: Match or beat Temple's verified clinical benchmark

Stretch Goal (Clinical Deployment)

≤10 FA/24h @ ≥75% sensitivity (NEDC OVERLAP scoring)

Enables ICU monitoring with manageable alarm fatigue
Current gap: SeizureTransformer @ 10 FA = 33.90% sensitivity (42-point gap to close)

Additional Metrics (Threshold-Independent)

Metric	Target	Baseline (SeizureTransformer)	Rationale
AUROC	≥0.90	0.902 (TUSZ eval)	Overall discrimination capability
AUPRC	≥0.40	Not reported	Better for 12:1 class imbalance
F1 Score	≥0.45	0.414 (NEDC OVERLAP)	Balanced precision/recall

Realistic Success Criteria

Outcome	Sensitivity @ 4 FA/24h	Publication Tier
Breakthrough	≥60%	Top-tier venue (beats all known systems)
Strong	≥50%	Highly publishable (matches Temple SOTA)
Publishable	≥45%	Solid contribution (architectural novelty)
Minimum	≥40%	Viable if architectural insights clear

Reality check: Temple NEDC research confirms ROC curves are very steep at low FA rates. 5% absolute sensitivity change = massive FA rate shift. Our dual-stack (BiMamba2 vs Gated DeltaNet) comparison provides scientific value regardless of absolute performance.

Scoring impact: Same predictions can yield 3-16× different FA rates depending on scorer (SzCORE vs NEDC OVERLAP vs NEDC TAES). We use NEDC OVERLAP as primary metric. See docs/06-evaluation/TAES_DISAMBIGUATION.md for critical naming collision explanation.

🚀 Quick Start

# 1️⃣ Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2️⃣ Clone repo
git clone https://github.com/clarity-digital-twin/brain-go-brr-v2.git
cd brain-go-brr-v2

# 3️⃣ Setup environment (installs mamba-ssm, PyG)
make setup
make setup-gpu

# Optional: Install FLA for Gated DeltaNet research
make setup-fla

# 4️⃣ Download TUSZ corpus
# Visit: https://isip.piconepress.com/projects/nedc/html/tuh_eeg/index.shtml
# Place in: data_ext4/tusz/edf/

# 5️⃣ Build preprocessing cache (one-time, ~2 hours)
python -m src build-cache \
  --data-dir data_ext4/tusz/edf/train \
  --cache-dir cache/tusz_mmap/train \
  --split train

python -m src build-cache \
  --data-dir data_ext4/tusz/edf/dev \
  --cache-dir cache/tusz_mmap/dev \
  --split dev

# 6️⃣ Smoke test (3 files, 5 minutes)
make smoke-bimamba    # Test BiMamba2 stack
make smoke-fla        # Test Gated DeltaNet stack

# 7️⃣ Full local training (RTX 4090, ~960 hours / 40 days)
export BGB_NAN_DEBUG=1
tmux new -s train
make train-bimamba    # or: make train-fla
# Ctrl+B then D to detach | tmux attach -t train to reattach

Cloud training (Modal A100-80GB) - See docs/05-training/modal.md for details:

# Deploy Modal functions first
modal deploy deploy/modal/app.py

# BiMamba2 production (hands-free, auto-restart)
modal run --detach deploy/modal/app.py \
  --action schedule-training \
  --config configs/modal/train_bimamba.yaml

# Gated DeltaNet production (hands-free, auto-restart)
modal run --detach deploy/modal/app.py \
  --action schedule-training \
  --config configs/modal/train_fla.yaml

# Monitor progress
modal app list
modal app logs <app-id>

🚨 CRITICAL: Use --action schedule-training for 100-epoch production runs (auto-restart every 23h). Use --action train ONLY for smoke tests and experiments.

See docs/01-installation/ and docs/05-training/ for complete setup guides.

📚 Documentation

Getting Started

Quickstart - 5-minute validation
First Training Run - Complete walkthrough

Architecture

V3 Spec - Full implementation details
Laplacian PE - Dynamic graph theory
Stability Evolution - NaN prevention history

Research

FLA Roadmap - Complete A/B strategy
FLA Quick Reference - Config guide
Future Work - Post-training enhancements

Operations

Training Guide - Local + Modal setup
Training Methodology - Why validation has more batches
Modal Timeout Guard - Three-layer defense system
Troubleshooting - Common issues
NaN Prevention - Gradient stability

🤝 Contributing

We welcome contributions! See docs/09-development/ for:

Coding Standards (Ruff, mypy, no comments unless requested)
Testing Strategy (make q before committing)
Technical Debt (currently zero!)

Zero technical debt policy: All P0/P1/P2 issues resolved before major releases.

📖 Citation

@software{brain-go-brr-v4,
  title = {Brain-Go-Brr V4: Clinical EEG Seizure Detection via Dual-Stack State-Space Models},
  author = {Clarity Digital Twin},
  year = {2025},
  version = {4.2.0},
  url = {https://github.com/clarity-digital-twin/brain-go-brr-v2},
  note = {Empirical A/B comparison of BiMamba2 and Flash Linear Attention (BiGatedDeltaNet) architectures on TUSZ}
}

⚖️ License

Apache 2.0 - See LICENSE for full text.

🙏 Acknowledgments

Datasets:

TUH EEG Seizure Corpus (Temple University)
CHB-MIT Scalp EEG Database (Boston Children's Hospital / MIT)

Foundational Papers:

EvoBrain (Kotoge et al., NeurIPS 2025) - Time-then-graph paradigm, dynamic graphs
Mamba (Gu & Dao 2023) - Selective state-space models
Gated DeltaNet (Yang et al., ICLR 2025) - Memory erasure + delta rule
SeizureTransformer (Wu et al. 2025) - SOTA baseline, U-Net + Transformer (EpilepsyBench #1)
EEGMamba (Gui et al. 2024) - Bidirectional Mamba for EEG (speed benchmark)
TCN (Bai et al. 2018) - Temporal convolutional networks
Focal Loss (Lin et al. 2017) - Class imbalance handling

Infrastructure & Libraries:

Modal.com - A100-80GB GPU infrastructure
PyTorch Geometric - Graph neural networks
mamba-ssm (Tri Dao) - Mamba2 implementation
FLA (Songlin Yang) - Gated DeltaNet implementation

Questions? Open an issue • Updates? Watch the repo • Discussion? Start a discussion

Status: v4.2.0 checkpoint resume fixes released • FLA training resumed (Epoch 18/100) • BiMamba2 paused (Epoch 6, backed up) • See STATUS.md for full details

Name		Name	Last commit message	Last commit date
Latest commit History 2,895 Commits
.github		.github
artifacts		artifacts
backups		backups
cache		cache
configs		configs
data_ext4		data_ext4
deploy/modal		deploy/modal
docs		docs
notebooks		notebooks
scripts		scripts
src		src
stubs/sklearn		stubs/sklearn
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
BASELINE_METRICS.md		BASELINE_METRICS.md
CACHE.md		CACHE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
EXPERIMENTAL_PLAN.md		EXPERIMENTAL_PLAN.md
INSTALLATION.md		INSTALLATION.md
INVESTIGATION_REPORT.md		INVESTIGATION_REPORT.md
LICENSE		LICENSE
LITERATURE.md		LITERATURE.md
Makefile		Makefile
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
SETUP.md		SETUP.md
STATUS.md		STATUS.md
TECHNICAL_DEBT.md		TECHNICAL_DEBT.md
TODO.md		TODO.md
TRAINING.md		TRAINING.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

Clarity-Digital-Twin/brain-go-brr-v2

Folders and files

Latest commit

History

Repository files navigation