Skip to content

sw33t-b1u/sage

Repository files navigation

SAGE — Security Attack Graph Engine

A platform that operationalizes the threat intelligence cycle by integrating external CTI data (STIX 2.1) with internal asset and organizational information. It visualizes and weights attack paths, and delivers actionable outputs to Red, Blue, and IR teams.

日本語版 README はこちら

Out of scope

This system receives data from the following — it does not replace them: real-time SIEM detection, endpoint protection, vulnerability scanning automation.

Features

  • Multi-source ingestion — OpenCTI (STIX 2.1), AWS Security Hub, GCP Security Command Center, TRACE (web/PDF crawler with PIR-driven validation gate), and analyst manual input via API
  • Attack Graph — Models asset connectivity and reachable attack paths. Asset criticality is dynamically adjusted per PIR at ETL time
  • Attack Flow — Tracks TTP time-series transitions as weighted FollowedBy edges
  • PIR cascadePIR is a first-class graph node with PirPrioritizesActor (TAP), PirPrioritizesTTP (PTTP), and PirWeightsAsset edges materializing the Strategic → Operational → Tactical cascade
  • Identity targetingIdentity SDO and ActorTargetsIdentity edges capture credential / org-targeting attribution (paired with TRACE)
  • Pluggable database backend — SQLite file (default since 4.0.0; synced via StorageBackend to a local directory or GCS) or Cloud Spanner (SAGE_DB=spanner)
  • Analysis API — Internal REST API (Cloud Run, VPC-internal, IAP-protected) exposing attack paths, choke points, actor TTPs, and asset exposure queries
  • Team outputs — GitHub Enterprise playbook issues, Slack priority alerts, Caldera adversary profiles for red team simulations
  • TLP enforcement — TLP Red objects excluded from storage; only white/green/amber ingested
  • IR feedback loop — Incident records feed back into FollowedBy weights over time

Architecture

[OpenCTI]──STIX 2.1───────┐
[Security Hub]─────────────┤
[SCC]──────────────────────┼──→ [GCS: Landing Zone]
[TRACE: validated STIX]────┤      (PIR-driven L2 gate +
[Analyst Input API]─manual─┘       semantic + stix2-validator)

[BEACON: assets.json / pir_output.json /
         identity_assets.json / user_accounts.json]
       │ (TRACE: validate_assets / validate_pir /
       │  validate_identity_assets / validate_user_accounts で検証通過後)
       ▼
[StorageBackend: Local (output/) or GCS]
  ├── stix/        ← TRACE STIX bundles
  ├── assets/      ← BEACON assets outputs
  ├── pir/         ← BEACON PIR outputs
  ├── plans/       ← collection_plan, sources_candidate
  └── db/          ← sage.db (SQLite backend database file)

       │
       ▼
[SAGE: load_assets / load_identity_assets / load_user_accounts /
       PIR ingest]  (falls back to StorageBackend when --input omitted)

        │
        ▼
[ETL Worker — Cloud Run]
  ├── Reads ALL bundles from StorageBackend stix/ category
  ├── STIX parsing + deduplication (identity SDO 含む)
  ├── TLP enforcement
  ├── PIR cascade build (TAP/PTTP/WeightsAsset)
  ├── FollowedBy weight recalculation
  └── Graph upsert (via sage.db backend dispatch)

        │
        ▼
[Database — selected by SAGE_DB]
  ├── sqlite (default): sage.db file synced via StorageBackend
  │     local: <base_dir>/db/sage.db in place
  │     gcs:   download on startup → write → upload after ETL
  └── spanner (optional): Spanner Graph ThreatIntelGraph

        │
        ▼
[Analysis API — Cloud Run, VPC-internal, read-only DB access]
  GET /attack-paths   GET /choke-points
  GET /actor-ttps     GET /asset-exposure
  GET /actors         GET /similar-incidents
  GET /threat-summary
  POST /caldera/adversary
  POST /api/incidents GET /api/incidents
  POST /api/annotate

        │
        ▼
[GHE Issues]  [Slack alerts]  [Caldera adversary profiles]

Documentation

Document Description
docs/setup.md Clone, install, configure, first run, testing
docs/deploy.md Cloud Run deployment and Cloud Scheduler
docs/usage.md CLI commands, workflows, operations, troubleshooting
docs/data-model.md Database schema (SQLite default / Spanner optional), node/edge definitions, PIR formulas
docs/ir-feedback-flow.md IR feedback loop and scoring formulas
docs/structure.md Project directory layout
docs/dependencies.md Dependency rationale and licenses
docs/api-stability.md API stability policy and BC guarantees

Cross-project:

Quick start

git clone https://github.com/sw33t-b1u/sage.git
cd sage
uv sync --extra dev
cp .env.example .env   # defaults run on SQLite with local storage — no GCP values needed
                       # set SAGE_DB=spanner (+ GCP_PROJECT_ID, SPANNER_*) for the Spanner backend

See docs/setup.md for the full setup procedure.

Project Structure

See docs/structure.md for the full directory layout and design criteria.

Development

make check     # lint + test + audit (full quality gate)
make vet       # ruff check
make lint      # ruff format --check
make format    # ruff format + fix
make test      # pytest
make audit     # pip-audit

PIR Methodology References

SAGE consumes PIR JSON produced by BEACON, validated by TRACE before ingestion. The PIR model follows:

PIRs cascade into Operational TAP (Threat Actor Prioritization) and Tactical PTTPs (Priority TTPs). This cascade is materialized in the graph as PIR nodes plus PirPrioritizesActor / PirPrioritizesTTP / PirWeightsAsset edges (added in 0.4.1, generalized in 0.5.0).

License

Apache-2.0 — see LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages