Podman-based orchestration for SAST, SCA, and DAST β automated security scanning and reporting for Git repositories.
GeoToolKit is a comprehensive, offline software assurance toolkit designed to scan open-source Git repositories for malicious code and vulnerabilities. It orchestrates industry-standard security scanning tools, running each in secure, isolated Podman containers for maximum safety and reliability.
- π Secure Container Execution: All scanning tools run in locked-down, rootless Podman containers with restrictive seccomp profiles
- π Multi-Language Support: Scans projects in Python, JavaScript, TypeScript, Java, Go, Ruby, C#, PHP, and more
- π Comprehensive Analysis:
- SAST (Static Application Security Testing) with Semgrep
- SCA (Software Composition Analysis) with Trivy and OSV-Scanner
- DAST (Dynamic Application Security Testing) with OWASP ZAP
- π« Offline Operation: Designed for air-gapped environments with offline vulnerability databases
- π Professional Reporting: Generates structured Markdown reports with risk levels and compliance mapping
- β‘ Automated Workflow: Clone, scan, report - fully automated from start to finish
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Git Repos βββββΆβ GeoToolKit βββββΆβ Security Reportβ
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Semgrep β β Trivy β βOSV-Scan β
βContainer β βContainer β βContainer β
ββββββββββββ ββββββββββββ ββββββββββββ
β
βΌ
ββββββββββββ
βOWASP ZAP β
βContainer β
ββββββββββββ
- Linux Host (Fedora, OpenSUSE, Ubuntu, etc.)
- Podman installed and configured
- Python 3.11+
- uv package manager (recommended)
If you just want to use GeoToolKit without development setup:
# Install from PyPI (when published)
pip install geotoolkit
# Or install from GitHub releases
pip install https://github.com/GeoDerp/GeoToolKit/releases/latest/download/geotoolkit-*.whl
# Run the CLI
geotoolkit --input projects.json --output report.md --database-path data/offline-db.tar.gzYou can start the Model Context Protocol (MCP) server to manage projects.json and trigger scans programmatically.
From an installed environment (preferred):
# Start built-in MCP server (default host 127.0.0.1, default port 9000)
# --mcp-server enables MCP mode; --mcp-host/--mcp-port override listen address.
geotoolkit --mcp-server --mcp-host 127.0.0.1 --mcp-port 9000 --database-path data/offline-db.tar.gzQuick API example:
# Trigger a scan by POSTing a projects.json (server returns report text)
curl -s -X POST "http://127.0.0.1:9000/runScan" \
-H "Content-Type: application/json" \
--data-binary @projects.json > security-report.mdExample MCP propts
createProjects from txt list
*Generates projects.json from a list of github repos
runScan
Runns GeoToolKit scanner with projects.json file
For development:
-
Clone the repository:
git clone https://github.com/GeoDerp/GeoToolKit.git cd GeoToolKit -
Set up Python environment:
# Using uv (recommended) uv venv source .venv/bin/activate uv sync --extra mcp
-
Prepare offline database (optional but recommended):
mkdir -p data # Use the automated builder script python scripts/build_offline_db.py --output data/offline-db.tar.gz --simulate
-
Configure projects in
projects.json(with optional inline network allowlist for DAST):{ "projects": [ { "url": "https://github.com/fastapi/fastapi", "name": "fastapi", "language": "Python", "description": "Modern, fast web framework", "ports": ["8000"], "network_allow_hosts": ["127.0.0.1:8000", "localhost:8000"], "network_allow_ip_ranges": ["127.0.0.1/32"] } ] } -
Run the scanner:
python src/main.py \ --input projects.json \ --output security-report.md \ --database-path data/offline-db.tar.gz
GeoToolKit has been validated in production-like environments with the following results:
- SAST (Semgrep): β Fully functional across Python, JavaScript, Go
- SCA (Trivy): β Functional with graceful offline fallback
- SCA (OSV-Scanner): β Functional with graceful offline fallback
- DAST (OWASP ZAP):
β οΈ Container starts successfully, API limitations documented below
- Works reliably across all tested languages (Python, JavaScript, TypeScript, Java, Go, Ruby)
- Uses custom rulesets from
rules/semgrep/ - No network connectivity required
- Offline Mode: Set
GEOTOOLKIT_TRIVY_OFFLINE=1to prevent network attempts - Cache Setup: For true offline operation, pre-populate Trivy cache:
# On a networked machine, run Trivy once to create cache podman run --rm -v trivy-cache:/root/.cache/trivy:rw \ docker.io/aquasec/trivy fs --download-db-only # Then set in your environment export TRIVY_CACHE_DIR=/path/to/trivy-cache export GEOTOOLKIT_TRIVY_OFFLINE=1
- Graceful Degradation: If offline mode is enabled without a cache, Trivy scan is skipped with a clear message
- Offline Mode: Set
GEOTOOLKIT_OSV_OFFLINE=1withGEOTOOLKIT_OSV_OFFLINE_DB=/path/to/db - Network Fallback: Gracefully handles DNS/network errors in air-gapped environments
- Image Selection: Use
export OSV_IMAGE=ghcr.io/google/osv-scanner:latest - Expected Behavior: Reports "No package sources found" for repos without package manifests (this is normal)
- Container Execution: β Starts successfully with proper network isolation
- API Limitation: The
ghcr.io/zaproxy/zaproxy:latestimage (v2.16.1) has spider add-on API limitations - Workaround: Scanner gracefully handles 404 errors and continues with active scan
- Production Recommendation: Consider using
owasp/zap2docker-stablefor better add-on support - Timeout Configuration:
export ZAP_SPIDER_TIMEOUT=120 export ZAP_ASCAN_TIMEOUT=600 export ZAP_READY_TIMEOUT=300
For a full multi-language scan with DAST support:
# 1. Start DAST targets (if testing live web applications)
python scripts/start_dast_targets.py validation/configs/container-projects.json
# 2. Configure offline/network settings
export OSV_IMAGE=ghcr.io/google/osv-scanner:latest
export TRIVY_CACHE_DIR=/path/to/trivy-cache # if available
export GEOTOOLKIT_TRIVY_OFFLINE=1 # if truly offline
export GEOTOOLKIT_OSV_OFFLINE=0 # disable if no offline DB
export ZAP_SPIDER_TIMEOUT=120
export ZAP_ASCAN_TIMEOUT=600
export ZAP_READY_TIMEOUT=300
# 3. Run scan
python src/main.py \
--input projects.json \
--output security-report.md \
--database-path data/offline-db.tar.gzThe repository ships with curated configs under validation/configs/:
enhanced-projects.jsonβ a trimmed multi-language suite with DAST metadata for quick validationcontainer-projects.jsonβ container definitions consumed bystart_dast_targets.py
Both files include dast_targets, allowlists, and health endpoints so the workflow exercises SAST/SCA runners plus ZAP inside the isolated Podman network.
-
Trivy Offline Database: The
data/offline-db.tar.gzcontains NVD JSON files but not the SQLite database Trivy expects. Followdocs/OFFLINE.mdto create a proper Trivy cache. -
OSV Offline Database: The bundled offline DB is a placeholder. For production offline scanning, obtain a real OSV database.
-
ZAP Spider API: Some ZAP images have incomplete spider add-on support. Active scanning continues even if spider fails.
For detailed troubleshooting and offline artifact preparation, see docs/OFFLINE.md and docs/CONTAINER_SECURITY.md.
An optional FastMCP server is provided to programmatically manage projects.json and run scans. It can interpret network_config blocks into explicit allowlists for DAST. See the full guide in mcp_server/README.md.
To use the MCP server, first ensure you have the required dependencies installed:
uv sync --extra mcpThe recommended way to run the server is via the main CLI:
# Start built-in MCP server (requires mcp dependencies)
# The --mcp-server flag enables MCP mode.
geotoolkit --mcp-server --mcp-host 127.0.0.1 --mcp-port 9000 --database-path data/offline-db.tar.gzFor development, you can also run the server script directly:
# Start MCP server directly (requires fastmcp)
uv run python mcp_server/mcp_server.pyTools:
createProjects(projects, outputPath?)β writesprojects.jsonand normalizes anynetwork_configintonetwork_allow_hosts,network_allow_ip_ranges, andports.normalizeProjects(inputPath?, outputPath?)β reads an existingprojects.jsonand derives explicit allowlists fromnetwork_config.runScan(inputPath?, outputPath?, databasePath?)β runs the scan and returns the report text as a string.
Quick note on network_config interpretation:
allowed_egress.external_hostsare turned intohost:portusingnetwork_config.portsor protocol defaults (80/443)- Other keys under
allowed_egressare treated as hostnames/IPs; keys containing/are treated as CIDRs and added tonetwork_allow_ip_ranges
Try with the included test-projects.json to validate Python and Go scanning quickly:
# Quick CLI smoke test using included test-projects.json
python src/main.py --input test-projects.json --output quick-report.md --database-path data/offline-db.tar.gz
# End-to-end validation helpers
python scripts/quick_validation.py
python scripts/validation_executor.py- If containers fail to start due to seccomp paths, ensure the profiles exist at
seccomp/*.jsonand you have Podman installed. - If image pulls are blocked (e.g., corporate network), pre-pull required images:
docker.io/semgrep/semgrepdocker.io/aquasec/trivyghcr.io/ossf/osv-scanner:latestghcr.io/zaproxy/zaproxy:latest
- For strictly offline environments, consider mirroring images to a local registry and using Podman's
--registries-conf.
If you interrupt a scan (Ctrl+C) or encounter Podman commands hanging:
-
Check for stuck processes:
ps aux | grep podman | grep -v grep
-
Kill stuck Podman processes:
# Kill any hanging podman commands pkill -9 podman -
Restart Podman socket:
systemctl --user restart podman.socket
-
Clean up containers:
# Remove stopped containers podman container prune -f # Remove all containers (if safe to do so) podman rm -f $(podman ps -aq)
-
Verify Podman is working:
podman ps podman version
Prevention: Always allow scans to complete gracefully. The ZAP runner includes timeout protection (configurable via ZAP_SPIDER_TIMEOUT, ZAP_ASCAN_TIMEOUT, and ZAP_READY_TIMEOUT environment variables) to prevent infinite hangs.
- View the results: Open
security-report.mdin your favorite editor
The projects.json file supports the following format:
{
"projects": [
{
"url": "https://github.com/owner/repo",
"name": "project-name",
"language": "Programming Language",
"description": "Brief description (optional)",
"dast_targets": ["http://127.0.0.1:3000/"],
"network_allow_hosts": ["127.0.0.1:3000", "localhost:3000"],
"ports": ["3000"]
}
]
}Key fields for DAST readiness:
dast_targets: Explicit HTTP/HTTPS endpoints to probe once the application is running (typically on localhost viastart_dast_targets.py).network_allow_hosts/network_allow_ip_ranges: Host:port or CIDR entries that explicitly permit egress from the ZAP container; DAST runs fail closed if the target is not included.ports: Used to derive default allowlist entries and to stand up local containers for validation.
For Dynamic Application Security Testing with OWASP ZAP, create a network-allowlist.txt:
localhost:8080
api.example.com:443
database.internal:5432
Then run with:
python src/main.py \
--input projects.json \
--output report.md \
--database-path data/offline-db.tar.gz \
--network-allowlist network-allowlist.txtGeoToolKit automatically looks for the Podman network defined in GEOTOOLKIT_DAST_NETWORK (defaults to gt-dast-net). When present, the ZAP container joins this isolated bridge so it can only talk to the explicitly allowed target containers. When scanning localhost services, ZAP falls back to slirp4netns:allow_host_loopback=true to keep traffic sandboxed while still reaching 127.0.0.1.
You can tune container networking and authentication via environment variables. Sensible, secure defaults are used when not set.
- ZAP (DAST)
- ZAP_API_KEY: If set, the ZAP container is started with API key authentication enabled and the provided key configured.
- ZAP_DISABLE_API_KEY: Set to 1/true to explicitly disable API key auth. Avoid in production.
- ZAP_PORT: Local port to expose the ZAP API (default 8080).
- ZAP_IMAGE: Container image to use (default ghcr.io/zaproxy/zaproxy:latest).
- ZAP_BASE_URL: Connect to an existing ZAP instance instead of starting a container.
- ZAP_PODMAN_NETWORK: Podman --network value to use (e.g., bridge). Optional.
- GEOTOOLKIT_DAST_NETWORK: Name of the isolated Podman network to join when running ZAP (defaults to
gt-dast-netif it exists; otherwise falls back to loopback-only access). - ZAP_PODMAN_PULL: Podman --pull policy: always|missing|never (default missing).
- ZAP_PODMAN_ARGS: Extra Podman args appended as-is.
- CONTAINER_HOST_HOSTNAME: Hostname used inside containers to reach the host (default host.containers.internal). Set for environments where the default isn't available.
- If you encounter registry access errors pulling OSV images, set an explicit image that is reachable from your host, for example:
export OSV_IMAGE=ghcr.io/google/osv-scanner:latest- For longer or more thorough ZAP scans, tune these environment variables (defaults used by GeoToolKit):
export ZAP_SPIDER_TIMEOUT=120 # seconds (default used by toolkit)
export ZAP_ASCAN_TIMEOUT=600 # seconds (default used by toolkit)
export ZAP_READY_TIMEOUT=300 # seconds (default used by toolkit)- Semgrep (SAST)
- SEMGREP_PACK: When set, runs Semgrep using this config pack.
- SEMGREP_NETWORK: Podman network mode for Semgrep container. Defaults to --network=none for isolation. Avoid host networking.
Security note: Host networking is intentionally avoided by default. Explicitly opt into networked modes only when required and understood.
For air-gapped or restricted CI environments, GeoToolKit can operate fully offline using pre-generated artifact bundles. This eliminates network dependencies and significantly speeds up scan execution.
- Generate artifacts on a networked host:
bash scripts/prepare_offline_artifacts.sh data/offline-artifactsThis creates:
trivy-cache.tgz- Trivy vulnerability database (~75-80 MB)osv_offline.db- OSV vulnerability database (if available)
- Extract and configure in your GeoToolKit workspace:
# Extract Trivy cache
mkdir -p data/trivy-cache
tar -xzf data/offline-artifacts/trivy-cache.tgz -C data/trivy-cache
# Move OSV database (if available)
mv data/offline-artifacts/osv_offline.db data/osv_offline.db- Set environment variables (already configured in
run_production_test.sh):
# Trivy offline mode
export TRIVY_CACHE_DIR="$(pwd)/data/trivy-cache"
export GEOTOOLKIT_TRIVY_OFFLINE=1
# OSV offline mode (if database available)
export GEOTOOLKIT_OSV_OFFLINE=1
export GEOTOOLKIT_OSV_OFFLINE_DB="$(pwd)/data/osv_offline.db"- Run scans - GeoToolKit will automatically use offline databases:
bash run_production_test.sh
# Or directly:
python -m src.main --input projects.json --output report.md --database-path data/offline-db.tar.gzFor CI environments, upload artifacts to your CI storage and extract before scanning:
# In your CI pipeline (e.g., GitHub Actions, GitLab CI)
mkdir -p /workspace/trivy-cache
tar -xzf trivy-cache.tgz -C /workspace/trivy-cache
export TRIVY_CACHE_DIR=/workspace/trivy-cache
export GEOTOOLKIT_TRIVY_OFFLINE=1
# If OSV database available:
export GEOTOOLKIT_OSV_OFFLINE=1
export GEOTOOLKIT_OSV_OFFLINE_DB=/workspace/osv/osv_offline.dbConfirm your offline setup is working correctly:
# Check Trivy cache structure
ls -lh data/trivy-cache/db/
# Should show: trivy.db (~780 MB) and metadata.json
# Check OSV database (optional)
ls -lh data/osv_offline.db- No network access required - Scans work in fully air-gapped environments
- Faster execution - No database downloads during scans (saves ~5-10 seconds per project)
- Predictable results - Same vulnerability data across all runs until you update artifacts
- Reduced failures - No network timeouts or registry access issues
Regenerate artifacts periodically (weekly/monthly) to get latest vulnerability data:
# On networked host
bash scripts/prepare_offline_artifacts.sh data/offline-artifacts
# Distribute updated artifacts to your CI/development environmentsTrivy complains about missing database:
- Verify
data/trivy-cache/db/trivy.dbanddata/trivy-cache/db/metadata.jsonexist - Ensure
TRIVY_CACHE_DIRpoints to the extracted cache directory (not the .tgz file) - Check file permissions - Trivy needs read access to cache files
OSV Scanner skips scanning:
- This is expected if
data/osv_offline.dbdoesn't exist - OSV artifact generation requires specific OSV Scanner versions - check
scripts/prepare_offline_artifacts.shoutput - You can still get SCA coverage from Trivy without OSV
Performance still slow:
- Verify environment variables are set correctly before running scans
- Check that large repos aren't being cloned repeatedly (use local paths if testing)
- Consider using
validation/configs/production-mcp-projects.jsoninstead of fullproduction-projects.jsonfor faster iteration
For detailed offline operation documentation, see docs/OFFLINE.md.
For optimal security and performance in air-gapped environments, GeoToolKit supports offline vulnerability databases.
You can automatically assemble an offline database bundle combining multiple vulnerability sources:
# Create a comprehensive offline database
python scripts/build_offline_db.py \
--output data/offline-db.tar.gz \
--years 2023 2024 2025
# For air-gapped environments (simulation mode)
python scripts/build_offline_db.py --output data/offline-db.tar.gz --simulateOptions:
--simulate: Create a placeholder bundle without network calls (useful for CI or air-gapped environments)--no-osvor--no-ghsa: Skip specific vulnerability sources- Set
GITHUB_TOKENenvironment variable to enable GitHub Security Advisory export
For manual database configuration:
-
National Vulnerability Database (NVD)
- Download: https://nvd.nist.gov/vuln/data-feeds
-
OSV Database
- Command:
osv-scanner --experimental-download-offline-databases
- Command:
-
GitHub Security Advisories
- Download: https://github.com/advisories
Contributing and development notes are available in CONTRIBUTING.md.
If your shell is fish (the default for some developers), use syntax compatible with fish when following examples in CI snippets (for example, use env VAR=value command or set -x VAR value; command).
# Run all tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html
# Run specific test suite
python -m pytest tests/unit/
python -m pytest tests/integration/# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Type checking
uv run mypy src/- Create a new runner in
src/orchestration/runners/ - Implement the
run_scan()method - Add parsing logic in
src/orchestration/parser.py - Add appropriate seccomp profile in
seccomp/ - Update workflow in
src/orchestration/workflow.py
- Rootless execution - All containers run without root privileges
- Network isolation - SAST/SCA tools run with
--network=none - Read-only filesystems - Containers cannot modify their base images
- Capability dropping - All Linux capabilities dropped by default
- Seccomp profiles - Restrictive syscall filtering for each tool
- Temporary filesystems - Limited tmpfs for scratch space only
| Language | SAST | SCA | Package Managers |
|---|---|---|---|
| Python | β | β | pip, poetry, pipenv, uv |
| JavaScript | β | β | npm, yarn, pnpm |
| TypeScript | β | β | npm, yarn, pnpm |
| Java | β | β | maven, gradle |
| Go | β | β | go modules |
| Ruby | β | β | bundler, gems |
| C# | β | β | nuget, paket |
| PHP | β | β | composer |
| Rust | β | β | cargo |
| C/C++ | β | β | conan, vcpkg |
The generated reports include:
- Executive Summary - High-level findings overview
- Project Details - Scanned repositories and metadata
- Vulnerability Analysis - Detailed findings with severity levels
- Compliance Mapping - NIST, OWASP Top 10, ISM alignment
- Recommendations - Actionable remediation steps
# Install development dependencies
uv sync --dev
# Run pre-commit hooks
uv run ruff check --fix src/ tests/
uv run ruff format src/ tests/This project is licensed under the MIT License - see the LICENSE file for details.
Current version: v0.1.0 (Beta)
- Issues: GitHub Issues
- Security Issues: Please report privately via email
Big thanks to the open-source security scanners and projects that make GeoToolKit possible:
- Semgrep β powerful, fast SAST (https://semgrep.dev)
- Trivy β container and dependency SCA from Aqua Security (https://github.com/aquasecurity/trivy)
- OSV-Scanner β offline scanner that uses Googleβs OSV (Open Source Vulnerabilities) data to detect known vulnerabilities in packages and SBOMs (https://github.com/ossf/osv-scanner)
- OWASP ZAP β DAST tooling from the OWASP project (https://www.zaproxy.org/)
If you've contributed integrations for other scanners or tools, thank you β please add them to this list by submitting a PR.
This project was developed with assistance from AI tools to speed up development and help generate documentation and examples. All code and contributions were reviewed by human maintainers. If you have questions about any part of the codebase or believe an AI-assisted change needs clarification, please open an issue or a pull request so maintainers can review and address it.