🚀 Access the live dashboard here: https://ecfr-web-369420849740.us-central1.run.app
An analytics dashboard for exploring the regulatory complexity of the Code of Federal Regulations (eCFR). The system calculates the RSCS (Regulatory Complexity Score) metric per 1,000 words, tracks LSA (List of Sections Affected) activity from the Federal Register, and provides AI-generated summaries for titles and sections.
Tech Stack:
- Backend: Go 1.24+ with Chi router, clean architecture
- Frontend: Nuxt 3 + Vue 3, USWDS design system
- Storage: SQLite (metadata), Parquet (daily snapshots)
- Analytics: DuckDB for fast aggregations
- AI: Vertex AI / Anthropic for summaries
- Go: 1.24+
- Node.js: 18+ with npm (or bun)
- Docker (optional): For containerized deployment
cp .env.template .envFor local development, the defaults work out of the box:
ENV=local— Uses local filesystem instead of GCSDATA_DIR=./data— Location of SQLite DB and Parquet filesDUCKDB_UI=1— Enables DuckDB Web UI at:4213
The data/ directory must contain:
ecfr.db— SQLite database with metadata- Date-partitioned Parquet files (e.g.,
parquet/2025-01-15/*.parquet)
To populate data, run the ETL pipeline (see ETL_GUIDE.md).
Terminal 1 — API Server:
go run ./cmd/apiAPI runs on http://localhost:8080
Terminal 2 — Frontend Dev Server:
cd web
npm install
npm run devUI runs on http://localhost:3000 with hot reload.
docker-compose up- UI:
http://localhost:3000 - API:
http://localhost:8080 - DuckDB UI:
http://localhost:4213(ifDUCKDB_UI=1)
The dashboard provides an intuitive interface to explore federal regulations:
-
Agency Overview: Start here to see a list of all federal agencies.
- Metrics: View total word counts, average RSCS scores, and LSA activity for each agency.
- Sort & Filter: Use table headers to sort by complexity or volume; filter by CFR Title.
- Hierarchy: Expand departments to see their sub-agencies.
-
Key Metrics Explained:
- RSCS (Regulatory Complexity Score): Measures complexity based on word count, definitions, cross-references, and modal verbs ("shall", "must", etc.) per 1,000 words.
- LSA Activity: Counts of proposed rules, final rules, and notices from the Federal Register API (last 30 days).
-
AI Summaries: Click the "AI Summaries" button to view machine-generated summaries of titles and sections.
Refresh the regulatory data by running the ETL pipeline:
go run ./cmd/etlThe pipeline:
- Fetches changed titles from the eCFR API
- Downloads and parses XML from GovInfo
- Computes RSCS metrics for each section
- Collects agency-level LSA data from Federal Register
- Writes snapshots to Parquet and SQLite
For full details, see ETL_GUIDE.md.
Backend:
go build -o api ./cmd/api # Build API
go build -o etl ./cmd/etl # Build ETL
go test ./... # Run all testsFrontend:
cd web
npm install # Install dependencies
npm run dev # Dev server with hot reload
npm run build # Production build
npm run test # Unit tests (Vitest)
npm run test:e2e # E2E tests (Playwright)The DuckDB web UI provides an interactive SQL interface for analysts and developers to explore eCFR regulatory data directly.
- Ensure
DUCKDB_UI=1is set in your.envfile (enabled by default) - Run the ETL pipeline at least once to populate Parquet data:
go run ./cmd/etl - Ensure
data/ecfr.dbexists (SQLite database)
# Start the API server (DuckDB UI starts automatically)
go run ./cmd/api
# Or with Docker
docker-compose up apiAccess the UI at: http://localhost:4213
These queries replicate the backend API data shown on the dashboard.
Note: SQLite tables are attached as
ecfr.*(e.g.,ecfr.agencies). Parquet columns use PascalCase (e.g.,SnapshotDate,RSCSPer1K).
SELECT
a.id, a.name, a.parent_id,
COALESCE(SUM(s.word_count), 0) as total_words,
COALESCE(AVG(s.rscs_per_1k), 0) as avg_rscs,
COALESCE(lsa.total_documents, 0) as lsa_counts
FROM ecfr.agencies a
LEFT JOIN ecfr.agency_cfr_references acr ON acr.agency_id = a.id
LEFT JOIN ecfr.sections s ON s.title = CAST(acr.title AS TEXT) AND s.agency_id = acr.chapter
LEFT JOIN (SELECT * FROM ecfr.agency_lsa WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM ecfr.agency_lsa)) lsa ON lsa.agency_id = a.id
GROUP BY a.id, a.name, a.parent_id, lsa.total_documents
ORDER BY total_words DESC;-- Use filename filter to exclude _diffs files, union_by_name for schema compatibility
SELECT * FROM read_parquet('data/parquet/*/*.parquet', filename=true, union_by_name=true)
WHERE filename NOT LIKE '%_diffs%'
LIMIT 100;SELECT kind, key, text, model, created_at FROM ecfr.summaries ORDER BY key;-- Compare RSCS scores across snapshot dates by CFR Title
SELECT
"SnapshotDate" as snapshot_date,
"Title" as title,
COUNT(*) as section_count,
ROUND(AVG("RSCSPer1K"), 2) as avg_complexity,
SUM("WordCount") as total_words
FROM read_parquet('data/parquet/*/*.parquet', filename=true, union_by_name=true)
WHERE filename NOT LIKE '%_diffs%'
AND "SnapshotDate" IS NOT NULL
GROUP BY "SnapshotDate", "Title"
ORDER BY "SnapshotDate" DESC, avg_complexity DESC
LIMIT 15;SELECT
a.name,
COUNT(DISTINCT s.id) as section_count,
SUM(s.word_count) as total_words,
ROUND(AVG(s.rscs_per_1k), 2) as avg_rscs,
ROUND(MAX(s.rscs_per_1k), 2) as max_rscs
FROM ecfr.agencies a
JOIN ecfr.agency_cfr_references acr ON acr.agency_id = a.id
JOIN ecfr.sections s ON s.title = CAST(acr.title AS TEXT)
GROUP BY a.name
ORDER BY avg_rscs DESC
LIMIT 10;SELECT
agency_name,
proposed_rules,
final_rules,
notices,
total_documents,
snapshot_date
FROM ecfr.agency_lsa
WHERE CAST(snapshot_date AS DATE) >= current_date - INTERVAL 30 DAY
ORDER BY total_documents DESC
LIMIT 10;The system follows Clean Architecture in Go:
internal/
├── domain/ # Core entities (Agency, Section, Summary, etc.)
├── usecase/ # Business logic (Ingest, Metrics, Snapshot, Summaries)
├── adapter/ # External integrations
│ ├── ecfr/ # eCFR API client
│ ├── govinfo/ # GovInfo XML/GCS client
│ ├── parquet/ # Local & GCS Parquet storage
│ ├── sqlite/ # SQLite repository
│ ├── duck/ # DuckDB analytics helper
│ ├── lsa/ # Federal Register API collector
│ ├── anthropic/# Anthropic Claude client
│ └── vertexai/ # Google Vertex AI client
└── delivery/
└── http/ # Chi router, handlers, DTOs
Data Flow:
- ETL fetches eCFR titles/sections from GovInfo, computes metrics, generates summaries
- Results stored in Parquet (date-partitioned snapshots) and SQLite (metadata mirror)
- API queries DuckDB over Parquet for fast aggregates; SQLite for metadata
- Frontend (Nuxt/Vue) renders USWDS-compliant pages
├── cmd/
│ ├── api/ # HTTP API server
│ └── etl/ # ETL pipeline
├── internal/ # Go application code (clean architecture)
├── web/ # Nuxt 3 frontend
│ ├── components/ # Vue components
│ ├── composables/ # Vue composables
│ ├── pages/ # Route pages
│ └── tests/ # Vitest & Playwright tests
├── data/ # SQLite DB & Parquet files (gitignored)
├── infra/ # Terraform IaC
├── openapi.yaml # API specification
├── docker-compose.yml # Local multi-service setup
└── Dockerfile.* # Container definitions
| Endpoint | Description |
|---|---|
GET /api/agencies |
List agencies with word counts, RSCS, LSA activity |
GET /api/agencies?title=12 |
Filter agencies by CFR title |
GET /api/agencies?include_checksum=true |
Include content checksums |
GET /api/titles/{id} |
Title metrics and summary |
GET /api/sections/{id} |
Section text, RSCS, and summary |
GET /api/summaries |
All AI-generated summaries |
See API.md and openapi.yaml for full specification.
Backend (Go):
go test ./... # All tests
go test -v ./internal/usecase -run TestMetrics # Single testFrontend (Vitest + Playwright):
cd web
npm run test # Unit tests
npm run test:e2e # E2E tests- API.md — API endpoints reference
- DEPLOY.md — Deployment guide (local & GCP)
- ETL_GUIDE.md — ETL pipeline details
- SCHEMA.md — Database schema
MIT License — see LICENSE for details.