A scalable system to ingest, evaluate, and visualize F1 fan submissions using:
- Neuro-san Cognizant AI Lab's multi-agent AI accelerator framework for intelligent evaluation
- Celery + Redis for massively parallel processing
- SQLAlchemy (SQLite by default, easily swap to Postgres)
- Dash (Plotly) dashboards with a high-performance EvalDataLoader
- uv for fast, reproducible Python packaging
git clone https://github.com/deepsaia/f1-fan-eval.gitThen
cd f1-fan-eval# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Optional: Install nsflow web UI
uv sync --group uiThis will create a virtual environment and install all dependencies automatically.
-
macOS (Homebrew)
brew install redis brew services start redis
-
Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y redis-server sudo systemctl enable --now redis-server
-
Docker (Optional)
docker run -p 6379:6379 redis:7
- Default: SQLite file at
f1_fans_eval.db - Postgres (optional): set env vars in
.env(see below)
-
Input Processing Reads CSV/JSON sources and writes normalized Submissions to DB.
-
AI Evaluation (Neuro-san) Multi-agent evaluation system using specialized Neuro-san agents:
- f1_fan_knowledge: Evaluates F1 technical knowledge across 10 criteria
- f1_fan_enthusiasm: Measures fan passion and excitement
- f1_fan_humor: Assesses entertainment value
Each agent returns scores (1-100) + brief descriptions + token/cost metrics.
-
Dashboards (Dash) Interactive dashboards (Raw Data, Score Distribution, Radar Comparison, System Performance), backed by a singleton EvalDataLoader that auto-infers schema and caches results.
βββ .env.example # Environment configuration template
βββ coded_tools/ # Custom tools for agents
βββ dash_app/ # Visualization dashboard (Plotly Dash)
β βββ db/ # Data loader
β βββ pages/ # Dashboard pages
β βββ utils/ # Helper utilities
βββ db/ # Database layer (SQLAlchemy models)
βββ deploy/ # Celery workers & task enqueuers
βββ eval/ # Evaluation pipeline
βββ input_processor/ # Input ingestion (CSV/JSON β DB)
βββ pyproject.toml # uv packaging & dependencies
βββ registries/ # Neuro-san agent definitions (HOCON)
βββ run.py # Neuro-san server runner
flowchart TB
subgraph "Input Layer"
A["CSV/JSON Inputs<br/>(samples/f1_sample.csv)"]
end
subgraph "Task Queue (Celery + Redis)"
B["Input Queue"]
F["Evaluation Queue"]
end
subgraph "Processing Workers"
C["Input Worker<br/>(process_inputs.py)"]
G["Evaluation Worker<br/>(process_eval.py)"]
end
subgraph "Neuro-san AI Agents"
K["f1_fan_knowledge<br/>(HOCON)"]
L["f1_fan_enthusiasm<br/>(HOCON)"]
M["f1_fan_humor<br/>(HOCON)"]
N["Neuro-san Server<br/>(run.py)"]
end
subgraph "Data Layer"
D["Database<br/>(SQLite/Postgres)"]
end
subgraph "Visualization"
H["Dash App"]
I["Score Distribution"]
J["Radar Comparison"]
O["System Performance"]
end
A -->|enqueue| B
B -->|process| C
C -->|write submissions| D
D -->|read pending| F
F -->|async eval| G
G -->|call via HTTP| N
N -->|route to| K
N -->|route to| L
N -->|route to| M
K & L & M -->|scores + metrics| G
G -->|write evaluations| D
D -->|load data| H
H --> I & J & O
-
Neuro-san AI Agents (
registries/*.hocon): Multi-agent networks that evaluate submissionsf1_fan_knowledge: Evaluates F1 technical knowledge across 10 criteriaf1_fan_enthusiasm: Measures fan passion and excitementf1_fan_humor: Assesses entertainment value
-
Neuro-san Server (
run.py): Backend server that hosts and orchestrates AI agents -
Evaluation Pipeline (
eval/process_eval.py): Async orchestrator that sends submissions to agents -
Task Queue (Celery + Redis): Enables parallel processing of inputs and evaluations
-
Database (SQLite/Postgres): Stores submissions and evaluation results
-
Visualization (Dash): Interactive dashboards for exploring results
All evaluation agents are defined in registries/*.hocon files using Neuro-san's HOCON format.
manifest.hocon: Registry of available agents (which agents are served)f1_fan_knowledge.hocon: Knowledge evaluation agent (10 sub-criteria)f1_fan_enthusiasm.hocon: Enthusiasm evaluation agentf1_fan_humor.hocon: Humor evaluation agentaaosa.hocon: Shared configuration for all agents
Each agent HOCON file defines:
- LLM configuration: Model selection (e.g.,
gpt-4o,claude-3-5-sonnet) - Agent instructions: System prompts and evaluation rubrics
- Tools: Functions the agent can call (e.g.,
evaluate_score,manage_eval) - Scoring criteria: Sub-dimensions and their weights
Example: Changing the evaluation rubric
Edit registries/f1_fan_knowledge.hocon:
"grounding_instructions": """
Follow this rubric when evaluating F1-related responses.
### RUBRIC
**1β30: Poor** β Lacks factual accuracy, generic
**31β50: Below Average** β Some relevant info, but superficial
**51β70: Good** β Solid understanding, accurate
**71β89: Strong** β Knowledgeable, insightful
**90β100: Exceptional** β Expert-level insight
"""Example: Changing the model
"llm_config": {
"use_model": "claude-3-5-sonnet", # or "gpt-4o", "gpt-4o-mini"
}For detailed HOCON syntax and agent configuration options, see the Neuro-san Agent HOCON Reference.
For local development, copy .env.example to .env and customize as needed:
cp .env.example .envImportant: In production, use proper environment variables instead of .env files for better security and configuration management.
Key configuration sections:
- Neuro-san Server: Connection type, host, and port (HTTP: 8080)
- Redis: For Celery task queue
- Database: PostgreSQL (optional) or SQLite (default)
- Processing: Concurrency limits and timeouts
- nsflow UI (optional): Web interface settings
If
POSTGRES_PASSWORDis empty, the code falls back to SQLite:sqlite:///f1_fans_eval.db
The run.py script starts the Neuro-san AI agent server backend:
# Start server only (recommended for production)
python run.py
# Start server + nsflow web UI (if installed)
python run.py --with-ui
# Custom ports
python run.py --http-port 8080
# With UI on custom port
python run.py --with-ui --ui-port 4173The server will be available at:
- HTTP:
localhost:8080 - nsflow UI (if enabled):
http://localhost:4173
python deploy/enqueue_input_tasks.py \
--input-source samples/f1_sample.csvcelery -A deploy.tasks_inputs worker --loglevel=INFO --concurrency=8- Reads the input CSV/JSON
- Writes Submissions into DB
python deploy/enqueue_eval_tasks.py
# optional filters:
# --filter-source path/to/ids.csv|.json|.txt
# --range 0-100
# --sid <comma-separated-sub_ids>
# --override # re-evaluate even if present
# --granular # use per-score-type clients (still single 'description' input)celery -A deploy.tasks_eval worker --loglevel=INFO --concurrency=10Concurrency is up to you; bump based on CPU/IO and model server capacity.
python input_processor/process_inputs.py \
--input-source samples/f1_sample.csvpython -m eval.process_eval --override
# flags:
# --db-url sqlite:///f1_fans_eval.db
# --filter-source <csv/json/txt> or inline JSON list
# --concurrency 8
# --local-testpython dash_app/app.py
# Opens http://127.0.0.1:8050-
EvalDataLoader (
dash_app/db/eval_data_loader.py): Singleton, lru-cached loader. Infers columns from SQLAlchemy models and guarantees consistent DataFrames. -
Reusable utils
utils/chart_helper.pyβ histograms, boxplots, time-series, radarutils/layout_helper.pyβ common layout scaffoldingutils/data_helpers.pyβ data cache & df buildersutils/ui_helpers.pyβ tables and small UI bits
-
Pages
- Raw Data (
pages/raw_data.py) - Score Distribution (
pages/score_dist.py) - Radar Comparison (
pages/radar_comp.py) - System Performance (
pages/system_perf.py)
- Raw Data (
The app uses component-local
dcc.Storecaches and the shared EvalDataLoader for snappy UX even with large tables.
# 1) Setup (first time only)
cp .env.example .env
uv sync
# 2) Start Redis
brew services start redis # macOS
# OR: sudo systemctl start redis # Linux
# 3) Start Neuro-san AI server (in terminal 1)
python run.py
# 4) Start Celery workers (in separate terminals)
# Terminal 2: Input processing worker
celery -A deploy.tasks_inputs worker --loglevel=INFO --concurrency=6
# Terminal 3: Evaluation worker
celery -A deploy.tasks_eval worker --loglevel=INFO --concurrency=10
# 5) Enqueue tasks (in another terminal)
python deploy/enqueue_input_tasks.py --input-source samples/f1_sample.csv
python deploy/enqueue_eval_tasks.py --override
# 6) Launch the Dash visualization app
python dash_app/app.py-
Managed with uv via
pyproject.toml -
Install dev dependencies:
uv sync --group dev # Run code quality tools black . pylint f1-fans-eval pytest
-
Install optional UI dependencies:
uv sync --group ui
Key runtime deps:
celery,redis,sqlalchemy,pandas,dash,dash-bootstrap-components,neuro-san
Optional UI deps:
nsflow,uvicorn
-
Neuro-san Agents All evaluation logic is defined in
registries/*.hoconfiles. You can customize rubrics, models, and scoring criteria without touching Python code. See π οΈ Customizing AI Agents section above. -
SQLite vs Postgres SQLite is great for local/dev. For scale, set Postgres env vars in
.env. The code auto-builds DB URLs. -
Safety & Retries Workers include retry policies. Evaluation calls use
MAX_RETRIES,RETRY_DELAY, and a semaphore to protect AI servers from overload. -
Multi-Agent Architecture Each evaluation dimension (knowledge, enthusiasm, humor) has its own specialized Neuro-san agent. The
process_eval.pyorchestrator calls all three agents and aggregates their scores. -
Dash Performance DataTables use virtualization and
EvalDataLoadercaches to scale. No fixed row limits.