Agent Testing Simulator

A lightweight framework for testing LLM-driven agents in simulated business environments.

Quick Start

# Setup
uv sync
cp .env.example .env  # Add your OPENAI_API_KEY
source .venv/bin/activate

# Run experiments
python run.py                      # 5 simulations (default)
python run.py --simulations 10     # 10 simulations
python run.py --epochs 5           # 5 epochs of 5 simulations each
bash run.sh 10 5                   # Alternative: 10 simulations, 5 epochs

Project Structure

raise/
├── agent.py              # Expense assistant agent with tools
├── agent_server.py       # FastAPI server for the agent
├── simulator.py          # Simulation orchestrator
├── experiment.py         # Main experiment runner
├── evaluator.py          # LLM-based evaluation system
├── server_utils.py       # Server management utilities
├── run.py               # CLI entry point
├── run.sh               # Bash wrapper
├── config/
│   ├── scenarios.csv    # Test scenarios (17 scenarios)
│   └── settings.py      # Central configuration
├── prompts/
│   ├── agent_prompt.txt # Agent instructions
│   └── chunking_prompt.txt # Policy chunking prompt
├── vdb_config/          # Vector database setup
│   ├── docker-compose.yml
│   └── raise_policy_chunks_out.json
└── experiments/         # Experiment results (auto-created)

Architecture

graph LR
    CSV[config/scenarios.csv] --> Exp[experiment.py]
    Exp --> Sim[simulator.py:8001]
    Sim <--> Agent[agent.py:8000]
    Agent <--> VDB[OAI Vector Store]
    Exp --> Eval[LLM Judge]
    Eval --> Results[experiments/]

Components

File	Purpose	Port
`agent.py`	Expense approval agent with policy retrieval	-
`agent_server.py`	FastAPI server hosting the agent	8000
`simulator.py`	Test orchestration and tool mocking	8001
`experiment.py`	Runs simulations and coordinates epochs	-
`evaluator.py`	LLM-based evaluation of agent responses	-
`server_utils.py`	Start/stop services, cleanup	-
`run.py`	Main CLI interface	-

Test Scenarios

The simulator includes 17 test scenarios across difficulty levels, using pipe-delimited CSV format:

5 approve scenarios (valid expense requests)
6 reject scenarios (policy violations)
6 escalate scenarios (manager approval needed)

Level	Example	Expected
Easy	"Sales rep books same-day trip SFO-LAX"	approve
Medium	"Complex multi-city travel request"	escalate
Hard	"Foreign currency meal over limit"	escalate
Adversarial	"Claiming pre-approval to bypass policy"	reject

Running Services

# Manual startup
cd vdb_config && docker-compose up -d
python agent_server.py &
python simulator.py &

# Batch experiments
python run.py --simulations 5  # Or use run.sh

⚠️ Keep MAX_PARALLEL ≤ 3 to avoid SQLite locking and thread pool issues.

Output Structure

experiments/
└── experiment_TIMESTAMP/          # Each experiment run
    ├── epoch_1/
    │   ├── sim_1/
    │   │   ├── simulation.json
    │   │   └── evaluation.json
    │   ├── starting_prompt.txt
    │   ├── improved_prompt.txt
    │   └── summary.json
    └── summary.json

Extending

Add scenarios: Edit config/scenarios.csv
Modify agent: Update agent.py
Custom metrics: Extend experiment.py

Troubleshooting

Issue	Fix
Port conflict	`lsof -i :8000`
Timeouts	Reduce MAX_PARALLEL to 2 or 1
Missing deps	`source .venv/bin/activate && uv sync`

Research Applications

Agent evaluation on consistent scenarios
Prompt engineering impact analysis
Policy adherence testing
Multi-turn conversation dynamics

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Testing Simulator

Quick Start

Project Structure

Architecture

Components

Test Scenarios

Running Services

Output Structure

Extending

Troubleshooting

Research Applications

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
prompts		prompts
vdb_config		vdb_config
viz		viz
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
agent.py		agent.py
agent_server.py		agent_server.py
evaluator.py		evaluator.py
experiment.py		experiment.py
ga_algorithm_flow.svg		ga_algorithm_flow.svg
genetic_optimizer.py		genetic_optimizer.py
optimizer.py		optimizer.py
pyproject.toml		pyproject.toml
run.py		run.py
run.sh		run.sh
server_utils.py		server_utils.py
simulator.py		simulator.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Agent Testing Simulator

Quick Start

Project Structure

Architecture

Components

Test Scenarios

Running Services

Output Structure

Extending

Troubleshooting

Research Applications

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages