Skip to content

leelige/opera

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OPERA

OPERA is a learned execution-time predictor that uses optimizer cost signals and operator-level cost-formula features as priors; the method details are described in the paper.The release is organized as a runnable Python project: one main entry point, operator-cost feature generation utilities, and the support code needed to run OPERA on user-provided workloads.

Repository Layout

  • run_opera.py: main command-line entry for loading data, training OPERA, evaluating on the selected test split, and writing metrics.
  • requirements.txt: minimal Python dependency list.
  • data/: place user-provided train/test trace files here. Each row should contain SQL text, a PostgreSQL JSON plan, observed runtime, and database configuration fields.
  • data/schemainfo/: generated schema and configuration metadata cache for local workloads.
  • cost_file/: PostgreSQL operator cost-formula hooks used to derive operator-level prior features.
  • common/: shared parsing, plan traversal, catalog lookup, and runtime cost-model helpers.
  • db_info/: schema metadata loading and optional live PostgreSQL connection helpers.
  • tools/build_node_cost_cache.py: generates the operator-cost feature artifact from selected traces and cost_file/ hooks.
  • tools/validate_query_total_runtime_based_cost.py: support validator used by the cache builder.

This release intentionally excludes raw trace files, generated schema caches, operator-cost feature artifacts, plotting code, historical ablation scripts, generated run directories, and paper-table post-processing scripts.

Setup

The release was prepared with the following software environment:

Python: 3.11.15
PostgreSQL: 13.3

PostgreSQL 13.3 is recommended when collecting plans, rebuilding schema metadata, or preparing operator-cost features for a custom workload. The included cost hooks and schema-statistics helpers follow PostgreSQL 13 behavior.

Install the Python dependency set before training or rebuilding artifacts:

pip install -r requirements.txt

requirements.txt includes the packages needed for model training, operator-cost feature preparation, and PostgreSQL metadata access:

numpy==2.4.4
torch==2.11.0+cu128
psycopg2-binary==2.9.12
paramiko==4.0.0
pandas==3.0.2

Prepare Operator-Cost Features

Place your train/test traces under data/, then generate an operator-cost feature artifact for the same train/test workload selection. The builder reads the selected trace files, applies the PostgreSQL operator formula hooks in cost_file/, and writes a compressed NPZ file consumed by run_opera.py.

python tools/build_node_cost_cache.py \
  --train-dbs imdb,tpch,tpcds \
  --test-dbs imdb,tpch,tpcds \
  --force

The --jobs option is optional; it only controls parallel worker count during feature generation.

Train And Evaluate

python run_opera.py \
  --device cuda \
  --train-dbs imdb,tpch,tpcds \
  --test-dbs imdb,tpch,tpcds \
  --outdir outputs/opera_retrain

The command trains on the selected train splits, evaluates on the selected test splits, and writes:

outputs/opera_retrain/metrics.json

The supported dataset names in the release entry point are imdb, tpch, and tpcds.

Database Metadata

Schema and configuration metadata can be loaded from data/schemainfo/ when matching cache files are available.

If the schema cache for a workload is missing, db_info/database_info.py falls back to a live PostgreSQL connection. In that path, db_info/infos.py reads connection settings from environment variables:

OPERA_DB_HOST
OPERA_DB_USER
OPERA_DB_PASSWORD
OPERA_DB_NAME
OPERA_DB_PORT
OPERA_PG_COMMAND_CTRL
OPERA_SSH_USER
OPERA_SSH_PASSWORD
OPERA_SSH_PORT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages