🟢🍋 LimeQO: Low Rank Learning for Offline Query Optimization

🌟 Overview

LimeQO is a novel offline query optimizer that leverages low-rank structure in the workload matrix to optimize query performance efficiently.

What is the workload matrix?

Each row represents a query.
Each column represents a hint (e.g., disable Nested Loop Join).
Each cell represents the runtime of the query under the given hint (in seconds).

Offline query optimization?

What: Effectively explore alternative query execution plans offline, in a way that minimizing total workload runtime while minimizing offline computation time.
Why: Focus on repetitive workloads and prevent regression.
How: Explores optimal plans within the available query hint space by steering the existing query optimizer.

How LimeQO works in one sentence?

LimeQO uses ALS (Alternating Least Squares) algorithm to predict the runtime, then leverages the completed workload matrix to efficiently select query hints for offline exploration.

🚩Features:

✅ Simple implementation: Utilizes a straightforward matrix factorization algorithm (ALS). DO NOT need any featurization!
✅ Low overhead: Negligible additional computation time.
✅ High performance: Achieves 2x speedup within just one hour.

🔗 Dataset

Download dataset from dropbox. This dataset contains the four workload we used in the experiments: ceb, job, stack, and dsb. Each zip includes query runtimes and corresponding EXPLAIN plans for all queries and hints.

wget -O dataset/qo_dataset.zip "https://www.dropbox.com/scl/fo/y4e88tmcx7ywo4ou1unnh/ABN6iqV1t_ecktO51gsPKRc?rlkey=hedjnmkpak3r3gxjzx48s9etu&st=uxnr4s17&dl=1"
unzip dataset/qo_dataset.zip -d dataset/
rm dataset/qo_dataset.zip

📂 Project Structure

.
├── dataset/              # Workload datasets (CEB, JOB, Stack, DSB)
├── src/                  
│   ├── data/             # Data loading and preprocessing
│   ├── models/           # Core LimeQO implementation
│   ├── utils/            # Helper functions and utilities
│   ├── strategies/       # Optimization strategies: Random, Greedy, LimeQO, LimeQO+
│   └── run_experiment.py # Run main experiments
├── draw/                 # Figures and visualizations
├── experiment/           # Experiment results
└── limeqo.ipynb          # Interactive demo notebook

🚀 Try LimeQO!

Check out the interactive demo in limeqo.ipynb to see how LimeQO works! Achieve 2x speedup in just less than 1 hour! ⬇️

🧐 Detailed instructions on reproducing the results in the paper:

Download code and set up environment

git clone https://github.com/zixy17/LimeQO.git
pip install -e .

Data and Experiment Results

Experiment results
The directory experiment/ already contains all results used in the paper.
You can directly reproduce every figure from the paper using these results, without downloading any dataset.
Datasets
The directory dataset/ contains partially packaged datasets.
To fully re-run experiments, you need to download the complete datasets (including all EXPLAIN plans) from Dropbox:
```
wget -O dataset/qo_dataset.zip "https://www.dropbox.com/scl/fo/y4e88tmcx7ywo4ou1unnh/ABN6iqV1t_ecktO51gsPKRc?rlkey=hedjnmkpak3r3gxjzx48s9etu&st=uxnr4s17&dl=1" 
unzip dataset/qo_dataset.zip -d dataset/ 
rm dataset/qo_dataset.zip
```
- The EXPLAIN plans are required only for LimeQO+ (training requires the plan trees).
- They are not needed for LimeQO or for reproducing figures.

Reproduce Figures from Experiment Results

All experiments results are stored in experiment/:
- ceb/, job/, stack/, dsb/ correspond to each dataset.
- These results were produced by src/run_experiment.py and are used for figure generation.
To reproduce the plots from the paper, run the notebooks in draw/.
- Example: Figure 5 (the first figure in the experiments section) can be reproduced by running: draw_ceb_fig1.ipynb, draw_job_fig1.ipynb, draw_stack_fig1.ipynb, draw_dsb_fig1.ipynb

Reproduce ALL experiment results

Download datasets as described above.
Run experiments: python src/run_experiment.py --dataset ceb
- --dataset can be one of ceb, job, stack, dsb.
- This will run all methods, including baselines (QOAdvisor, Random, Greedy) and ours (LimeQO, LimeQO+).
- Each baseline is repeated 20 times; LimeQO+ is repeated 5 times (since it involves training a neural network).
- Running each dataset may take more than 1 hour. For example, the LimeQO+ method on CEB dataset took ~2 hours per run (on CPU), so the total time is ~10 hours.
Figures can then be regenerated by running the notebooks in draw/.

Demo: limeqo.ipynb

This demo reproduces LimeQO results on the CEB benchmark (one figure from the paper).
It is designed as a quick exploratory example and does not reproduce baselines or LimeQO+.
No dataset download is required for this demo.
Running the demo may produce a RuntimeWarning: overflow encountered in expm1. This does not affect correctness of the results or the reproduced figure.

❤️ If you find our data, code, or the paper useful, please cite our paper 📑:

@article{yi2025low,
  title={Low Rank Learning for Offline Query Optimization},
  author={Yi, Zixuan and Tian, Yao and Ives, Zachary G and Marcus, Ryan},
  journal={Proceedings of the ACM on Management of Data},
  volume={3},
  number={3},
  pages={1--26},
  year={2025},
  publisher={ACM New York, NY, USA}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dataset		dataset
draw		draw
experiment		experiment
fig		fig
result		result
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
limeqo.ipynb		limeqo.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🟢🍋 LimeQO: Low Rank Learning for Offline Query Optimization

🌟 Overview

🚩Features:

🔗 Dataset

📂 Project Structure

🚀 Try LimeQO!

🧐 Detailed instructions on reproducing the results in the paper:

❤️ If you find our data, code, or the paper useful, please cite our paper 📑:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🟢🍋 LimeQO: Low Rank Learning for Offline Query Optimization

🌟 Overview

🚩Features:

🔗 Dataset

📂 Project Structure

🚀 Try LimeQO!

🧐 Detailed instructions on reproducing the results in the paper:

❤️ If you find our data, code, or the paper useful, please cite our paper 📑:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages