- 03/01/2025
- Expanded DW queries so that in total there are 209 queries.
- For a streamlined and rapid setup, we've translated all DW queries from Oracle SQL to MySQL dialect. A MySQL version of the DW database is also available in Google Drive. Please refer to Setup for more information.
- Each DW query comes with a SQL statement in both Oracle and MySQL dialects, allowing for benchmarking tasks such as SQL dialect translation.
- We are planning to release more queries and databases from a new source in the coming weeks.
- 01/19/2025: 191 queries and 463 tables across 6 databases are all available.
- 10/28/2024: A subset of 49 queries and 99 tables is available.
If you find our data or the paper helpful, please cite the paper
@article{chen2024beaver,
title={BEAVER: an enterprise benchmark for text-to-sql},
author={Chen, Peter Baile and Wenz, Fabian and Zhang, Yi and Yang, Devin and Choi, Justin and Tatbul, Nesime and Cafarella, Michael and Demiralp, {\c{C}}a{\u{g}}atay and Stonebraker, Michael},
journal={arXiv preprint arXiv:2409.02038},
year={2024}
}
Queries and databases are collected from two sources: DW and NW.
dev_xx.jsonincludes- natural language question, gold SQL statement
- tables and join keys used in the gold SQL statement
- column mappings from topics mentioned in the user question to columns used in the gold SQL statement.
- SQLs in
dev_dw.jsoninclude both MySQL and Oracle versions, making them useful for benchmarking tasks like dialect translation.
dev_tables.jsonincludes the table schema, and key-foreign-key relationships.- join keys for tables in database
dwcan be found indw_join_keys.json.
- join keys for tables in database
- Google drive folder contains the anonymized database content, provided in both CSV format and as a MySQL dump..
NWfolder includes data for each non-DWdatabase.
We have created a unified MySQL version for our dataset. A free MySQL installation can be found here. After the installation, import the MySQL dump files from the google drive to your local MySQL databases using
mysql -u root -p < `xxx.sql`
To execute a SQL statement, you can either log in to the MySQL interface or you can do it via mysql-connector-python.
If you want to use the Oracle version of DW queries, you can download the free oracle database and import the CSVs.
Your support in improving this dataset is greatly appreciated! If you have any questions or feedback, please send an email to peterbc@mit.edu or create an Issue.