SQL-IQ: A Unified Benchmark for Holistic SQL Intelligence Evaluation

Overview

SQL-IQ is a unified evaluation framework for assessing the holistic SQL intelligence of large language models (LLMs). While current evaluation paradigms heavily focus on single-turn Text-to-SQL generation, real-world database management demands a significantly broader set of capabilities, including error classification, cross-dialect translation, and deep query comprehension.

SQL-IQ systematically evaluates models across four core dimensions of SQL intelligence, encompassing seven distinct tasks:

Dimension	Task	Description
Generation	Text-to-SQL	Translate natural language questions into SQL queries
Generation	Conversational SQL	Multi-turn context-aware SQL generation
Comprehension	SQL Equivalence Judge	Determine if two SQL queries are semantically equivalent
Comprehension	SQL Judge	Select the correct SQL from two candidates
Debugging	SQL Error Classification	Detect and classify error types in SQL queries
Debugging	SQL Debugging†	Fix buggy SQL queries given error messages
Adaptation	SQL Translation	Cross-dialect SQL translation (Oracle, PostgreSQL, ClickHouse, Druid, MSSQL)

†SQL Debugging is sourced from BIRD-CRITIC. Please refer to their repository for data and evaluation code.

Installation

Requirements

Python >= 3.9
An LLM serving endpoint compatible with the OpenAI API format (e.g., vLLM, TGI, or OpenAI API)

Install Dependencies

# Clone the repository
git clone https://github.com/SQL-IQ/SQL-IQ.git
cd SQL-IQ

# Install core dependencies
pip install -r requirements.txt

Note: Database drivers (psycopg2, oracledb, pymssql, clickhouse-connect) are only required for the SQL Translation task's execution-based evaluation. If you only need tasks without SQL execution (sql_judge, sql_equ_judge, sql_err_class), core dependencies are sufficient.

Database Setup (for execution-based evaluation)

Tasks that require SQL execution (text2sql, conversational_sql, sql_trans) depend on the BIRD benchmark's SQLite database files.

Download BIRD databases:

Visit https://bird-bench.github.io/ and download the dev databases.
Extract the database files to a local directory, e.g.:

/path/to/bird_databases/
├── california_schools/
│   └── california_schools.sqlite
├── card_games/
│   └── card_games.sqlite
├── ...

Update configs/tasks_config.yaml — replace all <YOUR_BIRD_DB_DIR> placeholders with the actual path:

text2sql:
  db_dir: /path/to/bird_databases

conversational_sql:
  db_dir: /path/to/bird_databases

sql_trans:
  db_dir: /path/to/bird_databases

For SQL Translation task: The execution-based evaluation requires the 5 target dialect databases (Oracle, PostgreSQL, ClickHouse, MSSQL, Druid) to be running. See db_setup/README.md for Docker configurations and data migration scripts.

Usage

Quick Start

# Run a single task
python -m sql_iq \
    --tasks_config configs/tasks_config.yaml \
    --tasks text2sql \
    --api_base http://localhost:8000/v1 \
    --model_name your-model-name \
    --api_key your-api-key \
    --run_name my_experiment

# Run multiple tasks
python -m sql_iq \
    --tasks_config configs/tasks_config.yaml \
    --tasks text2sql,sql_judge,sql_equ_judge \
    --api_base http://localhost:8000/v1 \
    --model_name your-model-name \
    --api_key your-api-key

# Run all tasks
python -m sql_iq \
    --tasks_config configs/tasks_config.yaml \
    --tasks text2sql,conversational_sql,sql_judge,sql_equ_judge,sql_err_class,sql_trans \
    --api_base http://localhost:8000/v1 \
    --model_name your-model-name \
    --api_key your-api-key

Output Structure

Results are saved under the results/ directory:

results/<run_name>/
├── text2sql/
│   ├── predictions.jsonl      # Model predictions with checkpointing
│   └── metrics.json           # Evaluation metrics
├── sql_judge/
│   ├── predictions.jsonl
│   └── metrics.json
└── ...

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
data		data
db_setup		db_setup
docs		docs
prompts		prompts
scripts		scripts
sql_iq		sql_iq
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL-IQ: A Unified Benchmark for Holistic SQL Intelligence Evaluation

Overview

Installation

Requirements

Install Dependencies

Database Setup (for execution-based evaluation)

Usage

Quick Start

Output Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SQL-IQ: A Unified Benchmark for Holistic SQL Intelligence Evaluation

Overview

Installation

Requirements

Install Dependencies

Database Setup (for execution-based evaluation)

Usage

Quick Start

Output Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages