0% found this document useful (0 votes)

47 views16 pages

T2S Retrieval

SEEK-SQL is a novel multi-agent framework designed to enhance Text-to-SQL tasks by addressing knowledge-gap errors and enabling rapid adaptability in real-world environments. It consists of a Manager for planning and decomposition, along with three auxiliary agents: Selector for schema retrieval, Generator for SQL generation, and Sniffer for knowledge retrieval. The framework demonstrates state-of-the-art performance on datasets Spider and BIRD, achieving improved efficiency and robustness compared to existing methods.

Uploaded by

longlovefuture

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views16 pages

T2S Retrieval

Uploaded by

longlovefuture

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

SEEK-SQL: Self-Optimizing Agent for

Knowledge-Enhanced Text-to-SQL in Real-World

Scenario

Anonymous Submission

No Institute Given

Abstract. Deploying modern Text-to-SQL methods in real-world envi-

ronments faces two challenges: knowledge-gap errors caused by a database
environment lacking critical knowledge and the requirement for rapid
adaptability to unfamiliar environments. To meet these challenges, we
propose a novel knowledge-enhanced self-optimizing multi-agent frame-
work SEEK-SQL composed of four agents: one main Manager for plan-
ning and decomposition and three auxiliary agents: Selector for retrieval-
relevant database schema, Generator for SQL sentence generation and
Sniffer which firstly introduces knowledge retrieval module into the Text-
to-SQL framework. Also, we propose a new contrastive self-refine strat-
egy that not only fixes bugs but also repairs system defects through
batch-wise contrastive self-reflection, enhancing our framework’s gener-
alization. Overall, SEEK-SQL achieves a new state-of-art performance
among all in-context-learning methods on two datasets, Spider and BIRD,
with much less average token consumption compared with other multi-
agent frameworks. Such results validate our framework’s efficacy in solv-
ing real-world challenges and preparing for the next step of working in
real environments.

Keywords: Text-to-SQL · Multi-agent Framework.

1 Introduction

The objective of the Text-to-SQL task is to facilitate the automatic translation

of users’ natural language inquiries into SQL queries with database schema [31].
This technology liberates users from the necessity of SQL expertise, facilitating
their interaction with intricate database systems and thereby enabling them to
uncover significant insights, conduct efficient data analysis, reach well-founded
conclusions, produce reports grounded in data, and extract superior features for
machine learning purposes [19, 28]. Moreover, Text-to-SQL systems are instru-
mental in automating sophisticated data analytics and driving conversational
agents, thereby extending their utility beyond the confines of conventional data
retrieval [17]. With the relentless expansion of data, the capacity to efficiently
query databases without profound SQL knowledge is becoming ever more crucial
for a diverse array of applications.
2 Anonymous Submission

Text-to-SQL SQL with Knowledge Gap Error

Which countries' channels are playing SELECT T2.Country FROM TV_channel AS T1 JOIN Cartoon AS T2
some animation by todd casey? SQL ON T1.id = T2.Channel WHERE T2.Directed_by = 'todd caset'

User Missing Knowledge:

todd casey is a famous cartoon director
TV_Channel: id, series_name, Country, Language, ...
TV_series: id, Episode, Air_Data, Rating, Channel, ... Gold Sql
Database Catoon: id, Deirected_by, Written_by, Channel, ... SELECT T1.Country FROM TV_channel AS T1 JOIN Cartoon AS T2
ON T1.id = T2.Channel WHERE T2.Written_by = 'Todd Casey"

Fig. 1: A real-world example of the Text-to-SQL task and knowledge gap error
SQL

Unlike previous works using single-step or multi-step strategies with a single

LLM agent [5, 15], recent works are exploring using the multi-agent framework to
generate SQL sentences and refine detected errors [20, 2]. Due to the challenges
of generating complete correct answers in a single turn, usually, two straightfor-
ward approaches are taken into account: using an auxiliary agent with specially
finetuned LLM to fix SQL errors [2, 22, 20] and generating top-k candidates then
choosing the right one [22, 14]. However, when trying these approaches in a real-
world environment, we find two challenges that prevent the utilization of these
method. First, most SQL refinement methods only look for errors and try them
out based on the SQL answer and data table conditions, assuming that the
database environment is correct and complete. However, in real environments,
even in some datasets [8], the dataset environment usually does not provide
complete necessary information. The lack of necessary information may lead
to incorrect SQL answers, which we call knowledge-gap errors. Such errors
may compose up to 40% of total errors, as analyzed in Section 3.1, so how to
find and analyze external knowledge dynamically has become a key challenge
for real-world applications of Text-to-SQL. Second, using fine-tuned models to
correct errors or generate multiple candidates at a time exposed new drawbacks
in real environments. Fine-tuned models do not perform well in unfamiliar envi-
ronments, and frequently fine-tuning models in complex and changeable formal
environments will lead to a huge waste of computing power and very low effi-
ciency. At the same time, the existing algorithm only corrects SQL errors and
does not learn from experience, resulting in the same error occurring multi-
ple times. Generating multiple candidates at a time consumes more tokens in
the reasoning phase, which greatly increases the operating cost and reduces the
robustness in high-concurrency scenarios. Therefore, how to build an efficient,
agile, and highly versatile Text-to-SQL system has become another challenge in
the real environment.
Building on the real-world challenges mentioned above, we propose SEEK-
SQL, a knowledge-enhanced self-optimizing multi-agent framework for Text-to-
SQL tasks. To address knowledge-gap errors, we first introduce a knowledge
retrieval module into the Text-to-SQL framework. We proposed a new multi-
agent framework including Agent Manager, which is responsible for problem
decomposition and task allocation, and three other auxiliary agents: Selector,
SEEK-SQL 3

which is responsible for form extraction; Generator, which is responsible for

code generation; and Sniffer, a knowledge-enhanced agent with the ability to
retrieve and integrate external, local, and internal knowledge. In our framework,
Manager decomposes complex queries into a string of sub-queries, and uses in-
ternal reasoning to dynamically call other auxiliary agents to retrieve knowledge,
extract knowledge bases, and generate SQL when solving sub-queries. Moreover,
in order to expand the self-correction objects from SQL errors to the defects of
the system itself, we proposed a new self-correction algorithm. Different from
the previous correction method, we let the Generator and Manager self-correct
the errors of the current answer from back to front in the opposite order of gen-
eration. After the repair is completed, we adopted a batch-wise self-reflection
mechanism, allowing Manager to reflect on the generated guidelines in a batch
processing manner based on the historical data of the repair so as to achieve
learning of the error history and rapid adaptation to unfamiliar environments.
We present comprehensive evaluations on the efficacy of SEEK-SQL on two
datasets [29, 8] and two backbone LLM [9, 13]. Experiments show that our method,
compared with other ICL-based single-agent and multi-agent frameworks, achieves
a new SOTA on both datasets with much fewer tokens consumed than multi-
agent frameworks, showing our method’s efficacy on real-world challenges. Our
contribution can be summarized as follows:

– We define and highlight the key challenge in real-world Text-to-SQL scenar-

ios. To address the problem of knowledge-gap errors, we first introduce
the knowledge retrieval task into the Text-to-SQL framework and propose a
knowledge-enhanced framework SEEK-SQL.
– We propose a new self-refinement strategy that expands from SQL errors to
system defects. This strategy enables our framework’s self-optimization and
robustness in unfamiliar environments.
– We measure our framework on two datasets and two backbone LLM. Re-
sults prove our method’s efficacy with better overall performance and token
efficiency. All code, data, and prompts are available in anonymous Github
https://anonymous.4open.science/r/BC3F.

2 Related Works

Large language models (LLM) have driven significant progress in the Text-to-
SQL task, evolving from prompt engineering to multi-stage frameworks and,
more recently, multi-agent collaboration. Early work focused on leveraging high-
quality prompts to harness LLMs’ potential, as demonstrated by ACT-SQL
[30] and QDecomp [18], enhancing reasoning by incorporating chain-of-thought.
DAIL-SQL [5] further refined prompt design with systematic engineering. As
research progressed, multi-stage frameworks such as DIN-SQL [15] and DEA-
SQL [27] introduced task decomposition,applying tailored prompts to subtasks
while integrating error correction. This trend continued with sophisticated frame-
works like C3-SQL [4] and StructGPT [6], which further structured the process
4 Anonymous Submission

by incorporating database simplification,SQL generation, and verification into a

zero-shot learning pipeline.
More recently, works such as MAC-SQL [20] and SQLFixAgent [2] have
emerged, integrating multi-stage generation and self-correction modules to im-
prove efficiency and performance. Following the approach of previous work, our
work extends the multi-agent framework by incorporating unstructured knowl-
edge retrieval for the first time,enabling the combined retrieval of structured and
unstructured knowledge bases. Additionally, we have refined the self-correction
mechanism to align SQL fixing with the system’s self-evolution, thereby improv-
ing generalization performance.

3 Preliminaries

3.1 Definition of Knowledge-gap Errors

Fig. 2: Error number comparison before/after given external knowledge

Previous studies on SQL-fixing tasks have focused on defining SQL errors

based on their superficial manifestations rather than the deep-seated causes that
lead to the errors, such as mismatch errors [22] or semantic errors [2]. Our ex-
perience in practice has revealed that a significant portion of such errors stem
from a loss of essential knowledge. Fig 1 shows one example from [22]:The sys-
tem first gives the wrong answer as it doesn’t know the identity of Todd Casey.
When provided with external knowledge ’Todd Casey is a famous director’, the
system can revise the SQL to the correct answer. So we assume the existence
of a "knowledge-gap" error, defined as wrong SQL sentences because of a loss of
certain essential knowledge.
To confirm the existence of such error, we sampled 100 error cases generated
by GPT4o [13] from 2 widely used benchmarks, Spider [29] and BIRD [8], respec-
tively. After adding additional useful information to the query, we ask GPT4o
to regenerate the SQL again. Figure 2 shows the result, where we find nearly
40% errors are refined, proving the hypothesis of knowledge-gap error. Under
this premise, we focus on the research problem of "how to retrieve and introduce
external knowledge into real-world Text-to-SQL generation".
SEEK-SQL 5

3.2 Problem Definition

Given a triple X = (Q, S, D), where Q, S and D are natural language ques-
tion, database schema and external knowledge corpus, the database schema S
is defined as {T , C}, where T represents multiple tables {T1 , T2 , . . . , T|T | }and C
represents columns {C1 , C2 , . . . , C|C| }. The purpose of Text-to-SQL task is to gen-
erate the correct SQL Y corresponding to the question Q based on the database
schema S and external knowledge K retrieved from knowledge corpus D.

4 Methodology
4.1 Framework of SEEK-SQL

User
Query Anwser

Manager
Database

Sniffer
Query: Which countries' channels are playing some animation by todd casey?

SubQ1: Filter cartoons by todd casey

SubSchema1
Selector Knowledge: todd casey is a famous cartoon director

SQL1: SELECT * FROM Cartoon WHERE Dierected_by = 'Todd Casey' Generator

SubQ2: Filter cartoons whose channels are playing given cartoons

SubSchema2
Contrastive SQL2: SELECT T1.Country FROM TV_Channel AS T1 JOIN SQL1 AS T2
Self-refinement ON T1.id = T2.Channel
(Algorithm2)

Ans: SELECT T1.Country FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON

T1.id = T2.Channel WHERE T2.Written_by = 'Todd Casey'

Fig. 3: The overall structure of SEEK-SQL

To address the problem of knowledge-gap errors, we propose SEEK-SQL, a

novel multi-agent collaborative framework that innovatively integrates external
knowledge sources into the text2SQL process. SEEK-SQL comprises four agents
for SQL generation: a Selector for database schema extraction, a Sniffer for
external knowledge retrieval, a Generator for SQL sentence generation, and a
core Manager for task decomposition and planning. Unlike previous Text-to-
SQL works [1, 20, 26], we use a self-correction and self-evolution strategy instead
of a separate SQL-refining agent for error refinement, as described in Section
4.2. Algorithm 1 shows the generation process in SEEK-SQL, and the detailed
introduction of agents is presented below.
6 Anonymous Submission

Algorithm 1: SQL Generation Progress of SEEK-SQL

Input: Query q; Database S; Knowledge Corpus D
Output: SQL answer SQL
1 subQs = Manager.decompose(q, s);
2 SQLs = [];
3 for subQ in subQs do
4 k = Sniffer.search(subQ, S, D);
5 subS = Selector.select(subQ, S, K);
6 subSQL = Generator.generate(subQ, subS, k);
7 subSQL = Generator.selfCheck(subSQL);
8 SQLs.append(subSQL)
9 end
10 sql = SQLs.gather();
11 ok, err = Excute(sql, S);
12 if ok then
13 return sql
14 else
15 sql = SelfRefine(err, sql); // self-refinement in Algorithm 2
16 return sql
17 end

Selector is responsible for extracting the minimum sub-schema needed to solve

the problem from the overall database schema. Given the input triple X =
(Q, S, D) from Manager, and the external knowledge K retrieved from D. The
function of Selector can be described as follows:
S ′ = fSelector {Q, S, K} (1)
where S ′ is one sub-schema extracted from database schema S.
We utilize Agent Selector with similar motivations like [20]: Firstly, intro-
ducing excess irrelevant schema items in prompts boosts LLM’s likelihood of
generating extraneous SQL elements, leading to potential errors. Secondly, Us-
ing the entire database schema can cause excessively long context, raising API
costs and possibly exceeding LLM’s context length limit. As shown in Algorithm
1 Selector dynamically extracts different sub-schema according to different sub-
query decomposed by Manager, which further minimizes the scale of sub-schema.

Manager is the core agent responsible for planning and decomposing the gen-
eral process into a series of intermediate steps, including sub-queries and SQL
sentences. This process can be described as:
L
Y
P (Y | Q, S, D) = P Y j | Y <j ; Qj , S j , K| (2)
j=1

where Y is the final SQL answer, Q, S, D is the original complex question, dataset
schema, external knowledge corpus, Y <j is SQL sentences generated in previous
SEEK-SQL 7

steps, Y j , Qj , S j , K| is the SQL sentence, sub-query, sub-schema and external

knowledge generated by Generator, Manager, Selector and Sniffer respectively.
Like previous work [1, 26], we adopted chain-of-thought (CoT) [23] prompting
method and few-shot learning as Manager’s working pattern. To be specific, it
dynamically assesses the complexity of user queries, if they can be answered with
a simple SQL query, the SQL is generated directly. For more intricate questions,
it starts by generating SQL for the simplest sub-queries and then progressively
breaks them down into smaller sub-queries until it arrives at the final SQL that
corresponds to the original question. Moreover, in order to improve the general-
ization ability in held-out environments, we also proposed a new decomposition
example construction method Forward-Backward Decomposition Generation for
SEEK-SQL.

Forward-Backward Decomposition Generation For a given tuple (Q, S, D), which

represents the original complex question, dataset schema, external knowledge
corpus and the final SQL answer Y, the goal of decomposition is to generate a
sequence of (< Y 1 , Q1 , S 1 , K1 >, . . . , < Y t , Qt , S t , Kt >), where Y j , Qj , S j , K|
is the SQL sentence, sub-query, sub-schema and external knowledge used in the
j step. The commonly used method is to use LLM to generate examples for
each step from front to back so as to ensure the consistency of the final answer
[20]. However, this method may conclude redundant information in the final
example, which may cause potential semantic errors in the final SQL sentence
[2] and reduce external knowledge retrieval accuracy [25].
To avoid such shortcomings, we propose a new example construction method,
Forward-Backward Decomposition Generation. Specifically, first, we use LLM
(GPT-4o [13] in our experiment) to generate examples for each step from front
to back like common practice. Then, we use LLM to check each step from back
to the front, removing redundant information already included in former steps
from generated content, like repeatedly generated SQL sentences, useless dataset
parts, parts of the sub-query repeated in previous sub-queries, and so on. Finally,
we check the executability of the left sequence from front to back. The final
output of our method is a sequence in which each step only contains the minimum
necessary information and operation for the current step. Results in Ablation
Study 5.4 have validated the effectiveness of our method.

Sniffer is responsible for using various tools to retrieve and extract relevant
information from external knowledge sources to assist Manager in planning and
SQL generation. For a external knowledge corpus D, the function can be de-
scribed as:
K = fSnif f er {Q′ , S ′ , D, N } (3)
where Q′ , S ′ is the (sub-)query and (sub-)schema given by Manager, and N is
the optional additional information given by Manager through CoT progress.
There are multiple modes for external information retrieval. We choose and
utilize three modes in our system: In Local mode, Sniffer utilizes a light RAG
system with LlamaIndex [10] to search relevant information in a locally built
8 Anonymous Submission

knowledge base, which is mostly chosen in our real-world environment; in Open

World mode, Sniffer uses a search engine function to search relevant informa-
tion online and returns a summarized result, which is suitable for environments
lacking database information; in Close World mode, which is only suitable for
the highly sensitive environment, Sniffer only reflects and gives the information
based on its knowledge, which largely relies on its base model. Users can also
choose Hybrid model to make Sniffer comprehensively compare information from
all origins.

Generator is responsible for generating SQL sentences for the current sub-
query based on the previous sub-query and the previous SQL sentences as in
Equation 2. To avoid accumulating mistakes in each step, we adopted a light
self-refining strategy in generating progress, which asks Generator itself to check
the answer for syntax and table-mismatch errors after each step. Also, as Gen-
erator focuses on SQL generation, users can easily finetune SQL-focused code
generation models as its base model in real-world environments, thus further
reducing the difficulty of adapting to held-out environments.

4.2 Contrastive Self-refinement Strategy

Common approaches to fixing SQL bugs utilize single or multiple agents to

analyze SQL bugs and generate correct ones [2, 22]. However, such approaches
only focus on fixing SQL bugs in a single case without considering the underlying
systemic defects. This leads to frequent occurrence of the same type of errors,
leading to a large amount of wasted computing resources and time consumption.
To address this challenge, we propose a new self-refinement strategy targeting
not only SQL sentence fixing but also system-level evolution, which involves two
parts: Backward SQL Fixing and Batch-wise Contrastive Reflection, as shown
in Algorithm 2.

Backward SQL Fixing LLM-based Text-to-SQL system commonly faces two

sorts of errors: syntax errors and semantic errors. Syntax errors are execution
failures caused by incorrect syntax or spelling in the generated SQL, which are
mostly caused by the wrong behavior of Generator. Semantic errors, including
knowledge-gap errors, are often caused by redundant confusing sentences or lack
of understanding of the context, which can often execute smoothly but with
wrong and confusing outputs. Contrary to the generation order, we utilize a
backward two-step method to solve these errors:

Step 1: Syntax Error Checking First, we utilize Generator to review the wrong
SQL based on the error message. Generator will try to identify and repair syntax
errors and re-execute the answer. If the answer is still not correct or Generator
cannot identify any syntax error, this issue will be handed over to Manager and
activated in Step 2.
SEEK-SQL 9

Algorithm 2: Contrastive Self-refinement Strategy

Input: Query q; Database S; Knowledge Corpus D; Wrong SQL SQLerror
Output: SQL answer SQL
/* SQL fixing */
1 for count in [0, MaxTryTimes] do
2 err = SyntaxCheck(Generator, SQLerror );
3 if err! = ∅ then
4 sql = SyntaxRefine(Generator, SQLerror , err);
5 ok, err = Excute(sql, S);
6 if ok then break;
7 end
8 Solution = SemanticCheck(Manager, err, SQLerror );
9 sql = SemanticRefine(Solution, Manager, err, SQLerror );
10 ok, err = Excute(sql, S);
11 if ok then break;
12 end
/* Batch-wise Contrastive Reflection */
13 ReflectionBatch.append([CorrectTraj, WrongTraj]);
14 if Ref lectionBatch.size == SettledSize then
15 Guideline = ContrastiveReflection(Manager, ReflectionBatch);
16 GuidelineUpdate(Agents, Guideline);
17 end
18 MemoryBankUpdate(CorrectTraj);
19 return sql

Step 2: Semantic Error Repair When the error message is sent to Manager,
following [7], we utilize the rubber duck debugging method, where A programmer
goes through his code step-by-step to an unresponsive item (such as a rubber
duck) to aid in pinpointing and comprehending mistakes. Specifically, we ask
Manager to go through each line of SQL sentences to check its alignment with the
sub-query’s intention. When Manager detects mistakes, it calls different agents
to repair the mistakes. To be specific, the Manager calls Sniffer to find relevant
information when additional external information is needed to correct the SQL
answer; it calls Selector to re-extract the sub-schema when the existing sub-
schema does not match the query; and it calls itself to re-decompose the query
when certain sub-queries are not sufficiently detailed.

Batch-wise Contrastive Reflection To address the challenge of making the

Text-to-SQL system learn from bug-fixing progress and fix its system flaws, we
propose a new multi-agent self-reflection strategy through contrastive reasoning.
This strategy involves two steps:

Step 1: Positive and Negative Trajectory Batch Construction After Manager fixes
one error case, it will collect both the correct and wrong answer’s action trajec-
tories of this case including sub-queries and agent-calling history as positive and
10 Anonymous Submission

negative samples. Following [24], we adopt a batched training strategy, where

we sample n trajectories as a batch, with an equal split of positive and negative
trajectories (b/2 each) as two minibatches for contrastive reasoning. Experience
from [24] shows that such batched strategy can help generate more general and
comprehensive guidelines on the complex trajectories, including problem decom-
position and solutions to sub-queries for it targets not only individual samples
but a diverse set.

Step 2: Guideline Generation and Memory Bank Construction After construct-

ing the sample batch, Manager will compare the two minibatches based on their
key characteristics, attributing the performance gap to particular action in the
intricate trajectory, and then generating general instruction guidelines for each
agent to boost overall task performance. The instruction guideline will be con-
catenated after each agent prompts them to improve their performance in future
tasks.
Moreover, we manage a memory bank including most recent five correct
samples for agents, inspired by the human decision-making progresses where
recent past experiences are always referred [16]. Memory bank, including the
tuples of action sequences, instructions from Manager and performance of these
actions in recent cases, are given with task instruction and updated after each
case.

5 Experiments

5.1 Experiment Setup

Datasets We evaluate our framework on two popular benchmarks: Spider [29]

and BIRD [8].
Spider, a comprehensive dataset spanning 200 databases within 138 do-
mains, is extensively utilized to gauge the adaptability and generalization of
Text-to-SQL parsers against unfamiliar database schemas. It provides a sub-
stantial training set of 8,659 samples, a development set of 1,034 samples, and a
test set of 2,147 samples designed to challenge models in navigating diverse and
novel database structures.
BIRD, introduced by Alibaba DAMO Academy, stands as a novel bench-
mark for large-scale, real-world database grounded Text-to-SQL evaluation, en-
compassing 95 large-scale databases and high-quality Text-SQL pairs. With a
substantial data storage volume of 33.4GB across 37 professional domains, BIRD
differentiates itself from Spider by emphasizing the integration of external knowl-
edge reasoning to connect natural language queries with database content and
by addressing the novel challenges associated with SQL efficiency handling ex-
tensive databases.

Evaluation Metrics Adopting the evaluation frameworks from BIRD [8] and
Test-suite [31], we assess the performance of our text-to-SQL model in real-world
SEEK-SQL 11

scenarios with large databases using three key metrics: Exact Match Accuracy
(EM), Execution Accuracy (EX), and Valid Efficiency Score (VES). EM, as intro-
duced by Test-Suites [31], evaluates each SQL clause as a set, requiring a perfect
match between the predicted and reference query clauses without considering
values. EX, the proportion of queries where both predicted and ground-truth
execution results are identical, gauges the correctness of the query outcomes.
VES, introduced by BIRD [8], measures the efficiency of valid SQL queries, de-
fined as those whose result sets align with the ground-truth, thus considering
both the accuracy and the efficiency of the model-generated SQL.

Implementation Details In evaluation, we separately measure our framework

on two backbone LLMs: Deepseek-V3-671B [9] and GPT4o [13]. We also use
ChatGPT-3.5-turbo [13] in the token efficiency experiment. For Agent Sniffer,
we utilize E5-v2-base [21] and Flan-T5-base [3] as the embedding and generation
model of the light RAG system for Local mode; we also choose Web-search-pro
[11] with Google engine as search engine for Open World mode. The number
of few-shot examples, the size of the memory bank, and the batch size for con-
trastive reflection are all set to 5 to match other baselines. All experiments are
conducted on a server with one NVIDIA A100 40G GPU.

Baselines We choose two groups of previous works as baselines in our experi-

ment:
LLM-based methods use a single LLM with in-context-learning methods
to generate SQL sentences from given queries with one-stage or multi-stage rea-
soning. These methods include:DIN-SQL [15], DAIL-SQL [5], C3-SQL [4] and
ACT-SQL [30].
Multi-agent-based methods use various agents responsible for different
tasks to generate and refine SQL sentences. These methods include:MAC-SQL
[20], TOOL-SQL [22], SQLFixAgent [2], CHASE-SQL [14].

5.2 Overall Performance

Results on Spider Table 1 shows SEEK-SQL results with two different back-
bone LLMs and two different Sniffer modes on Spider benchmark. Results show
that SEEK-SQL outperforms all other baselines and achieves a new state-of-art,
confirming the effectiveness of our method. Moreover, our method has a far bet-
ter EX score than the performance of SQLFixAgent [2], while the EM score is
similar. This may be attributed to our efficient self-refinement strategy, which
allows some answers that were originally generated with errors to be correctly
executed and get correct results after being repaired, although these answers are
not completely the same as the golden answers. Also, compared with the perfor-
mance of Close World mode, SEEK-SQL performs better in Open World mode,
which shows that although the Spider dataset is relatively simple and equipped
with enough knowledge, it can still help relieve the problem of knowledge gap
errors with actively searching external knowledge online.
12 Anonymous Submission

Table 1: Evaluation of SEEK-SQL on Spider’s dev/test sets.

Dev Test
Methods
EM% EX% EM% EX%
DAIL-SQL[5] + GPT-4 + SC 68.7 83.6 66.0 86.6
C3[4] + ChatGPT 71.4 81.8 - 82.3
LLM-based
ACT-SQL[30] + GPT4 61.7 82.9 - -
DIN-SQL[15] + GPT4 60.0 85.3 - -
MAC-SQL[20] + GPT-4 23.5 86.8 19.3 82.8
SQLFixAgent[2] + ChatGPT 77.9 84.8 71.2 82.9
Multi-Agent-based
Tool-SQL[22] + GPT-4 - 86.9 - 85.6
CHASE-SQL[14] + Gemini-1.5-Pro - - - 87.6
SEEK-SQL(Close World) + Deepseek-V3 77.2 87.4 72.4 87.4
Ours SEEK-SQL(Open World) + Deepseek-V3 78.3 88.2 74.3 87.9
SEEK-SQL(Close World) + GPT4o 77.9 87.1 71.6 86.5

Table 2: Evaluation of SEEK-SQL on BIRD’s dev sets.

Dev
Methods
EX% VES%
DAIL-SQL[5] + GPT-4 54.76 56.08
LLM-based DIN-SQL[15] + GPT4 50.72 59.79
GPT4[13] 46.35 49.77
MAC-SQL[20] + GPT-4 59.39 66.39
Multi-Agent-based SQLFixAgent[2] + ChatGPT 58.67 62.19
CHASE-SQL[14] + Gemini-1.5-Pro 74.46 -
SEEK-SQL(Close World) + Deepseek-V3 72.15 68.85
Ours SEEK-SQL(Open World) + Deepseek-V3 74.53 69.72
SEEK-SQL(Close World) + GPT4o 73.72 68.81

Results on BIRD We further explore the performance of SEEK-SQL on

a larger and more complex benchmark, BIRD. Results in Table 2 show that
SEEK-SQL outperforms all other baselines and achieves a new state-of-art in
ICL methods. Compared with results in Spider, the performance gap between
SEEK-SQL in Close World mode and Open World mode is larger, indicating
the essence of external knowledge acquirement under complex realistic environ-
ments. Moreover, compared with other multi-agent self-refinement frameworks
like SQLFixAgent [2], even SEEK-SQL in Close World mode has a huge im-
provement (14.33%) on BIRD, achieves similar performance like CHASE-SQL
[14], which utilizes more complex agents in their framework. Such improvements
show the effectiveness of our contrastive self-reflection strategy, which continu-
ously learns from its failures and makes progress.

5.3 Efficiency Analysis

SEEK-SQL 13

(a) Dynamics curve of Execution Accuracy (b) Dynamics curve of average self-
(EX) refinement iterations needed for each query

Fig. 4: Optimization dynamics of SEEK-SQL agents on Spider dev set

Iteration Efficiency Figure 4 shows the optimization curve of EX and av-

erage self-refinement iterations needed for each query of SEEK-SQL on Spider
dev set. Impressively, SEEK-SQL agents show significant performance improve-
ments, e.g., from 53% to 74% on EX value, and average self-refinement iteration
drop from 3.7 to 1.8. This evidence strongly supports the effectiveness of our
contrastive self-reflection strategy as generation accuracy and self-refinement ef-
ficiency are rising. Additionally, our memory bank, which stores recent successful
trajectories, encourages agents to converge by the end of the optimization pro-
cess gradually.

(a) Average token consumption (b) Average EX value

Fig. 5: Token consumption and execution accuracy of Text-to-SQL methods on

Spider’s dev set sample.

Token Efficiency Figure 5 shows the average token consumption and execution
accuracy on a Spider’s dev set sample, including 100 queries. We compare our
method to three advanced methods, STRIKE [12], MAC-SQL, SQLFixAgent,
and DAIL-SQL, all powered by GPT-3.5-turbo for a fair comparison. Results
show that SEEK-SQL achieves the best performance with fewer tokens con-
sumed. Considering that the LLM APIs charge based on token number and the
14 Anonymous Submission

inference time is proportional to token lengths, this result shows that our method
is more efficient and cheaper. Moreover, even though during the first 20 iterations
our method has achieved similar performance and token efficiency with SQLFix-
Agent, SEEK-SQL consumed 45% fewer tokens and achieved better performance
in its last 20 iterations, benefiting from its fewer refinement iterations shown in
Fig 4b. SEEK-SQL’s final token efficiency is comparable to ICL-based method
DAIL-SQL, showing the system’s continuous evolution through contrastive re-
flection.

5.4 Ablation Study

Table 3: Ablation study of SEEK-SQL on Spider and BIRD dev set

Spider BIRD
Methods
EM% EX% EX% VES%
SEEK-SQL 78.3 88.2 74.53 69.72
w/o Sniffer 76.9 86.7 72.07 67.25
w/o FBD 77.8 87.2 73.83 68.27
w/o CRF 74.4 85.1 71.25 65.07

Table 3 shows the result of our ablation study of three main parts of our
framework. Starting from the complete framework, we separately remove Sniffer,
replace the forward-backward decomposition sample(FBD) with common GPT-
generated samples and disable the contrastive reflection self-refinement(CRF)
strategy. Results show that removing these parts will decrease performance in all
scenarios we tested, confirming the effectiveness and necessity of our framework.

6 Conclusion
In this paper, we highlight the key challenges of Text-to-SQL systems in real-
world scenarios. To address the challenges, we propose a new knowledge-enhanced
self-optimizing multi-agent framework SEEK-SQL, first introducing the knowl-
edge retrieval tasks into the Text-to-SQL framework. Results on Spider and
BIRD show that SEEK-SQL achieves a new SOTA performance with smaller
token consumption, benefiting from our system flaw repair mechanism brought
by contrastive self-refinement strategy, confirming its efficacy in addressing real-
world challenges.

References
1. Askari, A., Poelitz, C., Tang, X.: Magic: Generating self-correction guideline for
in-context text-to-sql. arXiv preprint arXiv:2406.12692 (2024)
SEEK-SQL 15

2. Cen, J., Liu, J., Li, Z., Wang, J.: Sqlfixagent: Towards semantic-accurate text-to-sql
parsing via consistency-enhanced multi-agent collaboration. In: AAAI (2025)
3. Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X.,
Dehghani, M., Brahma, S., et al.: Scaling instruction-finetuned language models.
Journal of Machine Learning Research 25(70), 1–53 (2024)
4. Dong, X., Zhang, C., Ge, Y., Mao, Y., Gao, Y., Lin, J., Lou, D., et al.: C3: Zero-shot
text-to-sql with chatgpt. arXiv preprint arXiv:2307.07306 (2023)
5. Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y.: Text-to-sql empowered by large
language models: A benchmark evaluation. CoRR abs/2308.15363 (2023)
6. Jiang, J., Zhou, K., Dong, Z., Ye, K., Zhao, X., Wen, J.R.: StructGPT: A general
framework for large language model to reason over structured data. In: Proceedings
of the 2023 Conference on Empirical Methods in Natural Language Processing.
pp. 9237–9251. Association for Computational Linguistics, Singapore (Dec 2023).
https://doi.org/10.18653/v1/2023.emnlp-main.574
7. Lee, C., Xia, C.S., Yang, L., Huang, J.t., Zhu, Z., Zhang, L., Lyu, M.R.: A
unified debugging approach via llm-based multi-agent synergy. arXiv preprint
arXiv:2404.17153 (2024)
8. Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R.,
Huo, N., Zhou, X., Chenhao, M., Li, G., Chang, K., Huang, F., Cheng, R., Li, Y.:
Can llm already serve as a database interface? a big bench for large-scale database
grounded text-to-sqls. In: Advances in Neural Information Processing Systems.
vol. 36, pp. 42330–42357. Curran Associates, Inc. (2023)
9. Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C.,
Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint:2412.19437 (2024)
10. Liu, J.: Llamaindex (2022). https://doi.org/10.5281/zenodo.1234
11. Liu, X., Qin, B., Liang, D., Dong, G., Lai, H., Zhang, H., Zhao, H., Iong, I.L.,
Sun, J., Wang, J., et al.: Autoglm: Autonomous foundation agents for guis. arXiv
preprint arXiv:2411.00820 (2024)
12. Nan, L., Zhao, Y., Zou, W., Ri, N., Tae, J., Zhang, E., Cohan, A., Radev, D.:
Enhancing text-to-SQL capabilities of large language models: A study on prompt
design strategies. In: Findings of the Association for Computational Linguistics:
EMNLP 2023. pp. 14935–14956. Association for Computational Linguistics, Sin-
gapore (Dec 2023). https://doi.org/10.18653/v1/2023.findings-emnlp.996
13. OpenAI, :, Hurst, A., Lerer, A., Goucher, A.P., et al, A.P.: Gpt-4o system card
(2024), https://arxiv.org/abs/2410.21276
14. Pourreza, M., Li, H., Sun, R., Chung, Y., Talaei, S., Kakkar, G.T., Gan, Y., Saberi,
A., Ozcan, F., Arik, S.O.: CHASE-SQL: Multi-path reasoning and preference op-
timized candidate selection in text-to-SQL. In: The Thirteenth International Con-
ference on Learning Representations (2025)
15. Pourreza, M., Rafiei, D.: Din-sql: Decomposed in-context learning of text-to-sql
with self-correction. arXiv preprint arXiv:2304.11015 (2023)
16. Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: lan-
guage agents with verbal reinforcement learning. In: Proceedings of the 37th Inter-
national Conference on Neural Information Processing Systems. NIPS ’23, Curran
Associates Inc., Red Hook, NY, USA (2023)
17. Sun, R., Arik, S.Ö., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., Dai, H.,
Nakhost, H., Sinha, R., Wang, Z., et al.: Sql-palm: Improved large language model
adaptation for text-to-sql (extended). arXiv preprint arXiv:2306.00739 (2023)
18. Tai, C.Y., Chen, Z., Zhang, T., Deng, X., Sun, H.: Exploring chain of thought
style prompting for text-to-SQL. In: Proceedings of the 2023 Conference on Em-
16 Anonymous Submission

pirical Methods in Natural Language Processing. pp. 5376–5393. Association for

Computational Linguistics, Singapore (Dec 2023)
19. Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: Rat-sql: Relation-
aware schema encoding and linking for text-to-sql parsers. arXiv preprint
arXiv:1911.04942 (2019)
20. Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Chai, L., Yan, Z., Zhang, Q.W., Yin,
D., Sun, X., Li, Z.: Mac-sql: A multi-agent collaborative framework for text-to-sql.
arXiv preprint (2024)
21. Wang, L., Yang, N., Huang, X., Jiao, B., Yang, L., Jiang, D., Majumder, R., Wei,
F.: Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint
arXiv:2212.03533 (2022)
22. Wang, Z., Zhang, R., Nie, Z., Kim, J.: Tool-assisted agent on sql inspection and
refinement in real-world scenarios. arXiv preprint (2024)
23. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E.H., Le,
Q.V., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language
models. In: Proceedings of the 36th International Conference on Neural Information
Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2022)
24. Wu, S., Zhao, S., Huang, Q., Huang, K., Yasunaga, M., Cao, K., Ioannidis, V.,
Subbian, K., Leskovec, J., Zou, J.Y.: Avatar: Optimizing llm agents for tool usage
via contrastive reasoning. In: Advances in Neural Information Processing Systems.
vol. 37, pp. 25981–26010. Curran Associates, Inc. (2024)
25. Xia, Z., Wu, Y., Xia, Y., Nguyen, C.T.: Momentum posterior regularization for
multi-hop dense retrieval. In: Proceedings of the 31st International Conference on
Computational Linguistics. pp. 8255–8271. Association for Computational Linguis-
tics, Abu Dhabi, UAE (Jan 2025), https://aclanthology.org/2025.coling-main.550/
26. Xie, W., Wu, G., Zhou, B.: Mag-sql: Multi-agent generative approach with soft
schema linking and iterative sub-sql refinement for text-to-sql. arXiv preprint
arXiv:2408.07930 (2024)
27. Xie, Y., Jin, X., Xie, T., Matrixmxlin, M., Chen, L., Yu, C., Lei, C., Zhuo, C., Hu,
B., Li, Z.: Decomposition for enhancing attention: Improving LLM-based text-
to-SQL through workflow paradigm. In: Findings of the Association for Compu-
tational Linguistics: ACL 2024. pp. 10796–10816. Association for Computational
Linguistics, Bangkok, Thailand (Aug 2024)
28. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React:
Synergizing reasoning and acting in language models. In: The Eleventh Interna-
tional Conference on Learning Representations (2023)
29. Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q.,
Roman, S., Zhang, Z., Radev, D.: Spider: A large-scale human-labeled dataset for
complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings
of the 2018 Conference on Empirical Methods in Natural Language Processing. pp.
3911–3921. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov
2018). https://doi.org/10.18653/v1/D18-1425
30. Zhang, H., Cao, R., Chen, L., Xu, H., Yu, K.: ACT-SQL: In-context learning for
text-to-SQL with automatically-generated chain-of-thought. In: Findings of the As-
sociation for Computational Linguistics: EMNLP 2023. pp. 3501–3532. Association
for Computational Linguistics, Singapore (Dec 2023)
31. Zhong, R., Yu, T., Klein, D.: Semantic evaluation for text-to-SQL with distilled
test suites. In: Proceedings of the 2020 Conference on Empirical Methods in Nat-
ural Language Processing (EMNLP). pp. 396–411. Association for Computational
Linguistics, Online (Nov 2020). https://doi.org/10.18653/v1/2020.emnlp-main.29

HCteam IT Proposal
No ratings yet
HCteam IT Proposal
15 pages
SQLPa LM
No ratings yet
SQLPa LM
61 pages
Final Paper
No ratings yet
Final Paper
16 pages
Datrics Text 2 SQL
No ratings yet
Datrics Text 2 SQL
29 pages
Few-Shot Text-to-SQL Translation Using Structure
No ratings yet
Few-Shot Text-to-SQL Translation Using Structure
28 pages
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
Large Language Model Enhanced Text-to-SQL Generation - A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation - A Survey
18 pages
Solid-SQL Enhanced Schema-Linking Based In-Context Learning For
No ratings yet
Solid-SQL Enhanced Schema-Linking Based In-Context Learning For
11 pages
STaR SQL Self Taught Reasoner For Text To SQL
No ratings yet
STaR SQL Self Taught Reasoner For Text To SQL
11 pages
Chase SQL
No ratings yet
Chase SQL
30 pages
Structure-Guided Large Language Models For
No ratings yet
Structure-Guided Large Language Models For
24 pages
A Survey of Large Language Model-Based Generative AI For Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges
No ratings yet
A Survey of Large Language Model-Based Generative AI For Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges
7 pages
LLM Model Transform For Short Term Trading On Commodity
No ratings yet
LLM Model Transform For Short Term Trading On Commodity
7 pages
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
No ratings yet
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
19 pages
In Context Reinforcement Learning Based Retrieval Augmented Generation For Text To SQL
No ratings yet
In Context Reinforcement Learning Based Retrieval Augmented Generation For Text To SQL
8 pages
Genedit: Compounding Operators and Continuous Improvement To Tackle Text-To-Sql in The Enterprise
No ratings yet
Genedit: Compounding Operators and Continuous Improvement To Tackle Text-To-Sql in The Enterprise
9 pages
2024 Lrec-Main 539
No ratings yet
2024 Lrec-Main 539
19 pages
670e4e23bdd7d170839060aa2023 Findings-Emnlp 227
No ratings yet
670e4e23bdd7d170839060aa2023 Findings-Emnlp 227
32 pages
FYP Idea Proposal Meow Group
No ratings yet
FYP Idea Proposal Meow Group
4 pages
CHESS: Contextual Harnessing For Efficient SQL Synthesis: Shayan Talaei Mohammadreza Pourreza
No ratings yet
CHESS: Contextual Harnessing For Efficient SQL Synthesis: Shayan Talaei Mohammadreza Pourreza
39 pages
Enhancing Text-To-SQL Capabilities of Large Language Models
No ratings yet
Enhancing Text-To-SQL Capabilities of Large Language Models
22 pages
Project Final 1
No ratings yet
Project Final 1
55 pages
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
No ratings yet
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
14 pages
Ijctt V72i12p103
No ratings yet
Ijctt V72i12p103
8 pages
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
11 pages
Syntax and Relation Enhanced Query Generation For
No ratings yet
Syntax and Relation Enhanced Query Generation For
12 pages
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
No ratings yet
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
18 pages
Mac SQL
No ratings yet
Mac SQL
18 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
17 pages
24 Data Centric Text To SQL Wi
No ratings yet
24 Data Centric Text To SQL Wi
6 pages
Pet-Sql:: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL With Cross-Consistency
No ratings yet
Pet-Sql:: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL With Cross-Consistency
15 pages
Dusql
No ratings yet
Dusql
13 pages
1.1 Overview
No ratings yet
1.1 Overview
4 pages
Project Report - 7 - Merged
No ratings yet
Project Report - 7 - Merged
46 pages
10 LLM Based SQL Generation Wi
No ratings yet
10 LLM Based SQL Generation Wi
6 pages
DB GPT Hub 2024
No ratings yet
DB GPT Hub 2024
17 pages
From Natural Language To SQL Review of
No ratings yet
From Natural Language To SQL Review of
15 pages
The Death of Schema Linking? Text-to-SQL in The Age of Well-Reasoned Language Models
No ratings yet
The Death of Schema Linking? Text-to-SQL in The Age of Well-Reasoned Language Models
11 pages
Automatic Metadata Extraction For Text-to-SQL: Vladislav Shkapenyuk Divesh Srivastava Theodore Johnson
No ratings yet
Automatic Metadata Extraction For Text-to-SQL: Vladislav Shkapenyuk Divesh Srivastava Theodore Johnson
37 pages
Sequential Feature Augmentation For Robust Text-to-SQL
No ratings yet
Sequential Feature Augmentation For Robust Text-to-SQL
7 pages
Recent Advances in Text To SQL
No ratings yet
Recent Advances in Text To SQL
22 pages
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
No ratings yet
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
33 pages
Text-to-SQL Meets The Real-World: 1 4 A 1 B 1 C 1 4 D 1 e 4 F 1 G 1 4 H 2 I 3 J 1 4 K
No ratings yet
Text-to-SQL Meets The Real-World: 1 4 A 1 B 1 C 1 4 D 1 e 4 F 1 G 1 4 H 2 I 3 J 1 4 K
12 pages
Reasoning-SQL: Reinforcement Learning With SQL Tailored Partial Rewards For Reasoning-Enhanced Text-to-SQL
No ratings yet
Reasoning-SQL: Reinforcement Learning With SQL Tailored Partial Rewards For Reasoning-Enhanced Text-to-SQL
29 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
Catsql:: Towards Real World Natural Language To SQL Applications
No ratings yet
Catsql:: Towards Real World Natural Language To SQL Applications
14 pages
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
22 pages
LLM-Based Text-to-SQL Survey
No ratings yet
LLM-Based Text-to-SQL Survey
18 pages
Research Paper
No ratings yet
Research Paper
32 pages
NLQ 262290 5914375 NLQ
No ratings yet
NLQ 262290 5914375 NLQ
8 pages
Low-Code SQL for Developers
No ratings yet
Low-Code SQL for Developers
11 pages
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
No ratings yet
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
18 pages
1711 04436v1
No ratings yet
1711 04436v1
13 pages
Enhancing Text-To-SQL Translation For Financial System Design
No ratings yet
Enhancing Text-To-SQL Translation For Financial System Design
11 pages
Fuzzy Logic in Information Retrieval
No ratings yet
Fuzzy Logic in Information Retrieval
6 pages
247 Sqlnet Generating Structured Q
No ratings yet
247 Sqlnet Generating Structured Q
15 pages
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
No ratings yet
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
17 pages
Data Warehouse 16 Marks Answer
No ratings yet
Data Warehouse 16 Marks Answer
4 pages
Python Roadmap
No ratings yet
Python Roadmap
5 pages
Racsigb
No ratings yet
Racsigb
39 pages
Unit 4 Locking Based Protocol
No ratings yet
Unit 4 Locking Based Protocol
15 pages
Most Important Kali Linux Commands
No ratings yet
Most Important Kali Linux Commands
4 pages
WT Selected Slip
No ratings yet
WT Selected Slip
22 pages
Recoverable Database Schedules Guide
No ratings yet
Recoverable Database Schedules Guide
15 pages
Class12-Cs-Practical File (Final) PDF
60% (5)
Class12-Cs-Practical File (Final) PDF
74 pages
Module-2-Data Visualisation
No ratings yet
Module-2-Data Visualisation
30 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
17 pages
Progress OpenEdge To Fabric OneLake Transfer
No ratings yet
Progress OpenEdge To Fabric OneLake Transfer
13 pages
Module 1
No ratings yet
Module 1
15 pages
System Analysis and Design Lab Manual
No ratings yet
System Analysis and Design Lab Manual
18 pages
Domain Modeling in System Analysis
No ratings yet
Domain Modeling in System Analysis
49 pages
What's The Difference Between VAR A1 - A4 and VAR A1 - A4?
No ratings yet
What's The Difference Between VAR A1 - A4 and VAR A1 - A4?
5 pages
Data Warehouse Design
No ratings yet
Data Warehouse Design
7 pages
Python Bank Management System Guide
No ratings yet
Python Bank Management System Guide
7 pages
Foundation of Data Science (BSC) 1
No ratings yet
Foundation of Data Science (BSC) 1
64 pages
Data Engineers: Delta Lake vs Parquet
No ratings yet
Data Engineers: Delta Lake vs Parquet
13 pages
Sas Developer Resume
No ratings yet
Sas Developer Resume
2 pages
Mongo DB
No ratings yet
Mongo DB
3 pages
B.Tech CSE 8th Sem
No ratings yet
B.Tech CSE 8th Sem
10 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
Data Frame
No ratings yet
Data Frame
17 pages
Data Engineering Study Guide - Outline (Make A Copy - ) and Go From There)
No ratings yet
Data Engineering Study Guide - Outline (Make A Copy - ) and Go From There)
8 pages
Tech Quiz for IT Professionals
No ratings yet
Tech Quiz for IT Professionals
54 pages
Data Engineering & Analysis Training
No ratings yet
Data Engineering & Analysis Training
4 pages
Azure SQL Solutions for IT Pros
No ratings yet
Azure SQL Solutions for IT Pros
26 pages
Lecture 2 Ho PDF
No ratings yet
Lecture 2 Ho PDF
35 pages
Online Book Store Database
No ratings yet
Online Book Store Database
2 pages

T2S Retrieval

Uploaded by

T2S Retrieval

Uploaded by

SEEK-SQL: Self-Optimizing Agent for

Knowledge-Enhanced Text-to-SQL in Real-World

Abstract. Deploying modern Text-to-SQL methods in real-world envi-

Keywords: Text-to-SQL · Multi-agent Framework.

The objective of the Text-to-SQL task is to facilitate the automatic translation

Text-to-SQL SQL with Knowledge Gap Error

User Missing Knowledge:

Unlike previous works using single-step or multi-step strategies with a single

which is responsible for form extraction; Generator, which is responsible for

– We define and highlight the key challenge in real-world Text-to-SQL scenar-

by incorporating database simplification,SQL generation, and verification into a

3.1 Definition of Knowledge-gap Errors

Fig. 2: Error number comparison before/after given external knowledge

Previous studies on SQL-fixing tasks have focused on defining SQL errors

3.2 Problem Definition

SubQ1: Filter cartoons by todd casey

SQL1: SELECT * FROM Cartoon WHERE Dierected_by = 'Todd Casey' Generator

SubQ2: Filter cartoons whose channels are playing given cartoons

Ans: SELECT T1.Country FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON

Fig. 3: The overall structure of SEEK-SQL

To address the problem of knowledge-gap errors, we propose SEEK-SQL, a

Algorithm 1: SQL Generation Progress of SEEK-SQL

Selector is responsible for extracting the minimum sub-schema needed to solve

steps, Y j , Qj , S j , K| is the SQL sentence, sub-query, sub-schema and external

Forward-Backward Decomposition Generation For a given tuple (Q, S, D), which

knowledge base, which is mostly chosen in our real-world environment; in Open

4.2 Contrastive Self-refinement Strategy

Common approaches to fixing SQL bugs utilize single or multiple agents to

Backward SQL Fixing LLM-based Text-to-SQL system commonly faces two

Algorithm 2: Contrastive Self-refinement Strategy

Batch-wise Contrastive Reflection To address the challenge of making the

negative samples. Following [24], we adopt a batched training strategy, where

Step 2: Guideline Generation and Memory Bank Construction After construct-

5.1 Experiment Setup

Datasets We evaluate our framework on two popular benchmarks: Spider [29]

Implementation Details In evaluation, we separately measure our framework

Baselines We choose two groups of previous works as baselines in our experi-

5.2 Overall Performance

Table 1: Evaluation of SEEK-SQL on Spider’s dev/test sets.

Table 2: Evaluation of SEEK-SQL on BIRD’s dev sets.

Results on BIRD We further explore the performance of SEEK-SQL on

5.3 Efficiency Analysis

Fig. 4: Optimization dynamics of SEEK-SQL agents on Spider dev set

Iteration Efficiency Figure 4 shows the optimization curve of EX and av-

(a) Average token consumption (b) Average EX value

Fig. 5: Token consumption and execution accuracy of Text-to-SQL methods on

5.4 Ablation Study

Table 3: Ablation study of SEEK-SQL on Spider and BIRD dev set

pirical Methods in Natural Language Processing. pp. 5376–5393. Association for

You might also like