Abstract:
Text-to-SQL generation models, capable of converting natural language prompts into SQL queries, offer significant potential for streamlining data analytics tasks. Despite...Show MoreMetadata
Abstract:
Text-to-SQL generation models, capable of converting natural language prompts into SQL queries, offer significant potential for streamlining data analytics tasks. Despite state-of-the-art performance on popular academic benchmarks such as Spider [1], recent large language models, such as GPT-4, exhibit a considerable performance degradation on real-world applications with longer, more convoluted schemas [2]. This disparity raises questions about what factors contribute to this drop and whether existing academic benchmarks are effective for representing real-world challenges. To determine these factors, we first examine Text-to-SQL model failures on customer logs. We find that accuracy on customer logs was on average 30% lower than accuracy on Spider. We identify three main challenges in real-world Text-to-SQL applications: long context length, unclear question formulation, and greater query complexity. With these insights, we create a new benchmark built from manually labeled customer logs and evaluate existing open source and private LLMs to demonstrate the impact of each factor on model performance. The benchmark incorporates 20 non-join queries and 30 join queries, each accompanied by three additional question phrasing variations, resulting in 200 queries total. To capture the effects of large schemas, we vary schema size from 5 to over 300 columns while retaining the minimum columns required to answer all questions. We assess the performance of prominent Text-to-SQL models, including GPT-4, GPT-3.5, BigCode's Starcoder [3], and NSQL Llama-2 [4] on both our benchmark and the Spider benchmark for comparative analysis. We use Spider execution accuracy to measure model performance. The evaluation results reveal a) A consistent decline in execution accuracy for longer schemas, dropping about 0.5 percentage points for every additional 10 columns, indicating that existing Text-to-SQL models struggle with progressively larger tables and schema lengths that are character...
Date of Conference: 13-16 May 2024
Date Added to IEEE Xplore: 23 July 2024
ISBN Information: