PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency

AI-generated keywords: Text-to-SQL

AI-generated Key Points

  • The paper introduces a two-stage Text-to-SQL framework called PET-SQL
  • The first stage utilizes reference-enhanced prompt representation and schema information to guide large language models in generating SQL queries
  • The second stage involves cross-consistency refinement using linked schema information to instruct the model in producing the final SQL query
  • PET-SQL achieves state-of-the-art results on the Spider benchmark with an execution accuracy of 87.6%
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhishuai Li, Xiang Wang, Jingjing Zhao, Sun Yang, Guoqing Du, Xiaoru Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

License: CC BY-NC-SA 4.0

Abstract: Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first introduce a novel prompt representation, called reference-enhanced representation, which includes schema information and randomly sampled cell values from tables to instruct LLMs in generating SQL queries. Then, in the first stage, question-SQL pairs are retrieved as few-shot demonstrations, prompting the LLM to generate a preliminary SQL (PreSQL). After that, the mentioned entities in PreSQL are parsed to conduct schema linking, which can significantly compact the useful information. In the second stage, with the linked schema, we simplify the prompt's schema information and instruct the LLM to produce the final SQL. Finally, as the post-refinement module, we propose using cross-consistency across different LLMs rather than self-consistency within a particular LLM. Our methods achieve new SOTA results on the Spider benchmark, with an execution accuracy of 87.6%.

Submitted to arXiv on 13 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.09732v1

, , , , The paper "PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency" introduces a novel approach to enhancing the performance of Text-to-SQL systems. Recent advancements in this field have focused on leveraging large language models for in-context learning, but they often struggle with verbose database information and complex user intentions. To address these challenges, the authors propose a two-stage framework that utilizes a reference-enhanced prompt representation in the first stage and cross-consistency refinement in the second stage. The first stage includes schema information and randomly sampled cell values from tables to guide large language models in generating SQL queries, using question-SQL pairs as few-shot demonstrations. The entities mentioned in the preliminary SQL query (PreSQL) are then parsed for schema linking, which helps condense useful information. In the second stage, with the linked schema, the prompt's schema information is simplified to instruct the model in producing the final SQL query. This innovative approach not only addresses existing challenges but also achieves state-of-the-art results on the Spider benchmark with an execution accuracy of 87.6%. Overall, PET-SQL demonstrates significant improvements in performance through its two-stage approach and cross-consistency refinement technique.
Created on 15 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.