PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (Technical Report)

AI-generated keywords: Approximate Query Processing

AI-generated Key Points

  • Authors Yuxuan Zhu, Tengjun Jin, Stefanos Baziotis, Chengsong Zhang, Charith Mendis, and Daniel Kang introduce two innovative techniques TAQA and BSAP to address challenges in approximate query processing (AQP).
  • TAQA is a two-stage online AQP algorithm that provides user-specified error guarantees, eliminates maintenance overheads, and avoids modifications to database management systems.
  • BSAP enables block-level sampling with statistical guarantees within the algorithm to enhance the efficiency of TAQA.
  • The authors develop a prototype middleware system called PilotDB to implement these techniques and achieve a priori error guarantees and substantial speedups on various DBMSs.
  • Evaluation of PilotDB on PostgreSQL, SQL Server, and DuckDB shows significant speedups of up to 126 times when running with a 5% guaranteed error.
  • Contributions include the proposal of TAQA for achieving error guarantees simultaneously (P1), development of BSAP for enabling block sampling for nested and join queries (P2), and construction/evaluation of PilotDB implementing both techniques (P3).
  • This research addresses limitations in existing literature related to approximate query processing by introducing novel algorithms and statistical techniques that improve performance while maintaining error guarantees across different DBMSs.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuxuan Zhu, Tengjun Jin, Stefanos Baziotis, Chengsong Zhang, Charith Mendis, Daniel Kang

SIGMOD 2025
23 pages, 19 figures
License: CC BY 4.0

Abstract: After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We simple ment TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.

Submitted to arXiv on 27 Mar. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2503.21087v1

In their technical report "PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees," authors Yuxuan Zhu, Tengjun Jin, Stefanos Baziotis, Chengsong Zhang, Charith Mendis, and Daniel Kang introduce two innovative techniques <b>TAQA and BSAP </b>to address challenges faced by existing methods in <b>approximate query processing (AQP)</b>. These techniques aim to provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. The first technique, <b>TAQA </b>, is a two-stage online AQP algorithm that successfully achieves all three properties for arbitrary queries. However, it may be slower than exact queries when using standard row-level sampling. To enhance the efficiency of TAQA, the authors also propose <b>BSAP </b>, which enables block-level sampling with statistical guarantees within the algorithm. To implement these techniques and achieve a priori error guarantees and substantial speedups on various DBMSs, the authors develop a prototype middleware system called <b>PilotDB </b>. This system is compatible with all database management systems supporting efficient block-level sampling. The evaluation of PilotDB on PostgreSQL, SQL Server, and DuckDB using real-world benchmarks demonstrates significant speedups of up to 126 times when running with a 5% guaranteed error. The contributions of this work include the proposal of TAQA for achieving error guarantees simultaneously (P1), the development of BSAP for enabling block sampling to answer approximate nested and join queries (P2), and the construction and evaluation of PilotDB implementing both techniques (P3). Overall, this research addresses key limitations in existing literature related to approximate query processing by introducing novel algorithms and statistical techniques that significantly improve performance while maintaining error guarantees across different database management systems.
Created on 28 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.