MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering

AI-generated keywords: MultiTabQA Tabular QA Semantic Parsing Neural Models Table Reasoning

AI-generated Key Points

Recent advances in tabular question answering (QA) have been limited to answering questions over a single table, which constrains their coverage and does not involve common table operations such as set operations, Cartesian products (joins), or nested queries.
Two major directions have been explored to address this gap: semantic parsing-based techniques and end-to-end neural models.
The proposed model is called MultiTabQA, which answers questions over multiple tables and generates tabular answers.
A pre-training dataset comprising 132,645 SQL queries and tabular answers was built for effective training.
MultiTabQA outperforms state-of-the-art single-table QA models adapted to multi-table QA settings by finetuning on three datasets (Spider, Atis, and GeoQuery) in terms of accuracy and generalization ability.
The approach is particularly useful for non-normalized tables from sources other than relational databases such as web tables or tables in text documents.
Novel evaluation metrics that assess the quality of generated tables at different levels of granularity based on their structural properties such as column names and data types associated with the generated values are introduced.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Vaishali Pal, Andrew Yates, Evangelos Kanoulas, Maarten de Rijke

arXiv: 2305.12820v2 - DOI (cs.CL)

Accepted at ACL-2023

License: CC BY-NC-SA 4.0

Abstract: Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table. However, real-world queries are complex in nature, often over multiple tables in a relational database or web page. Single table questions do not involve common table operations such as set operations, Cartesian products (joins), or nested queries. Furthermore, multi-table operations often result in a tabular output, which necessitates table generation capabilities of tabular QA models. To fill this gap, we propose a new task of answering questions over multiple tables. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers. To enable effective training, we build a pre-training dataset comprising of 132,645 SQL queries and tabular answers. Further, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. MultiTabQA outperforms state-of-the-art single table QA models adapted to a multi-table QA setting by finetuning on three datasets: Spider, Atis and GeoQuery.

Submitted to arXiv on 22 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.12820v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

Recent advances in tabular question answering (QA) have been limited to answering questions over a single table, which constrains their coverage and does not involve common table operations such as set operations, Cartesian products (joins), or nested queries. To address this gap, researchers have explored two major directions: semantic parsing-based techniques that transform natural language questions into logical forms used to query a relational database, and end-to-end neural models that combine question understanding with table reasoning. Our work focuses on the latter direction and proposes a new task of answering questions over multiple tables using our model called MultiTabQA. This model not only answers questions over multiple tables but also generates tabular answers. To enable effective training, we build a pre-training dataset comprising 132,645 SQL queries and tabular answers. Furthermore, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. Compared to state-of-the-art single-table QA models adapted to multi-table QA settings by finetuning on three datasets (Spider, Atis, and GeoQuery), MultiTabQA outperforms them in terms of accuracy and generalization ability. Our approach is particularly useful for non-normalized tables from sources other than relational databases such as web tables or tables in text documents. Moreover, our work highlights the importance of generating accurate tabular outputs for multi-table QA tasks since they often result in a tabular output. We introduce novel evaluation metrics that assess the quality of generated tables at different levels of granularity based on their structural properties such as column names and data types associated with the generated values. In summary, our proposed MultiTabQA model addresses the limitations of existing tabular QA systems by enabling accurate answer generation for complex multi-table queries from diverse sources beyond relational databases while providing an effective training methodology and novel evaluation metrics for generated tables.

- Recent advances in tabular question answering (QA) have been limited to answering questions over a single table, which constrains their coverage and does not involve common table operations such as set operations, Cartesian products (joins), or nested queries.
- Two major directions have been explored to address this gap: semantic parsing-based techniques and end-to-end neural models.
- The proposed model is called MultiTabQA, which answers questions over multiple tables and generates tabular answers.
- A pre-training dataset comprising 132,645 SQL queries and tabular answers was built for effective training.
- MultiTabQA outperforms state-of-the-art single-table QA models adapted to multi-table QA settings by finetuning on three datasets (Spider, Atis, and GeoQuery) in terms of accuracy and generalization ability.
- The approach is particularly useful for non-normalized tables from sources other than relational databases such as web tables or tables in text documents.
- Novel evaluation metrics that assess the quality of generated tables at different levels of granularity based on their structural properties such as column names and data types associated with the generated values are introduced.

There is a new way to answer questions about tables that can use more than one table. This is called MultiTabQA. It works better than other ways that can only use one table. They made a big list of questions and answers to help teach the computer how to do this well. This is helpful for tables that are not organized in a normal way, like ones on websites or in documents. They also made new ways to check if the answers are good or not." Definitions- Tabular question answering (QA): A type of computer program that helps people find information in tables. - Semantic parsing-based techniques: A way for computers to understand what people mean when they ask questions. - End-to-end neural models: A type of computer program that uses artificial intelligence to learn how to do things by itself. - Cartesian products (joins): A way of combining two or more tables into one big table. - Nested queries: A type of question that asks about information within another question's answer. - Pre-training dataset: A large set of questions and answers used to teach a computer program how to do something. - SQL queries: A type of language used to talk with databases and get information from them. - Accuracy: How correct something is. - Generalization ability: How well something can work with different types of information or situations. - Non-normalized tables: Tables that are not organized in a usual or expected way. - Web tables: Tables found on websites. - Evaluation metrics

Recent Advances in Tabular Question Answering (QA)

The field of tabular question answering has seen limited advances in recent years, with existing models only able to answer questions over a single table. This limitation restricts the coverage of these systems and does not involve common table operations such as set operations, Cartesian products (joins), or nested queries. To address this gap, two major directions have been explored: semantic parsing-based techniques that transform natural language questions into logical forms used to query a relational database, and end-to-end neural models that combine question understanding with table reasoning.

MultiTabQA: A Model for Answering Questions Over Multiple Tables

Our work focuses on the latter direction and proposes a new task of answering questions over multiple tables using our model called MultiTabQA. This model not only answers questions over multiple tables but also generates tabular answers. To enable effective training, we build a pre-training dataset comprising 132,645 SQL queries and tabular answers. Furthermore, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. Compared to state-of-the-art single-table QA models adapted to multi-table QA settings by finetuning on three datasets (Spider, Atis, and GeoQuery), MultiTabQA outperforms them in terms of accuracy and generalization ability.

Benefits Beyond Relational Databases

Our approach is particularly useful for non-normalized tables from sources other than relational databases such as web tables or tables in text documents. Moreover, our work highlights the importance of generating accurate tabular outputs for multi-table QA tasks since they often result in a tabular output. We introduce novel evaluation metrics that assess the quality of generated tables at different levels of granularity based on their structural properties such as column names and data types associated with the generated values.

Conclusion

In summary, our proposed MultiTabQA model addresses the limitations of existing tabular QA systems by enabling accurate answer generation for complex multi-table queries from diverse sources beyond relational databases while providing an effective training methodology and novel evaluation metrics for generated tables

Created on 25 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

56.4%

Successive Prompting for Decomposing Complex Questions

cs.CL

54.3%

Generate rather than Retrieve: Large Language Models are Strong Context Gener…

cs.CL

54.2%

LLaMA: Open and Efficient Foundation Language Models

cs.CL

53.9%

PMC-LLaMA: Further Finetuning LLaMA on Medical Papers

cs.CL

53.6%

Adapting Pretrained Language Models for Solving Tabular Prediction Problems i…

cs.CL

53.1%

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

cs.CL

52.9%

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Un…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.