Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

AI-generated keywords: Large Language Models Structured Data Benchmark Self-Augmentation Tabular Tasks

AI-generated Key Points

  • Large language models (LLMs) are increasingly used as few-shot reasoners for Natural Language (NL)-related tasks
  • LLMs' ability to process structured data like tables is relatively unexplored
  • A benchmark has been introduced to evaluate LLMs' structural understanding capabilities through tasks such as cell lookup, row retrieval, and size detection
  • Performance of advanced LLM models like GPT-3.5 and GPT-4 varies based on input choices including table input format, content order, role prompting, and partition marks
  • "Self-augmentation" method proposed for effective structural prompting using internal knowledge within LLMs leads to significant improvements in performance on tabular tasks
  • Different table storage formats (CSV, JSON, XML, markdown, HTML) impact LLM comprehension abilities
  • Accurate partitioning of data is important for downstream tasks involving tabular datasets paired with external knowledge sources
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang

This paper has been accepted as a full paper at WSDM 2024. Explore the MS research blog of our work at https://www.microsoft.com/en-us/research/blog/improving-llm-understanding-of-structured-data-and-exploring-advanced-prompting-methods/
License: CC BY-NC-SA 4.0

Abstract: Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, the understanding of their capability to process structured data like tables remains an under-explored area. While tables can be serialized as input for LLMs, there is a lack of comprehensive studies on whether LLMs genuinely comprehend this data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs through seven distinct tasks, e.g., cell lookup, row retrieval and size detection. Specially, we perform a series of evaluations on the recent most advanced LLM models, GPT-3.5 and GPT-4 and observe that performance varied with different input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose $\textit{self-augmentation}$ for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact($\uparrow2.31\%$), HybridQA($\uparrow2.13\%$), SQA($\uparrow2.72\%$), Feverous($\uparrow0.84\%$), and ToTTo($\uparrow5.68\%$). We believe that our open source benchmark and proposed prompting methods can serve as a simple yet generic selection for future research. The code and data of this paper will be temporality released at https://anonymous.4open.science/r/StructuredLLM-76F3/README.md and will be replaced with an official one at https://github.com/microsoft/TableProvider later.

Submitted to arXiv on 22 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.13062v5

Large language models (LLMs) are increasingly being utilized as few-shot reasoners for Natural Language (NL)-related tasks. However, their ability to process structured data like tables remains a relatively unexplored area. While tables can be serialized as input for LLMs, there is a lack of comprehensive studies on whether LLMs truly comprehend this type of data. In an effort to address this gap, this paper introduces a benchmark designed to evaluate the structural understanding capabilities of LLMs through seven distinct tasks such as cell lookup, row retrieval, and size detection. The study focuses on evaluating the performance of advanced LLM models, specifically GPT-3.5 and GPT-4, across different input choices including table input format, content order, role prompting, and partition marks. The results reveal that the performance of LLMs varies based on these input choices. Drawing from insights gained through benchmark evaluations, the paper proposes a method called "self-augmentation" for effective structural prompting using internal knowledge within LLMs. By combining self-augmentation with carefully selected input choices, significant improvements in LLM performance on various tabular tasks such as TabFact (+2.31%), HybridQA (+2.13%), SQA (+2.72%), Feverous (+0.84%), and ToTTo (+5.68%) are observed. The open-source benchmark and proposed prompting methods presented in this study offer valuable insights for future research in this domain. Additionally, the paper delves into the impact of different table storage formats (CSV, JSON, XML, markdown, HTML) on LLM comprehension abilities and explores the importance of accurate partitioning of data for downstream tasks involving tabular datasets paired with external knowledge sources. Overall,this study sheds light on the potential of large language models in understanding structured data like tables and provides valuable guidance for optimizing their performance in processing tabular information effectively.
Created on 30 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.