An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

AI-generated keywords: Automatic Text Summarization Long Documents Neural Architectures Benchmark Datasets Evaluation Metrics

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Automatic text summarization systems are important for condensing lengthy documents like academic articles and business reports.
  • Recent advancements in the field include the integration of neural architectures.
  • Challenges exist in extending these systems to long documents, highlighting the need for further research.
  • The survey provides an overview of research on long document summarization, covering benchmark datasets, models, and evaluation metrics.
  • The authors conduct an empirical analysis to assess current progress and suggest potential directions for future exploration.
  • This survey is a valuable resource for researchers and practitioners interested in automatic text summarization for long documents.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Huan Yee Koh, Jiaxin Ju, Ming Liu, Shirui Pan

Accepted for publication by ACM Computing Surveys

Abstract: Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

Submitted to arXiv on 03 Jul. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2207.00939v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics," authors Huan Yee Koh, Jiaxin Ju, Ming Liu, and Shirui Pan explore the significance of automatic text summarization systems for condensing lengthy documents such as academic articles and business reports. These systems play a crucial role in aiding readers' comprehension by extracting key information. The authors highlight recent advancements in this field, particularly with the integration of neural architectures. They also discuss challenges associated with extending these systems to long documents and emphasize the need for further research. This survey provides a comprehensive overview of research on long document summarization, focusing on benchmark datasets, summarization models, and evaluation metrics. The authors organize existing literature within this context and conduct an empirical analysis to assess current progress. Their findings suggest potential directions for future exploration to advance this technology. This survey serves as a valuable resource for researchers and practitioners interested in automatic text summarization for long documents.
Created on 09 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.