Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
AI-generated Key Points
- Lay summarisation is important for making scientific texts more understandable for non-experts.
- Automatic approaches are needed to broaden access to scientific literature and facilitate interdisciplinary knowledge sharing.
- Current corpora for lay summarisation are limited, hindering effective data-driven approaches.
- Two new datasets have been introduced: PLOS (large-scale) and eLife (medium-scale), containing biomedical journal articles paired with expert-written lay summaries.
- The researchers thoroughly analyzed these summaries and benchmarked them using mainstream summarisation approaches.
- A manual evaluation with domain experts was conducted to demonstrate the utility of the lay summaries and highlight key challenges associated with lay summarisation.
- The code and datasets are available through a GitHub repository.
- Background information on PLOS as an open-access publisher hosting influential peer-reviewed journals across various scientific fields is provided, along with details about eLife as an open-access journal focusing on biomedical and life sciences.
- Analyses comparing technical abstracts with lay summaries from both datasets were conducted to quantify differences in readability, rhetorical structure, vocabulary sharing, and abstractiveness.
- The study contributes valuable insights into improving lay summarisation techniques by providing comprehensive datasets for further research in this area.
Authors: Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton
Abstract: Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.