Sentence Simplification Using Paraphrase Corpus for Initialization

AI-generated keywords: Neural sentence simplification Sequence-to-sequence framework Paraphrase corpus Initialization strategies NLP tasks

AI-generated Key Points

  • Neural sentence simplification methods based on the sequence-to-sequence framework are predominant for sentence simplification (SS)
  • Current methods are limited by the lack of parallel SS corpus
  • Proposed strategies use paraphrase corpus data to initialize neural SS methods and reduce reliance on parallel corpus
  • Paraphrase corpora contain many sentence pairs relevant to SS corpus
  • Constructing large-scale pseudo parallel SS data with higher complexity differences is feasible
  • Three neural SS methods were implemented: LSTM network with soft attention layer, Transformer model based on attention mechanisms, and BART model trained with denoising as pretraining objective
  • Experimental results show significant improvements in simplification performance using proposed initialization strategies for all three models
  • Enhanced performance seen across different conditions compared to models without initialization, as shown by evaluation metrics like SARI scores
  • Automatic sentence simplification is a complex NLP task aimed at making texts more accessible and enhancing various NLP tasks and systems
  • Various approaches including hand-crafted rules, supervised and unsupervised methods leveraging resources like English Wikipedia and Simple English Wikipedia are explored in related work
  • Neural machine translation techniques have potential in simplifying sentences through deletion and reordering operations
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kang Liu, Jipeng Qiang

arXiv admin note: substantial text overlap with arXiv:2109.00165
License: CC ZERO 1.0

Abstract: Neural sentence simplification method based on sequence-to-sequence framework has become the mainstream method for sentence simplification (SS) task. Unfortunately, these methods are currently limited by the scarcity of parallel SS corpus. In this paper, we focus on how to reduce the dependence on parallel corpus by leveraging a careful initialization for neural SS methods from paraphrase corpus. Our work is motivated by the following two findings: (1) Paraphrase corpus includes a large proportion of sentence pairs belonging to SS corpus. (2) We can construct large-scale pseudo parallel SS data by keeping these sentence pairs with a higher complexity difference. Therefore, we propose two strategies to initialize neural SS methods using paraphrase corpus. We train three different neural SS methods with our initialization, which can obtain substantial improvements on the available WikiLarge data compared with themselves without initialization.

Submitted to arXiv on 31 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.19754v1

Neural sentence simplification methods based on the sequence-to-sequence framework have emerged as the predominant approach for the task of sentence simplification (SS). However, these methods are currently constrained by the lack of parallel SS corpus. In this study, we propose two strategies to initialize neural SS methods using paraphrase corpus data in order to reduce reliance on parallel corpus. The motivation behind this approach stems from two key findings: firstly, that paraphrase corpora contain a significant number of sentence pairs that belong to SS corpus, and secondly, that constructing large-scale pseudo parallel SS data by retaining sentence pairs with higher complexity differences is feasible. We implement three different neural SS methods - an LSTM network with soft attention layer, a Transformer model based solely on attention mechanisms, and a BART model trained with denoising as pretraining objective - using the fairseq toolkit. Our experimental results demonstrate substantial improvements in simplification performance when utilizing our proposed initialization strategies for all three models. Evaluation metrics such as SARI scores show enhanced performance across different conditions compared to models without initialization. In related work, automatic sentence simplification is discussed as a complex NLP task aimed at making texts more accessible and enhancing various NLP tasks and systems. Various approaches including hand-crafted rules, supervised and unsupervised methods leveraging resources like English Wikipedia and Simple English Wikipedia are explored. Additionally, neural machine translation techniques have shown promise in simplifying sentences through deletion and reordering operations. Overall, our study presents a novel approach to initializing neural SS methods using paraphrase corpora to reduce dependence on parallel corpus data, resulting in improved performance in text simplification tasks.
Created on 15 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.