Sequence to Sequence Learning with Neural Networks
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le
- Paper title: "Sequence to Sequence Learning with Neural Networks"
- Deep Neural Networks (DNNs) have limitations in mapping sequences to sequences
- Proposed approach: end-to-end using multilayered Long Short-Term Memory (LSTM) networks
- One LSTM network maps input sequence into a fixed-dimensional vector
- Another deep LSTM network decodes target sequence from the vector
- Evaluation on English to French translation task using WMT-14 dataset
- LSTM translations achieve BLEU score of 34.7 on test set, even after penalizing for out-of-vocabulary words
- Comparison to strong phrase-based Statistical Machine Translation (SMT) system with BLEU score of 33.3 shows superiority of LSTM model
- Reranking hypotheses generated by SMT system increases LSTM's BLEU score further to 36.5, surpassing previous state-of-the-art performance
- LSTM model handles long sentences well and learns sensible phrase and sentence representations considering word order and remaining relatively invariant to active/passive voice constructions
- Reversing word order in source sentences improves LSTM's performance significantly; introduces short-term dependencies between source and target sentences making optimization easier
Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.