Long-range Language Modeling with Self-retrieval
AI-generated Key Points
- Retrieval-augmented language models (LMs) are gaining attention
- Retrieval-Pretrained Transformer (RPT) is proposed to jointly train a retrieval-augmented LM from scratch for long-range language modeling tasks
- RPT model computes query representations for recently generated text chunks in a long document and uses them to retrieve earlier chunks located potentially tens of thousands of tokens before
- Information from retrieved chunks is fused into the LM representations to predict the next target chunk
- The retriever component is trained with a semantic objective that aims to retrieve chunks that increase the probability of the next chunk according to a reference LM
- RPT model evaluated on four long-range language modeling tasks spanning books, code, and mathematical writing, where documents are generally long across all datasets
- RPT model has 12 layers with hidden dimension d=1024 and eight attention heads with head dimension 128; CCA is applied every two layers, and two neighbors are used unless mentioned otherwise.
- RPT demonstrates promising results in long range language modeling tasks by improving retrieval quality and perplexity compared to strong baselines.
Authors: Ohad Rubin, Jonathan Berant
Abstract: Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling long texts. Given a recently generated text chunk in a long document, the LM computes query representations, which are then used to retrieve earlier chunks in the document, located potentially tens of thousands of tokens before. Information from retrieved chunks is fused into the LM representations to predict the next target chunk. We train the retriever component with a semantic objective, where the goal is to retrieve chunks that increase the probability of the next chunk, according to a reference LM. We evaluate RPT on four long-range language modeling tasks, spanning books, code, and mathematical writing, and demonstrate that RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.