M+: Extending MemoryLLM with Scalable Long-Term Memory
AI-generated Key Points
- Introduction of M+, a memory-augmented model for enhancing long-term information retention in large language models (LLMs)
- Three stages in the training process: Continual Training of MemoryLLM, Long-Context Modeling with Long Documents, and Training with long-term memory
- Utilization of backbone model Llama-3.1-8B with memory tokens in each layer
- Training on short documents from the fineweb-edu dataset followed by longer documents to improve long-context modeling abilities
- Integration of long-term memory to enhance M+ further
- Experimental results showing M+ outperforming MemoryLLM and other baselines by extending knowledge retention capabilities significantly
- Evaluation on various benchmarks for long-context understanding and knowledge retention tasks, demonstrating superior performance
- Future work aimed at reducing CPU-GPU communication overhead for more efficient generation with M+
- Impact on education, research, and industry as well as concerns about AI safety, reliability, fairness, bias propagation, and ethical considerations
Authors: Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, Zexue He
Abstract: Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.