Many-Shot In-Context Learning

AI-generated keywords: Large Language Models (LLMs) few-shot learning many-shot learning in-context learning (ICL) natural language understanding

AI-generated Key Points

  • Large Language Models (LLMs) capabilities in few-shot and many-shot in-context learning (ICL)
  • Expanding context windows to include hundreds or thousands of examples leads to significant performance gains
  • Availability of human-generated examples can be a limiting factor for many-shot ICL
  • Introduction of Reinforced and Unsupervised ICL settings as alternatives to human examples
  • Reinforced ICL uses model-generated chain-of-thought rationales, while Unsupervised ICL prompts with domain-specific questions
  • Both Reinforced and Unsupervised ICL are effective for complex reasoning tasks in the many-shot regime
  • Many-shot learning overcomes pretraining biases and learns high-dimensional functions with numerical inputs
  • Next-token prediction loss may not always indicate downstream ICL performance reliably
  • Impact of scaling examples on abstractive summarization tasks using XSum dataset, showing improved performance up to 50 shots before deterioration
  • Comparison with models fine-tuned for summarization like PEGASUS and mT5, which show continuous improvement with more shots from XSum
  • Evaluation of commonsense planning abilities of LLMs in Logistics domain, showing promise in generating simple plans within cities using trucks and airplanes through many-shot ICL
  • Training LLMs to learn code verifiers in-context through reward modeling enhances reasoning abilities, indicating potential improvements in commonsense planning abilities
  • Many-shot learning enhances LLM performance across various tasks and domains, advancing natural language understanding and reasoning capabilities
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Stephanie Chan, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

License: CC BY 4.0

Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.

Submitted to arXiv on 17 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.11018v1

In this study, we explore the capabilities of Large Language Models (LLMs) in few-shot and many-shot in-context learning (ICL). We find that expanding context windows to include hundreds or thousands of examples in the many-shot regime leads to significant performance gains across various generative and discriminative tasks. However, the availability of human-generated examples can be a limiting factor for many-shot ICL. To address this limitation, we introduce two new settings: Reinforced and Unsupervised ICL. Reinforced ICL utilizes model-generated chain-of-thought rationales instead of human examples, while Unsupervised ICL prompts the model with domain-specific questions without any rationales. Our experiments show that both Reinforced and Unsupervised ICL are effective in the many-shot regime, particularly for complex reasoning tasks. Additionally, we demonstrate that many-shot learning is capable of overcoming pretraining biases and learning high-dimensional functions with numerical inputs. Interestingly, our analysis reveals that next-token prediction loss may not always be a reliable indicator of downstream ICL performance. Furthermore, we investigate the impact of scaling examples for ICL on abstractive summarization tasks using the XSum dataset. By increasing the number of in-context examples up to 50 shots, we observe improved performance before seeing a deterioration. In contrast, models fine-tuned for summarization such as PEGASUS and mT5 typically show continuous improvement with more shots from XSum. We also delve into commonsense planning abilities of LLMs by evaluating their performance on planning problems in the Logistics domain. Many-shot ICL shows promise in improving their ability to generate simple plans within cities using trucks and airplanes. Lastly, we explore reward modeling by training LLMs to learn code verifiers in-context. This approach aims to enhance reasoning abilities through test-time verification processes. Our results indicate potential improvements in commonsense planning abilities through many-shot ICL. Overall, our study highlights the effectiveness of many-shot learning in enhancing LLM performance across various tasks and domains, showcasing its potential for advancing natural language understanding and reasoning capabilities.
Created on 08 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.