LIMA: Less Is More for Alignment

AI-generated keywords: Language Model Pretraining Instruction Tuning LIMA Human Study

AI-generated Key Points

  • Large language models are trained in two stages: unsupervised pretraining and large scale instruction tuning and reinforcement learning.
  • Almost all knowledge in large language models is learned during pretraining, making limited instruction tuning data necessary for high-quality output.
  • LIMA, a 65B parameter LLaMa language model, was trained with only 1,000 carefully curated prompts and responses without any reinforcement learning or human preference modeling.
  • LIMA demonstrated strong performance in a controlled human study compared to other models.
  • Limitations include the mental effort required for constructing examples and the possibility of weak responses due to unlucky samples or adversarial prompts.
  • Scaling up input diversity and output quality has positive effects on alignment while scaling up quantity alone might not.
  • Fine-tuning a strong pretrained language model on carefully curated examples can produce remarkable results on a wide range of prompts with limited instruction tuning data necessary for producing high-quality output.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

License: CC BY 4.0

Abstract: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

Submitted to arXiv on 18 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.11206v1

Large language models are typically trained in two stages: unsupervised pretraining from raw text to learn general-purpose representations, and large scale instruction tuning and reinforcement learning to better align with end tasks and user preferences. A recent study has shown that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high-quality output. The study involved training LIMA, a 65B parameter LLaMa language model, with the standard supervised loss on only 1,000 carefully curated prompts and responses without any reinforcement learning or human preference modeling. LIMA demonstrated remarkably strong performance in a controlled human study; responses from LIMA were either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic was as high as 58% when compared to Bard and 65% versus DaVinci003 which was trained with human feedback. However, there are limitations associated with this approach such as the mental effort required for constructing examples which is difficult to scale up. In addition, an unlucky sample during decoding or an adversarial prompt can often lead to a weak response. The effects of training data diversity, quality and quantity were also investigated through ablation experiments; it was observed that scaling up input diversity and output quality had measurable positive effects on alignment while scaling up quantity alone might not. Overall, these findings suggest that fine-tuning a strong pretrained language model on carefully curated examples can produce remarkable results on a wide range of prompts with limited instruction tuning data necessary for producing high-quality output. However further research is needed to address some limitations associated with this approach.
Created on 07 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 2

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.