Textbooks Are All You Need II: phi-1.5 technical report

AI-generated keywords: Small Language Model (LLM) GPT-4 Synthetic Data Common Sense Reasoning Toxic Content Generation

AI-generated Key Points

  • Investigation into the power of smaller Transformer-based language models
  • Building upon the success of previous models such as TinyStories and phi-1
  • Exploring how small a Large Language Model (LLM) can be while still achieving high levels of capability
  • Questions about the necessity of large-scale models like GPT-4
  • Cost, energy consumption, and controllability concerns with large models
  • Tackling the challenging task of common sense reasoning in natural language
  • Introduction of phi-1.5, a 1.3 billion parameter model trained on a dataset of 30 billion tokens
  • Comparable benchmark results in common sense reasoning to larger models trained on larger datasets
  • Dataset consists almost exclusively of synthetically generated data
  • Synthetic data generation implications for controlling toxic and biased content generation with LLMs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

License: CC BY 4.0

Abstract: We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

Submitted to arXiv on 11 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.05463v1

In this work, we continue the investigation into the power of smaller Transformer-based language models. We build upon the success of previous models such as TinyStories and phi-1, which demonstrated the ability to generate coherent English and achieve state-of-the-art Python coding performance, respectively. Our goal is to explore how small a Large Language Model (LLM) can be while still achieving high levels of capability. The latest generation of LLMs, exemplified by GPT-4, has shown remarkable improvements over their predecessors. However, these models are extremely large in scale, with trillions of parameters and tokens. This raises important questions about the necessity of such scale for achieving advanced capabilities. The cost of training and maintaining large models is substantial, and concerns about energy consumption and controllability also come into play. By investigating whether similar capabilities can be achieved at a smaller scale, we aim to provide insights into intelligent system architectures and development. Previous research has focused on tasks like fluent English speaking and coding simple functions in Python. In this work, we tackle the challenging task of common sense reasoning in natural language. Common sense reasoning has long been a difficult task for AI systems. We introduce phi-1.5, a new 1.3 billion parameter model trained on a dataset of 30 billion tokens. Remarkably, phi-1.5 achieves benchmark results in common sense reasoning that are comparable to models ten times its size trained on datasets more than ten times larger. One key aspect of our approach is that our dataset consists almost exclusively of synthetically generated data. This follows the methodology proposed in previous work on coding tasks using LLMs. Synthetic data generation has important implications for controlling toxic and biased content generation with LLMs—a notorious challenge in this field.
Created on 18 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.