Textbooks Are All You Need II: phi-1.5 technical report

AI-generated keywords: Small Language Model (LLM) GPT-4 Synthetic Data Common Sense Reasoning Toxic Content Generation

AI-generated Key Points

Investigation into the power of smaller Transformer-based language models
Building upon the success of previous models such as TinyStories and phi-1
Exploring how small a Large Language Model (LLM) can be while still achieving high levels of capability
Questions about the necessity of large-scale models like GPT-4
Cost, energy consumption, and controllability concerns with large models
Tackling the challenging task of common sense reasoning in natural language
Introduction of phi-1.5, a 1.3 billion parameter model trained on a dataset of 30 billion tokens
Comparable benchmark results in common sense reasoning to larger models trained on larger datasets
Dataset consists almost exclusively of synthetically generated data
Synthetic data generation implications for controlling toxic and biased content generation with LLMs

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

arXiv: 2309.05463v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

Submitted to arXiv on 11 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.05463v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In this work, we continue the investigation into the power of smaller Transformer-based language models. We build upon the success of previous models such as TinyStories and phi-1, which demonstrated the ability to generate coherent English and achieve state-of-the-art Python coding performance, respectively. Our goal is to explore how small a Large Language Model (LLM) can be while still achieving high levels of capability. The latest generation of LLMs, exemplified by GPT-4, has shown remarkable improvements over their predecessors. However, these models are extremely large in scale, with trillions of parameters and tokens. This raises important questions about the necessity of such scale for achieving advanced capabilities. The cost of training and maintaining large models is substantial, and concerns about energy consumption and controllability also come into play. By investigating whether similar capabilities can be achieved at a smaller scale, we aim to provide insights into intelligent system architectures and development. Previous research has focused on tasks like fluent English speaking and coding simple functions in Python. In this work, we tackle the challenging task of common sense reasoning in natural language. Common sense reasoning has long been a difficult task for AI systems. We introduce phi-1.5, a new 1.3 billion parameter model trained on a dataset of 30 billion tokens. Remarkably, phi-1.5 achieves benchmark results in common sense reasoning that are comparable to models ten times its size trained on datasets more than ten times larger. One key aspect of our approach is that our dataset consists almost exclusively of synthetically generated data. This follows the methodology proposed in previous work on coding tasks using LLMs. Synthetic data generation has important implications for controlling toxic and biased content generation with LLMs—a notorious challenge in this field.

- Investigation into the power of smaller Transformer-based language models
- Building upon the success of previous models such as TinyStories and phi-1
- Exploring how small a Large Language Model (LLM) can be while still achieving high levels of capability
- Questions about the necessity of large-scale models like GPT-4
- Cost, energy consumption, and controllability concerns with large models
- Tackling the challenging task of common sense reasoning in natural language
- Introduction of phi-1.5, a 1.3 billion parameter model trained on a dataset of 30 billion tokens
- Comparable benchmark results in common sense reasoning to larger models trained on larger datasets
- Dataset consists almost exclusively of synthetically generated data
- Synthetic data generation implications for controlling toxic and biased content generation with LLMs

Researchers are studying smaller language models to see how powerful they can be. They are building on previous successful models like TinyStories and phi-1. They want to find out if large models like GPT-4 are really necessary, considering the cost, energy usage, and control issues they have. They are also working on making the models understand common sense reasoning in natural language. They introduced a model called phi-1.5 that has 1.3 billion parameters and was trained on a dataset of 30 billion words. This model performs as well as larger models trained on bigger datasets in understanding common sense. The dataset used for training is mostly made up of artificially created data, which has implications for controlling harmful or biased content generated by these models." Definitions- Transformer-based language model: A type of computer program that helps computers understand and generate human-like text. - Capability: How well something can do a task or perform a function. - Large Language Model (LLM): A very big language model with many parameters that helps computers understand and generate text. - Benchmark results: Measuring how well something performs compared to other things in a standard test or competition. - Dataset: A collection of information used for training or testing a computer program. - Tokens: Small units of text, like words or characters. - Synthetic data generation: Creating artificial data instead of using real-world examples. - Toxic content: Harmful or dangerous information that can cause harm to people who read it. - Biased

Exploring the Power of Smaller Transformer-Based Language Models

The development of Large Language Models (LLMs) has been a major breakthrough in AI research. LLMs, exemplified by GPT-4, have demonstrated remarkable improvements over their predecessors and are capable of performing complex tasks such as natural language understanding and common sense reasoning. However, these models are extremely large in scale with trillions of parameters and tokens, raising important questions about the necessity of such scale for achieving advanced capabilities. In this work, we explore how small an LLM can be while still achieving high levels of capability.

Previous Work on TinyStories and Phi-1

Our investigation builds upon the success of previous models such as TinyStories and phi-1. TinyStories was able to generate coherent English sentences at a smaller scale than traditional language models. Phi-1 achieved state-of-the art Python coding performance using only 1 billion parameters—a fraction compared to other popular models like GPT-3 which use up to 175 billion parameters.

Introducing Phi-1.5: Achieving Benchmark Results in Common Sense Reasoning

We introduce phi-1.5, a new 1.3 billion parameter model trained on a dataset consisting almost exclusively of synthetically generated data from 30 billion tokens—more than ten times larger than what was used for training phi-1 yet still much smaller than datasets used for training larger LLMs like GPT-4 or GPT-3 which use up to 45 trillion tokens respectively. Remarkably, phi-1.5 achieves benchmark results in common sense reasoning that are comparable to those achieved by models ten times its size trained on datasets more than ten times larger!

Implications for Controlling Toxic Content Generation

Synthetic data generation has important implications for controlling toxic and biased content generation with LLMs—a notorious challenge in this field due to the sheer amount of data required for training large models like GPTs or BERTs which can lead to unintended bias being introduced into the system if not carefully monitored during training process . By exploring whether similar capabilities can be achieved at a smaller scale using synthetic data sets instead , we aim to provide insights into intelligent system architectures that may help reduce energy consumption , cost , controllability issues associated with large language models while still maintaining high levels of capability .

Created on 18 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

63.6%

Training a Helpful and Harmless Assistant with Reinforcement Learning from Hu…

cs.CL

62.0%

Self-Alignment with Instruction Backtranslation

cs.CL

61.1%

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

cs.CL

60.6%

LLaMA: Open and Efficient Foundation Language Models

cs.CL

60.3%

Emergent Abilities of Large Language Models

cs.CL

60.2%

InstructZero: Efficient Instruction Optimization for Black-Box Large Language…

cs.AI

59.2%

LIMA: Less Is More for Alignment

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.