Large Language Models: A Survey

AI-generated keywords: Large Language Models GPT LLaMA PaLM NLMs

AI-generated Key Points

  • Thorough review of Large Language Models (LLMs) and their advancements
  • Current state of LLMs and focus on three prominent families: GPT, LLaMA, and PaLM
  • Overview of early pre-trained neural language models: NLMs based on RNNs (LSTM and GRU) and Transformer architecture
  • Use of self-attention mechanisms in Transformers for efficient pre-training of large language models
  • Techniques used to enhance LLMs for real-world applications
  • Survey of popular datasets prepared for LLM training, fine-tuning, and evaluation
  • Review of widely used evaluation metrics for LLMs
  • Comparison of performance among several popular models on representative benchmarks
  • Discussion of open challenges and future research directions in the field of LLMs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

arXiv admin note: substantial text overlap with arXiv:2401.14423
License: CC BY 4.0

Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Submitted to arXiv on 09 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.06196v1

In this paper, the authors provide a thorough review of Large Language Models (LLMs) and their advancements. They begin by discussing the current state of LLMs and focus on three prominent families: GPT, LLaMA, and PaLM. The authors also give an overview of early pre-trained neural language models that laid the groundwork for LLMs. These include NLMs based on recurrent neural networks (RNNs), such as LSTM and GRU, as well as the revolutionary Transformer architecture. The paper then delves into how LLMs are built, highlighting the use of self-attention mechanisms in Transformers to efficiently pre-train large language models on massive amounts of data. The authors also discuss techniques used to enhance LLMs for real-world applications. Next, they survey popular datasets prepared for LLM training, fine-tuning, and evaluation. The authors review widely used evaluation metrics for LLMs and compare the performance of several popular models on representative benchmarks. Finally, they conclude by discussing open challenges and future research directions in the field of LLMs.
Created on 13 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.