A Survey of Large Language Models

AI-generated keywords: Artificial Intelligence Language Modeling Pre-trained Models Large Language Models Natural Language Processing

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Language modeling in artificial intelligence is a key research area for enhancing machines' understanding and generation of human language.
  • Pre-trained language models (PLMs) developed by training Transformer models on extensive data sets have shown remarkable proficiency in natural language processing tasks.
  • Increasing model size significantly boosts performance levels, leading to the development of large language models (LLMs) with unique capabilities beyond conventional limits.
  • The launch of ChatGPT has marked a significant milestone in the field, attracting widespread interest from society and revolutionizing AI algorithm development and utilization.
  • This comprehensive survey focuses on pre-training techniques, adaptation tuning strategies, utilization methods, and capacity evaluation mechanisms for large language models (LLMs).
  • The exploration and advancement of LLMs have garnered substantial attention from academic and industrial circles, shaping future research directions.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen

ongoing work; 58 pages

Abstract: Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

Submitted to arXiv on 31 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.18223v10

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of artificial intelligence, language modeling has emerged as a pivotal area of research, aiming to enhance machines' ability to comprehend and generate human language. Over the past two decades, there has been a notable shift from traditional statistical language models to more advanced neural language models. One of the recent breakthroughs in this field is the development of pre-trained language models (PLMs), which involve training Transformer models on extensive corpora data sets. These PLMs have showcased remarkable proficiency in tackling various natural language processing (NLP) tasks. Researchers have observed that scaling up model size can significantly boost performance levels. This led to further investigations into the impact of increasing model parameters beyond conventional limits. Surprisingly, when these enlarged language models surpass a certain threshold, they not only exhibit enhanced performance but also demonstrate unique capabilities absent in smaller-scale models. To distinguish these high-parameter scale models, they are now commonly referred to as large language models (LLMs). The exploration and advancement of LLMs have garnered substantial attention from both academic and industrial circles. Notably, the launch of ChatGPT has marked a significant milestone in this domain, attracting widespread interest from society at large. The continuous evolution of LLMs is poised to revolutionize how AI algorithms are developed and utilized across various applications. This comprehensive survey delves into recent advancements in LLMs by providing insights into their background, key discoveries, and prevalent methodologies. It focuses on four core aspects: pre-training techniques, adaptation tuning strategies, utilization methods, and capacity evaluation mechanisms. Additionally, it outlines available resources for developing LLMs while addressing lingering challenges that may shape future research directions. Authored by a diverse team comprising Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu and others; this survey encapsulates the cutting-edge developments in LLMs and underscores their transformative potential within the AI community.
Created on 15 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.