Small Language Models: Survey, Measurements, and Insights

AI-generated keywords: Small Language Models Decoder-only Transformer Architecture Pre-training Process Performance Levels Machine Intelligence

AI-generated Key Points

  • Surge in development of small language models (SLMs)
  • SLMs based on decoder-only transformer architecture like GPT-2
  • Meticulously selected 59 SLMs with open weights and parameter sizes ranging from 100M to 5B
  • Focus on base knowledge acquired during pre-training process
  • Variability in performance levels across tasks due to differences in hyperparameters and training datasets
  • Subjectivity of "small" definition, with a limit of 5B parameters set for SLMs in the study
  • Benchmarking inference latency and memory footprints to provide insights for advancing research
  • Identification of key innovations and potential research topics for future developments in machine intelligence accessibility, affordability, and efficiency
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu

License: CC BY 4.0

Abstract: Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the capabilities of LLMs in the pursuit of artificial general intelligence, SLM research aims to make machine intelligence more accessible, affordable, and efficient for everyday tasks. Focusing on transformer-based, decoder-only language models with 100M-5B parameters, we survey 59 state-of-the-art open-source SLMs, analyzing their technical innovations across three axes: architectures, training datasets, and training algorithms. In addition, we evaluate their capabilities in various domains, including commonsense reasoning, in-context learning, mathematics, and coding. To gain further insight into their on-device runtime costs, we benchmark their inference latency and memory footprints. Through in-depth analysis of our benchmarking data, we offer valuable insights to advance research in this field.

Submitted to arXiv on 24 Sep. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2409.15790v1

In recent years, there has been a surge in the development of small language models (SLMs), which have garnered increasing attention from both research and industrial communities. These SLMs are based on decoder-only transformer architecture like GPT-2 and are designed for superior performance and real-world deployment. To understand their capabilities and costs, we have meticulously selected 59 SLMs with open weights and parameter sizes ranging from 100M to 5B. Our comprehensive review focuses on the base knowledge acquired during the pre-training process of these SLMs. We exclude fine-tuned models and variants of transformers to maintain consistency in our analysis. While all selected SLMs share similar architectures, they vary in hyperparameters and training datasets, some of which remain closed-source. These differences result in varying performance levels across tasks such as commonsense reasoning, mathematics, coding, and more. It is important to note that the definition of "small" is subjective and relative, with device memory capacities evolving over time. Despite this, we have set a limit of 5B parameters for SLMs in this study. By benchmarking inference latency and memory footprints of these models, we aim to provide valuable insights for advancing research in the field of small language models. Through our exhaustive investigation into these state-of-the-art SLMs, we have identified key innovations and potential research topics that could shape future developments in machine intelligence accessibility, affordability, and efficiency for everyday tasks. By making our results and benchmark tools publicly available, we hope to facilitate further advancements in SLM research and contribute to the broader goal of enhancing machine intelligence capabilities for practical applications.
Created on 26 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.