How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

AI-generated keywords: Supervised Fine-tuning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Study focuses on impact of supervised fine-tuning (SFT) data composition on large language models (LLMs)
  • LLMs have diverse capabilities: math reasoning, code generation, instruction following
  • Open-source LLMs enhanced through ad-hoc SFT, proprietary LLMs versatile across skills
  • Four research questions proposed to explore association between model performance and factors like data amount, composition ratio, model size, SFT strategies
  • Different capabilities of LLMs scale differently, larger models generally show superior performance with same amount of data
  • Mathematical reasoning and code generation consistently improve with increasing data amount
  • General human-aligning abilities plateau after approximately a thousand samples
  • Data composition can enhance various abilities under limited data conditions but may lead to performance conflicts with abundance of data
  • Amount of composition data has greater influence on performance than composition ratio
  • Sequentially learning multiple skills risks catastrophic forgetting
  • Dual-stage Mixed Fine-tuning (DMT) strategy proposed as solution for learning multiple abilities with different scaling patterns
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou

Abstract: Large language models (LLMs) with enormous pre-training tokens and parameters emerge diverse abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). While the open-source community has explored ad-hoc SFT for enhancing individual capabilities, proprietary LLMs exhibit versatility across various skills. Therefore, understanding the facilitation of multiple abilities via SFT is paramount. In this study, we specifically focuses on the interplay of data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies. Our experiments reveal that distinct capabilities scale differently and larger models generally show superior performance with same amount of data. Mathematical reasoning and code generation consistently improve with increasing data amount, whereas general abilities plateau after roughly a thousand samples. Moreover, we observe data composition appears to enhance various abilities under limited data conditions, yet can lead to performance conflicts when data is plentiful. Our findings also suggest the amount of composition data influences performance more than the composition ratio. In analysis of SFT strategies, we find that sequentially learning multiple skills risks catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy offers a promising solution to learn multiple abilities with different scaling patterns.

Submitted to arXiv on 09 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.05492v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , This study focuses on the impact of supervised fine-tuning (SFT) data composition on the abilities of large language models (LLMs). LLMs have diverse capabilities, such as math reasoning, code generation, and instruction following. Open-source LLMs have been enhanced through ad-hoc SFT, while proprietary LLMs exhibit versatility across various skills. Understanding how SFT facilitates multiple abilities is crucial for improving model performance. The researchers propose four research questions to explore the association between model performance and factors like data amount, composition ratio, model size, and SFT strategies. The experiments reveal that different capabilities of LLMs scale differently, with larger models generally showing superior performance with the same amount of data. Mathematical reasoning and code generation consistently improve with increasing data amount, while general human-aligning abilities plateau after approximately a thousand samples. Data composition can enhance various abilities under limited data conditions but may lead to performance conflicts when there is an abundance of data. Additionally, the amount of composition data has a greater influence on performance than the composition ratio. In analyzing SFT strategies, it is discovered that sequentially learning multiple skills risks catastrophic forgetting. To address this issue, the researchers propose a Dual-stage Mixed Fine-tuning (DMT) strategy that offers a promising solution for learning multiple abilities with different scaling patterns. Overall, this study provides insights into how SFT data composition affects the abilities of large language models and proposes strategies to optimize their performance in various tasks.
Created on 30 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.