Evaluating Large Language Models on Controlled Generation Tasks

AI-generated keywords: Large Language Models (LLMs)

AI-generated Key Points

  • Authors evaluate controllability of large language models (LLMs) on various generation tasks
  • LLMs compared to smaller specialized models
  • Performance analyzed on five tasks and ten benchmarks
  • LLMs struggle with fine-grained hard constraints like numerical planning and paraphrase generation
  • LLMs can generate human-level rationales and conform to coarse control signals like sentiment, topic, and keyword incorporation
  • Automatic rationales generated by LLMs can enhance performance through chain-of-thought reasoning
  • Study has limitations including heavy prompt engineering effort and imperfect automatic evaluations
  • No solutions proposed for addressing tasks where LLMs struggle, future work needed
  • Research provides insights into controllability of large language models in generation tasks
  • Potential solutions offered to improve performance.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

EMNLP 2023
License: CC BY 4.0

Abstract: While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that **large language models struggle at meeting fine-grained hard constraints**.

Submitted to arXiv on 23 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.14542v1

In this study, the authors evaluate the controllability of large language models (LLMs) on various generation tasks. They compare LLMs to smaller specialized models and analyze their performance on five tasks and ten benchmarks, including a numerical planning benchmark that is challenging for LLMs but easy for humans. The results show that while LLMs can generate human-level rationales and conform to coarse control signals like sentiment, topic, and keyword incorporation, they struggle with fine-grained hard constraints such as numerical planning and paraphrase generation. The authors suggest that these findings can guide the adoption of LLMs in downstream applications. They propose using automatic rationales generated by LLMs to enhance their performance through chain-of-thought reasoning. However, the study has some limitations, including heavy prompt engineering effort and imperfect automatic evaluations. Additionally, no solutions are proposed for addressing the tasks where LLMs struggle, leaving it as future work. Overall, this research provides valuable insights into the controllability of large language models in generation tasks and offers potential solutions to improve their performance.
Created on 24 Dec. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.