PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

AI-generated keywords: PromptBench Adversarial Prompts LLMs Robustness Benchmark Evaluation Benchmark

AI-generated Key Points

  • PromptBench is a robustness benchmark for Large Language Models (LLMs) to measure their resilience to adversarial prompts.
  • The study focuses on different types of textual attacks targeting prompts at various levels: character, word, sentence, and semantic.
  • Adversarial prompts are used in tasks such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving.
  • The study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples.
  • Contemporary LLMs are found to be vulnerable to adversarial prompts.
  • Word frequency analysis is utilized to provide practical guidance for crafting more robust prompts.
  • Code and compiled prompts are publicly accessible for future research on prompt robustness.
  • A visualization website is available for easy exploration of adversarial prompts.
  • PromptBench categorizes different types of prompts based on their purpose and labeled sample requirements: task-oriented and role-oriented prompts in both zero-shot (ZS) and few-shot (FS) learning scenarios.
  • The evaluation includes multiple LLMs with different architectures and sizes across various tasks and domains.
  • PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis, grammar correctness detection, duplicate sentence detection, natural language inference, multi-task knowledge evaluation through multiple-choice questions, and reading comprehension.
  • Datasets used include SST-2, CoLA, QQP, MRPC, MNLI, QNLI, RTE, WNLI, and MMLU.
  • Overall findings from PromptBench provide insights into the robustness of LLMs to adversarial prompts and offer practical recommendations for prompt composition.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, Xing Xie

Technical report; 23 pages; code is at: https://github.com/microsoft/promptbench
License: CC BY 4.0

Abstract: The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.

Submitted to arXiv on 07 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.04528v2

PromptBench is a robustness benchmark designed to measure the resilience of Large Language Models (LLMs) to adversarial prompts. The study focuses on various types of textual attacks targeting prompts at different levels, including character, word, sentence, and semantic levels. These adversarial prompts are used in diverse tasks such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. The study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples. The findings reveal that contemporary LLMs are vulnerable to adversarial prompts. To provide practical guidance for downstream users and prompt engineers in crafting more robust prompts, the analysis of word frequency is utilized. Additionally, the code and compiled prompts are made publicly accessible to stimulate future research on prompt robustness. A visualization website is also built for easy exploration of adversarial prompts. PromptBench consists of different types of prompts categorized based on their purpose and labeled sample requirements. Task-oriented prompts explicitly describe the task for the model to perform while role-oriented prompts frame the model as an entity with a specific role. Both zero-shot (ZS) and few-shot (FS) learning scenarios are considered for these prompt categories. The evaluation includes a set of LLMs with different architectures and sizes to assess their performance across various tasks and domains. The models considered include Flan-T5-large (0.8B), Dolly-6B, LLaMA-13B, Vicuna-13B , Cerebras-GPT-13B , GPT-NEOX -20B Flan -UL2 (20B )and ChatGPT . PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis , grammar correctness detection , duplicate sentence detection , natural language inference , multi -task knowledge evaluation through multiple -choice questions , and reading comprehension . The datasets used include SST -2 , CoLA , QQP , MRPC , MNLI , QNLI , RTE , WNLI , and MMLU . Overall, PromptBench provides comprehensive insights into the robustness of LLMs to adversarial prompts and offers practical recommendations for prompt composition . The availability of code and evaluation benchmark encourages collaborative exploration in this field .
Created on 14 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.