PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

AI-generated keywords: PromptBench Adversarial Prompts LLMs Robustness Benchmark Evaluation Benchmark

AI-generated Key Points

PromptBench is a robustness benchmark for Large Language Models (LLMs) to measure their resilience to adversarial prompts.
The study focuses on different types of textual attacks targeting prompts at various levels: character, word, sentence, and semantic.
Adversarial prompts are used in tasks such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving.
The study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples.
Contemporary LLMs are found to be vulnerable to adversarial prompts.
Word frequency analysis is utilized to provide practical guidance for crafting more robust prompts.
Code and compiled prompts are publicly accessible for future research on prompt robustness.
A visualization website is available for easy exploration of adversarial prompts.
PromptBench categorizes different types of prompts based on their purpose and labeled sample requirements: task-oriented and role-oriented prompts in both zero-shot (ZS) and few-shot (FS) learning scenarios.
The evaluation includes multiple LLMs with different architectures and sizes across various tasks and domains.
PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis, grammar correctness detection, duplicate sentence detection, natural language inference, multi-task knowledge evaluation through multiple-choice questions, and reading comprehension.
Datasets used include SST-2, CoLA, QQP, MRPC, MNLI, QNLI, RTE, WNLI, and MMLU.
Overall findings from PromptBench provide insights into the robustness of LLMs to adversarial prompts and offer practical recommendations for prompt composition.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, Xing Xie

arXiv: 2306.04528v2 - DOI (cs.CL)

Technical report; 23 pages; code is at: https://github.com/microsoft/promptbench

License: CC BY 4.0

Abstract: The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.

Submitted to arXiv on 07 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.04528v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

PromptBench is a robustness benchmark designed to measure the resilience of Large Language Models (LLMs) to adversarial prompts. The study focuses on various types of textual attacks targeting prompts at different levels, including character, word, sentence, and semantic levels. These adversarial prompts are used in diverse tasks such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. The study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples. The findings reveal that contemporary LLMs are vulnerable to adversarial prompts. To provide practical guidance for downstream users and prompt engineers in crafting more robust prompts, the analysis of word frequency is utilized. Additionally, the code and compiled prompts are made publicly accessible to stimulate future research on prompt robustness. A visualization website is also built for easy exploration of adversarial prompts. PromptBench consists of different types of prompts categorized based on their purpose and labeled sample requirements. Task-oriented prompts explicitly describe the task for the model to perform while role-oriented prompts frame the model as an entity with a specific role. Both zero-shot (ZS) and few-shot (FS) learning scenarios are considered for these prompt categories. The evaluation includes a set of LLMs with different architectures and sizes to assess their performance across various tasks and domains. The models considered include Flan-T5-large (0.8B), Dolly-6B, LLaMA-13B, Vicuna-13B , Cerebras-GPT-13B , GPT-NEOX -20B Flan -UL2 (20B )and ChatGPT . PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis , grammar correctness detection , duplicate sentence detection , natural language inference , multi -task knowledge evaluation through multiple -choice questions , and reading comprehension . The datasets used include SST -2 , CoLA , QQP , MRPC , MNLI , QNLI , RTE , WNLI , and MMLU . Overall, PromptBench provides comprehensive insights into the robustness of LLMs to adversarial prompts and offers practical recommendations for prompt composition . The availability of code and evaluation benchmark encourages collaborative exploration in this field .

- PromptBench is a robustness benchmark for Large Language Models (LLMs) to measure their resilience to adversarial prompts.
- The study focuses on different types of textual attacks targeting prompts at various levels: character, word, sentence, and semantic.
- Adversarial prompts are used in tasks such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving.
- The study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples.
- Contemporary LLMs are found to be vulnerable to adversarial prompts.
- Word frequency analysis is utilized to provide practical guidance for crafting more robust prompts.
- Code and compiled prompts are publicly accessible for future research on prompt robustness.
- A visualization website is available for easy exploration of adversarial prompts.
- PromptBench categorizes different types of prompts based on their purpose and labeled sample requirements: task-oriented and role-oriented prompts in both zero-shot (ZS) and few-shot (FS) learning scenarios.
- The evaluation includes multiple LLMs with different architectures and sizes across various tasks and domains.
- PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis, grammar correctness detection, duplicate sentence detection, natural language inference, multi-task knowledge evaluation through multiple-choice questions, and reading comprehension.
- Datasets used include SST-2, CoLA, QQP, MRPC, MNLI, QNLI, RTE, WNLI, and MMLU.
- Overall findings from PromptBench provide insights into the robustness of LLMs to adversarial prompts and offer practical recommendations for prompt composition.

PromptBench is a test for big computer programs that understand language to see how well they can handle tricky questions. The test looks at different ways that these programs can be tricked, like changing the words in a question or making the question sound different but mean the same thing. These tricks are used in tasks like figuring out if a sentence sounds positive or negative, understanding what a sentence means, answering questions about a story, translating languages, and solving math problems. The test looked at over 4,000 tricky questions across 8 tasks and 13 sets of data and found that the programs were not very good at handling them. To help make better questions, the researchers looked at how often different words are used and made some tools available for other people to use. They also made a website where people can look at these tricky questions easily. PromptBench puts the tricky questions into categories based on what they're trying to do and what kind of information they need from the program. The test looked at many different kinds of programs doing many different tasks with lots of different sets of data."

Exploring the Robustness of Large Language Models with PromptBench

The development of large language models (LLMs) has enabled many advances in natural language processing, from sentiment analysis to reading comprehension. However, these models are vulnerable to adversarial prompts that can cause them to make incorrect predictions or fail entirely. To measure the resilience of LLMs against such attacks, researchers have developed a benchmark called PromptBench. This study evaluates 4,032 adversarial prompts across 8 tasks and 13 datasets with a total of 567,084 test samples. The findings reveal that contemporary LLMs are indeed vulnerable to adversarial prompts and provide practical guidance for downstream users and prompt engineers in crafting more robust ones.

What is PromptBench?

PromptBench is a robustness benchmark designed to measure the resilience of LLMs against various types of textual attacks targeting prompts at different levels: character, word, sentence, and semantic levels. These adversarial prompts are used in diverse tasks such as sentiment analysis, natural language inference (NLI), reading comprehension (RC), machine translation (MT), and math problem-solving (MPS). PromptBench consists of different types of prompts categorized based on their purpose and labeled sample requirements. Task-oriented prompts explicitly describe the task for the model to perform while role-oriented prompts frame the model as an entity with a specific role. Both zero-shot (ZS) and few-shot (FS) learning scenarios are considered for these prompt categories. The evaluation includes a set of LLMs with different architectures and sizes including Flan-T5-large(0.8B), Dolly-6B , LLaMA -13B , Vicuna -13B , Cerebras -GPT -13B , GPT -NEOX -20B Flan -UL2(20B )and ChatGPT .

Datasets Used in Evaluation

PromptBench comprises 8 diverse tasks with 13 public datasets covering areas such as sentiment analysis , grammar correctness detection , duplicate sentence detection , NLI , multi -task knowledge evaluation through multiple choice questions , RC . The datasets used include SST – 2 CoLA QQP MRPC MNLI QNLI RTE WNLI MMLU .

Analysis & Findings

The findings reveal that contemporary LLMs are indeed vulnerable to adversarial prompts which can lead them to make incorrect predictions or fail entirely depending on how they were crafted by attackers or malicious actors . To provide practical guidance for downstream users and prompt engineers in crafting more robust ones analysis of word frequency is utilized . Additionally code compiledprompts made publicly accessible stimulate future research on prompt robustness A visualization website also built easy explorationofadversarialprompts

Conclusion

Overall PromptBench provides comprehensive insights into the robustnessofLLMstoadversarialpromptsandofferspracticalrecommendationsforpromptcomposition Availability code evaluationbenchmark encourages collaborative exploration this field

Created on 14 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

64.5%

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

cs.CR

63.0%

Frugal Prompting for Dialog Models

cs.CL

62.8%

LLaMA: Open and Efficient Foundation Language Models

cs.CL

62.4%

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

cs.SE

61.8%

How Many Data Points is a Prompt Worth?

cs.LG

61.5%

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in N…

cs.CL

61.2%

Life of PII -- A PII Obfuscation Transformer

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.