Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

AI-generated keywords: Alpaca vs Vicuna LLMs Black-box prompt optimization Memorization Instruction-based prompts

AI-generated Key Points

  • Introduction of a novel black-box prompt optimization method using LLMs to uncover memorization in victim agents
  • Utilization of an iterative rejection-sampling optimization process to identify instruction-based prompts with specific characteristics
  • Instruction-based prompts yield outputs with 23.7% higher overlap with training data compared to baseline prefix-suffix measurements
  • Demonstration that instruction-tuned models can expose pre-training data effectively, if not more so, than base models
  • Highlighting the potential for automated attacks using instructions proposed by other LLMs beyond original training data contexts
  • Evaluation focuses on measuring memorization/reconstruction and evaluating prompt overlap, utilizing ROUGE-L and LCSP as metrics
  • Experimental results show that instruction-tuned models exhibit higher memorization scores (Rouge-L) compared to base models across different sequence lengths and data domains
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana

License: CC BY 4.0

Abstract: In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs. We use an iterative rejection-sampling optimization process to find instruction-based prompts with two main characteristics: (1) minimal overlap with the training data to avoid presenting the solution directly to the model, and (2) maximal overlap between the victim model's output and the training data, aiming to induce the victim to spit out training data. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that (1) instruction-tuned models can expose pre-training data as much as their base-models, if not more so, (2) contexts other than the original training data can lead to leakage, and (3) using instructions proposed by other LLMs can open a new avenue of automated attacks that we should further study and explore. The code can be found at https://github.com/Alymostafa/Instruction_based_attack .

Submitted to arXiv on 05 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.04801v1

In this paper, titled "Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs," the authors introduce a novel black-box prompt optimization method that leverages an attacker LLM agent to reveal higher levels of memorization in a victim agent. This method surpasses the traditional approach of quantifying memorization in LLMs by prompting the target model with training data directly. The researchers employ an iterative rejection-sampling optimization process to identify instruction-based prompts with specific characteristics: minimal overlap with training data to prevent providing solutions directly to the model and maximal overlap between the victim model's output and the training data to encourage the victim to produce training data. Through their experiments, they find that these instruction-based prompts yield outputs with 23.7% higher overlap with training data compared to baseline prefix-suffix measurements. The study demonstrates that instruction-tuned models can expose pre-training data as effectively as their base models, if not more so. Additionally, it highlights that contexts beyond the original training data can lead to information leakage and emphasizes the potential for automated attacks using instructions proposed by other LLMs. The evaluation of the proposed attack and baseline methods focuses on two key areas: measuring memorization/reconstruction and evaluating prompt overlap. The researchers utilize ROUGE-L as a metric for assessing memorization by computing the longest common subsequence between generated and original suffixes, finding it more accurate than traditional metrics like BLEU score. They also introduce LCSP as a measure of overlap between prompts and suffixes. The experimental results showcase that instruction-tuned models exhibit higher memorization scores (Rouge-L) compared to base models across different sequence lengths and data domains. Detailed breakdowns of these results are provided in tables and appendices for reference. Overall, this study sheds light on how LLMs can memorize more information than previously thought, underscoring the importance of understanding and mitigating potential vulnerabilities in language models.
Created on 07 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.