Low-Resource Languages Jailbreak GPT-4
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Authors Zheng-Xin Yong, Cristina Menghini, and Stephen H. Bach highlight the issue of AI safety training and red-teaming for large language models (LLMs) to prevent unsafe content generation.
- Research reveals a vulnerability in LLMs due to linguistic inequality in training data, allowing bypassing of protective measures by translating unsafe English inputs into low-resource languages.
- Experiments show GPT-4's ability to interact with translated unsafe inputs, leading users towards harmful outcomes in 79% of cases.
- Vulnerability primarily affects low-resource languages, with significantly lower success rates observed for high- or mid-resource languages.
- Publicly available translation APIs enable easy exploitation of safety vulnerabilities in LLMs.
- Advocacy for developing robust multilingual safeguards with broad language coverage to address linguistic inequalities within AI systems is emphasized.
Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach
Abstract: AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.