InfoFlood: Jailbreaking Large Language Models with Information Overload

Auteurs : Advait Yadav, Haibo Jin, Man Luo, Jun Zhuang, Haohan Wang

Licence : CC BY 4.0

Résumé : Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains. However, their potential to generate harmful responses has raised significant societal and regulatory concerns, especially when manipulated by adversarial techniques known as "jailbreak" attacks. Existing jailbreak methods typically involve appending carefully crafted prefixes or suffixes to malicious prompts in order to bypass the built-in safety mechanisms of these models. In this work, we identify a new vulnerability in which excessive linguistic complexity can disrupt built-in safety mechanisms-without the need for any added prefixes or suffixes-allowing attackers to elicit harmful outputs directly. We refer to this phenomenon as Information Overload. To automatically exploit this vulnerability, we propose InfoFlood, a jailbreak attack that transforms malicious queries into complex, information-overloaded queries capable of bypassing built-in safety mechanisms. Specifically, InfoFlood: (1) uses linguistic transformations to rephrase malicious queries, (2) identifies the root cause of failure when an attempt is unsuccessful, and (3) refines the prompt's linguistic structure to address the failure while preserving its malicious intent. We empirically validate the effectiveness of InfoFlood on four widely used LLMs-GPT-4o, GPT-3.5-turbo, Gemini 2.0, and LLaMA 3.1-by measuring their jailbreak success rates. InfoFlood consistently outperforms baseline attacks, achieving up to 3 times higher success rates across multiple jailbreak benchmarks. Furthermore, we demonstrate that commonly adopted post-processing defenses, including OpenAI's Moderation API, Perspective API, and SmoothLLM, fail to mitigate these attacks. This highlights a critical weakness in traditional AI safety guardrails when confronted with information overload-based jailbreaks.

Soumis à arXiv le 13 Jui. 2025

Explorez l'arbre d'article

Cliquez sur les nœuds de l'arborescence pour être redirigé vers un article donné et accéder à leurs résumés et assistant virtuel

Accédez également à nos Résumés, ou posez des questions sur cet article à notre Assistant IA.

Recherchez des articles similaires (en version bêta)

En cliquant sur le bouton ci-dessus, notre algorithme analysera tous les articles de notre base de données pour trouver le plus proche en fonction du contenu des articles complets et pas seulement des métadonnées. Veuillez noter que cela ne fonctionne que pour les articles pour lesquels nous avons généré des résumés et que vous pouvez le réexécuter de temps en temps pour obtenir un résultat plus précis pendant que notre base de données s'agrandit.