Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey

AI-generated keywords: Artificial Intelligence Explainable AI Adversarial Machine Learning Security Fairness

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Explainable AI (XAI) methods are seen as a solution for debugging and building trust in statistical and deep learning models
XAI methods offer insights into model predictions but recent advancements in adversarial machine learning have exposed limitations and vulnerabilities in explanations
Concerns arise about the security and reliability of XAI explanations due to potential manipulation, deception, or whitewashing of evidence
A comprehensive survey based on over 50 research papers explores adversarial attacks on machine learning model explanations and considers fairness metrics
Strategies are discussed for strengthening defenses against attacks and developing resilient interpretation methodologies to prevent malicious manipulations
The survey highlights existing insecurities within XAI frameworks, paving the way for further exploration in adversarial XAI (AdvXAI)
Authored by Hubert Baniecki and Przemyslaw Biecek, the survey titled "Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey" will be presented at the IJCAI 2023 Workshop on XAI

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Hubert Baniecki, Przemyslaw Biecek

arXiv: 2306.06123v1 - DOI (cs.CR)

To be presented at the IJCAI 2023 Workshop on XAI

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning highlight the limitations and vulnerabilities of state-of-the-art explanations, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This concise survey of over 50 papers summarizes research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI).

Submitted to arXiv on 06 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.06123v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the realm of artificial intelligence, explainable AI (XAI) methods have been heralded as a solution for debugging and fostering trust in statistical and deep learning models. These methods also provide insights into their predictions. However, recent advancements in adversarial machine learning have revealed limitations and vulnerabilities present in state-of-the-art explanations. This casts doubt on their security and reliability. The potential for manipulation, deception, or whitewashing of evidence regarding a model's reasoning poses significant risks when used in critical decision-making processes and knowledge discovery endeavors. This comprehensive survey delves into the findings of over 50 research papers that explore adversarial attacks on explanations generated by machine learning models. It also considers fairness metrics within these contexts. The discourse extends to strategies for fortifying defenses against such attacks and devising resilient interpretation methodologies to safeguard against malicious manipulations. By identifying a spectrum of existing insecurities within XAI frameworks, the survey sets the stage for outlining nascent avenues of inquiry within the domain of adversarial XAI (AdvXAI). Authored by Hubert Baniecki and Przemyslaw Biecek, this survey titled "Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey" is slated for presentation at the IJCAI 2023 Workshop on XAI. Through its nuanced examination of adversarial threats to XAI systems and proposed defense mechanisms, this study contributes valuable insights to ongoing discussions surrounding the robustness and integrity of interpretable AI technologies.

- Explainable AI (XAI) methods are seen as a solution for debugging and building trust in statistical and deep learning models
- XAI methods offer insights into model predictions but recent advancements in adversarial machine learning have exposed limitations and vulnerabilities in explanations
- Concerns arise about the security and reliability of XAI explanations due to potential manipulation, deception, or whitewashing of evidence
- A comprehensive survey based on over 50 research papers explores adversarial attacks on machine learning model explanations and considers fairness metrics
- Strategies are discussed for strengthening defenses against attacks and developing resilient interpretation methodologies to prevent malicious manipulations
- The survey highlights existing insecurities within XAI frameworks, paving the way for further exploration in adversarial XAI (AdvXAI)
- Authored by Hubert Baniecki and Przemyslaw Biecek, the survey titled "Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey" will be presented at the IJCAI 2023 Workshop on XAI

SummaryExplainable AI (XAI) helps us understand and trust computer models. But some bad people can trick the explanations. Researchers are studying how to make XAI more secure and fair. They want to protect against attacks that could change the explanations in a bad way. Definitions- Explainable AI (XAI): Methods that help us understand how computer models work. - Adversarial machine learning: Techniques used to trick or manipulate machine learning models. - Vulnerabilities: Weaknesses or flaws in something that can be exploited. - Manipulation: Changing something in a dishonest or unfair way. - Resilient: Able to withstand or recover from difficult situations.

In recent years, the field of artificial intelligence (AI) has seen a surge in interest and development. With advancements in statistical and deep learning models, AI has become increasingly capable of making complex decisions and predictions. However, as these models become more sophisticated, they also become less transparent to human understanding. This lack of transparency can lead to mistrust and skepticism towards AI systems, especially when they are used in critical decision-making processes. To address this issue, researchers have turned to explainable AI (XAI) methods as a solution for debugging and fostering trust in machine learning models. These methods aim to provide insights into how a model makes its predictions, allowing humans to understand the reasoning behind its decisions. However, recent advancements in adversarial machine learning have revealed limitations and vulnerabilities present in state-of-the-art explanations generated by XAI techniques. This raises concerns about the security and reliability of XAI systems. The potential for manipulation or deception of evidence regarding a model's reasoning poses significant risks when used in critical decision-making processes or knowledge discovery endeavors. To shed light on these issues, Hubert Baniecki and Przemyslaw Biecek have conducted a comprehensive survey that delves into the findings of over 50 research papers exploring adversarial attacks on explanations generated by machine learning models. Titled "Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey," this study is slated for presentation at the IJCAI 2023 Workshop on XAI. Through its nuanced examination of adversarial threats to XAI systems and proposed defense mechanisms, this survey contributes valuable insights to ongoing discussions surrounding the robustness and integrity of interpretable AI technologies. The authors begin by providing an overview of explainable AI methods currently being used in various domains such as healthcare, finance, criminal justice system etc., highlighting their benefits but also acknowledging their limitations when it comes to handling adversarial attacks. They then delve into the various types of adversarial attacks that have been identified in the literature, including data poisoning, model inversion, and input perturbation attacks. These attacks aim to manipulate or deceive XAI systems by exploiting their vulnerabilities. The survey also considers fairness metrics within the context of adversarial attacks on XAI systems. This is an important aspect as these systems are often used in decision-making processes that can have significant impacts on individuals or groups. The authors discuss how adversarial attacks can lead to biased decisions and suggest ways to incorporate fairness considerations into defense mechanisms against such attacks. One of the key contributions of this study is its exploration of strategies for fortifying defenses against adversarial attacks on explanations generated by machine learning models. These include techniques such as robust feature selection, model distillation, and ensemble methods. The authors also discuss potential limitations and challenges associated with these defense mechanisms. In addition to discussing existing insecurities within XAI frameworks, the survey also sets the stage for outlining nascent avenues of inquiry within the domain of adversarial XAI (AdvXAI). It highlights areas where further research is needed to develop more robust and secure explainable AI methods. Overall, Baniecki and Biecek's survey sheds light on a critical issue facing interpretable AI technologies - their vulnerability to adversarial attacks. By providing a comprehensive overview of existing research in this area, it not only raises awareness about potential threats but also offers valuable insights into developing more resilient interpretation methodologies for safeguarding against malicious manipulations. In conclusion, while explainable AI methods have been hailed as a solution for fostering trust in machine learning models, they are not immune to security risks posed by adversarial attacks. This comprehensive survey serves as an important reminder that we must continue to critically examine and strengthen our understanding of AdvXAI if we want interpretable AI technologies to be reliable tools for decision-making processes in various domains.

Created on 24 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

84.4%

XAI-Based Detection of Adversarial Attacks on Deepfake Detectors

cs.CR

80.6%

Adversarial Machine Learning in Network Intrusion Detection Systems

cs.CR

79.4%

A Survey of Game Theoretic Approaches for Adversarial Machine Learning in Cyb…

cs.CR

79.2%

Explainability Guided Adversarial Evasion Attacks on Malware Detectors

cs.CR

78.0%

Supporting AI/ML Security Workers through an Adversarial Techniques, Tools, a…

cs.CR

77.0%

The Threat of Adversarial Attacks on Machine Learning in Network Security -- …

cs.CR

75.5%

Machine Learning for Intrusion Detection in Industrial Control Systems: Appli…

cs.CR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.