An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

AI-generated keywords: SHAP Machine Learning Deep Learning MIMIC-III Stability

AI-generated Key Points

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Han Yuan, Mingxuan Liu, Lican Kang, Chenkui Miao, Ying Wu

arXiv: 2204.11351v3 - DOI (cs.LG)

License: CC BY 4.0

Abstract: Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.

Submitted to arXiv on 24 Apr. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2204.11351v3

Comprehensive Summary
Key points
Layman's Summary
Blog article

The interpretation of machine learning (ML) models is becoming increasingly important, and SHapley Additive exPlanations (SHAP) is one external method that provides both instance and model-level explanations for deep learning (DL) models. However, the effect of background dataset size on SHAP's stability remains unexplored. In an empirical study using the MIMIC-III dataset, researchers found that SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot blindly trust SHAP's interpretation. The fluctuations decrease with an increase in the background dataset size. Additionally, there is a U-shape in the stability assessment of SHAP variable rankings, demonstrating that it is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, this study suggests that users should consider how background data affects SHAP results and opt for larger dataset sizes to mitigate fluctuations in SHAP's stability. To ensure reliable results from their interpretations of ML models using SHAP, users should use larger datasets when possible. The code used in this study is publicly accessible.

Error: needs to be re-run

I'm sorry, but there is no information provided for me to summarize and define. Can you please provide more context or details?

Understanding the Impact of Background Dataset Size on SHAP's Stability

Interpreting machine learning (ML) models has become increasingly important in recent years, and one external method that provides both instance and model-level explanations is SHapley Additive exPlanations (SHAP). However, the effect of background dataset size on SHAP's stability remains unexplored. In a study published in 2020, researchers used the MIMIC-III dataset to investigate how background data affects SHAP results. The findings suggest that users should consider how background data affects SHAP results and opt for larger datasets when possible to ensure reliable interpretations from their ML models using SHAP.

Background Data Affects SHAP Results

The research team conducted an empirical study using the MIMIC-III dataset to explore how different sizes of background datasets affect the stability of SHAP values and variable rankings. They found that when using different background datasets acquired from random sampling, there were fluctuations in both the values and rankings produced by SHAP. This indicates that users cannot blindly trust what they see from these interpretations without considering other factors such as dataset size.

Larger Datasets Provide More Reliable Results

The researchers also discovered that with an increase in the size of the background dataset, fluctuations decreased significantly. Additionally, they observed a U-shape in terms of stability assessment for variable rankings; this means it was more reliable at ranking variables according to importance compared to moderately important ones. Overall, this study suggests that larger datasets are better suited for providing more reliable results when interpreting ML models with SHAP than smaller ones.

Conclusion & Accessible Code

To conclude, this research demonstrates why it is important for users to consider how background data affects their interpretation of ML models using SHAP before relying on its output completely. Larger datasets provide more stable results than smaller ones; thus, opting for larger datasets can help mitigate any fluctuations or discrepancies seen in outputs from these interpretations. The code used in this study is publicly accessible so anyone interested can replicate or build upon it further if needed.

Created on 21 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

51.8%

Retention Is All You Need

cs.AI

50.1%

Enlarging Instance-specific and Class-specific Information for Open-set Actio…

cs.CV

49.5%

Training a Helpful and Harmless Assistant with Reinforcement Learning from Hu…

cs.CL

49.1%

Cyber-risk Perception and Prioritization for Decision-Making and Threat Intel…

stat.ME

48.8%

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG

47.4%

Whats next? Forecasting scientific research trends

cs.DL

46.6%

Finding Experts in Transformer Models

cs.AI

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.