A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

AI-generated keywords: Data Poisoning Attacks Graph-based Semi-supervised Learning Framework Algorithms Security

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • The paper presents a unified framework for data poisoning attacks in graph-based semi-supervised learning (G-SSL)
  • Two specialized algorithms are introduced to address critical scenarios: poisoning regression tasks under $\ell_2$-norm constraint and classification tasks under $\ell_0$-norm constraint
  • For poisoning regression tasks, the authors tackle it as a non-convex trust region problem and demonstrate that their gradient-based algorithm can effectively identify the globally optimal perturbation
  • Addressing classification tasks under $\ell_0$-norm constraint presents an NP-hard integer programming challenge, which is overcome by a probabilistic solver proposed by the authors
  • Experiments on real datasets show that even flipping two labeled data points in a binary classification problem using MNIST data can significantly reduce model performance
  • The study contributes valuable insights into enhancing the security and reliability of G-SSL models against data poisoning attacks, offering practical solutions for safeguarding machine learning systems
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xuanqing Liu, Si Si, Xiaojin Zhu, Yang Li, Cho-Jui Hsieh

NeurIPS 2019 camera-ready

Abstract: In this paper, we proposed a general framework for data poisoning attacks to graph-based semi-supervised learning (G-SSL). In this framework, we first unify different tasks, goals, and constraints into a single formula for data poisoning attack in G-SSL, then we propose two specialized algorithms to efficiently solve two important cases --- poisoning regression tasks under $\ell_2$-norm constraint and classification tasks under $\ell_0$-norm constraint. In the former case, we transform it into a non-convex trust region problem and show that our gradient-based algorithm with delicate initialization and update scheme finds the (globally) optimal perturbation. For the latter case, although it is an NP-hard integer programming problem, we propose a probabilistic solver that works much better than the classical greedy method. Lastly, we test our framework on real datasets and evaluate the robustness of G-SSL algorithms. For instance, on the MNIST binary classification problem (50000 training data with 50 labeled), flipping two labeled data is enough to make the model perform like random guess (around 50\% error).

Submitted to arXiv on 30 Oct. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1910.14147v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning" by Xuanqing Liu, Si Si, Xiaojin Zhu, Yang Li, and Cho-Jui Hsieh presents a comprehensive framework for conducting data poisoning attacks in graph-based semi-supervised learning (G-SSL). This framework unifies various tasks, goals, and constraints into a single formula. The authors introduce two specialized algorithms designed to efficiently address two critical scenarios: poisoning regression tasks under $\ell_2$-norm constraint and classification tasks under $\ell_0$-norm constraint. For the former case of poisoning regression tasks, the authors tackle it as a non-convex trust region problem. They demonstrate that their gradient-based algorithm can effectively identify the globally optimal perturbation with careful initialization and update schemes. In contrast, addressing classification tasks under $\ell_0$-norm constraint presents an NP-hard integer programming challenge. To overcome this complexity, the authors propose a probabilistic solver that outperforms traditional greedy methods significantly. Furthermore, the researchers validate their framework by conducting experiments on real datasets to assess the robustness of G-SSL algorithms. For instance, in a binary classification problem using MNIST data with 50 labeled samples out of 50,000 training instances, simply flipping two labeled data points is enough to reduce model performance to chance levels (around 50% error rate). This study contributes valuable insights into enhancing the security and reliability of graph-based semi-supervised learning models against data poisoning attacks. The proposed framework and algorithms offer practical solutions for safeguarding machine learning systems from malicious manipulations in real-world applications.
Created on 19 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.