Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

AI-generated keywords: Model Extraction Marich Query Selection Mutual Information Distributional Equivalence

AI-generated Key Points

The paper focuses on black-box model stealing attacks where an attacker can only query a machine learning model through publicly available APIs.
The authors aim to design a black-box model extraction attack that uses a minimal number of queries to create an informative and distributionally equivalent replica of the target model.
They define distributionally equivalent and max-information model extraction attacks and reduce both attacks into a variational optimization problem.
The attacker solves this problem to select the most informative queries that simultaneously maximize entropy and reduce the mismatch between the target and stolen models, leading to an active sampling-based query selection algorithm called Marich.
Marich is evaluated on various text and image datasets, including BERT and ResNet18 models, yielding models that achieve 69-96% of the true model's accuracy using 1,070-6,950 samples from publicly available query datasets that are different from private training datasets.
Extracted models by Marich lead to prediction distributions that are approximately 2-4 times closer to the target's distribution compared to existing active sampling-based algorithms.
Additionally, these extracted models lead to 85-95% accuracy under membership inference attacks.
In related works, the authors elaborate on previous research in model extraction literature and aim to mitigate some of its limitations by generalizing three approaches for extracting models: task accuracy, fidelity, and functional equivalence model extractions using a novel definition of distributional equivalence.
They also introduce a novel information-theoretic objective for model extraction which maximizes mutual information between target and extracted models over the whole data domain.
Furthermore, they discuss different classes of target models such as linear models or neural networks as well as types of query feedback used in learning-based attack algorithms such as probability vectors or gradients of last layers in neural networks.
Finally, they highlight different types of query datasets used in previous research such as synthetically generated samples, adversarially perturbed private datasets, and publicly available datasets. The authors use publicly available datasets in Marich to avoid restricting access to knowledge of the private dataset or any perturbed version of it.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Pratik Karmakar, Debabrota Basu

arXiv: 2302.08466v1 - DOI (cs.LG)

Presented in the Privacy-Preserving AI (PPAI) workshop at AAAI 2023 as a spotlight talk

License: CC BY-SA 4.0

Abstract: We study black-box model stealing attacks where the attacker can query a machine learning model only through publicly available APIs. Specifically, our aim is to design a black-box model extraction attack that uses minimal number of queries to create an informative and distributionally equivalent replica of the target model. First, we define distributionally equivalent and max-information model extraction attacks. Then, we reduce both the attacks into a variational optimisation problem. The attacker solves this problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads us to an active sampling-based query selection algorithm, Marich. We evaluate Marich on different text and image data sets, and different models, including BERT and ResNet18. Marich is able to extract models that achieve $69-96\%$ of true model's accuracy and uses $1,070 - 6,950$ samples from the publicly available query datasets, which are different from the private training datasets. Models extracted by Marich yield prediction distributions, which are $\sim2-4\times$ closer to the target's distribution in comparison to the existing active sampling-based algorithms. The extracted models also lead to $85-95\%$ accuracy under membership inference attacks. Experimental results validate that Marich is query-efficient, and also capable of performing task-accurate, high-fidelity, and informative model extraction.

Submitted to arXiv on 16 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.08466v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In this paper, the authors focus on black-box model stealing attacks where an attacker can only query a machine learning model through publicly available APIs. The goal is to design a black-box model extraction attack that uses a minimal number of queries to create an informative and distributionally equivalent replica of the target model. To achieve this, the authors first define distributionally equivalent and max-information model extraction attacks and then reduce both attacks into a variational optimization problem. The attacker solves this problem to select the most informative queries that simultaneously maximize entropy and reduce the mismatch between the target and stolen models. This leads to an active sampling-based query selection algorithm called Marich. The authors evaluate Marich on various text and image datasets, including BERT and ResNet18 models. They find that Marich is able to extract models that achieve 69-96% of the true model's accuracy using 1,070-6,950 samples from publicly available query datasets that are different from private training datasets. Models extracted by Marich yield prediction distributions that are approximately 2-4 times closer to the target's distribution compared to existing active sampling-based algorithms. Additionally, these extracted models lead to 85-95% accuracy under membership inference attacks. In related works, the authors elaborate on previous research in model extraction literature and aim to mitigate some of its limitations. They generalize three approaches for extracting models: task accuracy, fidelity, and functional equivalence model extractions using a novel definition of distributional equivalence. They also introduce a novel information-theoretic objective for model extraction which maximizes mutual information between target and extracted models over the whole data domain. Furthermore, they discuss different classes of target models such as linear models or neural networks as well as types of query feedback used in learning-based attack algorithms such as probability vectors or gradients of last layers in neural networks. Finally, they highlight different types of query datasets used in previous research such as synthetically generated samples, adversarially perturbed private datasets, and publicly available datasets. The authors use publicly available datasets in Marich to avoid restricting access to knowledge of the private dataset or any perturbed version of it.

- The paper focuses on black-box model stealing attacks where an attacker can only query a machine learning model through publicly available APIs.
- The authors aim to design a black-box model extraction attack that uses a minimal number of queries to create an informative and distributionally equivalent replica of the target model.
- They define distributionally equivalent and max-information model extraction attacks and reduce both attacks into a variational optimization problem.
- The attacker solves this problem to select the most informative queries that simultaneously maximize entropy and reduce the mismatch between the target and stolen models, leading to an active sampling-based query selection algorithm called Marich.
- Marich is evaluated on various text and image datasets, including BERT and ResNet18 models, yielding models that achieve 69-96% of the true model's accuracy using 1,070-6,950 samples from publicly available query datasets that are different from private training datasets.
- Extracted models by Marich lead to prediction distributions that are approximately 2-4 times closer to the target's distribution compared to existing active sampling-based algorithms.
- Additionally, these extracted models lead to 85-95% accuracy under membership inference attacks.
- In related works, the authors elaborate on previous research in model extraction literature and aim to mitigate some of its limitations by generalizing three approaches for extracting models: task accuracy, fidelity, and functional equivalence model extractions using a novel definition of distributional equivalence.
- They also introduce a novel information-theoretic objective for model extraction which maximizes mutual information between target and extracted models over the whole data domain.
- Furthermore, they discuss different classes of target models such as linear models or neural networks as well as types of query feedback used in learning-based attack algorithms such as probability vectors or gradients of last layers in neural networks.
- Finally, they highlight different types of query datasets used in previous research such as synthetically generated samples, adversarially perturbed private datasets, and publicly available datasets. The authors use publicly available datasets in Marich to avoid restricting access to knowledge of the private dataset or any perturbed version of it.

Summary: The paper talks about how someone can steal a machine learning model without having access to it. The authors made a tool called Marich that can copy the model by asking it questions and using the answers to make a new one. They tested Marich on different models and datasets, and it worked well. They also talked about other ways people have tried to steal models before. Definitions: - Black-box model stealing attacks: when someone tries to copy a machine learning model without having access to it. - APIs: a way for different computer programs to talk to each other. - Distributionally equivalent: when two things have the same pattern or distribution of information. - Variational optimization problem: a type of math problem where you try to find the best answer from many possible options. - Active sampling-based query selection algorithm: a tool that helps choose which questions to ask in order to get the most useful information. - Membership inference attacks: when someone tries to figure out if their data was used in creating a particular machine learning model. - Fidelity: how closely something matches another thing. - Information-theoretic objective: a way of measuring how much information is shared between two things. - Neural networks: computer programs that are designed to learn from data like humans do.

Black-Box Model Stealing Attacks

"In this paper, the authors focus on black-box model stealing attacks where an attacker can only query a machine learning model through publicly available APIs. The goal is to design a black-box model extraction attack that uses a minimal number of queries to create an informative and distributionally equivalent replica of the target model. "

Reducing Attack into Variational Optimization Problem

"To achieve this, the authors first define distributionally equivalent and max-information model extraction attacks and then reduce both attacks into a variational optimization problem. The attacker solves this problem to select the most informative queries that simultaneously maximize entropy and reduce the mismatch between the target and stolen models. This leads to an active sampling-based query selection algorithm called Marich. "

Evaluating Marich on Various Text & Image Datasets

"The authors evaluate Marich on various text and image datasets, including BERT and ResNet18 models. They find that Marich is able to extract models that achieve 69-96% of the true model's accuracy using 1,070-6,950 samples from publicly available query datasets that are different from private training datasets. Models extracted by Marich yield prediction distributions that are approximately 2-4 times closer to the target's distribution compared to existing active sampling based algorithms. Additionally, these extracted models lead to 85-95% accuracy under membership inference attacks."

Previous Research in Model Extraction Literature & Mitigating Limitations

"In related works, the authors elaborate on previous research in model extraction literature and aim to mitigate some of its limitations. They generalize three approaches for extracting models: task accuracy, fidelity, and functional equivalence model extractions using a novel definition of distributional equivalence. They also introduce a novel information theoretic objective for model extraction which maximizes mutual information between target and extracted models over the whole data domain.

Types Of Query Datasets Used In Previous Research

"Finally , they highlight different types of query datasets used in previous research such as synthetically generated samples , adversarially perturbed private datasets , and publicly available datasets.

Created on 26 Mar. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.