Generating Fake Cyber Threat Intelligence Using Transformer-Based Models

AI-generated keywords: Cyber defense CTI data poisoning AI-based systems GPT-2

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Cyber-defense systems can automatically ingest Cyber Threat Intelligence (CTI) containing semi-structured data and/or text to populate knowledge graphs
  • Fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems
  • Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs
  • Researchers have conducted a study in which they automatically generate fake CTI text descriptions using transformers and demonstrate that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning can generate plausible CTI text with the ability of corrupting cyber-defense systems
  • The researchers then utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus, introducing adverse impacts such as returning incorrect reasoning outputs, representation poisoning and corruption of other dependent AI-based cyber defense systems
  • Professional threat hunters were equally likely to consider their fake generated CTI as true based on human evaluation study results
  • This study highlights the need for increased vigilance when it comes to cyber defense systems' vulnerability to data poisoning attacks.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Priyanka Ranade, Aritran Piplai, Sudip Mittal, Anupam Joshi, Tim Finin

In Proceedings of International Joint Conference on Neural Networks 2021 (IJCNN 2021), July 2021
License: CC BY-NC-ND 4.0

Abstract: Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. In this paper, we automatically generate fake CTI text descriptions using transformers. We show that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning, can generate plausible CTI text with the ability of corrupting cyber-defense systems. We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI as true.

Submitted to arXiv on 08 Feb. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2102.04351v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Cyber-defense systems are becoming increasingly sophisticated, with the ability to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. However, this development comes with potential risks as fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. To explore this issue further, researchers Priyanka Ranade, Aritran Piplai, Sudip Mittal, Anupam Joshi, and Tim Finin have conducted a study in which they automatically generate fake CTI text descriptions using transformers. They demonstrate that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning can generate plausible CTI text with the ability of corrupting cyber-defense systems. The researchers then utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning and corruption of other dependent AI-based cyber defense systems. To evaluate their findings, the researchers utilized traditional approaches and conducted a human evaluation study with cybersecurity professionals and threat hunters. Based on the study's results professional threat hunters were equally likely to consider their fake generated CTI as true. This study highlights the need for increased vigilance when it comes to cyber defense systems' vulnerability to data poisoning attacks. As adversaries become more sophisticated in their methods of attack it is crucial that those responsible for developing these systems remain vigilant in identifying potential vulnerabilities and taking steps to mitigate them.
Created on 08 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.