Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

AI-generated keywords: NLG Machine-generated Text Detection Responsible AI License (RAIL) Abuse Prevention Open Problems

AI-generated Key Points

  • Advances in natural language generation (NLG) have made machine-generated text indistinguishable from human-authored text.
  • Access to generative models is becoming more democratized due to open-source models and user-friendly tools.
  • Detecting machine-generated text is crucial to counteract the risk of abuse.
  • The survey provides a comprehensive analysis of threat models posed by contemporary NLG systems.
  • It offers a complete review of machine generated text detection methods, highlighting the need for attention to open problems in this area.
  • Existing detection methodologies often lack realism, transparency, and fairness methods, potentially causing harm themselves.
  • Collaboration between AI researchers, cybersecurity professionals, and non-technical experts is necessary to prevent widespread abuse of NLG models.
  • Usage and disclosure policies for online platforms can help address the issue, such as bans or enforced rules mandating public disclosure of AI-generated content.
  • Adjusting licenses for released models to require disclosure can also be beneficial.
  • Adoption or development of licenses like the Responsible AI License (RAIL) can improve best practices around handling powerful NLG models.
  • Coordinated efforts across technical and social domains are essential to harnessing the benefits of high-capacity NLG systems while minimizing societal damage caused by their misuse.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Evan Crothers, Nathalie Japkowicz, Herna Viktor

Manuscript submitted to ACM Special Session on Trustworthy AI
License: CC BY 4.0

Abstract: Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.

Submitted to arXiv on 13 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.07321v1

Advances in natural language generation (NLG) have led to the development of machine-generated text that is becoming increasingly difficult to distinguish from human-authored text. With the availability of powerful open-source models and user-friendly tools, access to generative models is becoming more democratized. However, along with the great potential of state-of-the-art NLG systems comes the risk of abuse. To counteract this abuse, detecting machine-generated text is crucial. This survey provides a comprehensive analysis of threat models posed by contemporary NLG systems and offers the most complete review of machine generated text detection methods to date. It places machine-generated text within its cybersecurity and social context and provides guidance for future work in addressing critical threat models. The survey highlights the need for attention to open problems in machine generated text detection. Existing detection methodologies often do not reflect realistic settings or incorporate sufficient transparency and fairness methods, which can potentially cause harm themselves. The prevention of widespread abuse of NLG models requires collaboration between AI researchers, cybersecurity professionals, and non-technical experts. In terms of future development, usage and disclosure policies for online platforms are worth considering. These policies can take the form of bans or enforced rules that mandate public disclosure of AI-generated content. Researchers can also adjust licenses for released models to require disclosure. The adoption or development of licenses like the Responsible AI License (RAIL) can help improve best practices around handling powerful NLG models. Overall, this survey emphasizes the urgent need for attention to open problems in machine generated text detection in order to provide suitable defenses against widely-available NLG models. Coordinated efforts across technical and social domains are essential to harnessing the benefits of high-capacity NLG systems while minimizing societal damage caused by their misuse.
Created on 21 Dec. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.