Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Authors: Lingyi Yang, Feng Jiang, Haizhou Li

License: CC BY 4.0

Abstract: The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have incited awe and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies, including HC3, have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of ChatGPT's involvement in text generation based on editing distance. It provides a mechanism to measure the degree of human originality in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement, which indicates that a Polish Ratio value greater than 0.2 signifies ChatGPT involvement and a value exceeding 0.6 implies that ChatGPT generates most of the text.

Submitted to arXiv on 21 Jul. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2307.11380v1

The summary is not ready yet
Created on 02 Aug. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.