How to detect AI: generated content, writing, plagiarism

How to detect AI: generated content, writing, plagiarism

ChatGPT is a great tool that has a number of uses. Unfortunately, one of its main appeals is being able to produce content quickly, and fairly authentically with only a few prompts. This has led to the tool being used to write essays, plagiarise content and create realistic artwork that are then passed off as genuine. In this article we will look at the tools that have been developed to detect when something has been produced by AI or by a human.

ChatGPT, the conversation AI developed by OpenAI, never ceases to amaze and panic the Internet in equal measure. Manu fields such as journalism, medicine, cybersecurity and even psychiatry are currently studying the possibilities offered to them by artificial intelligence capable of generating precise and detailed texts in a natural language with a simple request. They are equally attentive to the excesses and potential problems that it can generate. 

The teaching environment has been particularly taken by surprise by ChatGPT a tool, already widely used by students since its appearance, that has proven to be effective at writing homework and summaries that it is difficult, even impossible, to differentiate its creation from content produced by a human. This is a real problem, both in terms of ethics and transparency. Indeed, students see it as a valuable time saver as well as the possibility of obtaining a good grade without too much effort. It is one problem among many others, which pushes its developers to find a way to set up a signature within the text in order to differentiate those produced by the AI from those that were created by a human.

The fact remains that in the meantime, teachers find themselves helpless in the face of a technology that is beyond them. AI has also been banned in New York universities, among others, where any use for homework may result in the student's exclusion from school, or even higher education. It is one thing to prohibit, but you still have to be able to enforce the rules and detect the use of an AI tool. Several researchers and developers have already implemented several tools to detect which of the artificial intelligence or the student has produced a text.

DetectGPT

A team of Stanford researchers has tackled the problem that many teachers are currently facing and have developed a method called DetectGPT. It's a barometer that determines whether or not a text is machine-generated, without relying on an AI or "collecting a dataset of actual or generated passages" to compare the text. To achieve this, the tool detects samples of pretrained language models "using the local curvature of the model's logarithmic probability function", i.e. it can recognize content structure models generated by an AI and report them when it detects them. The researchers did not give more information, except that DetectGPT is currently only at the prototype stage and that it will be necessary to wait in order to benefit from a public version. 

ChatGPT detector - GPT3

The Franco-Canadian start-up Draft & Goal has just created a free online detector that is capable, in just a few seconds, of telling whether the text was written by an artificial intelligence or a real person, and with a rate of reliability which would be 93%. Simply copy the text of at least 400 characters that you wish to submit, paste it into the interface and click on the "Analyze" button. The tool then delivers a score between 0 and 100. Vincent Terrasi, the co-founder of Draft & Goal, explains that "when the result is greater than 60, there is a very high probability that the content is from artificial intelligence of ChatGPT. If the result is less than 40, then it is probably the result of human work." For the gray zone which is between 40 and 60, it is usually a mixture of the two, a technique often used by students to decrease the risk of getting caught. The start-up's tool is currently in beta and only works with English texts. 

GPTZero

Edward Tian, a computer science and journalism student at Princeton University in the United States, has developed an application whose algorithm can identify whether a text was produced by the chatbot or whether it was written by a human. Called GPTZero, it analyzes the text in order to assess its complexity, its randomness compared to a text model as well as its uniformity. Combined together, these elements make it possible – in theory – to determine whether the content was created by a human or by an AI.

GPTZero can be accessed for free via any web browser from gptzero.me. Be careful, the tool is still in beta version, so it often has bugs and is not very fast. You have to wait a few minutes between each step for the page to load and don't hesitate to refresh it if an error message appears. To use it, simply copy/paste the text in the corresponding field then press Ctrl + Enter for the tool to analyze it. You must then wait a few minutes for all the results to be displayed and, at the end, click on "Get GPTZero Result". 

To determine whether the text was written by an artificial intelligence or a human, GPTZero measures its Perplexity, which refers to "the randomness of the text". It is "a measure of the ability of a language model like ChatGPT to predict a sample of text. In other words, it measures how well the computer model likes the text." The higher the perplexity of a text, the more likely it is to have been written by a human.

Daily life