How to detect AI: images, writing, plagiarism

ChatGPT is a great tool that has a number of uses. One of its main appeals is being able to produce content quickly, and fairly authentically with only a few prompts. This has led to the tool being used to write essays, plagiarise content and create realistic artwork that are then passed off as genuine. In this article we will look at the tools that have been developed to detect when something has been produced by AI or by a human.

ChatGPT, the conversation AI developed by OpenAI, never ceases to amaze and panic the Internet in equal measure. Many fields such as journalism, medicine, cybersecurity and even psychiatry are currently studying the possibilities offered to them by artificial intelligence capable of generating precise and detailed texts in a natural language with a simple request. They are equally attentive to the excesses and potential problems that it can generate.

The teaching environment has been particularly taken by surprise by ChatGPT a tool, already widely used by students since its appearance, that has proven to be effective at writing homework and summaries that it is difficult, even impossible, to differentiate its creation from content produced by a human. This is a real problem, both in terms of ethics and transparency. Indeed, students see it as a valuable time saver as well as the possibility of obtaining a good grade without too much effort. It is one problem among many others, which pushes its developers to find a way to set up a signature within the text in order to differentiate those produced by the AI from those that were created by a human.

The fact remains that in the meantime, teachers find themselves helpless in the face of a technology that is beyond them. AI has also been banned in New York universities, among others, where any use for homework may result in the student's exclusion from school, or even higher education. It is one thing to prohibit, but you still have to be able to enforce the rules and detect the use of an AI tool. Several researchers and developers have already implemented several tools to detect which of the artificial intelligence or the student has produced a text.

How to detect AI generated text?

DetectGPT

A team of Stanford researchers has tackled the problem that many teachers are currently facing and have developed a method called DetectGPT. It's a barometer that determines whether or not a text is machine-generated, without relying on an AI or "collecting a dataset of actual or generated passages" to compare the text. To achieve this, the tool detects samples of pretrained language models "using the local curvature of the model's logarithmic probability function", i.e. it can recognize content structure models generated by an AI and report them when it detects them. The researchers did not give more information, except that DetectGPT is currently only at the prototype stage and that it will be necessary to wait in order to benefit from a public version.

ChatGPT detector - GPT3

The Franco-Canadian start-up Draft & Goal has just created a free online detector that is capable, in just a few seconds, of telling whether the text was written by an artificial intelligence or a real person, and with a rate of reliability which would be 93%. Simply copy the text of at least 400 characters that you wish to submit, paste it into the interface and click on the "Analyze" button. The tool then delivers a score between 0 and 100. Vincent Terrasi, the co-founder of Draft & Goal, explains that "when the result is greater than 60, there is a very high probability that the content is from artificial intelligence of ChatGPT. If the result is less than 40, then it is probably the result of human work." For the gray zone which is between 40 and 60, it is usually a mixture of the two, a technique often used by students to decrease the risk of getting caught. The start-up's tool is currently in beta and only works with English texts.

GPTZero

Edward Tian, a computer science and journalism student at Princeton University in the United States, has developed an application whose algorithm can identify whether a text was produced by the chatbot or whether it was written by a human. Called GPTZero, it analyzes the text in order to assess its complexity, its randomness compared to a text model as well as its uniformity. Combined together, these elements make it possible – in theory – to determine whether the content was created by a human or by an AI.

GPTZero can be accessed for free via any web browser from gptzero.me. Be careful, the tool is still in beta version, so it often has bugs and is not very fast. You have to wait a few minutes between each step for the page to load and don't hesitate to refresh it if an error message appears. To use it, simply copy/paste the text in the corresponding field then press Ctrl + Enter for the tool to analyze it. You must then wait a few minutes for all the results to be displayed and, at the end, click on "Get GPTZero Result".

To determine whether the text was written by an artificial intelligence or a human, GPTZero measures its Perplexity, which refers to "the randomness of the text". It is "a measure of the ability of a language model like ChatGPT to predict a sample of text. In other words, it measures how well the computer model likes the text." The higher the perplexity of a text, the more likely it is to have been written by a human.

How to detect AI generated images?

During its I/O 2023 conference, Google emphasized its commitment to developing ethical and transparent AI. As part of its collaboration with Adobe Firefly, Google will now include a "AI Generated with Google" label and metadata on images produced by its AI in Google Images. Users will soon have the option to add a similar watermark to their own generated illustrations across various platforms. Bing Image Creator renderings can be identified by a Bing icon.

Furthermore, Microsoft aims to enable internet users to easily verify the origin of images through a comprehensive history feature. By accessing the "About this image" button in Google's reverse image search, users can discover when and where the image surfaced on the web. This valuable information allows for better assessment of its authenticity, whether it originated from a reputable source or emerged from the depths of the internet. At first, this tool will be available in the United States during the summer, supporting English-language searches, before being released globally. For example, when searching for "moon landing" and selecting the "about this image" button, the tool reveals its appearance on Reddit attributed to "Midjourney". Conversely, it may indicate its initial publication on credible news websites.

What is being done to prevent the spread of false images?

Legislative efforts are slowly emerging to address the challenges posed by social media, but their implementation is still pending. In response to the issue of fake news, TikTok has introduced updated regulations targeting deepfakes and manipulated media. The platform now includes a dedicated section on "synthetic and manipulated media" and encourages the use of stickers or captions indicating synthetic, fake, unreal, or modified content to promote transparency. TikTok strictly prohibits deepfakes involving private individuals, although exceptions are made for public figures with significant roles, such as politicians, business leaders, or celebrities, unless the content serves promotional purposes or violates other policies. On the contrary, Twitter is committed to "freedom of expression" and maintains a permissive approach, epitomised by figures like Elon Musk.

Europe has actively responded to the rise of artificial intelligence. The European Parliament recently approved the AI Act, a legislation aimed at regulating AI activities and addressing associated abuses. MEPs specifically emphasize the need for clear identification of content generated by tools like ChatGPT and MidJourney, aiming to combat illegal content, information manipulation, and protect copyright. The enforcement of this regulation remains a crucial aspect to be determined.