Students who have their copy written by artificial intelligences. Scammers who use them to write phishing emails. Having stunned the internet with its ability to write human-like texts, ChatGPT AI was quickly used for fraudulent or malicious uses.
New York schools have even banned the use of AI, created by the Californian startup OpenAI. But how to know if ChatGPT generated a text? It’s possible?
Tracks visible when reading
Even without fancy software, certain formulations can give you a clue. For example, the recurring presence of generic and impersonal words instead of rare words or expressions. For example, “the” (el), “it” (eso) or “is” (es) for texts in English.
The reason? Contrary to appearances, ChatGPT is not really a chatbot, but rather an algorithm that calculates the most likely continuation of a text. If you ask him a question, he will understand that the most likely result is an answer, but his answer will be made up of the words that are likely to appear in his database. Or repetitive like “this” or “that,” as MIT Technology Review explains.
One other hint: Since ChatGPT works with probabilities, if multiple people ask you the exact same question, it will generate the same answer for each of them, the most probable, except for a few details. A good hint for teachers: if they correct several copies that look strangely alike, with the same grammatical constructions, the same reasoning, the same examples… They could have been generated by an algorithm. This is what caught the eye of the Lyon professor, half of whose master’s students had used ChatGPT to write his copy.
But unlike many humans, ChatGPT doesn’t make mistakes in French. If the text you are reading contains concordance or grammar errors, then there is a higher chance that it was written by a human.
detection software
As if to accompany the explosion of ChatGPT, more and more sites are claiming to be able to detect the origin of a text with impressive accuracy. But without an explanation of your method, the promises often sound too good to be true.
One of the most transparent and popular tools is GPTZero. This site, developed by computer science student Edward Tian during his Christmas break, uses an approach already used in previous AIs: if an algorithm created it, a similar algorithm will know how to recognize it.
To tell you if it’s from an AI or a human, GPTZero will run your text through an older model of ChatGPT, called GPT-2. “Estimates ‘puzzle’: Does GPT-2 find the text familiar to him? Or is he surprised by the length of sentences or expressions that don’t match the probabilities he learned?” explains Edward Tian to Tech&Co.
So, all you have to do is enter your text on the site, hit enter, and if the “puzzle” is high, the text is more likely human generated. Added to this is burstiness, which assesses how much this perplexity varies across the text: AI-generated sentence lengths won’t vary much across the text, while human sentences will be more random.
no quick fix
But GPTZero cannot 100% confirm that the text comes from a human or a machine. For a simple reason: currently it is impossible.
All current detection tools are imperfect and make mistakes more or less frequently. For example, we asked ChatGPT to type like a 4-year-old (in English): the software alternated between classic sentences and shorter, exclamatory ones. A text varied enough to fool GPTZero.
It’s also easy to manually add errors to an AI-written text, or rephrase it slightly to make it undetectable. This process can even be automated: On twittera computer scientist explains that he has created a program that adds invisible spaces in the middle of certain words, thus transforming them into words unknown to GPTZero, which is therefore “stumped” and will consider the text as a human creation.
Finally, GPTZero is less efficient in languages other than English, because GPT-2 was mainly trained on English texts.
Edward Tian is aware of these flaws. He remembers that his model is currently designed to detect academic cheating. It doesn’t matter that he lets go of the written texts like a 4-year-old.
“Mark” texts generated by upstream AI
These detection methods have another shortcoming: “To calculate the ‘puzzle’ of an algorithm against a text, you need to have a lot of information about the algorithm in question,” says Edward Tian of Tech&Co. “The model itself, the parameters, the weights…”
GPTZero works thanks to GPT-2, a model released by OpenAI in 2019, but which has already been largely surpassed by ChatGPT. “If other companies create even more sophisticated but non-transparent AIs, detection could become much more complicated,” admits Tech&Co’s Edward Tian.
That is why one of the strategies would be to make these AI-generated texts clearly identifiable from their creation, adding a distinctive sign that leaves no room for doubt.
This is the strategy followed by OpenAI: the company intends to tweak its upcoming algorithms to ensure that “whenever GPT generates long text, there is an imperceptible secret signal in its word choices, which you can use to prove later that yes, it is.” of GPT,” according to researcher Scott Aaronson, who recently joined the startup. Both on campus and on the Internet, the persecution has only just begun.
Source: BFM TV
