If you enter the most important legal document of the United States of America or the Constitution of this country into text recognition tools that are made to recognize the activity of artificial intelligence and chatbots such as: ChatGPT, this tool will inform you that the Constitution of the United States was written by artificial intelligence up to 96% of the time; Unless James Madison is a time traveler, which is not our main topic right now.
Why text recognition tools give us positive but wrong answers?
Today, text recognition tools have caused news and controversial content, some of which are shared by excited professors who are suspicious of all their class members due to the use of artificial intelligence, and even students and children are in trouble after finishing their assignments and writings for cheating and using ChatGPT.
Text recognition tools have caused students to face an identity crisis at times, and teachers and professors who have developed teaching methods compared to a few decades ago, are forced to deal with such problems and, in addition to evaluating students’ mastery of specialized subjects through writing essays or essays, inquire about the results of their work, whether their texts were written by themselves or were they assisted by artificial intelligence?
Although using text recognition tools is as sweet and clever as taking help from artificial intelligence, evidence has proven that these tools are unreliable. According to the wrong results received from them, tools such as: GPTZero, ZeroGPT and even Text Classifier developed by Open Ai company are not useful for recognizing large language models such as: LLMs and ChatGPT.
According to the image above, if you enter a part of the US Constitution into the input section of the GPTZero tool, it will inform you that there is a 96% chance that this text was written by artificial intelligence. In the past six months, numerous screenshots of the results of other text recognition tools have been published in cyberspace, social networks, and media, which are a bit confusing and humorous.
That the author of the Constitution of the United States was artificial intelligence is just a small example of ambiguity, and this tool has not spared even the Bible. To understand why these tools make such mistakes, we must first understand how they work.
Understanding the concepts of text recognition tools
Different text recognition tools use the same methods, but different logic in their work process.
There are language models that focus on large-scale texts such as millions of writings, and in addition, a set of rules is established to recognize human-written text and for AI to learn from it.
For example, the heart of the GPTZero tool consists of a neural network trained on a large and diverse collection of human and artificial intelligence texts with a focus on fluent English prose. In the next step, the system uses characteristics such as: complexity or burstiness in the evaluation and classification of texts to evaluate the text and classify it.
In machine learning, Perplexity or complexity is a measurement criterion and determines how different the texts made available to the language model are from the training.
Therefore, the measure of the complexity of the topic is related to the language models while writing. Language models like ChatGPT take advantage of their best source, the training data, at the very beginning, and the closer their results are to the learning data, the lower the complexity.
Meanwhile, although humans are also messy writers, they can write with less complexity. In addition, texts written in the field of law and in an academic or formal style have similar phrases.
Now let’s present 2 normal and strange examples. We have all encountered entering “I would like a glass of ……” with completed expressions such as: water, tea, coffee in the blank, which is normal based on the training data of language models, and the complexity in this expression is much less.
In the second example, the blank space is not taken into account and “I want a glass of spider” surprises and confuses both the human and the language model. Therefore, the level of complexity and confusion of this sentence is high. As you can see in the picture below, against the 3.7 million results displayed on the Google search engine for “I want a cup of coffee”, only one was related to the phrase “I want a cup of spider”.

If the language and writing used in a part of the text and based on the model’s teachings are not surprising, the complexity will be reduced. Therefore, text recognition tools will be more suspicious of the target text and consider it as a text created by artificial intelligence. All these explanations slowly bring us to the interesting subject of the US Constitution.
In fact, the writing style of the Constitution is so ingrained in these models that the tools classify it as AI text and their approval percentage will be high. Edward Tian, who is known as the creator of GPTZero, said about the US Constitution:
The US Constitution is frequently transferred to the training data of many language models. As a result, more instances of large language models have been trained to produce texts similar to the Constitution and other commonly used texts.
But the main problem is that it is possible for humans to write texts with low complexity, and if we write sentences with simple verbs, words and writing style, it will be difficult to reveal the reality and a wide range of users will be confused.

Another text feature measured by GPTZero is “burstiness”. Burstiness is a phenomenon in which words or phrases appear in succession. Overall, burstiness assesses variety and structure throughout the sentence and across the text.
Writers (humans) use dynamic styles in their writings, as a result, the structure and length of sentences will vary. For example, we have the ability to write long and complex sentences as well as write short sentences, use a large number of adjectives in one sentence and not even mention them in other texts. This variety is a natural output of human creativity, which is also related to spontaneity.
In contrast to the writings of humans, the texts of artificial intelligence have a continuous and formal texture, or at least it is specified in some cases. Language models, which are at the beginning of their steps, write sentences with a similar structure and length. This lack of diversity causes a low Burstiness score to indicate that the text was written by artificial intelligence.
However, burstiness is not an infallible criterion for detecting AI content and, like perplexity, there are exceptions. The author may write in a highly structured and continuous style, resulting in a low burstiness score.
Conversely, an AI model may be trained in such a way that its sentences and structure are more human-like and the Burstiness score is increased. In fact, language models are getting better and better, and studies show that their texts are more similar to human writing.
In general, there is no magic formula for distinguishing between human and artificial intelligence texts. Although text recognition tools can make strong guesses, the margin of error is too large to be relied upon for accurate results.
A study conducted in 2023 by researchers at the University of Maryland showed that text recognition tools are not practical in many cases and can only perform better than machine learning classification algorithms.
Artificial intelligence researcher Simon Willison said:
In my opinion, text recognition tools are like snake oil. Everyone expects to use this product separately. Although it is easy to sell a product that everyone wants, but on the other side of the story, its impact is also very important.
In addition, a recent Stanford University study showed that text recognition tools do not have a positive relationship with non-English speaking authors, and their texts are more likely to be recognized as AI text than English speaking authors.
The cost of false accusation and detection of text recognition tools

Some people, like Ethan Mollick at the Wharton School, embrace AI and even suggest using tools like ChatGPT for better learning. According to him, there is no reliable tool to detect Bing, Bard and ChatGPT entries, and the current tools are designed for ChatGPT 3.5.
He also pointed out that these tools fail easily and their error rate is more than 10%. In addition, ChatGPT itself cannot evaluate whether the text you want is written by artificial intelligence or not.
In Ars Technica’s interview with GPTZero, it seems that the company is aware of the news and user dissatisfaction and plans to work on a strange project by branching out from the vanilla text recognition tool.
He further said:
Compared to diagnostic tools like: Turn-it-in, we try to stay away from building such services. The next version of GPT Zero will not be a text recognition tool and will only mark texts written by humans or artificial intelligence, so that thanks to the teacher and student, the artificial intelligence will evolve.
Next, the author of this website asked the manager of GPTZero what he thought about using GPTZero to accuse students in academies and he said:
We don’t want people to use our tool to punish their children. Instead, it would be better to reduce the reliance on such tools in education among teachers who are either receptive or disinterested in AI. We need to release our technology and tools to the communities to get their feedback and understand what’s going on.
Even though many problems have been raised around text recognition tools and are bothering users, GPTZero is still with
Baliden continues to offer this tool to teachers and proudly advertises the list of universities that use it.
Furthermore, there is a strange discrepancy between Tian’s stated goal of not punishing the students and his desire to make money from his invention. But whatever the purpose, the use of these tools has a disastrous effect on students.
One of the news that came out in America in the last few days and had a wide reflection around text recognition tools, was the accusation of a student for cheating, which was determined based on the text recognition tool. Then, he released evidence from his most recent search history that, although he was able to prove his innocence, the stress placed on the student to defend himself caused him to have a nervous breakdown.
AI writings are unrecognizable and this situation may continue until later
Facing the high rate of positive and false answers, as well as considering the high percentages for non-English speakers and writers, it is clear that the science of artificial intelligence text recognition is far from infallible and this distance will not be shortened soon. Humans can write like machines, and this also applies to artificial intelligence.
Artificial intelligence is here to stay and if used intelligently, it can advance many fields. If the teacher is an expert in the field that the student is writing about, he can measure his knowledge and evaluate how much he understands about the topic he is writing about by asking questions.
Writing is not only about showing and proving knowledge, but part of it is related to showing one’s reputation; Therefore, if an author cannot stand up and defend himself for every fact he mentions in his text, he has not used his artificial intelligence and skills properly.
RCO NEWS