The new models of artificial intelligence are experiencing an increase in hallucinations, which are a threat to the accuracy and applications of these technologies, especially in sensitive areas.
Despite the promotion of reasoning models in artificial intelligence chats such as ChatGPT and Gemini, new studies show that these new versions produce more errors than previous models. These errors, which are called “illusions” in the specialized language of artificial intelligence, have not only declined, but have even increased in some models.
An illusion; Chronic error of linguistic models
The illusion is a term to describe the mistakes that large language models make; Such as providing inaccurate information as a reality, or answers that are correct, but have nothing to do with the question or do not follow the instructions properly.
According to a technical report from OpenAI, the new models, O3 and O4-Mini, released in April, were higher than the previous model, O1 (published in late 1). For example, the O3 model suffered an illusion when summarizing public information about people, and this number for O4-MINI even reached 1 %. While the O1 model only had an illusion rate.
The issue is not just openai
The problem of illusion is not limited to Openai products. The data presented in the ranking of Vectara show that some other reasoning models, such as the Deepseek-R1, have increased significantly compared to their previous versions. These models go through several stages of reasoning to answer.
However, Openai believes that reasoning models are not inherently susceptible to illusion. “We are actively reducing the high rate of hallucinations in the new models and will continue to investigate the accuracy,” the company said.
Hallucinations and hazardous applications
The illusion of language models can question their performance in many applications. From an assistant to research that requires accurate information to legal chats that should not be cited in imaginary cases. Even the mistake of a customer service chat that invokes the expired rules can be a hassle for the company.
Artificial Intelligence Companies promised when the illusion would decrease over time; But the high rate of illusion in recent versions has made this optimism doubt.
Are the ratings reliable?
Vectara ranking based on the ability of the models to summarize the documents is set up, but experts such as Emily Bandar from the University of Washington warns that this method cannot be a comprehensive criterion for evaluating models in all tasks. He also emphasizes that language models are not essentially designed for semantic understanding, but work on predicting the next word and therefore produce unreliable answers.
The port also considers the use of the word “illusion” misleading, because it both gives the errors of artificial intelligence and the smell of human smell and the notion that errors are exceptional cases, while being structural and permanent.
Errors beyond illusions
Arvand Narayanan of the University of Princeton believes that the issue is not limited to illusion. According to him, models sometimes use invalid sources or cite obsolete information. Increasing the volume of educational data or processing power has not necessarily reduced these errors.
He suggests that perhaps the best way to use language models is to limit them to tasks in which the accuracy of the answer can be examined faster than traditional methods. The port also recommends that artificial intelligence chats should not be trusted in general.
RCO NEWS




