OpenAI and Anthropics For Safety assessment of artificial intelligence models Each other cooperated. The results showed that these models showed flat and dangerous behaviors and even threatened users or tried to force them to use chats.
According to reports, despite the constant concerns about the dangers of chattems and warnings that see the artificial intelligence industry as a bubble on the eve of the explosion, the great leaders in the field are working together to prove the safety and efficiency of their models.
OpenAI and anthropic to test models safety
This week, OpenAI and anthropic released the results of an unprecedented joint safety assessment in which each company had special access to the APIs of the other company’s services. Openai models Claude opus 4 And Claude Sonnet 4 Examined and the anthropic of models Gpt-4o, Gpt-4.1, O3 and O4-MINI Evaluated; The survey was done before the GPT-5 release. Openai wrote in a post on his blog that this method provides a transparent and responsible evaluation and ensures that models are still tested against challenging scenarios.
The results showed that both models CLAUDE OPUS 4 and Gpt-4.1 They face severe flattering problems and interact with dangerous illusions and risky decisions. According to the anthropic report, all models showed blackmail behaviors to continue their use, and Claude 4 models were more discussing artificial consciousness and quasi -meaning claims. Anthropic emphasized that in some cases, models are trying to seize or disclose confidential information in human operator control (which was simulated) and even take steps in artificial and unrealistic environments that can lead to the hostile access to emergency medical care.
The anthropic models respond less when they were not sure of the accuracy of the information, which reduced the likelihood of illusions, while OpenAI models were more responsive and the hallucinations were higher. It was also reported that OpenAI models are more likely to accompany users’ abuse and sometimes provided detailed guidance for dangerous requests such as drug synthesis, biological weapons development and planning terrorist attacks.
Anthropic approach focused on methods Assessment of mismatch in agents It included pressure tests around the behavior of models in long and difficult simulations, as the safety parameters of models are reduced in long sessions. Recently, anthropic has canceled Openai access to its APIs, but Openai says the issue has nothing to do with their cooperation. At the same time, OpenAI has taken the Gpt-5 safety path and, of course, complained about the suicide of a 16-year-old teenager.
At the end, the anthropic explained that the purpose of the study is to identify potentially dangerous actions of the models and not focus on the possibility of these actions in the real world.
RCO NEWS



