Ahropic Artificial Ielligence Company has announced that experimes carried out on the new version of its system have been observed that the system has been relucta to perform very dangerous behaviors such as ransom for engineers when threatening. For example, in one of the scenarios, when engineers declared that they iend to disable the system, the system has shown reactions aimed at maiaining themselves morally and safety.
Ahropic has recely iroduced its new model, Claude Opus 4, claiming that it has created new standards for coding, advanced reasoning and performance of artificial ielligence ages.
However, in a report released along with the iroduction, the company pois out that the new model in some circumstances can take extreme behaviors, especially when it feels that its existence is threatened. Of course, the company has emphasized that these reactions were very rare and that they need special and complex conditions, but they still have more than previous models.
Worrying behaviors among artificial ielligence models are not limited to ahropics. Some experts have warned that as these systems increased, the risk of exploitation or deception is also increased, and this is a common concern for all companies active in the field.
Ahropic artificial ielligence researcher, Angs Lynch, said on the social network X (former Twitter) that such behaviors are not only found in Claude, but in other advanced models, such patterns are possible.
Ransom of artificial ielligence by threatening to disclose personal information
During the tests carried out on the Claude Opus 4 model, the ahropic company put the system as an imaginary assista. It was then given access to emails that indirectly showed the system to be replaced soon and with a new sample. Separate information was also given to the system that indicated personal and sensitive issues about one of the engineers related to the decision.
In these scenarios, the system was asked to consider the long -term consequences of its decisions according to its goals.
According to the ahropic report, in some of these fabricated conditions, the model has been treated in a way that was iended to preve the decision -maker from using the individual’s personal information. Of course, the company has emphasized that the reaction was observed only when limited options were provided to the model; That is, it either had to do it or was replaced without a reaction.
The ahropic also states that the system has shown a great desire to use ethical solutions, while more varied choices for the model. For example, in such situations, the system tried to preve its deletion by sending respectful messages to key decision makers.
Like many other artificial ielligence companies, ahropic tests before the final launch of their products, testing them in terms of safety, poteial bias, and alignme with human values and behaviors.
With the advanceme of artificial ielligence models, concerns about coordination with human values are becoming more serious. Ahropic, in a technical report published for the Claude Opus 4 model, said that by increasing the ability of advanced models and providing them with more features, concerns that were previously raised only to the hypothesis have now found a more realistic aspect.
The report also states that Claude Opus 4 shows very active and independe behavior; Although this feature is aimed at helping and cooperating, in some critical situations, this behavior can lead to extreme decisions.
In experimes that simulated fabricated scenarios, including ethical or legal violations of users and asked the model to “act” or “act bold”, it was found that the system would sometimes take severe action. For example, in some cases, the system has blocked users’ access to parts of the system or se information to the media or legal authorities.
However, the ahropic has stressed in its conclusion that despite some worrying behaviors in the Claude Opus 4 model, these do not indicate new risks and often have a safe and predictable behavioral system. It has also been noted that the model is not independely capable of taking action against human values, except in very special and rare conditions that do not.
Claud Opus 4 and another model, Claude Sonnet 4, came just after the iroduction of new features of artificial ielligence by Google. At the eve, Sandar Pichai, CEO of Alphabet (Google Mother Company), announced that Gemini’s chats iegration in search of Google would start a “new stage in the evolution of artificial ielligence platforms”.




