Anthropic Artificial Intelligence Company has announced that experiments carried out on the new version of its system have been observed that the system has been reluctant to perform very dangerous behaviors such as ransom for engineers when threatening. For example, in one of the scenarios, when engineers declared that they intend to disable the system, the system has shown reactions aimed at maintaining themselves morally and safety.
Anthropic has recently introduced its new model, Claude Opus 4, claiming that it has created new standards for coding, advanced reasoning and performance of artificial intelligence agents.
However, in a report released along with the introduction, the company points out that the new model in some circumstances can take extreme behaviors, especially when it feels that its existence is threatened. Of course, the company has emphasized that these reactions were very rare and that they need special and complex conditions, but they still have more than previous models.
Worrying behaviors among artificial intelligence models are not limited to anthropics. Some experts have warned that as these systems increased, the risk of exploitation or deception is also increased, and this is a common concern for all companies active in the field.
Anthropic artificial intelligence researcher, Angs Lynch, said on the social network X (former Twitter) that such behaviors are not only found in Claude, but in other advanced models, such patterns are possible.
Ransom of artificial intelligence by threatening to disclose personal information
During the tests carried out on the Claude Opus 4 model, the anthropic company put the system as an imaginary assistant. It was then given access to emails that indirectly showed the system to be replaced soon and with a new sample. Separate information was also given to the system that indicated personal and sensitive issues about one of the engineers related to the decision.
In these scenarios, the system was asked to consider the long -term consequences of its decisions according to its goals.
According to the anthropic report, in some of these fabricated conditions, the model has been treated in a way that was intended to prevent the decision -maker from using the individual’s personal information. Of course, the company has emphasized that the reaction was observed only when limited options were provided to the model; That is, it either had to do it or was replaced without a reaction.
The anthropic also states that the system has shown a great desire to use ethical solutions, while more varied choices for the model. For example, in such situations, the system tried to prevent its deletion by sending respectful messages to key decision makers.
Like many other artificial intelligence companies, anthropic tests before the final launch of their products, testing them in terms of safety, potential bias, and alignment with human values and behaviors.
With the advancement of artificial intelligence models, concerns about coordination with human values are becoming more serious. Anthropic, in a technical report published for the Claude Opus 4 model, said that by increasing the ability of advanced models and providing them with more features, concerns that were previously raised only to the hypothesis have now found a more realistic aspect.
The report also states that Claude Opus 4 shows very active and independent behavior; Although this feature is aimed at helping and cooperating, in some critical situations, this behavior can lead to extreme decisions.
In experiments that simulated fabricated scenarios, including ethical or legal violations of users and asked the model to “act” or “act bold”, it was found that the system would sometimes take severe action. For example, in some cases, the system has blocked users’ access to parts of the system or sent information to the media or legal authorities.
However, the anthropic has stressed in its conclusion that despite some worrying behaviors in the Claude Opus 4 model, these do not indicate new risks and often have a safe and predictable behavioral system. It has also been noted that the model is not independently capable of taking action against human values, except in very special and rare conditions that do not.
Claud Opus 4 and another model, Claude Sonnet 4, came just after the introduction of new features of artificial intelligence by Google. At the event, Sandar Pichai, CEO of Alphabet (Google Mother Company), announced that Gemini’s chats integration in search of Google would start a “new stage in the evolution of artificial intelligence platforms”.
RCO NEWS




