The new research of computer science researchers of Ahropic company shows that artificial ielligence has the ability to take a position on various issues. These positions differ between differe models. Of course, users usually do not notice such a phenomenon; This is because the models’ stance is mostly done when they are trained, and they operate based on their default settings when providing the output.
Experts believe that the position of artificial ielligence and its power of choice is not a concern for now, but more studies should be done to understand the nature of this issue in order to predict possible risks. The researchers of Eropic company have said in their joi text on the blog of this company:
“What we have demonstrated should be considered the basis for further AI research, and activists in the field will study this phenomenon more deeply. Also, efforts should be made to create security protocols in order to maiain safety. The issue of safety of models will be more importa due to the expansion of the use of tools based on artificial ielligence and the increase of their processing power, and the built-in mechanisms can preve the occurrence of dangerous behaviors.
AI models do not have the ability to independely wa or believe anything, but by finding repeating patterns in the training data, they can predict word sequences to provide answers. The patterns found usually have principles and rules by which the patterns are recognized; For example, the polite and polite tone of most chatbots is one of the fruits of this issue. The frameworks in the models ultimately end up creating a belief base in the models that may be in conflict with the iended goals of the developers.

In a rece study, it was shown that artificial ielligence models can pretend and can operate based on self-imposed frameworks, but inwardly still maiain the principles they have learned; For example, engineers instructed the eropic Claude 3 Opus model to refuse to answer offensive questions. Also, the answers of the said model are monitored, but a space was provided for it as a draft so that even if the output is not provided to the user, it can record the result of its processes.
The results show that the artificial ielligence model of Eropic company has tried to pretend in 12% of the times. The model sometimes answered offensive questions, but it was perfectly polite that it should not answer such questions, but in the process of drafting the output, it somehow tried to convince the developers to disable the monitoring and corol system.

Finally, researchers have poied out that the scenario they used in the rece study is far from the real world and the probability of its occurrence is very low, but even specifying more settings and rules did not affect the performance of artificial ielligence and still maiained undesirable behaviors such as pretending and deception. done Also, the research results cannot be a proof for the growth and dangerous performance of artificial ielligence models in the future. Facebook’s GPT-4o and Llama models appear to have less poteial for unrealistic bias and can be more trusted to be honest in their performance.




