The new research of computer science researchers of Anthropic company shows that artificial intelligence has the ability to take a position on various issues. These positions differ between different models. Of course, users usually do not notice such a phenomenon; This is because the models’ stance is mostly done when they are trained, and they operate based on their default settings when providing the output.
Experts believe that the position of artificial intelligence and its power of choice is not a concern for now, but more studies should be done to understand the nature of this issue in order to predict possible risks. The researchers of Entropic company have said in their joint text on the blog of this company:
“What we have demonstrated should be considered the basis for further AI research, and activists in the field will study this phenomenon more deeply. Also, efforts should be made to create security protocols in order to maintain safety. The issue of safety of models will be more important due to the expansion of the use of tools based on artificial intelligence and the increase of their processing power, and the built-in mechanisms can prevent the occurrence of dangerous behaviors.
AI models do not have the ability to independently want or believe anything, but by finding repeating patterns in the training data, they can predict word sequences to provide answers. The patterns found usually have principles and rules by which the patterns are recognized; For example, the polite and polite tone of most chatbots is one of the fruits of this issue. The frameworks in the models ultimately end up creating a belief base in the models that may be in conflict with the intended goals of the developers.
In a recent study, it was shown that artificial intelligence models can pretend and can operate based on self-imposed frameworks, but inwardly still maintain the principles they have learned; For example, engineers instructed the entropic Claude 3 Opus model to refuse to answer offensive questions. Also, the answers of the said model are monitored, but a space was provided for it as a draft so that even if the output is not provided to the user, it can record the result of its processes.
The results show that the artificial intelligence model of Entropic company has tried to pretend in 12% of the times. The model sometimes answered offensive questions, but it was perfectly polite that it should not answer such questions, but in the process of drafting the output, it somehow tried to convince the developers to disable the monitoring and control system.
Finally, researchers have pointed out that the scenario they used in the recent study is far from the real world and the probability of its occurrence is very low, but even specifying more settings and rules did not affect the performance of artificial intelligence and still maintained undesirable behaviors such as pretending and deception. done Also, the research results cannot be a proof for the growth and dangerous performance of artificial intelligence models in the future. Facebook’s GPT-4o and Llama models appear to have less potential for unrealistic bias and can be more trusted to be honest in their performance.
RCO NEWS