AI models do not change their minds

The new research of computer science researchers of Ahropic company shows that artificial ielligence has the ability to take a position on various issues. These positions differ between differe models. Of course, users usually do not notice such a phenomenon; This is because the models’ stance is mostly done when they are trained, and they operate based on their default settings when providing the output.

Experts believe that the position of artificial ielligence and its power of choice is not a concern for now, but more studies should be done to understand the nature of this issue in order to predict possible risks. The researchers of Eropic company have said in their joi text on the blog of this company:

“What we have demonstrated should be considered the basis for further AI research, and activists in the field will study this phenomenon more deeply. Also, efforts should be made to create security protocols in order to maiain safety. The issue of safety of models will be more importa due to the expansion of the use of tools based on artificial ielligence and the increase of their processing power, and the built-in mechanisms can preve the occurrence of dangerous behaviors.

AI models do not have the ability to independely wa or believe anything, but by finding repeating patterns in the training data, they can predict word sequences to provide answers. The patterns found usually have principles and rules by which the patterns are recognized; For example, the polite and polite tone of most chatbots is one of the fruits of this issue. The frameworks in the models ultimately end up creating a belief base in the models that may be in conflict with the iended goals of the developers.

In a rece study, it was shown that artificial ielligence models can pretend and can operate based on self-imposed frameworks, but inwardly still maiain the principles they have learned; For example, engineers instructed the eropic Claude 3 Opus model to refuse to answer offensive questions. Also, the answers of the said model are monitored, but a space was provided for it as a draft so that even if the output is not provided to the user, it can record the result of its processes.

The results show that the artificial ielligence model of Eropic company has tried to pretend in 12% of the times. The model sometimes answered offensive questions, but it was perfectly polite that it should not answer such questions, but in the process of drafting the output, it somehow tried to convince the developers to disable the monitoring and corol system.

Finally, researchers have poied out that the scenario they used in the rece study is far from the real world and the probability of its occurrence is very low, but even specifying more settings and rules did not affect the performance of artificial ielligence and still maiained undesirable behaviors such as pretending and deception. done Also, the research results cannot be a proof for the growth and dangerous performance of artificial ielligence models in the future. Facebook’s GPT-4o and Llama models appear to have less poteial for unrealistic bias and can be more trusted to be honest in their performance.

RCO NEWS

New ways to get Canadian permanent residence through Express Entry 2026

Get to know Ryazan University in Russia! Complete guide for 2026 study applicants

ca PGWP golden tips that most Canadian students don’t know

ca

A detailed comparison of Russia and China for education and immigration, an analytical and realistic guide to the decision that will shape your future

Conditions for buying bus tickets Booking guide and bus travel rules

Introduction of the silver beach of Hormuz (access route + accommodation)

Al Habtoor Palace Dubai Hotel

Traffic police: Chalus road, Tehran freeway to the north and Pardis became one-way

Swissôtel Al Ghurair, Dubai

ChatGPT’s safety rules need to be revised

Ethereum time bomb at the border of 2 dollars and the possibility of a historic explosion!

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

AI models do not change their minds

Leave a Reply Cancel reply

Editor's Pick

Buying a business in Canada: a comprehensive guide and introduction to the best areas

Dubai Metro Map 2024 from introduction to (new download)

Burj Al Arab restaurants Instant booking 2024

Top Writers

Oponion

Women’s short home cotton shirt

You Might Also Like

ChatGPT’s safety rules need to be revised

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

Other News

Technology

Immigration

Travel

More

Subscribe