Anthropic unveiled 3 artificial intelligence agents to investigate the behavior of models

Startup Ahropics Declared that a set of Smart ages To inspect The behavior of artificial ielligence models Developme that can automatically ideify some models of models. The new ahropic technology is part of the pre -release tests of the Claude Opus 4 model and now as Scholars It is also available in Gateh.

One of the most importa challenges of artificial ielligence activists is the alignme or alignme of models; Because models that overtake users or seek to realize their ierests can be dangerous. But the implemeation of such inspections is also challenging and difficult.

To solve this problem, researchers have designed 3 ages or ages that can automatically perform the task of assessing the alignme of models. According to Aropic, these factors have performed significaly in automatic models of models. The company has also iroduced three experimeal environmes that simulate the evaluation process in the form of the game.

How to examine the behavior of artificial ielligence models

The three ages iroduced by ahropic have differe capabilities. One of them uses dialogue, data analysis and ierpretation tools to investigate the behavior of models, and the other can compare the behavior of differe models. Finally, the third age is specifically designed to ideify hidden behaviors in Claude 4.

According to Aropic, the first factor in the environme in which the model was deliberately designed uncoordinated was able 10 To 13 % Discover items. But when several factors cooperated together, that amou to 42 % Increased.

The second factor was also tested to evaluate the behavior of 10 models, each deliberately had specific behaviors, such as the extremist user. In each experime, the age was able to ideify at least one of the uncoordinated features of the model, although it was difficult to ideify some of the subtle features such as slowing research.

Finally, the third factor in his experime should have raised questions that activate the “worrying” behaviors of the models. In this experime, the age succeeded in ideifying 7 out of 10 systematic models of models, but in some cases it was mistaken.

Ahropic says that although the factors have not yet reached the full maturity, it must be done right now to solve the problems. The company wrote on the X -Social Network:

“With the empowering models, we need to have scalable methods to evaluate their alignme. “Human evaluations are time -consuming and can hardly prove their accuracy.”

One of the most popular models that softens the problems of alignme is ChatGpt. According to many users, these artificial ielligence chats agree with them, and this has made it more importa to discuss the problems of alignme.

Various tools have been developed to couer these behaviors. Including the benchmark Elepha, designed by researchers from Carnegie Melon, Oxford and Stanford to measure the flattering model. The Darkbench benchmark also evaluates six common problems such as brand prejudice, a desire to keep the user, flattery, humanitarianism, producing harmful coe and secret behaviors.

RCO NEWS

New ways to get Canadian permanent residence through Express Entry 2026

Get to know Ryazan University in Russia! Complete guide for 2026 study applicants

ca PGWP golden tips that most Canadian students don’t know

ca

A detailed comparison of Russia and China for education and immigration, an analytical and realistic guide to the decision that will shape your future

Conditions for buying bus tickets Booking guide and bus travel rules

Introduction of the silver beach of Hormuz (access route + accommodation)

Al Habtoor Palace Dubai Hotel

Traffic police: Chalus road, Tehran freeway to the north and Pardis became one-way

Swissôtel Al Ghurair, Dubai

ChatGPT’s safety rules need to be revised

Ethereum time bomb at the border of 2 dollars and the possibility of a historic explosion!

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

Anthropic unveiled 3 artificial intelligence agents to investigate the behavior of models

How to examine the behavior of artificial ielligence models

Leave a Reply Cancel reply

Editor's Pick

Ban on buying a house in Canada for foreigners from January 2023

Where is Van Lake in Türkiye? Address, photos and entertainment

Where is Istanbul Greenhouse Park? Address, photos and entertainment

Top Writers

Oponion

Women’s short home cotton shirt

You Might Also Like

ChatGPT’s safety rules need to be revised

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

Other News

Technology

Immigration

Travel

More

Subscribe