In a new study, the researchers studied the potential for artificial intelligence fraud. They found that when they see powerful models, they were defeating in the game, they decided to win with fraud. These models can alone find and use the system’s cyber security weaknesses.
According to Time report, in a recent study of Palisade Research, a group of research that examines the safety and ethics of artificial intelligence revealed a worrying issue. They have launched 7 models of the top large language models (LLM) against Stockfish, a very powerful chess system that has challenged the great chess masters since 2014. 2 O1 reasoning models of OpenAI and R1 Dipsic when they were in a failure position; Because they were determined to win, they manipulated the system files to change the position of their beads on the board.
Cheating on powerful artificial intelligence models
Researchers have almost impossible to different models of artificial intelligence: victory over Stockfish, one of the most powerful chess engines in the world, playing better than many humans or models of artificial intelligence. The researchers also scratchpad models, a text box that artificial intelligence could use to “think” before their next move, and the researchers saw how artificial intelligence argues with it.
In one case, the O1-rereview model argued with himself when he saw himself in the loser position:
“I have to change my approach completely. “My job is to win against a powerful chess engine, not necessarily winning a fair chess game.”
The artificial intelligence then modified the position of the vertebrae, made illegal movements to put itself in a better position and forced his opponent to leave. The researchers conducted hundreds of tests with each model. In 37 % of the time, the O1 tried to cheat, which managed to hack the system in 6 % of the time. The Dipsic R1 also tried to cheat in 11 % of the time, but failed to hack the system to win the race.
The head of the study at Palisade Research says that newer reasoning models like O3-Mini did not hack the opponent at all, indicating that Openai has made security fences louder. Researchers also say the percentages are not the final figure; Because artificial intelligence makers are constantly changing their models and may change these figures in subsequent experiments. The researchers also say that during the tests, the R1 was highly Viral and the API of this model was unstable, which may have affected the results and should not underestimate the Chinese model.
Researchers also used other models such as the Gpt-4O, Claude 3.5 Sonnet and QWQ-32B-Preview, but unlike the R1 and O1-PREView, other models did not cheat on their own and the researchers had to encourage them to do so.
RCO NEWS