Sakana AI has announced the development of a new criterion for measuring artificial intelligence capabilities using Sudoko’s riddles. This criterion contains the most challenging riddles of Sudoku that are difficult even for the most professional puzzlers and can be a serious test for the reasoning capabilities of artificial intelligence models.
Why Sudoku?
Sudoko, known in the 1980s by Nikoli in Japan, consists of a table of 1.2, some of which are predetermined. The goal is to fill the empty houses so that each number from 1 to 2 per row, columns and squares appear without repetition. In recent years, more advanced versions of Sudoku have been introduced, with more sophisticated and diverse rules, called “Modern Sudoko”, which requires creative reasoning.
Although computers have long been able to solve simple soduko through computational search algorithms, artificial intelligence models are still unable to imitate the human reasoning method in solving these puzzles. Modern Sudoko has unique laws that require abstract reasoning, so Sakana AI believes that these puzzles can act as an ideal criterion for measuring the reasoning abilities of artificial intelligence models.
The new challenge of artificial intelligence in the field of reasoning
With the advancement of artificial intelligence models such as ChatGpt-4 and Deepseek R1, the need for more complex criteria to measure the reasoning of these models has increased. So far, academic exams and mathematics competitions have been considered as measurement criteria, but modern models have successfully passed. Now, modern Sudoko, with a variety of unpredictable rules, is a new challenge for artificial intelligence models.
In a key lecture at the GTC 2025 Conference, Jensen Huanga emphasized that riddles such as Sudoku could be a valuable source of reasoning in artificial intelligence models.
Collaborate with Cracking the Cryptic to teach artificial intelligence
One of the main problems in teaching reasoning models is the lack of quality data from problem -solving processes. Many of the texts on the Internet lack step -by -step explanations about how human reasoning. To solve this problem, Sakana AI has collaborated with the famous Cracking the Cryptic channel, the largest puzzle solving channel on YouTube.
This cooperation includes the following:
- More than 5 videos of Sudoku’s sophisticated riddles
- More than 2 hours of textual data from the human reasoning process, including about 2 million words
- About 2 million moves extracted from riddles solving videos
This set of datasets, which are released along with a new measure, can help artificial intelligence models learn better human reasoning.
Current challenges of artificial intelligence models in solving Sudoko
Today’s artificial intelligence models have a major problem in solving Sudoku: “Inability to maintain global adaptation in long chains of reasoning. Many of these models are able to put numbers in the table, but sometimes follow the wrong paths that lead to contradiction in the final stages. “This is the main weakness of artificial intelligence models compared to humans.”
In contrast, Sudoku’s professional solvement uses a gradual reasoning method. They first analyze the unique limitations of the puzzle and look for the “point of entry”; That is, the key insight that paves the way for the puzzle solving. Many of today’s advanced models are still unable to discover these entry points.
Nikoli handmade soduku; Benchmark for human reasoning in artificial intelligence
Nikoli, the Japanese company that introduced Sudoku to the world, offered a set of 4 handmade puzzles for this new criterion. Unlike computer -generated riddles that often rely on brutal force, handmade soduku requires creative insights and reasoning. For this reason, these puzzles can be an ideal criterion for measuring the intelligence capabilities of artificial intelligence.
Publishing a new criterion of artificial intelligence reasoning
Sakana AI has published its new criterion along with a complete set of data and tools. Interested parties and researchers can view this criterion in GitHub.
In this criterion, riddles are gradually adjusted from a simple to surface that today’s advanced models are not even able to solve a number. Early tests have shown that even the strongest models of artificial intelligence currently are unable to solve the difficult riddles of this criterion. For example, the ChatGpt-4O model has only been able to solve 2 % of the simple riddles of this criterion, while with increasing the level of difficulty, its performance has declined dramatically.
The new Sakana AI criterion has created a major challenge for advanced models of artificial intelligence and can pave the way for significant advances in artificial reasoning. Modern Sudoko, with their diverse and complex rules, can be a good criterion for measuring the actual abilities of artificial intelligence models to solve problems.
With this new criterion, can artificial intelligence solve sophisticated riddles like humans? The answer to this question will determine the future of research in the field of artificial intelligence reasoning.
RCO NEWS