OpenAI O3 has recently achieved a score of 85 in the ARC-AGI benchmarks; This is the highest score recorded by artificial intelligence. The previous highest score was 55, which was on par with the average human score. It should be noted that Open AI O3 has also scored well in a very difficult math test.
What is the ARC-AGI test?
The ARC-AGI test can be described as a test to check the “performance of the model” in adapting to something new in artificial intelligence systems; That is, a few examples of a new situation that the system should see to understand how it works.
Until artificial intelligence systems can learn from a small number of examples and adapt to more practical examples, they will only be used for routine and repetitive tasks and, of course, cases where failure is acceptable.
The results indicate that the o3 model has high consistency and can discover generalizable rules from a few limited examples.
Francois Cholet, the French AI researcher who designed this measure, believes that o3 searches through various “chains of thought” that describe the steps to solve a problem, and finally based on some defined rules or heuristics. , chooses the best.
It’s not unlike the way Google’s AlphaGo system searches, which might take different sequences of moves to beat the Go world champion.
If it’s like AlphaGo, it’s simply an AI that creates a heuristic. This was the process for AlphaGo, and Google trained a model to rate different sequences of moves as better or worse than others.
However, almost everything about o3 is unknown. Open AI has limited information disclosure to a few media presentations and early testing to a limited number of AI safety researchers, labs, and institutions, and as a result, truly understanding o3’s potential requires extensive efforts, from evaluating and understanding the distribution of its capabilities to the number of failures and the number The frequency of its success will be.
We’ll have to wait until the o3 comes out and then we’ll have a better idea if it’s nearly as compatible as the average human. If indeed this is the case, this system will be economically influential and usher in a new era of accelerated intelligence that has the ability to improve automatically.
RCO NEWS