The O3 AI model is weaker in reality than expected • DigiKala Meg – RCO NEWS Daily world news agency Based on Dubai, UAE

The difference between the benchmark results presented by Openai (creator of GPT chat) and independent institutions for the O3 artificial intelligence model has raised questions about the transparency of the company and its model testing methods. When Openai unveiled O3 in December, the company claimed that the model was capable of answering more than a quarter of the frontiermath questions; Frontiermath is a challenging set of mathematical issues. This rating has completely overshadowed competitors’ performance; The next model, with the best performance, only managed to solve about 2 percent of the FrontierMath problem.

But as it turns out, that number probably represents a high bound, which had a specific version of O3 with more computational power than OpenAI last week. The EPOCH AI Research Institute, FrontierMath’s backup, released the results of its independent benchmark tests for O3 on Friday. EPOCH findings showed that O3 earned about 10 % points, far less than the highest score announced by Openai.

This in itself does not prove that Openai has given incorrect information. The benchmark results released by the company in December show a low -key currency that matches the score recorded by EPOCH. EPOCH also noted that their trial layout is probably different from the Openai layout, and they have used a newer edition of FrontierMath for their measures.

EPOCH wrote in a statement: “The reason for the difference between our results and Openai results can be Openai’s evaluation with a stronger internal infrastructure, the use of more computational power at the time of the test, or the implementation of the results on another subdivision of FrontierMath (180 issues in FrontierMath-2010-11-26 against 290 issues in 290 issues. Frontiermath-2025-02-28-Private “. Venda Zhou of Openai, a member of the technical team, announced in a live broadcast last week that the O3 model was optimized in the production phase of “for real -world applications” and optimized speed, unlike the O3 version, which was displayed in December. For this reason, he added that there may be “differences” in benchmarks.

Of course, the fact that the O3 general version is less than the Openai’s promises in the tests seems somewhat trivial, as the O3-mini-High and O4-Mini models in the FrontierMath are better than O3, and OpenAI has a stronger version of O3-Proteer in the coming weeks. However, the story once again reminds that the results of artificial intelligence benchmarks should not be accepted without examination when it is published by a business company.

Benchmark’s “discussions and controversy” becomes commonplace in the artificial intelligence industry, as companies are competing to attract media and users to introduce new models. In January, EPOCH was criticized for lack of timely disclosure from OpenAI until O3 introduced by the company. Many of the researchers involved in the development of Frontiermath were unaware of Openai’s participation before the issue became public.

Recently, Ilan Musk’s XAI has been accused of publishing non -benchmark charts for its recent artificial intelligence model, Grok 3. Just this month, Meta acknowledged that it had advertised benchmarks for a version of a model that was different from the version provided to developers.

Source: Techcrunch

Latest Passing over countries : Spain | Dominica | United Arab Emirates

RCO NEWS

ca Quick guide to the first job for international students in Canada

Get to know Ryazan University in Russia! Complete guide for 2026 study applicants

ca PGWP golden tips that most Canadian students don’t know

ca

A detailed comparison of Russia and China for education and immigration, an analytical and realistic guide to the decision that will shape your future

Archaeologists in shock; A 4,000-year-old cemetery was discovered

Rules for transporting a pet by plane (fee + documents)

Sights of Tabas From Kal Jeni to Cheshme Morteza Ali + photo

Parthian treasure in a postal envelope; Discovery of billions of coins

When Tutankhamun’s amazing tomb was opened; The shocking narrative of the first reporter

Literacy of artificial intelligence of Tehranis under the microscope of Irandoc – Ecomotive

What is the best monitor for ps5?

Cheap watches display wrong heart rate and SpO2!

The best tools for creating Farsi subtitles with artificial intelligence

Review of the film “Return to Silent Hill”; A big disrespect to the fans

The O3 AI model is weaker in reality than expected • DigiKala Meg

RCO News

Leave a Reply Cancel reply

Editor's Pick

ca Quick guide to the first job for international students in Canada

Rules for transporting a pet by plane (fee + documents)

Sights of Tabas From Kal Jeni to Cheshme Morteza Ali + photo

Top Writers

Oponion

The difference between a stylish and up-to-date make-up and a messy make-up

You Might Also Like

Literacy of artificial intelligence of Tehranis under the microscope of Irandoc – Ecomotive

The best tools for creating Farsi subtitles with artificial intelligence

Review of the film “Return to Silent Hill”; A big disrespect to the fans

Establishment of the first artificial intelligence banking assistant in Tehran University Science and Technology Park

Other News

Technology

Immigration

Travel

More

Subscribe