OpenAI by introducing a new test that GDPVAL Is called, the performance of artificial intelligence models in Tasks Real And Job Has reviewed. The results of this test indicate that Gpt-5 And Claude opus 4.1 They are close to a level where they can provide outputs similar to that of humanists.
According to the company’s explanation, GDPVal includes 1320 real tasks of 44 different jobs such as software engineering, lawyer and nursing. These tasks are designed by a group of experts with an average of 14 years of working experience. Also, from an engineering map to a legal bill and a nursing care plan, the output format of the models can vary.
Openai has emphasized that, unlike conventional benchmarks that are often academic, GDPVAL challenges models with multimedia files and presentations such as slides and documents. In this way, the artificial intelligence giant has tried to bring the models of models closer to the tasks of a real workforce.
In this test of models Gpt-5 ,O3 ,O4-mini And Gpt-4O From Openai along with Claude opus 4.1 From anthropic, 2.5 Peru Google and Grok 4 They have been reviewed from XAI. Their performance is then evaluated by experts.
The performance of artificial intelligence models in the new benchmark Openai
Results have shown that Claude Opus 4.1 is the best performance in terms of Aesthetics And Appearance Outputs It was like the layout of the slides and the formatting of the documents. In contrast, Gpt-5 Most Precision It has shown in finding specialized information and accuracy of information. Openai has also announced that the ability of models has doubled from the time of the Gpt-4O release in spring 2024 to the GPT-5 in the summer of 2025.
One of the notable points, save on Time And Cost Is. According to Openai, advanced models can perform GDPVal tasks about 100 times faster and 100 times cheaper than humanists. Of course, these statistics are only about processing and API costs and do not include important steps such as human supervision, reform and integration into projects.
However, Openai says that GDPVAL is still Restrictions It has. This test only examines the outputs and cannot examine the ability of models to build multiple drafts or manage long -term projects. Also in the real world, many tasks are defined as vaguely or with changing conditions, while GDPVAL focuses more on transparent and specific tasks.
Openai has stressed in its concluding that despite all these limitations, the results show that the models of artificial intelligence are reaching a level that can perform a large portion of the tasks, making the manpower more time for more complex activities.
The company has announced that it plans to expand GDPVal to more areas of the industry and to incorporate more difficult and interactive tasks.
RCO NEWS




