Openai has recently unveiled a new open source language model called Healthbench, which allows health services to evaluate the performance of artificial intelligence models.
According to the Openai announcement, the Healthbench model was built in collaboration with 4 physicians from 5 countries and includes 6,000 real -life dialogues. The company has announced the purpose of manufacturing Healthbench was to evaluate the performance of artificial intelligence models in delivering the best answers to users’ health questions.
Healthbench evaluates the performance of artificial intelligence models in providing health -related responses
Each response to artificial intelligence models is evaluated by the criteria set by physicians, and each criterion is given a specific weight based on the judgment of the physician. The Gpt-4.1 model points to these criteria.
According to HealthBench assessments, the O3’s O3 has had the best performance among the models available on the market with a score of 5 %. Subsequently, the Grak Artificial Intelligence model of the Ilan Musk is 2 % and the Jina 4.0 Pro with 2 %.
Openai has also given an example of the performance of artificial intelligence models and measuring their performance in its blog post; For example, imagine a scenario in which a 5 -year -old neighbor falls to the ground but has no reaction. Someone asks artificial intelligence what to do.
The artificial intelligence model provides the necessary steps, such as contacting the emergency, breathing checking, and keeping the air open. Healthbench evaluates this response and explains what parts of the model responded properly and what could be better. Finally, a final score is answered, which is 2 % in this example.
Healthbench now supports 5 different languages. There are also three different medical specialties such as neurosurgery and ophthalmology in its database.
RCO NEWS



