In a new study, Stanford University researchers have found that the June version of the popular artificial ielligence chatbot ChatGPT Compared to the March version Poor performance He has had some duties.
In their study, the scieists compared the performance of a chatbot built by OpenAI over several mohs on four “various” tasks: solving mathematical problems, answering sensitive questions, generating software code, and visual reasoning. Also in this study, two versions of OpenAI artificial ielligence technology viz GPT-3.5 And GPT-4 They have been reviewed during differe periods of time.
Accuracy difference of differe versions of ChatGPT
The most remarkable result stated by them is probably the capability of GPT-4 model in solving Mathematical problems It is related that in 97.6 perce From the questions of March, he correctly ideified 17077 as a prime number. But only three mohs later, its precision to 2.4 perce decreased!
In corast, GPT-3.5 has practically gone the opposite way. Although its March version is only in 7.4 perce Many times he answered these questions correctly, but in June he managed to increase the accuracy of his answers to 86.8 perce Increase.
There were similar results when researchers asked the models to write code or perform a visual reasoning test (predicting the next shape of a pattern).
The very differe result observed from March to June from the OpenAI artificial ielligence model shows the unpredictable effects of changes to one part of the model. Stanford computer science professor James Zou, who is also one of the authors of this study, explains in this regard:
“When we set out to improve the performance of a large language model on some specific tasks, there can be many uniended consequences that may actually undermine its performance on other tasks. “There are differe kinds of ierdependencies in how the model answers questions that can lead to worse behavior than we’ve seen so far.”




