In a new study, Stanford University researchers have found that the June version of the popular artificial intelligence chatbot ChatGPT Compared to the March version Poor performance He has had some duties.
In their study, the scientists compared the performance of a chatbot built by OpenAI over several months on four “various” tasks: solving mathematical problems, answering sensitive questions, generating software code, and visual reasoning. Also in this study, two versions of OpenAI artificial intelligence technology viz GPT-3.5 And GPT-4 They have been reviewed during different periods of time.
Accuracy difference of different versions of ChatGPT
The most remarkable result stated by them is probably the capability of GPT-4 model in solving Mathematical problems It is related that in 97.6 percent From the questions of March, he correctly identified 17077 as a prime number. But only three months later, its precision to 2.4 percent decreased!
In contrast, GPT-3.5 has practically gone the opposite way. Although its March version is only in 7.4 percent Many times he answered these questions correctly, but in June he managed to increase the accuracy of his answers to 86.8 percent Increase.
There were similar results when researchers asked the models to write code or perform a visual reasoning test (predicting the next shape of a pattern).
The very different result observed from March to June from the OpenAI artificial intelligence model shows the unpredictable effects of changes to one part of the model. Stanford computer science professor James Zou, who is also one of the authors of this study, explains in this regard:
“When we set out to improve the performance of a large language model on some specific tasks, there can be many unintended consequences that may actually undermine its performance on other tasks. “There are different kinds of interdependencies in how the model answers questions that can lead to worse behavior than we’ve seen so far.”
RCO NEWS