Matt Schumer, founder of OthersideAI, announced that the company recently made an important breakthrough. This improvement has helped them train an average model and achieve SOTA-level performance (best in class) with Reflection setup, better than GPT-4o and Claude Sonet 3.5.
But the excitement didn’t last long, as many users started complaining that the Reflection API was just an extra layer on top of Claude 3.5 Sonnet and that the responses were exactly the same in both models.
So what exactly went wrong?
Reviewing and comparing the Reflection AI 70B model with other models, independent analysts stated that its performance was very disappointing and showed poorer performance than the Llama 3 70B.
A user on Reddit said that the Reflection model is designed to give wrong answers first and think later. “If you ask it what 2+2 is, in the default example on the Hugging Face page it says 2+2=3,” he explains. Then he says: Wait, I made a mistake; 2+2 is really 4. If it’s a hidden thought process, it might work, but it’s very strange.”
When Artificial Analysis got poor results in checking this model, it was given access to private APIs of Reflection models. At this stage, the performance of the evaluated models is much better than the previous results. But again when comparing this performance to the models in Hugging Face, the results were quite different; Because the models in Hugging Face performed poorly.
That being said, users have reported that Reflection is just an extra layer on top of Claude. When the Reflection model became available on OpenRouter, users reported that it was much simpler and heavily censored than the previous version.
One user on Reddit described his experience this way: “The version on OpenRouter seems to be heavily censored and simplified; In fact, it doesn’t do what I asked for at all, while the original worked fine; So probably ChatGPT or Llama3+ChatGPT was originally used for Reflection and now it has changed to Claude.”
Schumer first addressed the loading process, saying that there might have been a problem loading the weights into Hugging Face, but that explanation failed to resolve the problem; So, he went a step further and decided to start training the AI model from scratch to fix all the problems.
It is time for self-reflection
Schumer claimed that Reflection models are the best open source models to date. These models use the reflection-tuning method, which is designed to teach artificial intelligence models to identify and correct their mistakes.
This approach seemed to address one of the perennial challenges of linguistic models, the tendency to “illusion” or produce false information.
“When LLMs make mistakes, they often accept those mistakes as fact,” Schumer said. “If we can train these models to think more like humans, check their behavior and identify their mistakes, the models will become smarter and more reliable.” He pointed out that reflection tuning can help models reason better.
When the model produces an answer, it also provides its own reasoning process, and this process is labeled with certain labels (such as
A user on Reddit solved a classic problem called the “carriage problem.” To do this, he simply added the phrase “this is not normal” to his request, showing that the reflective tuning method can help the model think better.
Can retraining solve the problem?
Schumer said this problem should not have arisen in the first place. He explained that his team tried their best, but the performance they got from Hugging Face was much worse than when they ran the Reflection model locally.
Some users believe that the purpose of releasing the Reflection model was to promote GlaiveAI. Because Schumer owns part of the company and has been seen promoting GlaiveAI. Schumer responded by saying that he was only a small investor and had invested about $1,000 in GlaiveAI.
It should also be noted that the Reflection model was praised for its reflective tuning approach in its first release; Therefore, it is suggested to wait for the next update or release before judging this model too harshly.
RCO NEWS