Meta rejected the claims of testing sets in the process of training LLAMA 4 models. “We have heard that some have claimed that we have used test collections in the training process,” said in a post released by Ahmad al -Dahl, deputy director of the Meta Social Intelligence. “This claim is completely false and we will never do so.”
He added that these models have been released as soon as they are ready and may take several days for all public versions to be fully stable. Meta also attributed the different performance of models to stability issues in implementation, not a defect in the training process.
New LLAMA 4 models
Meta recently two new models of the Llama 4 family called Scout And Maverick Has offered. Maverick model to speed to second place in LmarenaThe Platform of Ranking of AI models, achieved. In this platform, users vote for the best responses by directly comparing models.
In the press statement, meta to points Elo Maverick’s model pointed to 1417 and that above the model Gpt-4O OpenAI and slightly lower than the model Gemini 2.5 Pro Patient.
Experimental and Transparency Model in Results
A version of the Maverick that in Lmarena It has been evaluated, not exactly the same version that Meta has published publicly. In a blog post, Meta announced that she used a custom -made version designed to improve conversation capabilities.
Platform Chatbot isnaWhich by lmarena.ai (Previously LMSYS.ORG), in response to community concerns, more than 2,000 outcomes released a direct comparison for public review. These results include user requests, models of models, and user preferences. The company announced that it has released the results to ensure complete transparency. It has also updated its policies for ranking models to make future evaluations more fair and repetitive. They announced that the copy Hf Model Llama-4-Maverick It will be added to the Arena soon and the ranking results will be released.
Rumors around LLAMA 4
The story of the LLAMA 4 models was controversial when a Viral post in Reddit, citing a Chinese report, claimed that a meta -employee had put forward internal pressure to combine test sets in the post -training process. The report said that company leaders have suggested that different test sets should be combined with post -training criteria to meet functional goals in different criteria.
The post also claimed that the man had resigned and requested to get out of the technical report. However, Meta sources confirmed that the person was not out of the company and that the Chinese report was fake.
Difference in the assessment results
However, some artificial intelligence researchers have pointed to the differences between the results reported by meta and the results observed by them. A user in Network X said:
“The Llama 4 in LMSYS is quite different from other versions of the LLAMA 4, even if you use the proposed system message. I tried several different messages myself. “
Susan ZhangSenior research engineer in Google Deepmind“Four -dimensional chess movement: Using the LLAMA 4 trial version to manipulate LMSYS, disclose inaccurate preferences, and ultimately discredit the entire ranking system,” he said.
Pressure to release LLAMA 4
There were also questions about the release of the LLAMA 4 model on the weekend, as large technology companies usually publish important leaflets on business days. It has also been said to have been under pressure to put LLAMA 4 before the release of the next argument model Deepseek Publish with the name R2.
Meanwhile, Meta has announced that it will release its reasoning model soon. Prior to the release of LLAMA 4, it was reported that Meta had delayed the release date at least twice, as it did not perform well in the expected technical criteria, especially in the duties of reasoning and mathematics. There were also concerns that the LLAMA 4 is less capable than the OpenAI models in similar human conversations.
Meta It continues to defend its models and strives to reduce community concerns by clarifying and providing more information.
RCO NEWS