Meta is still involved in answering questions and criticisms of the family of new LLAMA 4 models, but the company Nvidia By introducing the Large Language Model (LLM), it has attracted attention. This model LLAMA-3.1 Nemotron Ultra The name is based on the previous version of the Llama-3.1-405B-Instruct Meta and, according to Nvidia, has a close performance to the top models available.
Model LLAMA -3.1-Nemotron-Ultra-253B-V1 With 253 billion parameters, it is designed for tasks such as advanced reasoning, following instructions, and playing the role of artificial intelligence assistant.
The model was first introduced at the Nvidia GTC annual conference and is now fully accessible on the Hugging Face platform. Model code, weights and post -training data have also been published.
The new Nvidia model has been developed using the Neural Architecture Search Process (NAS), in which innovations such as removing attention layers, fused FeedForward networks and variable compression are applied to the model structure. This architecture is designed to provide high output quality by reducing memory consumption and computational resources and can be executed with only 8 H100 graphics cards.
In addition to the H100, this model is also compatible with more advanced Nvidia architectures such as the B100 and Hopper and performs well in BF16 and FP8 accuracy modes.
Nvidia has used the multi -stage post -training process to enhance the model capabilities, which included monitoring training in areas such as mathematics, code production, chat and tools. The GRPO algorithm (relative optimization of group policy) has also been used to improve the performance in following instructions and reasoning.
Nvidia’s new model performance against competitors

The new Nvidia model has been stunning in various reputable tests. For example, in the MATH500 test, the model’s performance increased from 80.40 % to 97 % in reasoning. Also in the Aime25 test, the score rose from 16.67 % to 72.50 % and in LiveCodebench from 29.03 % to 66.31 %.
In response to public questions (GPQA), the model reached 76.01 % score in active reasoning, surpassed Deepseek R1 (71.5 %). The IFEVAL test also recorded 89.45 % against 83.3 % of the competitor and has done a little better in LiveCodebench.
It should be noted, however, that the Deepseek R1 model still works better in some heavy math tests, including in the Aime25 with a score of 79.8 % at 72.50 %.
It supports a variety of languages, including English, French, French, Italian, Portuguese, Indian, Spanish and Thai, and can be used for applications such as chats, artificial intelligence agents, code production and production by RAG.
RCO NEWS