NVIDIA unveiled the Hymba-1.5B-Base small language model, a model that combines transformer atteion mechanisms with state space models (SSM). This hybrid architecture is designed to increase efficiency in natural language processing tasks.
Pavel Molchanov, Scieist and Director of Research at NVIDIA, announced this new developme in the X platform. “Sharing our team’s new work on Hymba, a compact and efficie language model with a hybrid architecture,” he wrote on Twitter.
Also, he has published a technical report of this research and explained what are the differences between the Mamba and Atteion models and how can these two models be combined. He also meioned phenomena such as atteion sink and forced-to-attend.
This model uses a dual structure where one part is responsible for accurate information retrieval and the other part helps to summarize the text effectively.
Also, the Hymba model adds learnable tokens at the beginning of inputs to store importa information and reduce the need for additional processing. Finally, to increase memory efficiency and calculation speed, Hymba takes advaage of methods such as sharing data between layers and using a special type of information processing in which the model only focuses on certain parts of the data and ignores the rest.
An article eitled “Hymba: A Hybrid-head Architecture for Small Language Models” has fully explained the design, performance and applications of this model.
Hymba outperforms Llama-3.2
In a corolled study where differe architectures were compared under the same conditions, Hymba-1.5B-Base showed significa advaages. In fact, this model was able to surpass all public models with less than 2 billion parameters.
Compared to Llama-3.2-3B, Hymba model had 1.32% higher accuracy, reduced cache size (temporary memory) by 11.67 times and increased processing speed by 3.49 times.
“Hymba outperforms other small language models such as Meta 3.2 or SmolLM v2, which are trained with only 1.5 trillion tokens,” said Philip Schmid, technical lead and responsible for large language models at Hugging Face.
“Pavel Molchanov” also said about this: “I don’t know if we should be proud of training with 1.5 trillion tokens or not because our goal is to move quickly and probably in the next two weeks someone will have a better model.”
NVIDIA also provides an environme startup script that facilitates environme setup and supports CUDA versions 12.1 and 12.4.
But caution!
Nvidia announced that the model was trained using iernet data. In fact, this data may coain offensive coe, unsafe coe, and social discrimination, so the Hymba model may reflect these problems, give offensive answers to offensive questions, or even generate wrong or irreleva text in response to neutral questions. .
Users should set the data batch size to one when generating, as the curre setting does not fully support a particular way of processing data. However, any size dataset can be used to train the model and populate the data.
The company emphasizes that everyone should have a joi role and responsibility in creating reliable artificial ielligence. Also, certain ethical guidelines have been set for the developme of this technology. In addition, users are asked to use the model responsibly and be aware of its limitations.




