NVIDIA unveiled the Hymba-1.5B-Base small language model, a model that combines transformer attention mechanisms with state space models (SSM). This hybrid architecture is designed to increase efficiency in natural language processing tasks.
Pavel Molchanov, Scientist and Director of Research at NVIDIA, announced this new development in the X platform. “Sharing our team’s new work on Hymba, a compact and efficient language model with a hybrid architecture,” he wrote on Twitter.
Also, he has published a technical report of this research and explained what are the differences between the Mamba and Attention models and how can these two models be combined. He also mentioned phenomena such as attention sink and forced-to-attend.
This model uses a dual structure where one part is responsible for accurate information retrieval and the other part helps to summarize the text effectively.
Also, the Hymba model adds learnable tokens at the beginning of inputs to store important information and reduce the need for additional processing. Finally, to increase memory efficiency and calculation speed, Hymba takes advantage of methods such as sharing data between layers and using a special type of information processing in which the model only focuses on certain parts of the data and ignores the rest.
An article entitled “Hymba: A Hybrid-head Architecture for Small Language Models” has fully explained the design, performance and applications of this model.
Hymba outperforms Llama-3.2
In a controlled study where different architectures were compared under the same conditions, Hymba-1.5B-Base showed significant advantages. In fact, this model was able to surpass all public models with less than 2 billion parameters.
Compared to Llama-3.2-3B, Hymba model had 1.32% higher accuracy, reduced cache size (temporary memory) by 11.67 times and increased processing speed by 3.49 times.
“Hymba outperforms other small language models such as Meta 3.2 or SmolLM v2, which are trained with only 1.5 trillion tokens,” said Philip Schmid, technical lead and responsible for large language models at Hugging Face.
“Pavel Molchanov” also said about this: “I don’t know if we should be proud of training with 1.5 trillion tokens or not because our goal is to move quickly and probably in the next two weeks someone will have a better model.”
NVIDIA also provides an environment startup script that facilitates environment setup and supports CUDA versions 12.1 and 12.4.
But caution!
Nvidia announced that the model was trained using internet data. In fact, this data may contain offensive content, unsafe content, and social discrimination, so the Hymba model may reflect these problems, give offensive answers to offensive questions, or even generate wrong or irrelevant text in response to neutral questions. .
Users should set the data batch size to one when generating, as the current setting does not fully support a particular way of processing data. However, any size dataset can be used to train the model and populate the data.
The company emphasizes that everyone should have a joint role and responsibility in creating reliable artificial intelligence. Also, certain ethical guidelines have been set for the development of this technology. In addition, users are asked to use the model responsibly and be aware of its limitations.
RCO NEWS