Nvidia unveiled its new hybrid architecture for small language models called Hymba

Contents

Hymba outperforms Llama-3.2 But caution!

study time: 2 minutes

NVIDIA unveiled the Hymba-1.5B-Base small language model, a model that combines transformer attention mechanisms with state space models (SSM). This hybrid architecture is designed to increase efficiency in natural language processing tasks.

Pavel Molchanov, Scientist and Director of Research at NVIDIA, announced this new development in the X platform. “Sharing our team’s new work on Hymba, a compact and efficient language model with a hybrid architecture,” he wrote on Twitter.

Also, he has published a technical report of this research and explained what are the differences between the Mamba and Attention models and how can these two models be combined. He also mentioned phenomena such as attention sink and forced-to-attend.

This model uses a dual structure where one part is responsible for accurate information retrieval and the other part helps to summarize the text effectively.

Also, the Hymba model adds learnable tokens at the beginning of inputs to store important information and reduce the need for additional processing. Finally, to increase memory efficiency and calculation speed, Hymba takes advantage of methods such as sharing data between layers and using a special type of information processing in which the model only focuses on certain parts of the data and ignores the rest.

An article entitled “Hymba: A Hybrid-head Architecture for Small Language Models” has fully explained the design, performance and applications of this model.

Hymba outperforms Llama-3.2

In a controlled study where different architectures were compared under the same conditions, Hymba-1.5B-Base showed significant advantages. In fact, this model was able to surpass all public models with less than 2 billion parameters.

Compared to Llama-3.2-3B, Hymba model had 1.32% higher accuracy, reduced cache size (temporary memory) by 11.67 times and increased processing speed by 3.49 times.

“Hymba outperforms other small language models such as Meta 3.2 or SmolLM v2, which are trained with only 1.5 trillion tokens,” said Philip Schmid, technical lead and responsible for large language models at Hugging Face.

“Pavel Molchanov” also said about this: “I don’t know if we should be proud of training with 1.5 trillion tokens or not because our goal is to move quickly and probably in the next two weeks someone will have a better model.”

NVIDIA also provides an environment startup script that facilitates environment setup and supports CUDA versions 12.1 and 12.4.

But caution!

Nvidia announced that the model was trained using internet data. In fact, this data may contain offensive content, unsafe content, and social discrimination, so the Hymba model may reflect these problems, give offensive answers to offensive questions, or even generate wrong or irrelevant text in response to neutral questions. .

Users should set the data batch size to one when generating, as the current setting does not fully support a particular way of processing data. However, any size dataset can be used to train the model and populate the data.

The company emphasizes that everyone should have a joint role and responsibility in creating reliable artificial intelligence. Also, certain ethical guidelines have been set for the development of this technology. In addition, users are asked to use the model responsibly and be aware of its limitations.

Latest Passing over countries : Spain | Dominica | United Arab Emirates

RCO NEWS

Canadian Background Check (Background Check)

– Canada az iran ۲ The main reason for the rejection of job requests in Canada

– Canada az iran the best courses in Canada for high -income jobs up to 1

Academic Immigration to the UK Steps, documents and costs of 2025

Registering a company in the UK

State Airlines Notice: Get a round trip ticket trip

The Sudan National Museum looted

Mashhad Comprehensive Guide Guide: Suitable for parties and ceremonies

Video | Travel with Loganat Cafe in Tehran’s history

The right handbag of traveling; Light, stylish and functional companionship

2 Reasons for Pixel Purchase Mistake!

Samsung Galaxy A06 review; A cheap handset for minimal needs

Alibaba China introduced a model of visual intelligence artificial intelligence

Open EIA claims: The audio assistant of his artificial intelligence now offers a better conversation experience

Artificial Intelligence Introducing 10 websites of illustrator artificial intelligence

Hymba outperforms Llama-3.2

But caution!

RCO News

Leave a Reply Cancel reply

Editor's Pick

– Canada az iran ۲ The main reason for the rejection of job requests in Canada

The Sudan National Museum looted

Mashhad Comprehensive Guide Guide: Suitable for parties and ceremonies

Top Writers

Oponion

The most fashionable shawl model 1404! Get acquainted with the most beautiful shawl 2025 model

You Might Also Like

Alibaba China introduced a model of visual intelligence artificial intelligence

Open EIA claims: The audio assistant of his artificial intelligence now offers a better conversation experience

Artificial Intelligence Introducing 10 websites of illustrator artificial intelligence

2 % accuracy of artificial intelligence in the diagnosis of endometrial cancer

Other News

Technology

Immigration

Travel

More

Subscribe