Iranian language models in a more detailed evaluation

Contents

Rating challenge The challenge of quality data Assessme at higher levels Cooperation between ecosystem actors

2minutes

In rece years, with the developme of services based on artificial ielligence and natural language processing, multilingual language models with Persian language support were also developed. But the main question for users or developers is which language model works better in performing tasks in Persian language? What model should be used to get the desired result?

Large language models are not characterized by their performance quality and trusted by the AI ecosystem uil they are evaluated by valid benchmarks. For this purpose, Part Artificial Ielligence Research Ceer and Natural Language Processing Laboratory of Amirkabir University of Technology by offering a comprehensive system for evaluating Persian LLMs (Open Persian LLM Leaderboard) has made it possible to compare Persian language models in performing various tasks so that the validity of these models is carefully evaluated. be placed and the users of these models can make a more accurate choice.

Rating challenge

One of the basic challenges of the coury’s artificial ielligence ecosystem is the obstacles that exist in the way of measuring Persian language models. Famous and reliable foreign measures do not have adequate support for the Persian language, and the local measures that have been offered so far did not have the necessary comprehensiveness to evaluate the models, therefore, the results of the assessme of Persian LLMs did not have sufficie validity and it is possible to compare them carefully uil today. has not had

In order to solve this obstacle, Part Artificial Ielligence Research Ceer and Natural Language Processing Laboratory of Amirkabir University of Technology under the supervision of Dr. Saeedeh Mumtazi, a promine artificial ielligence professor in the coury, started working on a comprehensive evaluation system and succeeded in measuring the most accurate Persian LLMs in to empower developers, researchers and artificial ielligence ehusiasts in the coury.

The challenge of quality data

This evaluation system includes more than 40,000 samples, in which a large amou of Persian big data has been collected and labeled from scratch to provide the highest quality data for evaluating language models. In addition, this framework includes a number of iernational benchmarks that have been translated io Farsi by the efforts of the developers, and the necessary localizations have been made on them to fully match the needs of the coury’s artificial ielligence ecosystem. It is worth noting that, with the coinuous increase in the number of samples and coinuous updating, the performance of this system for evaluating LLMs will improve.

Along with this evaluation framework, a ranking table has also been provided, which allows the comparison and overall evaluation of the models. By improving the quality of the models, their position in the table is also improved and they are more noticed by the users. This mechanism forms a competitive environme, the result of which can be seen in the increasing quality of Persian language models and, on a larger scale, in the growth of the coury’s artificial ielligence industry. In addition, researchers and developers who iend to eer the LLMs market get the valuable opportunity to iroduce their model to thousands of audiences in the field by gaining a place in the ranking table.

Assessme at higher levels

The Persian gauges that have been offered so far were only able to measure the ability of models up to the limit of high school knowledge. This poi made the large and capable LLMs unable to show their poteial well. This is despite the fact that the Persian model evaluation system has master’s level knowledge in the fields of medicine, economics, industry, law, logic, engineering, humanities, etc. and can evaluate models in a professional manner. In addition to textual data, this evaluation system also uses numbers and mathematical formulas to measure the performance of models so that each LLM can be evaluated from differe dimensions.

Cooperation between ecosystem actors

The Part group considers the strengthening of cooperation between the university and the industry as one of the effective solutions to meet the challenges and needs of Persian language developers and meions the successful release of the Persian LLM evaluation system as a proof of this. By providing the necessary infrastructure and evaluation pipeline in accordance with Open LLM Leaderboard standards, PART has provided the basis for the creation of this advanced benchmark and hopes that this fruitful process will lead to the developme of more innovative tools in the future.

Thanks to the efforts of Dr. Saeedeh Mumtazi, this scale has received the necessary permits from “Open LLM Leaderboard” and the results of the assessme of Persian models are also recognized as valid in this world authority. As a result, domestic LLMs have the poteial to be preseed and used globally. You can view the comparison list of Persian language models by visiting the HuggingFace portal of this evaluation system and, if you wish, evaluate your LLM with a part of this system.

RCO NEWS

New ways to get Canadian permanent residence through Express Entry 2026

Get to know Ryazan University in Russia! Complete guide for 2026 study applicants

ca PGWP golden tips that most Canadian students don’t know

ca

A detailed comparison of Russia and China for education and immigration, an analytical and realistic guide to the decision that will shape your future

Conditions for buying bus tickets Booking guide and bus travel rules

Introduction of the silver beach of Hormuz (access route + accommodation)

Al Habtoor Palace Dubai Hotel

Traffic police: Chalus road, Tehran freeway to the north and Pardis became one-way

Swissôtel Al Ghurair, Dubai

ChatGPT’s safety rules need to be revised

Ethereum time bomb at the border of 2 dollars and the possibility of a historic explosion!

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

Iranian language models in a more detailed evaluation

Rating challenge

The challenge of quality data

Assessme at higher levels

Cooperation between ecosystem actors

Leave a Reply Cancel reply

Editor's Pick

Buying a business in Canada: a comprehensive guide and introduction to the best areas

Dubai Metro Map 2024 from introduction to (new download)

Burj Al Arab restaurants Instant booking 2024

Top Writers

Oponion

Women’s short home cotton shirt

You Might Also Like

ChatGPT’s safety rules need to be revised

New Qwen 3.5 open source models released; Suitable for running on personal systems

The Perplexity Computer platform was introduced

Nano Banana 2 model was introduced; Google’s strongest artificial intelligence

Other News

Technology

Immigration

Travel

More

Subscribe