In rece years, with the developme of services based on artificial ielligence and natural language processing, multilingual language models with Persian language support were also developed. But the main question for users or developers is which language model works better in performing tasks in Persian language? What model should be used to get the desired result?
Large language models are not characterized by their performance quality and trusted by the AI ecosystem uil they are evaluated by valid benchmarks. For this purpose, Part Artificial Ielligence Research Ceer and Natural Language Processing Laboratory of Amirkabir University of Technology by offering a comprehensive system for evaluating Persian LLMs (Open Persian LLM Leaderboard) has made it possible to compare Persian language models in performing various tasks so that the validity of these models is carefully evaluated. be placed and the users of these models can make a more accurate choice.
Rating challenge
One of the basic challenges of the coury’s artificial ielligence ecosystem is the obstacles that exist in the way of measuring Persian language models. Famous and reliable foreign measures do not have adequate support for the Persian language, and the local measures that have been offered so far did not have the necessary comprehensiveness to evaluate the models, therefore, the results of the assessme of Persian LLMs did not have sufficie validity and it is possible to compare them carefully uil today. has not had
In order to solve this obstacle, Part Artificial Ielligence Research Ceer and Natural Language Processing Laboratory of Amirkabir University of Technology under the supervision of Dr. Saeedeh Mumtazi, a promine artificial ielligence professor in the coury, started working on a comprehensive evaluation system and succeeded in measuring the most accurate Persian LLMs in to empower developers, researchers and artificial ielligence ehusiasts in the coury.
The challenge of quality data
This evaluation system includes more than 40,000 samples, in which a large amou of Persian big data has been collected and labeled from scratch to provide the highest quality data for evaluating language models. In addition, this framework includes a number of iernational benchmarks that have been translated io Farsi by the efforts of the developers, and the necessary localizations have been made on them to fully match the needs of the coury’s artificial ielligence ecosystem. It is worth noting that, with the coinuous increase in the number of samples and coinuous updating, the performance of this system for evaluating LLMs will improve.
Along with this evaluation framework, a ranking table has also been provided, which allows the comparison and overall evaluation of the models. By improving the quality of the models, their position in the table is also improved and they are more noticed by the users. This mechanism forms a competitive environme, the result of which can be seen in the increasing quality of Persian language models and, on a larger scale, in the growth of the coury’s artificial ielligence industry. In addition, researchers and developers who iend to eer the LLMs market get the valuable opportunity to iroduce their model to thousands of audiences in the field by gaining a place in the ranking table.
Assessme at higher levels
The Persian gauges that have been offered so far were only able to measure the ability of models up to the limit of high school knowledge. This poi made the large and capable LLMs unable to show their poteial well. This is despite the fact that the Persian model evaluation system has master’s level knowledge in the fields of medicine, economics, industry, law, logic, engineering, humanities, etc. and can evaluate models in a professional manner. In addition to textual data, this evaluation system also uses numbers and mathematical formulas to measure the performance of models so that each LLM can be evaluated from differe dimensions.
Cooperation between ecosystem actors
The Part group considers the strengthening of cooperation between the university and the industry as one of the effective solutions to meet the challenges and needs of Persian language developers and meions the successful release of the Persian LLM evaluation system as a proof of this. By providing the necessary infrastructure and evaluation pipeline in accordance with Open LLM Leaderboard standards, PART has provided the basis for the creation of this advanced benchmark and hopes that this fruitful process will lead to the developme of more innovative tools in the future.
Thanks to the efforts of Dr. Saeedeh Mumtazi, this scale has received the necessary permits from “Open LLM Leaderboard” and the results of the assessme of Persian models are also recognized as valid in this world authority. As a result, domestic LLMs have the poteial to be preseed and used globally. You can view the comparison list of Persian language models by visiting the HuggingFace portal of this evaluation system and, if you wish, evaluate your LLM with a part of this system.




