In recent years, with the development of services based on artificial intelligence and natural language processing, multilingual language models with Persian language support were also developed. But the main question for users or developers is which language model works better in performing tasks in Persian language? What model should be used to get the desired result?
Large language models are not characterized by their performance quality and trusted by the AI ecosystem until they are evaluated by valid benchmarks. For this purpose, Part Artificial Intelligence Research Center and Natural Language Processing Laboratory of Amirkabir University of Technology by offering a comprehensive system for evaluating Persian LLMs (Open Persian LLM Leaderboard) has made it possible to compare Persian language models in performing various tasks so that the validity of these models is carefully evaluated. be placed and the users of these models can make a more accurate choice.
Rating challenge
One of the basic challenges of the country’s artificial intelligence ecosystem is the obstacles that exist in the way of measuring Persian language models. Famous and reliable foreign measures do not have adequate support for the Persian language, and the local measures that have been offered so far did not have the necessary comprehensiveness to evaluate the models, therefore, the results of the assessment of Persian LLMs did not have sufficient validity and it is possible to compare them carefully until today. has not had
In order to solve this obstacle, Part Artificial Intelligence Research Center and Natural Language Processing Laboratory of Amirkabir University of Technology under the supervision of Dr. Saeedeh Mumtazi, a prominent artificial intelligence professor in the country, started working on a comprehensive evaluation system and succeeded in measuring the most accurate Persian LLMs in to empower developers, researchers and artificial intelligence enthusiasts in the country.
The challenge of quality data
This evaluation system includes more than 40,000 samples, in which a large amount of Persian big data has been collected and labeled from scratch to provide the highest quality data for evaluating language models. In addition, this framework includes a number of international benchmarks that have been translated into Farsi by the efforts of the developers, and the necessary localizations have been made on them to fully match the needs of the country’s artificial intelligence ecosystem. It is worth noting that, with the continuous increase in the number of samples and continuous updating, the performance of this system for evaluating LLMs will improve.
Along with this evaluation framework, a ranking table has also been provided, which allows the comparison and overall evaluation of the models. By improving the quality of the models, their position in the table is also improved and they are more noticed by the users. This mechanism forms a competitive environment, the result of which can be seen in the increasing quality of Persian language models and, on a larger scale, in the growth of the country’s artificial intelligence industry. In addition, researchers and developers who intend to enter the LLMs market get the valuable opportunity to introduce their model to thousands of audiences in the field by gaining a place in the ranking table.
Assessment at higher levels
The Persian gauges that have been offered so far were only able to measure the ability of models up to the limit of high school knowledge. This point made the large and capable LLMs unable to show their potential well. This is despite the fact that the Persian model evaluation system has master’s level knowledge in the fields of medicine, economics, industry, law, logic, engineering, humanities, etc. and can evaluate models in a professional manner. In addition to textual data, this evaluation system also uses numbers and mathematical formulas to measure the performance of models so that each LLM can be evaluated from different dimensions.
Cooperation between ecosystem actors
The Part group considers the strengthening of cooperation between the university and the industry as one of the effective solutions to meet the challenges and needs of Persian language developers and mentions the successful release of the Persian LLM evaluation system as a proof of this. By providing the necessary infrastructure and evaluation pipeline in accordance with Open LLM Leaderboard standards, PART has provided the basis for the creation of this advanced benchmark and hopes that this fruitful process will lead to the development of more innovative tools in the future.
Thanks to the efforts of Dr. Saeedeh Mumtazi, this scale has received the necessary permits from “Open LLM Leaderboard” and the results of the assessment of Persian models are also recognized as valid in this world authority. As a result, domestic LLMs have the potential to be presented and used globally. You can view the comparison list of Persian language models by visiting the HuggingFace portal of this evaluation system and, if you wish, evaluate your LLM with a part of this system.
RCO NEWS