Apple’s new language model, utilizing advanced architecture, is capable of producing high and complex texts at extremely high speed and precision. According to reports, Apple’s research team is based on a model -based model Diffusion Has presented that the texts can be up to 128 times faster Create similar models.
Large linguistic models Chatgpt Are of Autoregressive type; These models produce the text in a token to the token, and make each token with the user input and all the previous token.
Apple’s new language model has a very fast speed
In front of, Diffusion models Produce a few token simultaneously and modify in a few steps to form a final response. One of the advanced types of these models, FLOW-MATCHING That is to abandon the multiple correction stages and tries to get the final result at one stage.
Apple’s new study, “FS-DFM: Fast and Account Long Text Generation with Few-Step Diffination Language Models” introduces a new model called Few-Step Discrete Flow-Matching (FS-DFM). This model can produce long texts with only eight steps of modification, while conventional Diffusion models needed more than a thousand stages to deliver similar quality.
To achieve this speed, the researchers have used three stages: first the training model is to manage several text modification stages, then a “teacher” model is used to perform accurate and larger updates at each stage, and ultimately the way each step is optimized so that the model can be achieved in less stages and more stability.

Compared to similar large models, the FS-DFM has performed significantly in the “entropy” and “confusion” criteria. Confusion measures the quality of the text; The lower the text, the more natural and accurate the text. Entropy shows the model of confidence in the selection of each word; The low value of the text repetitive or predictable, and the high value causes the text to become coherent or random.
The FS-DFM model with parameters 1.7, 1.3 and 0.17 billion, with 7 and 8 billion parameters compared to the Dream and Llada models, has a lower numerical confusion criterion and a more stable result in entropy.
Due to the excellent performance and lack of similar models, researchers have announced that they intend to publish model code and checkpoints to allow more reproduction and research. The full study of the article in Arxiv contains functional samples and charts that illustrate the modification of each token and how it changes.
RCO NEWS




