\n
Meta continues to advance its research into new forms of generative AI models, and today it revealed its latest research effort, showing off a model called CM3leon, which it claims compares to other competing examples, including the DALL-E 2 model. It has the best performance.
\n\n\n\n
According to Hoshio, Meta claims that its new artificial intelligence model called CM3Leon is the best text-to-image conversion tool. CM3leon is a multimodal base model for text-to-image and image-to-text generation, particularly useful for automatic caption generation for images. Therefore, according to the image input, the model can create a descriptive text title that accurately reflects the content of the image.
\n\n\n
\n\n\n
Obviously, CM3leon AI-generated images are not a new concept at this point, as popular tools such as Stable Diffusion, DALL-E, and Midjourney are widely available. What’s new are the techniques Meta used to build the CM3leon and the performance Meta claims its base model is capable of achieving.
\n\n\n\n
Most of the current text-to-image generation technologies use some kind of artificial intelligence model called “diffusion model” to generate the image. CM3leon has a different approach to generating images from text. Instead of using a diffusion model, it uses a “sign-based autoregression” model. This means that CM3leon is designed to generate an image by breaking the input text into smaller units called “tokens” and then using a special method to generate each token in a sequence, to finally produce the image output.
\n\n\n\n
“Diffusion models have recently become popular for image generation due to their robust performance and relatively low computational cost,” the Meta Research Group wrote in a research paper titled Scaling Automatic Multivariate Regression Models: Pretraining and Instruction Tuning. “In contrast, token-based autoregression models can produce better results with more coherent images, but are much more expensive to train and use for inference.”
\n\n\n
\n\n\n
What meta-researchers have been able to do with CM3leon is show how a token-based autoregressive model can be more efficient than a diffusion model-based approach.
\n\n\n\n
“CM3leon was able to achieve ‘superior performance’ in text-to-image generation despite being trained with only a fifth of the computational resources used by previous transformer-based methods,” researcher Meta wrote in a blog post.
\n\n\n\n
CM3Leon is very powerful with about 7 billion parameters, which is almost twice the DALL-E 2 OpenAI model. This AI model uses a technique called Supervised Optimization (SFT), which has helped increase its power. This technique has already been used in text models such as ChatGPT and its results have been very promising for image models as well.
\n\n\n
\n\n\t\t
\n
\n
RCO NEWS


