The Chinese technology giant, Alibaba, has released the WAN 2.1 artificial intelligence model as an open source model for video production and made it publicly available. This model is capable of producing sophisticated movements that precisely simulate real -world physics.
Alibaba said in his blog post: “WAN 2.1 works better in all standard tests than existing open source models.”
The company has introduced several optimized models for video production, including features such as converting text to video, video to video, video editing, text to video and video. This set includes three main models:
- WAN2.1-I2V-14b
- WAN2.1-T2V-14B
- WAN2.1-T2V-1.3B
The i2V-14B is capable of producing 2p and 4p resolution videos and can create sophisticated visual scenes and precise motor patterns.
The T2V-14B also supports similar resolution and is the only video model available that can produce both Chinese and English texts.
The T2V-1.3B is designed for conventional graphics cards (Consumer-Grade GPUS) and can produce a 5-second 2-second video video in 5 minutes on the RTX 4090 with 1.2 GB of VRAM.
Better performance than Sora Open
The model has performed better than the Sora Open in the VBENCH rating table, which evaluates the quality of video production at different criteria such as the integrity of the subject’s identity, movement, time vibration and spatial relationships.
WAN 2.1 technical innovations
Alibaba has announced that the technical advances of this model are based on several key technologies, including:
A new 3D variety of varieties (VAE) for video generation
Scalable Pre -Training Strategies
Making huge collections
Using Auto Evaluation Methods
Using the new 3D Causal Vae architecture, this model reduces the memory needed and at the same time maintains the time sequence of motions.
Faster performance than hunyuanvideo
Functional tests show that the WAN 2.1 variety of videos reconstructs videos faster than the Hunyuanvideo model on the A800 graphics card. “This fast advantage will appear in higher resolution, because our VAE model is smaller and uses the Feature Cache mechanism,” says Alibaba.
Key technologies used in WAN 2.1
Using Flow Matching framework in Diffusion Transformer (DIT) architecture
Integration of the T5 encoding for multiple text input processing using cross-on-the-Cross
Collecting and removing duplicate data out of $ 1.5 billion and 2 billion images to improve the quality of model training
Investment of $ 2 billion in artificial intelligence
Alibaba has recently released the QWQ-Max-Preview model in the QWen AI artificial intelligence family. The company plans to invest more than $ 5 billion in cloud computing and artificial intelligence over the next three years.
Competition in the world of video production models seems to be more exciting day by day!
RCO NEWS