Fantasytalking, developed by Chinese researchers, is capable of producing realistic videos from a fixed portrait image and audio file.
Using the Video Diffusion Transformer architecture and two -stage strategy, the model first coordinates the overall face, body and background movements, and secondly, using specific masks, adjusts the lips to the frame to coordinate the sound.
These features have led Fantasytalking to produce high quality spokesperson avatars and preserve face identity.
This model uses control modules to adjust the severity of face and body movements and enables videos with a variety of angles (close, half -sized, full -fledged), different graphic (realistic or cartoon) styles, and even animals.
Compared to advanced methods such as Omnihuman-1, Fantasytalking is superior in terms of realism, motor cohesion and audio-visual matching, and offers more natural results because of the use of face-based mechanisms.
This technology is a major step in the field of car graphics and vision.
RCO NEWS



