Faasytalking, developed by Chinese researchers, is capable of producing realistic videos from a fixed portrait image and audio file.
Using the Video Diffusion Transformer architecture and two -stage strategy, the model first coordinates the overall face, body and background movemes, and secondly, using specific masks, adjusts the lips to the frame to coordinate the sound.
These features have led Faasytalking to produce high quality spokesperson avatars and preserve face ideity.
This model uses corol modules to adjust the severity of face and body movemes and enables videos with a variety of angles (close, half -sized, full -fledged), differe graphic (realistic or cartoon) styles, and even animals.
Compared to advanced methods such as Omnihuman-1, Faasytalking is superior in terms of realism, motor cohesion and audio-visual matching, and offers more natural results because of the use of face-based mechanisms.
This technology is a major step in the field of car graphics and vision.




