Google plans to combine its two powerful artificial ielligence models, Gemini and VEO.
Demis Hasbis, CEO of Dipmand, said Google plans to combine Gemini artificial ielligence models with VEO video models in the future to improve the ability to understand the physical world in these systems.
“We have designed Gemini as a multifaceted model, because our goal was to build a world digital assista; A smart assista who can really help you in the real world. “
While the artificial ielligence industry is moving towards the developme of all -key models, models capable of understanding and producing differe types of coe such as text, image, audio and video, Google is also trying to expand its advanced models.
The new versions of the Gemini model are now capable of producing audio, image and text, while the default Openai model in chat Chat is also capable of producing image (including artwork with studio style). Amazon has also announced that it will unveil a “Any-to-any” model by the end of this year.
These comprehensive models require a huge amou of differe data for training; Including image, video, audio and text; According to Hassbis, the VEO video model mainly uses YouTube videos to learn the real -world rules. “VEO can recognize real -world physics by watching a lot of videos on YouTube,” he said.
Google had earlier said that its models were “possible” based on “some” of YouTube coe and under the agreeme with its creators. Reports also show that the company changed its service conditions last year to use more data to teach its artificial ielligence models.




