Researchers have developed a new artificial intelligence system that can produce accurate images of that place based on a recorded sound. In this research, first some sounds recorded from the streets of different cities of the world were given to the artificial voice, then the model produced accurate images for the streets.
According to published reports, a team of researchers from the University of Texas in this research sought to answer the question of whether artificial intelligence can understand the visual characteristics of its environment with only audio clips. A skill that was once thought to be unique to humans.
The ability of artificial intelligence to understand the environment from the recorded sound
They explain in their paper that they first collected 100 YouTube video and audio clips from cities in North America, Asia, and Europe. They then used these clips to train an artificial intelligence model that can produce high-resolution images of different environments based on audio inputs.
Next, the AI was fed 10-second audio clips and asked to generate high-resolution images of what the environment looked like.
To determine the accuracy of the images, a group of people were present in the research as judges. For these judges, the output of artificial intelligence and the sound based on which the images were produced were played, then they were asked to identify which image corresponds to the sound. On average, 80% of the time, the judges’ diagnosis was correct.
According to a statement published by the University of Texas, the accuracy of the images created by this artificial intelligence model shows that machines can well simulate the human connection between audio and visual perception of environments.
Yuhao Kang, one of the authors of this study, says:
“Our research shows that acoustic environments contain enough visual cues to produce recognizable images of streetscapes in which different locations are accurately represented; “That means you can transform acoustic environments into vivid visual displays, and more effectively transform sounds into sights.”
RCO NEWS