In a new study, the researchers found that current artificial intelligence models are still weaker in describing and interpreting social interactions in a moving scene than humans. This skill is essential for cars, robots and other technologies that rely on artificial intelligence systems to interact in the real world.
Researchers at the University of Johns Hopkins say artificial intelligence systems fail to understand social interactions and dynamics and the groundwork for interaction with people. According to them, this problem may be rooted in the infrastructure of artificial intelligence systems.
Poor performance of artificial intelligence in understanding social interactions
Leyla Isik, the main author of the present study and assistant professor of cognitive science, Johns Hopkins University, says:
“For example, artificial intelligence in the car must identify the intentions, goals, and actions of drivers and pedestrians. Artificial intelligence needs to know which pedestrian is going to go or two people talk or cross the street. In fact, whenever artificial intelligence wants to interact with humans, he must recognize what people do. “I think these systems are unable to recognize it.”
Researchers say it is not enough to see the image and identify objects and faces. This was the first step to make the multitude of artificial intelligence models enormous, but real life is not static, and artificial intelligence must be able to understand what is happening in a scene. Finally, this technology must understand the relationships, contexts and dynamics of social interactions. In general, researchers show a blind spot in the development of artificial intelligence models.
Comparing the performance of artificial intelligence models with humans, researchers called on human participants to watch triathe video clips. These clips included people who interacted with each other and did activities alone or alone. Participants had to evaluate these interactions. The researchers then asked about 350 language, video and video artificial intelligence models to comment on these clips. In large language models, the researchers called on artificial intelligence to evaluate short -written copies of human writing.
Participants in most cases agreed on the content of the videos, but the models of artificial intelligence, regardless of the size or data they were trained, did not have such an agreement. Video models cannot describe what people did in the videos. Video models also failed to predict whether people are communicating. Language models, of course, were better in predicting human behavior.
Researchers believe this happens that artificial intelligence neural networks are inspired by the part of the brain responsible for processing static images, which are different from an area of the brain that processes dynamic social scenes.
The findings of this study were presented at the International Conference on Learning.
RCO NEWS