Apple researchers from a new artificial intelligence model called Ferret-UI Lite have unveiled; A lightweight artificial intelligence agent that runs directly on the device and can interact with the user interface of applications based on user requests. It is worth noting that this model, despite having only 3 billion parameters, performs at the same level or even better than some GUI models, which are up to 24 times larger.
The story of Ferret goes back to December 2023; When a team of 9 Apple researchers published an article titled “FERRET: Refer and Ground Anything Anywhere at Any Granularity” published In that research, a multimodal linguistic model (MLLM) was introduced that could respond to linguistic references about specific parts of an image.
After that, Apple developed versions including Ferretv2 ,Ferret-UI and Ferret-UI 2 published
While the original Ferret-UI was built on a model with 13 billion parameters, and Ferret-UI 2 added support for more platforms and higher resolutions, the Lite version takes a different approach; The model, which was designed from the beginning to run directly on the device, has a light and energy-efficient structure, and despite its smaller size, it appears competitive against much larger models.
Researchers emphasize that most existing GUI agents are based on massive server-side models; Because these models have strong reasoning and planning ability. But such models are usually too heavy and expensive to run on the device.
Ferret-UI Lite combines real and artificial data, supervised fine-tuning and trained reinforcement learning and uses real-time cropping and zooming techniques. In this method, after an initial prediction, the model re-cuts the same part and analyzes it more accurately to compensate for the limitation of its capacity in processing image details.

One of the main innovations of Ferret-UI Lite is the use of a multi-agent system to generate synthetic training data; A system that designs tasks, divides them into execution steps, executes them, and finally evaluates the result so that real interactions, even with errors and unforeseen conditions, are recorded in the data.
The strengths and limitations of this artificial intelligence model
The results show that Ferret-UI Lite performs very well in short-term and low-level tasks, but appears weaker than larger models in complex and multi-step interactions; An issue that can be expected considering the limitations of a small model and on a device.
On the contrary, its most important advantage is local implementation and privacy protection; Because no data is sent to cloud servers for processing.
All in all, Ferret-UI Lite could be an important step towards personal AI agents that run directly on a phone or laptop and automatically interact with apps.
RCO NEWS




