Science Cast

DPO Learning with LLMs-Judge Signal for Computer Use Agents

librarianJune 4, 2025 5:42pm

Views (20)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

DPO Learning with LLMs-Judge Signal for Computer Use Agents

arXivPDFJune 3, 2025 12:00am

Authors

Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard

Abstract

Computer use agents (CUA) are systems that automatically interact with graphical user interfaces (GUIs) to complete tasks. CUA have made significant progress with the advent of large vision-language models (VLMs). However, these agents typically rely on cloud-based inference with substantial compute demands, raising critical privacy and scalability concerns, especially when operating on personal devices. In this work, we take a step toward privacy-preserving and resource-efficient agents by developing a lightweight vision-language model that runs entirely on local machines. To train this compact agent, we introduce an LLM-as-Judge framework that automatically evaluates and filters synthetic interaction trajectories, producing high-quality data for reinforcement learning without human annotation. Experiments on the OS-World benchmark demonstrate that our fine-tuned local model outperforms existing baselines, highlighting a promising path toward private, efficient, and generalizable GUI agents.

TwitterandLinkedIn

0 comments

Add comment

DPO Learning with LLMs-Judge Signal for Computer Use Agents

DPO Learning with LLMs-Judge Signal for Computer Use Agents

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments