CUA-Skill

Abstract

Computer-Using Agents (CUAs) aim to autononomously operate computer systems to complete real-world desktop tasks. However, existing agentic systems remain difficult to scale and continue to lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans interact with graphical user interfaces. We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills coupled with parameterized execution graphs. CUA-Skill is a large-scale library of carefully engineered skills spanning common Windows applications, serving as a practical infrastructure and tool substrate for scalable, reliable agent development. Built upon this skill base, we construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument instantiation, and memory-aware failure recovery. Our results demonstrate that CUA-Skill substantially improves execution success rates and robustness on challenging end-to-end agent benchmarks, establishing a strong foundation for future computer-using agent development. On WindowsAgentArena, CUA-Skill Agent achieves state-of-the-art 57.5% (best of three) successful rate while being significantly more efficient than prior and concurrent approaches.

Computer-Using Agentic Skills

CUA-Skill consists of three components: (i) a skill cell that captures a minimal user intent, (ii) a parameterized execution graph that specifies concrete realizations of the skill through GUI-grounded interactions and executable scripts, and (iii) a skill composition graph that encodes how individual skills are typically chained together.

cua skill and graph construction example

CUA-Skill Agent

CUA-Skill Agent supports flexible, long-horizon task completion via dynamic skill selection and execution. Given a natural-language user instruction, the agent incrementally selects and executes skills from the CUA-Skill library, conditioning each decision on the current UI state, execution history, and accumulated memory. At each step, an LLM M𝑝 serves as the planner, determining both which skill to invoke next and how to instantiate its arguments

Experimental Results

Statistics of CUA-Skill Execution Graph across applications. The GUI primitive statistics measures per atomic skill, how the quantity of GUI primitives distributes. (Right) Bar plot of success rate across applications.

Synthesized User Task Successful Rate. CUA-Skill is noticeablly higher than Ultra-CUA (Yang et al., 2025b) by 1.7x,
and Operator by 3.64x.

Success Rate by Application Category of CUA-Skill Agent on WindowsAgentArena (Bonatti et al., 2024).

WAA success rate by application category

Overall comparison of system performance on WAA

BibTeX

@article{chen2025cuaskill, title={CUA-Skill: Develop Skills for Computer Using Agent}, author={Chen, Tianyi and Li, Yinheng and Solodko, Michael and Wang, Sen and Jiang, Nan and Hao, Junheng and Cui, Tingyuan and Ko, Jongwoo and Abdali, Sara and Zheng, Suzhen and Fan, Hao and Cameron, Pashmina and Wagle, Justin and Koishida, Kazuhito} journal={arXiv preprint arXiv:2601.21123}, year={2026} }

CUA-Skill: Develop Skills for Computer Using Agent

Abstract

Computer-Using Agentic Skills

CUA-Skill Agent

Experimental Results

Case Study

BibTeX