跳转至

type: concept tags: [benchmark, gui-agent, smartphone, evaluation, personalization] related: [[mobile-agent-framework]], [[secagent-mobile-gui]], [[clawmobile-agentic]] sources: - url: https://arxiv.org/abs/2603.29318v1 title: "PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent" date: 2026-03 created: 2026-04-14


PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

核心问题

Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration.

方法/架构

基于论文摘要,该方法包含以下关键创新点:

  • To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents.

实验结果

论文报告了以下主要实验结果:

  • PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities at a fine-grained level.
  • Through PSPA-Bench, we benchmark 11 state-of-the-art GUI agents.
  • Results reveal that current methods perform poorly under personalized settings, with even the strongest agent achieving limited success.

为什么重要

该研究的重要性体现在:

  • Our analysis further highlights three directions for advancing personalized GUI agents: (1) reasoning-oriented models consistently outperform general LLMs, (2) perception remains a simple yet critical capability, and (3) reflection and long-term memory mechanisms are key to improving adaptation.
  • Together, these findings establish PSPA-Bench as a foundation for systematic study and future progress in personalized GUI agents.

关联

基于论文内容和研究领域,该工作与以下概念相关:

  • [mobile-agent-framework

参考资源

  • 论文原文:https://arxiv.org/abs/2603.29318