type: concept tags: [benchmark, gui-agent, smartphone, evaluation, personalization] related: [[mobile-agent-framework]], [[secagent-mobile-gui]], [[clawmobile-agentic]] sources: - url: https://arxiv.org/abs/2603.29318v1 title: "PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent" date: 2026-03 created: 2026-04-14
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent¶
核心问题¶
Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration.
方法/架构¶
基于论文摘要,该方法包含以下关键创新点:
- To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents.
实验结果¶
论文报告了以下主要实验结果:
- PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities at a fine-grained level.
- Through PSPA-Bench, we benchmark 11 state-of-the-art GUI agents.
- Results reveal that current methods perform poorly under personalized settings, with even the strongest agent achieving limited success.
为什么重要¶
该研究的重要性体现在:
- Our analysis further highlights three directions for advancing personalized GUI agents: (1) reasoning-oriented models consistently outperform general LLMs, (2) perception remains a simple yet critical capability, and (3) reflection and long-term memory mechanisms are key to improving adaptation.
- Together, these findings establish PSPA-Bench as a foundation for systematic study and future progress in personalized GUI agents.
关联¶
基于论文内容和研究领域,该工作与以下概念相关:
- [mobile-agent-framework
参考资源¶
- 论文原文:https://arxiv.org/abs/2603.29318