type: concept tags: [benchmark, gui-agent, smartphone, evaluation, personalization] related: [[mobile-agent-framework]], [[secagent-mobile-gui]], [[clawmobile-agentic]] sources: - url: https://arxiv.org/abs/2603.29318v1 title: "PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent" date: 2026-03 created: 2026-04-14

PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent¶

核心问题¶

Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration.

方法/架构¶

基于论文摘要，该方法包含以下关键创新点：

To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents.

实验结果¶

论文报告了以下主要实验结果：

PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities at a fine-grained level.
Through PSPA-Bench, we benchmark 11 state-of-the-art GUI agents.
Results reveal that current methods perform poorly under personalized settings, with even the strongest agent achieving limited success.

为什么重要¶

该研究的重要性体现在：

Our analysis further highlights three directions for advancing personalized GUI agents: (1) reasoning-oriented models consistently outperform general LLMs, (2) perception remains a simple yet critical capability, and (3) reflection and long-term memory mechanisms are key to improving adaptation.
Together, these findings establish PSPA-Bench as a foundation for systematic study and future progress in personalized GUI agents.

关联¶

基于论文内容和研究领域，该工作与以下概念相关：

[mobile-agent-framework

参考资源¶

论文原文：https://arxiv.org/abs/2603.29318