跳转至

type: concept tags: [gui-agent, mobile, semantic, automation, efficiency] related: [[mobile-agent-framework]], [[pspa-bench-gui-agent]], [[clawmobile-agentic]] sources: - url: https://arxiv.org/abs/2603.08533v2 title: "SecAgent: Efficient Mobile GUI Agent with Semantic Context" date: 2026-03 created: 2026-04-14


SecAgent: Efficient Mobile GUI Agent with Semantic Context

核心问题

Mobile Graphical User Interface (GUI) agents powered by multimodal large language models have demonstrated promising capabilities in automating complex smartphone tasks.

方法/架构

基于论文摘要,该方法包含以下关键创新点:

  • To address these challenges, we present SecAgent, an efficient mobile GUI agent at 3B scale.
  • We first construct a human-verified Chinese mobile GUI dataset with 18k grounding samples and 121k navigation steps across 44 applications, along with a Chinese navigation benchmark featuring multi-choice action annotations.

实验结果

论文报告了以下主要实验结果:

  • Building upon this dataset, we propose a semantic context mechanism that distills history screenshots and actions into concise, natural language summaries, significantly reducing computational costs while preserving task-relevant information.
  • Through supervised and reinforcement fine-tuning, SecAgent outperforms similar-scale baselines and achieves performance comparable to 7B-8B models on our and public navigation benchmarks.
  • Our dataset is available at https://huggingface.co/datasets/alibabagroup/CMGUI.

为什么重要

该研究的重要性体现在:

  • 提供了高质量的数据集,为相关研究提供宝贵资源
  • 建立了标准化的评估基准,推动领域发展
  • 提升了计算效率,使实际部署更加可行

关联

基于论文内容和研究领域,该工作与以下概念相关:

  • [mobile-agent-framework

参考资源

  • 论文原文:https://arxiv.org/abs/2603.08533