跳转至

type: concept tags: [gui-agent, benchmark, trajectory, mobile, real-world] related: [[turing-test-mobile-gui]], [[pspa-bench-gui-agent]], [[gui-agent-privacy]] sources: - "[arXiv] MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion" created: 2026-04-14


MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

核心问题

Mobile agents can autonomously complete user-assigned tasks through GUI interactions.

方法/架构

基于论文摘要,该方法包含以下关键创新点:

  • In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to determine whether a task has succeeded, leading to a mismatch between benchmarks and real-world usage and making it difficult to evaluate model performance accurately.
  • To address these issues, we propose MobiFlow, an evaluation framework built on tasks drawn from arbitrary third-party applications.

实验结果

论文报告了以下主要实验结果:

  • Using an efficient graph-construction algorithm based on multi-trajectory fusion, MobiFlow can effectively compress the state space, support dynamic interaction, and better align with real-world third-party application scenarios.
  • MobiFlow covers 20 widely used third-party applications and comprises 240 diverse real-world tasks, with enriched evaluation metrics.
  • Compared with AndroidWorld, MobiFlow's evaluation results show higher alignment with human assessments and can guide the training of future GUI-based models under real workloads.

为什么重要

该研究的重要性体现在:

  • 建立了标准化的评估基准,推动领域发展
  • 提升了计算效率,使实际部署更加可行

关联

基于论文内容和研究领域,该工作与以下概念相关:

  • [turing-test-mobile-gui

参考资源

  • 论文原文:https://arxiv.org/abs/2604.09587