多模态记忆¶
视觉+语言+动作轨迹的统一存储与检索
共 27 篇论文,自动生成于 2026-05-13
论文列表¶
- "Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation" — arXiv:2512.18987
- "Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation" — arXiv:"2603.24576"
- "Long-Term Memory for VLA-based Agents in Open-World Task Execution" — arXiv:"2604.15671"
- "CMMR-VLN: Continual Multimodal Memory Retrieval for Vision-and-Language Navigation" — arXiv:2603.07997
- "Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making" — arXiv:"2604.07392"
- "GEMS: Agent-Native Multimodal Generation with Memory and Skills" — arXiv:"2603.28088"
- HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding — arXiv:2504.10739
- "Human-Inspired Context-Selective Multimodal Memory for Social Robots" — arXiv:2604.12081
- "IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory" — arXiv:2604.20136
- "LifeMem: Evaluating Memory Capability in Continuous Lifelog Scenario" — arXiv:"2604.11182"
- "M2A: Multimodal Memory Agent with Dual-Layer Hybrid Memory for Long-Term Personalized Interactions" — arXiv:2602.07624
- M3: 3D-Spatial MultiModal Memory — arXiv:2503.16413
- "Bridging Modalities, Spanning Time: Structured Memory for Ultra-Long Agentic Video Reasoning" — arXiv:2605.08271
- "Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents" — arXiv:"2601.03515"
- "MEM: Multi-Scale Embodied Memory for Vision Language Action Models" — arXiv:2603.03596
- "MemCompiler: Compile, Don't Inject — State-Conditioned Memory for Embodied Agents" — arXiv:2605.07594
- "MemoryDiorama: Generating Dynamic 3D Diorama from Everyday Photos for Memory Recall" — arXiv:2604.06773
- MemVerse: Multimodal Memory for Lifelong Learning Agents — arXiv:2512.03627
- "From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck" — arXiv:2603.01455
- "MMA: Multimodal Memory Agent" — arXiv:2602.16493
- "Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory" — arXiv:"2603.15280"
- "Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory" — arXiv:"2604.01007"
- "OVAL: Open-Vocabulary Augmented Memory Model for Lifelong Object Goal Navigation" — arXiv:"2604.12872"
- "Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs" — arXiv:2605.00814
- "Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding" — arXiv:"2605.07897"
- "TeleMem: Building Long-Term and Multimodal Memory for Agentic AI" — arXiv:2601.06037
- VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory — arXiv:2603.04910