当前位置: 首页  2014贵州省先进计算与医疗信息服务工程实验室  通知公告
20240118论文报告-AppAgent: Multimodal Agents as Smartphone Users

报告题目:AppAgent: Multimodal Agents as Smartphone Users

作者:Chi Zhang∗, Zhao Yang∗, Jiaxuan Liu∗, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu†

单位:Tencent

报告人:王旭飞

报告时间:2024年1月18日

报告地点:贵州大学博学楼621

报告内容摘要:Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks.This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps.Central to our agent’s functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated age editing tools. The results affirm our agent’s proficiency in handling a diverse

array of high-level tasks.


【关闭本页】 【返回顶部】