2025年agent前沿研究

输入“/”快速插入内容

2025年agent前沿研究

用户6393

2025年10月24日修改

重要必读①

重要必读②

综述 6

Agent AI: Surveying the Horizons of Multimodal Interaction

斯坦福大学、微软研究院、加州大学洛杉矶分校、华盛顿大学等​

链接: https://arxiv.org/pdf/2401.03568

The Rise and Potential of Large Language Model Based Agents: A Survey

复旦大学自然语言处理团队、米哈游公司

链接: https://xuanjing-huang.github.io/files/agent.pdf

https://github.com/woooodyy/llm-agent-paper-list

A survey on large language model based autonomous agents

中国人民大学高瓴人工智能学院研究团队

链接：https://link.springer.com/content/pdf/10.1007/s11704-024-40231-1.pdf

https://github.com/paitesanshi/llm-agent-survey

https://github.com/melih-unsal/demogpt

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

南方科技大学等

链接：https://arxiv.org/pdf/2402.01680

https://github.com/taichengguo/llm_multiagents_survey_papers

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security​

清华大学人工智能产业研究院

链接：https://arxiv.org/pdf/2401.05459

https://github.com/mobilellm/personal_llm_agents_survey

Understanding the planning of LLM agents: A survey

中国科学技术大学、华为诺亚方舟实验室

链接：https://arxiv.org/pdf/2402.02716

构建、应用、评估

大模型智能体论文合集

深度之眼整理

CVPR2025

单智能体 21

深度之眼整理

Font-Agent: Enhancing Font Understanding with Large Language Models

论文：https://openaccess.thecvf.com/content/CVPR2025/papers/Lai_Font-Agent_Enhancing_Font_Understanding_with_Large_Language_Models_CVPR_2025_paper.pdf

深度之眼整理ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting​

论文：https://arxiv.org/abs/2504.01603

Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields​

代码：https://feature4x.github.io/

论文：https://arxiv.org/abs/2503.20776

TANGO: Training-free Embodied AI Agents for Open-world Tasks

深度之眼整理

论文：https://openaccess.thecvf.com/content/CVPR2025/papers/Ziliotto_TANGO_Training-free_Embodied_AI_Agents_for_Open-world_Tasks_CVPR_2025_paper.pdf

Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback​

论文：https://openaccess.thecvf.com/content/CVPR2025/papers/Khan_Sketchtopia_A_Dataset_and_Foundational_Agents_for_Benchmarking_Asynchronous_Multimodal_CVPR_2025_paper.pdf

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories​

代码：https://github.com/aim-uofa/SegAgent

论文：https://arxiv.org/abs/2503.08625

深度之眼整理

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

论文：https://arxiv.org/html/2503.17709v1

2025年agent前沿研究​

2025年agent前沿研究