分享
2025年agent前沿研究
输入“/”快速插入内容
2025年agent前沿研究
用户6393
用户6393
2025年10月24日修改
重要必读①
重要必读②
综述 6
Agent AI: Surveying the Horizons of Multimodal Interaction
斯坦福大学、微软研究院、加州大学洛杉矶分校、华盛顿大学等
链接:
https://arxiv.org/pdf/2401.03568
The Rise and Potential of Large Language Model Based Agents: A Survey
复旦大学自然语言处理团队、米哈游公司
链接:
https://xuanjing-huang.github.io/files/agent.pdf
https://github.com/woooodyy/llm-agent-paper-list
A survey on large language model based autonomous agents
中国人民大学高瓴人工智能学院研究团队
链接:
https://link.springer.com/content/pdf/10.1007/s11704-024-40231-1.pdf
https://github.com/paitesanshi/llm-agent-survey
https://github.com/melih-unsal/demogpt
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
南方科技大学等
链接:
https://arxiv.org/pdf/2402.01680
https://github.com/taichengguo/llm_multiagents_survey_papers
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
清华大学人工智能产业研究院
链接:
https://arxiv.org/pdf/2401.05459
https://github.com/mobilellm/personal_llm_agents_survey
Understanding the planning of LLM agents: A survey
中国科学技术大学、华为诺亚方舟实验室
链接:
https://arxiv.org/pdf/2402.02716
构建、应用、评估
大模型智能体论文合集
深度之眼整理
CVPR2025
单智能体 21
深度之眼整理
Font-Agent: Enhancing Font Understanding with Large Language Models
论文:
https://openaccess.thecvf.com/content/CVPR2025/papers/Lai_Font-Agent_Enhancing_Font_Understanding_with_Large_Language_Models_CVPR_2025_paper.pdf
深度之眼整理ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
论文:
https://arxiv.org/abs/2504.01603
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
代码:
https://feature4x.github.io/
论文:
https://arxiv.org/abs/2503.20776
TANGO: Training-free Embodied AI Agents for Open-world Tasks
深度之眼整理
论文:
https://openaccess.thecvf.com/content/CVPR2025/papers/Ziliotto_TANGO_Training-free_Embodied_AI_Agents_for_Open-world_Tasks_CVPR_2025_paper.pdf
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
论文:
https://openaccess.thecvf.com/content/CVPR2025/papers/Khan_Sketchtopia_A_Dataset_and_Foundational_Agents_for_Benchmarking_Asynchronous_Multimodal_CVPR_2025_paper.pdf
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
代码:
https://github.com/aim-uofa/SegAgent
论文:
https://arxiv.org/abs/2503.08625
深度之眼整理
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
论文:
https://arxiv.org/html/2503.17709v1