Building Generalizable Sequential Decision-Making Systems: Multi-Agent Reinforcement Learning in the Era of LLMs

发布者:梁慧丽发布时间:2024-12-09浏览次数:13

时间

TIME

2024年12月3日(周二)14:00-15:00

地点

VENUE

信管学院308会议室

主讲人

SPEAKER

Muning Wen(温睦宁) is currently a third-year Ph.D. student at Shanghai Jiao Tong University, under the supervision of Professor Weinan Zhang. He possesses extensive theoretical and practical experience in reinforcement learning, multi-agent systems, and LLM agents. In his recent academic endeavors, Muning has been dedicated to developing advanced RL/MARL algorithms aimed at enhancing the sequential decision-making capabilities of LLM agents in dynamic environments. Additionally, he has been deeply involved in the application of these algorithms in fields such as data science, mathematics, and embodied intelligence. In the past three years, Muning has published over ten papers in top-tier academic conferences, including NeurIPS, ICML, and ICLR. Since 2023, he has also been serving as a reviewer for these prestigious conferences.

个人主页

PERSONAL HOMEPAGE

https://scholar.google.com/citations?user=Zt1WFtQAAAAJ

主题

TITLE

Building Generalizable Sequential Decision-Making Systems: Multi-Agent Reinforcement Learning in the Era of LLMs

摘要

ABSTRACT

In this talk, the speaker will discuss the feasibility of building a sequence decision-making system with strong generalization abilities, drawing from his previous research experience in the fields of multi-agent reinforcement learning and LLM agents. The speaker will first introduce the Multi-Agent Advantage Decomposition Theorem and its application in multi-agent reinforcement learning. This approach allows for transforming the MARL problem into a sequence modeling problem, which can then be optimized in conjunction with sequence models like Transformers. Additionally, the speaker will present their latest exploration to improve LLM agents' performance, including a framework for LLM agent reinforcement learning—Action Decomposition-based Bellman Update and Policy Optimization (BAD and POAD), which aims to bridge the theoretical gaps between reinforcement learning and language model optimization and improve learning efficiency. Lastly, the speaker will explore the alignment between multi-agent sequence modeling methods and the current generative paradigm of language agents, discussing the potential and challenges of applying multi-agent reinforcement learning for systems involving multiple language agents.


搜索
您想要找的