主讲人
Yun Hua
Shanghai Jiao Tong University
时间
2026年4月7日 星期二
下午 14:00-15:00
地点
学院104会议室
Abstract
As AI agents evolve from traditional reinforcement learning to a new generation of increasingly capable agents powered by large language models (LLMs), a key question is how to ensure these agents act in line with human expectations. This challenge can be framed as a credit assignment problem, with reward serving as the primary learning signal for shaping agent behavior agent behavior. In single-agent settings, sparse and delayed rewards make it difficult to learn long-horizon dependencies. In multi-agent environments, it becomes a problem of attributing individual contributions to shared outcomes, often leading to free-riding and unstable cooperation.
My research addresses these challenges through reward shaping. In single-agent settings, I develop methods for cross-domain reward transfer, enabling agents to transfer knowledge from previously learned tasks and improve learning efficiency. In multi-agent and LLM-based systems, I draw on concepts from economics, including externalities and the Shapley value, to model each agent’s contribution to collective outcomes, and develop tractable approximations that enable reward shaping for structural credit assignment, thereby promoting cooperation among agents. Overall, these efforts aim to improve the learning efficiency, generalization, and cooperation of AI agents, contributing to the development of more capable and reliable AI agent systems.
Biography
Yun Hua is currently a Postdoctoral Fellow at Antai College of Economics and Management, Shanghai Jiao Tong University, working with Prof. Jun Luo. He received his Ph.D. in Computer Science and Technology from East China Normal University under the supervision of Prof. Xiangfeng Wang, where he also obtained his B.S. in Software Engineering. During his doctoral studies, he was a visiting research assistant at The Chinese University of Hong Kong, Shenzhen, collaborating with Prof. Hongyuan Zha.
His research focuses on credit assignment for AI agents, particularly in reinforcement learning and LLM-based agents, with a focus on using reward shaping to improve learning efficiency and enable effective cooperation. He has published 10 papers in this area, including 5 as first, co-first, or co-corresponding author at ICML, NeurIPS, ICLR, and KDD. He is a recipient of the Shanghai Postdoctoral Excellence Program.





