Evaluation in the Era of LLM

发布者：梁慧丽发布时间：2025-11-28浏览次数：10

报告人

Chenghua Lin

The University of Manchester

时间

2025年7月25日星期五

上午 10:00-11:00

地点

308报告厅

Abstract

Evaluating large language models (LLMs) remains a significant challenge, especially in multimodal settings and due to the sensitivity of evaluation prompts, which can greatly affect robustness. In this talk, I will present two efforts aimed at addressing these issues. First, I introduce OmniBench, a benchmark designed to evaluate omni-language models—models capable of reasoning jointly over visual, acoustic, and textual modalities. Our findings reveal that current models struggle with such tri-modal tasks, motivating the development of OmniInstruct, a large-scale instruction tuning dataset to enhance multimodal reasoning. Second, while LLM-based evaluation is a scalable alternative to human judgment, it is highly sensitive to prompt design. To mitigate this, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Together, these works contribute towards more rigorous and trustworthy evaluation.

Biography

Chenghua Lin is a Full Professor and Chair in Natural Language Processing in the Department of Computer Science at The University of Manchester. His research focuses on integrating machine learning and NLP for language generation and understanding. He currently serves as the Chair of the ACL SIGGEN Board, a member of the IEEE Speech and Language Processing Technical Committee, and is a founding advisor of the Multimodal Art Projection community. He has received several prizes and awards for his research, including the CIKM Test-of-Time Award and the INLG Best Paper Runner-up Award. He has also held numerous program and chairing roles for *ACL conferences, including Documentation Chair for ACL’25, Publication Chair for ACL’23, Workshop Chair for AACL-IJCNLP’22, Program Chair for INLG’19, and Senior Area Chair for NAACL’25, IJCNLP-AACL’25, ACL’23, EACL’23, ACL’22, and EMNLP’20.

导航

学术交流

Evaluation in the Era of LLM

联系我们

友情链接

搜索
您想要找的

导航

学术交流

Evaluation in the Era of LLM

联系我们

友情链接

搜索您想要找的

搜索
您想要找的