报告人
Chenghao Xiao
Durham University
时间
2025年10月14日 星期二
下午 14:00-15:00
地点
602会议室
Abstract
Modern AI search systems typically treat retrieval, reasoning and generation as separate modules, leading to fragmented pipelines, error propagation and missed opportunities for deep semantic understanding. In this talk, I present a unifying vision: representation learning as the alignment with latent generative knowledge. I will show how this principle bridges the long-standing divide between generative and representation models across text, image, video and audio.
Throughout the talk, I will discuss key advances from my recent work., including 1) RAR-b: where I proposed the “reasoning as retrieval” paradigm, leading to early conceptualization of representation learning as “alignment of models’ representation capabilities with their generative capabilities”. 2) MIEB, where I introduced the largest multimodal embedding benchmark, which demonstrates that multimodal generative models achieve superior representational performance with orders-of-magnitude less contrastive activation than CLIP paradigm. 3) LCO-Embedding, a language-centric training paradigm of omni-modal representation model I led at Alibaba DAMO Academy, where I also introduced “Generation-Representation Scaling Law”.
Finally, I outline frontiers directions such as 1) reinforcement learning for representation learning (RL for RL); 2) exploring non-autoregressive generative backbones (e.g., diffusion language models) for representation learning; 3) Environment-aware AI search. Together, these steps pave the way toward truly unified omni-modal AI search systems – where retrieval emerges not as a separate component, but as an intrinsic capability of generative intelligence.
Biography
Chenghao Xiao is a final-year PhD candidate at Durham University, UK. His research interest is primarily on unifying representation learning and generative models. His research has resulted in 20 publications in top-tier conferences and journals such as NeurIPS, ACL, ICCV, ICLR, EMNLP, NAACL, and TACL. He proposed the reasoning as retrieval paradigm, a revolutionary paradigm that conceptualizes training representation models as an alignment of models’ representation capabilities with their generative capabilities. He led and was the core contributor of widely-adopted embedding benchmarks like RAR-b, MIEB, and MMTEB. He led LCO-Embedding, a language-centric omni-modal representation model, at Alibaba DAMO Academy.




