主讲人
Junfan Li
Harbin Institute of Technology (Shenzhen)
时间
2026年5月14日 星期四
下午 14:00-15:00
地点
学院104会议室
Abstract
Transformer architectures rely on the self-attention mechanism, whose time complexity during inference is quadratic in the length of the input, restricting the deployment in long-context inference tasks. It has been proved that the self-attention mechanism is essentially a kernel model. Inspired by this connection, we investigate the problem of computationally budgeted online kernel learning, explore the fundamental trade-off between learning performance and computational budget, and further characterize the conditions as well as the corresponding algorithms that can break this inherent trade-off. If the eigenvalues of kernel matrix decay fast, then it is possible to maintain optimal or nearly optimal learning performance and significantly reduce the computational cost simultaneously. Finally, we show how the proposed online kernel learning algorithms can be applied to efficient LLM inference, achieving a nearly linear acceleration in computational efficiency.
Biography
Junfan Li is currently a Postdoctoral Fellow at Harbin Institute of Technology (Shenzhen), where he works with Prof. Liqiang Nie and Prof. Zenglin Xu. He received his Ph.D and M.S. in Computer Science from Tianjin University, advised by Prof. Shizhong Liao, and his B.S. in Biomedical Engineering from University of Electronic Science and Technology of China. His research interests lie in machine learning under resource constraints, such as computational resource, communication, attributes and privacy, with a particular focus on resource-limited online (kernel) learning, federated learning, and efficient LLM inference. He has published first-author papers in COLT, ICML, NeurIPS, AAAI, ECML-PKDD, ML, and JCST.





