Live Session
Session 9: Sequential Recommendation 2
Industry
Short-form Video Needs Long-term Interests: An Industrial Solution for Serving Large User Sequence Models
Yuening Li (Google), Diego Uribe (Google), Chuan He (Google), Jiaxi Tang (Google DeepMind), Qingyun Liu (Google DeepMind), Junjie Shan (Google), Ben Most (Google), Kaushik Kalyan (Google), Shuchao Bi (Google), Xinyang Yi (Google DeepMind), Lichan Hong (Google DeepMind), Ed Chi (Google DeepMind) and Liang Liu (Google).
Abstract
Sequential models are invaluable for powering personalized recommendation systems. In the context of short-form video (SFV) feeds, where user behavior history is typically longer, there’s a need for a system to handle users’ long-term interests. However, deploying large sequence models to extensive web-scale applications faces challenges due to high serving cost. To address this, we propose an industrial framework designed for efficiently serving large user sequence models. Specifically, the proposed infrastructure decouples the serving of user sequence model and the main recommendation model, with user sequence model served offline (in an asynchronous manner) with periodical refresh. The proposed infrastructure is also model-agnostic; thus, it can be used to support any types of user sequence models (even LLMs) with controllable costs. Empirical results show that large user models deployed with our framework significantly and consistently enhances the quality of the main recommendation model, with minimal serving costs increase.