Live Session
Session 7: Cold Start
Main Track
A Multi-modal Modeling Framework for Cold-start Short-video Recommendation
Gaode Chen (Kuaishou Technology), Ruina Sun (Kuaishou Technology), Yuezihan Jiang (Kuaishou Technology), Jiangxia Cao (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Han Li (Kuaishou Technology), Kun Gai (Kuaishou Technology) and Xinghua Zhang (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China)
Abstract
Short video has witnessed rapid growth in the past few years in multimedia platforms. To ensure the freshness of the videos, platforms receive a large number of user-uploaded videos every day, making collaborative filtering-based recommender methods suffer from the item cold-start problem (e.g., the new-coming videos are difficult to compete with existing videos). Consequently, increasing efforts tackle the cold-start issue from the content perspective, focusing on modeling the multi-modal preferences of users, a fair way to compete with new-coming and existing videos. However, recent studies ignore the existing gap between multi-modal embedding extraction and user interest modeling as well as the discrepant intensities of user preferences for different modalities. In this paper, we propose M3CSR, a multi-modal modeling framework for cold-start short video recommendation. Specifically, we preprocess content-oriented multi-modal features for items and obtain trainable category IDs by performing clustering. In each modality, we combine modality-specific cluster ID embedding and the mapped original modality feature as modality-specific representation of the item to address the gap. Meanwhile, M3CSR measures the user modality-specific intensity based on the correlation between modality-specific interest and behavioral interest and employs pairwise loss to further decouple user multi-modal interests. Extensive experiments on four real-world datasets demonstrate the superiority of our proposed model. The framework has been deployed on a billion-user scale short video application and has shown improvements in various commercial metrics within cold-start scenarios.