Live Session
Teatro Petruzzelli
Paper
16 Oct
 
15:15
CEST
Session 10: Graph Learning
Add Session to Calendar 2024-10-16 03:15 pm 2024-10-16 04:20 pm Europe/Rome Session 10: Graph Learning Session 10: Graph Learning is taking place on the RecSys Hub. Https://recsyshub.org
Main Track

A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation

View on ACM Digital Library

Zixuan Yi (University of Glasgow) and Iadh Ounis (University of Glasgow)

View Paper PDFView Poster
Abstract

With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommendation systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality modelling. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality modelling process produces disjointed embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item representations for effective user preferences prediction. We hypothesise that the use of a unified model for addressing both aforementioned isolated processes will enable the consistent extraction and cohesive fusion of joint multi-modal features, thereby enhancing the effectiveness of multi-modal recommender systems. In this paper, we propose a novel model, called Unified Multi-modal Graph Transformer (UGT), which firstly leverages a multi-way transformer to extract aligned multi-modal features from raw data for top-k recommendation. Subsequently, we build a unified graph neural network in our UGT model to jointly fuse the user/item representations with their corresponding multi-modal features. Using the graph transformer architecture of our UGT model, we show that the UGT model can achieve significant effectiveness gains, especially when jointly optimised with the commonly-used multi-modal recommendation losses. Our extensive experiments conducted on three benchmarkdatasets demonstrate the superiority of our proposed UGT model over seven existing state-of-the-art recommendation approaches.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.