Live Session
Chamber of Commerce
Poster
15 Oct
 
8:00
CEST
Tuesday Posters
Add Session to Calendar 2024-10-15 08:00 am 2024-10-15 05:30 pm Europe/Rome Tuesday Posters Tuesday Posters is taking place on the RecSys Hub. Https://recsyshub.org
Research

Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items

View on ACM Digital Library

Aleksandr Petrov (University of Glasgow), Craig Macdonald (University of Glasgow) and Nicola Tonellotto (University of Pisa)

View Paper PDFView Poster
Abstract

Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models' memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50x, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. On analysis of RecJPQ's scoring algorithm, we find that its efficiency is limited by its use of item score accumulators, which prevent parallelisation. On the other hand, LightRec (a non-sequential method that uses a similar idea of sub-ids) reported large inference efficiency improvements using an algorithm we call PQTopK. We show that it is also possible to improve RecJPQ-based models' inference efficiency using the PQTopK algorithm. In particular, we speed up RecJPQ-enhanced SASRec by a factor of 4.5x compared to the original SASRec's inference method and by the factor of 1.56x compared to the method implemented in RecJPQ code on a large-scale Gowalla dataset with more than a million items. Further, using simulated data, we show that PQTopK remains efficient with catalogues of up to tens of millions of items, removing one of the last obstacles to using Transformer-based models in production environments with large catalogues.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.