Live Session
Session 4: Collaborative Filtering
Main Track
The Role of Unknown Interactions in Implicit Matrix Factorization — A Probabilistic View
Joey De Pauw (University of Antwerp) and Bart Goethals (University of Antwerp)
Abstract
Matrix factorization is a well-known and effective methodology for top-k list recommendation. It became widely known during the Netflix challenge in 2006, and since then, many adapted and improved versions have been published. A particularly interesting matrix factorization algorithm called iALS (for implicit Alternating Least Squares) adapts the method for implicit feedback, i.e.\ a setting where only a very small amount of positive labels are available along with a majority of unknown labels. Compared to the classical task of rating prediction, learning from implicit feedback is applicable to many more domains, as the data is more abundant and requires less effort to elicit from users. However, the sparsity, imbalance, and implicit nature of the signal also pose unique challenges to retrieving the most relevant items to recommend. We revisit the role of unknown interactions in implicit matrix factorization. Traditionally, all unknowns are interpreted as negative samples and their importance in the training objective is then down-weighted to balance them out with the known, positive interactions.Interestingly, by adapting a probabilistic view of matrix factorization, we can retain the unknown nature of these interactions by modelling them as either positive or negative. With this new formulation that better fits the underlying data, we gain improved performance on the downstream recommendation task without any computational overhead compared to the popular iALS method. This paper outlines the key insights needed to adapt iALS to use logistic regression. Furthermore, the popular full-rank EASE model is identified as a special case of iALS. With this knowledge, EASE was trivially adapted to use logistic regression as well. An extensive experimental evaluation on several real-world datasets demonstrates the effectiveness of our approach. Additionally, a discrepancy between the need for weighting between factorization and regression models is discovered, leading towards a better understanding of these methods.