Live Session
Tuesday Posters
Late Breaking Results
Informed Dataset-Selection with Algorithm-Performance-Spaces
Joeran Beel (University of Siegen), Lukas Wegmeth (University of Siegen), Lien Michiels (University of Antwerp) and Steffen Schulz (University of Siegen)
Abstract
When designing recommender-systems experiments, central questions are how many and which datasets to use. So far,the community has not answered these questions. We argue that the informed selection of datasets for recommender-system research is a crucial aspect of the design of offline experiments. Eventually, the goal of evaluating a recommender-system algorithm offline is to obtain an estimate of how well the algorithm will perform on future unknown data compared to another algorithm. In this paper, we propose one method to strategically select datasets for recommender-system experiments to obtain good generalization power to new data. Namely, we introduce the idea of "Algorithm Performance Spaces" in which datasets are plottet based on how algorithms perform on them. This allows to identify diverse datasets, whereas "diverse" considers how differently algorithms perform on the dataset. We do not claim to have found the final answer. We see the proposed method as one suggestion that will hopefully initiate a discussion in the community andeventually lead to accepted best practices on dataset selection.