Live Session
Teatro Petruzzelli
Paper
16 Oct
 
17:20
CEST
Session 11: Optimisation and Evaluation 1
Add Session to Calendar 2024-10-16 05:20 pm 2024-10-16 06:45 pm Europe/Rome Session 11: Optimisation and Evaluation 1 Session 11: Optimisation and Evaluation 1 is taking place on the RecSys Hub. Https://recsyshub.org
Reproducibility

Reproducibility and Analysis of Scientific Dataset Recommendation Methods

View on ACM Digital Library

Ornella Irrera (Department of Information Engineering, University of Padua), Matteo Lissandrini (Department of Foreign Languages and Literatures at the University of Verona, Italy), Daniele Dell’Aglio (Department of Computer Science, Aalborg University, Denmark) and Gianmaria Silvello (Department of Information Engineering, University of Padua, Italy)

View Paper PDFView Poster
Abstract

Datasets play a central role in scholarly communications. However, scholarly graphs are often incomplete, particularly due to the lack of connections between publications and datasets. Therefore, the importance of dataset recommendation—identifying relevant datasets for a scientific paper, an author, or a textual query—is increasing. Although various methods have been proposed for this task, their reproducibility remains unexplored, making it difficult to compare them with new approaches.We reviewed current recommendation methods for scientific datasets, focusing on the most recent and competitive approaches, including an SVM-based model, a bi-encoder retriever, a method leveraging co-authors and citation network embeddings, and a heterogeneous variational graph autoencoder. These approaches underwent a comprehensive analysis under consistent experimental conditions.Our reproducibility efforts show that three methods can be reproduced, while the graph variational autoencoder is challenging due to unavailable code and test datasets. Hence, we re-implemented this method and performed a component-based analysis to examine its strengths and limitations. Furthermore, our study indicated that three out of four considered methods produce subpar results when applied to real-world data instead of specialized datasets with ad-hoc features.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.