Live Session
Teatro Petruzzelli
Paper
17 Oct
 
14:30
CEST
Session 16: Large Language Models 2
Add Session to Calendar 2024-10-17 02:30 pm 2024-10-17 04:20 pm Europe/Rome Session 16: Large Language Models 2 Session 16: Large Language Models 2 is taking place on the RecSys Hub. Https://recsyshub.org
Reproducibility

Reproducibility of LLM-based Recommender Systems: the case study of P5 paradigm

View on ACM Digital Library

Pasquale Lops (University of Bari Aldo Moro), Antonio Silletti (University of Bari Aldo Moro), Marco Polignano (University of Bari Aldo Moro), Cataldo Musto (University of Bari Aldo Moro) and Giovanni Semeraro (University of Bari Aldo Moro)

View Paper PDFView Poster
Abstract

Recommender systems field may greatly benefit of the availability of pretrained Large Language Models (LLMs), which can serve as the core mechanism to generate recommendations based on detailed user and item data, such as textual descriptions, user reviews, and metadata. On one hand this new generation of LLM-based recommender systems paves the way to deal with traditional limitations, such as cold-start and data sparsity, but on the other hand this poses fundamental challenges for their accountability. Reproducing experiments in the new context of LLM-based recommender systems is very challenging for several reasons. New approaches are published at an unprecedented pace, which makes difficult to have a clear picture of the main protocols and good practices in the experimental evaluation. Moreover, the lack of proper frameworks for LLM-based recommendation development and evaluation makes the process of benchmarking models complex and uncertain. In this work, we discuss the main issues encountered when trying to reproduce P5 (Pretrain, Personalized Prompt, and Prediction Paradigm), one of the first works unifying different recommendation tasks in a shared language modeling and natural language generation framework. Starting from this study, we have developed OurFramework4LLM (anonymized name), a framework for training and evaluating LLMs, specifically for the recommendation task. It has been used to perform several experiments to assess the impact of different LLMs, personalization and novel set of more informative prompts on the overall performance of recommendations, in a fully reproducible environment.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.