Live Session
Teatro Petruzzelli
Paper
17 Oct
 
12:30
CEST
Session 15: Off-policy Learning
Add Session to Calendar 2024-10-17 12:30 pm 2024-10-17 01:15 pm Europe/Rome Session 15: Off-policy Learning Session 15: Off-policy Learning is taking place on the RecSys Hub. Https://recsyshub.org
Main Track

Optimal Baseline Corrections for Off-Policy Contextual Bandits

View on ACM Digital Library

Shashank Gupta (University of Amsterdam, The Netherlands), Olivier Jeunen (ShareChat), Harrie Oosterhuis (Radboud University) and Maarten de Rijke (University of Amsterdam)

View Paper PDFView Poster
Abstract

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation). Our work unifies these approaches by proposing a single framework built on their equivalence in learning scenarios.The foundation of our framework is the derivation of an equivalent baseline correction for all of the existing control variates. Consequently, our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it. This optimal estimator brings significantly improved performance in both evaluation and learning, and minimizes data requirements. Empirical observations corroborate our theoretical findings.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.