Live Session
Wednesday Posters
Industry Poster
Off-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)
Katarzyna Siudek-Tkaczuk (Samsung R&D Institute Poland), Sławomir Kapka (Samsung R&D Institute Poland), Jędrzej Alchimowicz (Samsung R&D Institute Poland), Bartłomiej Swoboda (Samsung R&D Institute Poland) and Michał Romaniuk (Samsung R&D Institute Poland)
Abstract
Off-Policy Selection (OPS) aims to select the best policy form a set of policies trained using offline Reinforcement Learning. In this work, we describe our custom OPS method and its successful application in Samsung Instant Plays for optimizing ad delivery timings. The motivation behind proposing our custom OPS method is the fact that traditional Off-Policy Evaluation (OPE) methods often exhibit enormous variance leading to unreliable results. We applied our OPS method to initialize policies for ours custom pseudo-online training pipeline. The final policy resulted in a substantial 49% lift in the number of watched ads while maintaining similar retention rate.