Interpretable Policies for Dynamic Product Recommendations

Marek Petrik Ronny Luss
IBM T.J. Watson Research Center IBM T.J. Watson Research Center
Yorktown Heights, NY 10598 Yorktown Heights, NY 10598
mpetrik@us.ibm.com rluss@us.ibm.com

Abstract

In many applications, it may be better to compute a good interpretable policy instead of a complex optimal one.
For example, a recommendation engine might perform better when accounting for user profiles, but in the
absence of such loyalty data, assumptions would have to be made that increase the complexity of the
recommendation policy. A simple greedy recommendation could be implemented based on aggregated user
data, but another simple policy can improve on this by accounting for the fact that users come from different
segments of a population. In this paper, we study the problem of computing an optimal policy that is
interpretable. In particular, we consider a policy to be interpretable if the decisions (e.g., recommendations)
depend only on a small number of simple state attributes (e.g., the currently viewed product). This novel model
is a general Markov decision problem with action constraints over states . We show that this problem is NP hard
and develop a MILP formulation that gives an exact solution when policies are restricted to being deterministic.
We demonstrate the effectiveness of the approach on a real-world business case for a European tour operator's
recommendation engine.

This poster from the UAI 2016 conference was given as an invited presentation at the Bayesian Modeling Applications
Workshop

BMAW 2016 - Page 59 of 59