Interpretable Policies for Dynamic Product Recommendations Marek Petrik Ronny Luss IBM T.J. Watson Research Center IBM T.J. Watson Research Center Yorktown Heights, NY 10598 Yorktown Heights, NY 10598 mpetrik@us.ibm.com rluss@us.ibm.com Abstract In many applications, it may be better to compute a good interpretable policy instead of a complex optimal one. For example, a recommendation engine might perform better when accounting for user profiles, but in the absence of such loyalty data, assumptions would have to be made that increase the complexity of the recommendation policy. A simple greedy recommendation could be implemented based on aggregated user data, but another simple policy can improve on this by accounting for the fact that users come from different segments of a population. In this paper, we study the problem of computing an optimal policy that is interpretable. In particular, we consider a policy to be interpretable if the decisions (e.g., recommendations) depend only on a small number of simple state attributes (e.g., the currently viewed product). This novel model is a general Markov decision problem with action constraints over states . We show that this problem is NP hard and develop a MILP formulation that gives an exact solution when policies are restricted to being deterministic. We demonstrate the effectiveness of the approach on a real-world business case for a European tour operator's recommendation engine. This poster from the UAI 2016 conference was given as an invited presentation at the Bayesian Modeling Applications Workshop BMAW 2016 - Page 59 of 59