Optimal Set Recommendations based on Regret

                              Paolo Viappiani                                      Craig Boutilier
                      Department of Computer Science                       Department of Computer Science
                           University of Toronto                                University of Toronto
                           Toronto, ON, Canada                                  Toronto, ON, Canada
                           paolo@cs.toronto.edu                                 cebly@cs.toronto.edu


                             Abstract                               a user’s score: the latter tends to produce results that are
                                                                    very similar one to each other, and thus not offer much ac-
     Current conversational recommender systems do not of-          tual “choice” for a user. This is especially true when we
     fer guarantees on the quality of their recommendations,
     either because they do not maintain a model of a user’s
                                                                    recognize that estimated scores or preferences are likely to
     utility function, or do so in an ad hoc fashion. In this pa-   be very crude. Diversity is also important in practice: we
     per, we propose an approach to recommender systems             cannot generally predict how patient a user will be. They
     that incorporates explicit utility models into the rec-        may terminate the exploration of product space at any time,
     ommendation process in a decision-theoretically sound          hence the recommender system should be able to provide
     fashion. The system maintains explicit constraints on          anytime recommendations, reflecting the best recommen-
     the user’s utility based on the semantics of the prefer-       dations given the information provided by the user so far.
     ences revealed by the user’s actions. In particular, we        This characteristic of conversational recommenders is simi-
     propose and investigate a new decision criterion, set-         lar to the exploration-exploitation dilemma in reinforcement
     wise maximum regret, for constructing optimal recom-           learning. Since we do not know know how much time the
     mendation sets. This new criterion extends the mathe-
     matical notion of maximum regret used in decision the-
                                                                    user is willing to spend in order to improve the recommenda-
     ory and preference elicitation to sets. We develop com-        tion, we want to show products that are both: (a) expected to
     putational procedures for computing setwise max re-            be rated highly given the current information about the user;
     gret. We also show that the criterion suggests choice          and (b) are maximally informative should the user critique
     sets for queries that are myopically optimal: that is, it      (or otherwise provide feedback on) them.
     refines knowledge of a user’s utility function in a way           Many authors have considered the importance of diversity
     that reduces max regret more quickly than any other            in the recommendations. For example, researchers in case-
     choice set. Thus setwise max regret acts both as guar-         based reasoning have proposed techniques based on greedy
     antee on the quality of our recommendations and as a
     driver for further utility elicitation.
                                                                    maximization of diversity (McSherry 2002), defined as an
     Our simulation results suggest that this utility-
                                                                    aggregate of a distance metric, or as a weighted tradeoff be-
     theoretically sound approach to user modeling allows           tween diversity and the recommendation score (Smyth and
     much more effective navigation of a product space than         McClave 2001). However diversity and dissimilarity mea-
     traditional approaches based on, for example, heuristic        sures does not consider the information that we have about
     utility models and product similarity measures.                the user’s preference. While they guarantee that the set con-
                                                                    tains alternatives that differ in their features, they do not use
                         Introduction                               at all the information about a user’s preferences available
                                                                    from previous user actions and feedback. It has been argued
Recommender systems can help users navigate product                 that diversity should be instead tailored to the system’s be-
spaces and make decisions involving very large sets of al-          lief about the user (Price and Messinger 2005).1
ternatives. Conversational recommender systems rely on                 To maximize the information presented to the user in a
mixed-initiative interactions, with both the user and the sys-      recommendation set, and recommend a set of optimal rec-
tem taking an active role in the decision process. User feed-       ommendations, it is necessary to maintain an explicit rep-
back can be entered in many forms, for instance, as direct          resentation of the uncertainty in the preference model and
answers to queries, or critique of the options displayed by         a sound decision-theoretic semantics of the interaction in
the system.                                                         the first place. In fact, most practical conversational recom-
   Many recommender systems employ some form of diver-              mender systems (especially those using critiquing) do not
sity to show a set of products that might be appealing to the       use an explicit model of a user’s preferences, or only main-
user. Intuitively, diversity overcomes a key problem with
presentation of the top-k items based on some estimate of              1
                                                                        Indeed, the natural decision theoretic account of set recom-
Copyright c 2009, Association for the Advancement of Artificial     mendations immediately suggests diversity w.r.t. belief about a
Intelligence (www.aaai.org). All rights reserved.                   user’s preferences (Boutilier et al. 2003).
tain such a model in an ad hoc, heuristic fashion. In this pa-     set of recommended alternatives using setwise minimax re-
per, we develop an approach to set-based recommendations           gret. In Sec. 3 we discuss computation of setwise max re-
with an explicit utility model. We represent the uncertainty       gret and minimax regret, both for configuration problems
w.r.t. the user model with constraints of her utility func-        modeled as a constraint satisfaction problem (CSP) and for
tion induced by choices or critiques. To construct a suitable      product databases, while in Sec. 4 we briefly discuss the
recommendation set, we develop a novel criterion, setwise          performance of elicitation. Finally, in Sec. 5, we perform
maximum regret, that captures the idea of providing a set of       simulations of complete critiquing-based recommender sys-
jointly optimal recommendations. Our qualitative model of          tems, comparing our regret-based approach to state of the art
uncertainty has two key advantages over probabilistic mod-         critiquing algorithms such as dynamic critiquing and incre-
els (Price and Messinger 2005): relatively simple prior infor-     mental critiquing.
mation in the form of bounds or constraints on user prefer-
ences can be exploited (rather than probabilistic priors); and            Regret-based Recommendation Systems
exact computation is much more tractable (in contrast with         We begin this section by presenting our formalization of the
probabilistic models of utility that generally require reason-     decision problem, reviewing minimax regret for robust rec-
ing with densities that have no closed form (Boutilier 2002;       ommendation and elicitation, and then defining our key con-
Chajewska and Koller 2000)).                                       cepts of setwise max regret and setwise minimax regret.
   To make this model effective, user actions should be asso-
ciated with a precise, sound semantics. For instance, a user       Underlying Decision Problem
critique is assumed to reveal some aspect of the user’s pref-      We assume a recommendation system is charged with the
erences and this is used to update an explicit utility model.      task of recommending an option to a user in a multi at-
More precisely, in our work, unit critiques and compound           tribute space (e.g., computers, cars, apartment rental, etc.).
critiques places linear constraints on a user utility function.    Products are characterized by a finite set of attributes X =
The advantage of this approach is that we can use decision-        {X1 , ...Xn }, each with finite domains Dom(Xi ). Let X ⊆
theoretically sound criteria to:                                   Dom(X ) denote the set of feasible configurations. For in-
1. suggest or recommend a product;                                 stance, attributes may correspond to the features of vari-
                                                                   ous apartments, such as size, neighborhood, distance from
2. bound the difference in the quality of a recommended            public transportation, etc., with X defined either by con-
   product and the optimal option for the user;                    straints on attribute combinations (e.g., constraints on com-
3. determine which options and critiques carry the most in-        puter components that can be put together), or by an explicit
   formation to help speed up the navigation process; and          database of feasible configurations (e.g., a rental database).
                                                                      The user has a utility function u : Dom(X ) → R. In what
4. suggest to the user when to terminate the process (i.e.,        follows we will assume either a linear or additive utility
   when further interaction will offer only modest improve-        function depending on the nature of the attributes (Keeney
   ment in recommendation quality).                                and Raiffa 1976). In both additive and linear models, we
   We adopt the notion of minimax regret (Boutilier et al.         assume that u can be decomposed as follows:
2006a) to make product suggestions in the face of utility                                 X               X
function uncertainty. This robust decision criterion allows                      u(x) =        fi (xi ) =   λi vi (xi )
us to bound the loss (difference from optimal) of any rec-                                 i              i
ommendation. We propose and investigate a new decision             where each local utility function fi assigns a value to each
criterion, setwise maximum regret, for constructing optimal        element of Dom(Xi ). In classical utility elicitation, these
recommendation sets. This new criterion extends maximum            values can be determined by assessing local value func-
regret to sets of products rather than a single product. We de-    tions vi over Dom(Xi ) that are normalized       on the interval
fine set maximum regret, argue that minimizing setwise max                                                P
                                                                   [0, 1], and importance weights λi ( i λi = 1) for each at-
regret is the best means for constructing a set of options for     tribute (Keeney and Raiffa 1976; Fishburn 1967). This sets
a user, and develop effective computational procedures for         fi (xi ) = λi vi (xi ) and ensures that global utility is normal-
computing optimal recommendation sets for setwise regret.          ized on the interval [0, 1]. A simple additive model in the
   We present critiquing as a possible application domain.         rental domain might be:
While user-controlled exploration in traditional critiquing
systems does not offer any guarantees (practical, empirical,              u(Apt) = f1 (Size) + f2 (Distance) + f3 (Nbrhd )
or theoretical) of either sufficient or efficient exploration of
the space (A user may cycle through a set of similar products      When Dom(Xi ) is drawn from some real-valued set, we of-
or converge at a product far from optimal), our regret-based       ten assume that vi (hence fi ) is linear in Xi .2
recommender allows us to provide guarantees on the quality            Since a user’s utility function is not generally known, we
(utility) of the recommended product vis-à-vis feasible al-       often write u(x; w) to emphasize the dependence of u on
ternatives. We also show with simulations that regret-based           2
                                                                       Our approach relies considerably on the additive assumption,
critiquing can lead to much more efficient exploration of the      though can easily be generalized to more general models such
product space and lead to better decisions in practice.            as GAI (Fishburn 1967; Bacchus and Grove 1995; Braziunas and
   In Sec. 2 we introduce our model of regret-based recom-         Boutilier 2007a). The assumption of linearity is simply a conve-
mendation and describe our strategy for selection of a joint       nience; nothing critical depends on it.
user-specific parameters. In the additive case, the values         other words, it minimizes setwise max regret given the cur-
fi (xi ) over ∪i {Dom(Xi )} serve as a sufficient parametriza-     rent information about the user’s utility function, making it
tion of u (for linear attributes, a more succinct representation   extremely robust in the presence of utility function uncer-
is possible). The optimal product for the user with utility pa-    tainty (in a way to be made precise below). Second, max re-
rameters w is that x ∈ X that maximizes u(x; w). Our goal          gret is a well-defined progress metric that lets the user know
is to recommend, or help the user find, an optimal product,        the cost and benefit of further exploration of product space.
or one whose utility is near optimal.                              Finally, the information contained in user selection of some
                                                                   choice from the recommended set is maximally informative
Regret-based Recommendation                                        (in a sense defined below).
In probabilistic approaches to recommendation, a distribu-
tion over preferences—typically in the form a density over
                                                                   Minimax Regret
utility function parameters—is maintained, and the option          Minimax regret has been advocated as a means for robust
with highest expected utility is recommended (Chajewska et         optimization (Kouvelis and Yu 1997), and has more re-
al. 2000; Boutilier 2002; Boutilier et al. 2003). When a set       cently been used for decision making with utility uncer-
of alternatives need to be recommended, the expectimax or          tainty (Boutilier et al. 2001; Salo and Hämäläinen 2001;
EMAX criterion can be used (Boutilier et al. 2003; Price and       Boutilier et al. 2006a).
Messinger 2005). One difficulty with probabilistic models is          Assume that through some interactions with a user, and
that one requires probabilistic prior information over utility     possibly using some prior knowledge, we determine that her
models, which can be difficult to formulate and represent.         utility function w lies in some set W . Following (Boutilier
Another is that exact computation can often be computation-        et al. 2006a) we define:
ally intense; this is especially true since (arguably) natural
density models for utility functions are rarely closed under       Definition 1 Given a set of feasible utility functions W , we
the type of evidence provided by user interaction (e.g., be-       define the pairwise max regret MR(x, y; W ) of x, y ∈ X;
havioral observation or answers to queries) (Boutilier 2002;       the the max regret MR(x; W ) of x ∈ X; the minimax regret
Chajewska and Koller 2000)); as a result, computationally          MMR(W ) of W ; and the minimax optimal configuration
demanding fitting of (say) mixture models is required after        x∗W as follows:
every model update.                                                        MR(x, y; W ) = max u(y; w) − u(x; w)                 (1)
   Instead, we propose the use of minimax regret to generate                                   w∈W

recommendation sets. As we will see, this obviates the need                        MR(x; W ) = max MR(x, y; W )                 (2)
                                                                                                     y∈X
to complex probabilistic reasoning, yet can offer robust rec-
ommendations and provide very effective guidance for the                              MMR(W ) = min MR(x, W )                   (3)
                                                                                                   x∈X
user. In traditional regret-based approaches, a single recom-                              ∗
                                                                                          xW = arg min MR(x, W )                (4)
mendation is made using the minimax regret (Savage 1954)                                           x∈X

criterion. For multiple joint recommendations, we develop
the notion of setwise minimax regret (defined below). We           Intuitively, MR(x; W ) is the worst-case loss associated with
can summarize the correspondence between the Bayesian              recommending configuration x; i.e., by assuming an adver-
and the regret-based approach with the following table:            sary will choose the user’s utility function w from W to
                                                                   maximize the difference in utility between the optimal con-
                                                                   figuration (under w) and x. The minimax optimal configu-
     Probabilistic approach         Regret approach                ration x∗W minimizes this potential loss. MR(x, W ) bounds
        Expected Utility            Minimax Regret                 the loss associated with x, and is zero iff x is optimal for all
    Expected Max (EMAX)          Minimax Setwise Regret            w ∈ W . Any choice that is not minimax optimal has strictly
                                                                   greater loss than x∗W for some w ∈ W .
   In this paper we propose a framework that maintains a set          Minimax regret has proven to be an effective tool in utility
W of feasible utility models, and at each step, the system         elicitation in a variety of domains. A decision support or
shows a set of recommendations that are jointly optimal with       recommender system can query (or otherwise interact with)
respect to minimax regret. At a very high level, our regret-       a user providing additional constraints on the utility set W
based recommender works as follows:                                until minimax regret reaches some acceptable level (possibly
                                                                   optimality), elicitation costs become too high, or some other
1. The set W is initialized given some initial constraints;        termination criterion is met.
2. The current recommendations are determined (using the           Example Consider the following example, where the op-
   setwise minimax regret);                                        tions oi are defined using two features/coordinates x1 and
3. After each user action, W is refined to reflect the new         x2 :
   constraints imposed by the user’s feedback;                                                   x1         x2
4. The process repeats (steps 2 and 3) until the user is satis-                           o1    0.35       0.68
   fied or minimax regret reaches some target threshold.                                  o2     0.9       0.2
                                                                                          o3      0        0.75
This process is appealing for two reasons. First, the current                             o4      1         0
recommendation (i.e., set of options) is always optimal; in                               o5     0.5       0.3
              1
                                                                                   M R(oi , oj )    o1     o2     o3     o4     o5     M R(oi )
             0.9                                                                      o1             0    0.55   0.07   0.65   0.15     0.65
                                                                                      o2           0.48     0    0.55    0.1    0.1     0.55
             0.8
                                                                                      o3           0.35    0.9    0     1.00    0.5       1
             0.7                                                                      o4           0.68    0.2   0.75     0     0.3     0.75
                                                                                      o5           0.38    0.4   0.45    0.5     0       0.5
             0.6
   utility


             0.5
                             o1                                                      Minimax regret is 0.5, and the minimax optimal recom-
             0.4
                             o2                                                   mendation is option o5 ; its max regret occurs at adversarial
                                                                                  choice of utility w1 = 1, and choice of option o4 . It can be
             0.3             o3                                                   easily shown that regret is maximized at one of the vertexes
             0.2             o4                                                   of the feasible region W when W is a bounded, convex poly-
             0.1             o5                                                   tope (such a polytope induced by the interactions we discuss
                                                                                  later).
              0
                   0   0.1    0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1      Now imagine that, perhaps as the result of interactions
                                    utility parameter w1                          with the user, we learn that 0.2 ≤ w1 ≤ 0.6. The new min-
                                                                                  imax regret value for this constrained case is 0.138. This
Figure 1: Each options is represented by a single line in the                     value corresponds to recommendation o1 , and adversarial
utility space.                                                                    option o2 and utility w1 = 0.6. In this case, the constraints
                                                                                  0.2 ≤ w1 and w1 ≤ 0.6 decrease minimax regret signifi-
                                                                                  cantly. However usually constraints are not added directly as
                                                                                  such, but result from the acquisition of knowledge acquired
                                                                                  through a variety of interaction modalities, such as direct
We assume linear utility: u(x; w) = w1 x1 + w2 x2 where w                         user preference queries, or passive observation of user be-
is vector of tradeoff weights, with w2 = 1−w1 , 0 ≤ w1 ≤ 1;                       havior. For instance, comparison queries ask the user which
and the local value functions for each coordinate are identity                    of two proposed options is preferred. The impact of the in-
functions. (i.e., vi (xi ) = xi ). Given these assumptions,                       formation acquired depends greatly on the comparison, as
utility is one-dimensional; it can be written as u(x; w) =                        different options can lead to different degrees of regret re-
(x1 − x2 )w1 + x2 . So we deal only with the uncertainty on                       duction.
the single parameter w1 .                                                            A natural meta-heuristic for generating elicitation queries
                                                                                  is the current solution strategy (CSS), first described in
   This simple example is convenient because it is easy to                        (Boutilier et al. 2006a). This strategy would ask the user
visualize option utilities as a 1D function of w1 graphically.                    whether she prefers the minimax regret option x∗W or the
The utility of the different options are shown in Fig. 1 with                     adversary option xw = MRAdv (x∗W , W ).
respect to the parameter w1 . We notice that, for some values                        In our example, starting from the unconstrained space W ,
of w1 , each of the options o1 , o2 , o3 and o4 is optimal, but                   CSS would select {o4 , o5 } (the minimax regret option and
not so o5 . When considering a particular value of w1 (a                          the adversarial option) and ask the user to compare them.
particular utility function) the actual regret (or real loss) is                  Now, let’s assume that the user asserts that she prefers o4
the difference between the utility of the best option given                       over o5 ; then a new constraint u(o4 ; w1 ) ≥ u(o5 ; w1 ),
w1 and the utility of our recommendation in that case. For                        equivalent to w1 ≥ 0.375, is added to our model. In the
instance, for w1 = 0.9, the best option is o4 with utility 0.9,                   space W o4 ≻o5 resulting from the incorporation of this con-
while o1 has utility 0.38, so the actual regret of o1 would be                    straint, the minimax regret is 0.1, resulting from recom-
0.9 − 0.38 = 0.52. Max regret accounts for the uncertainty                        mendation o2 and adversarial option o4 . Option o4 , even
over w1 and it is the maximum of the possible actual regret                       if known to be better than o5 , has max regret of 0.18 (at
values (for o1 is 0.65 when w1 = 1).                                              w1 = 0.374, with adversarial option o1 ). Therefore, option
                                                                                  o2 will be recommended.
   When the options are few as in this case, we can compute                          Minimax regret offers recommendations that are robust
the max regret of each choice by explicitly enumerating the                       given the uncertainty of the preference model. In this ex-
maximum the pairwise regret of that choice against any pos-                       ample, o5 is recommended (in the unconstrained setting)
sible adversarial choice of option. The table below illustrates                   even though it cannot possibly be optimal for any user util-
this, where each row corresponds to a recommendation, each                        ity function; this is so because it prevents “disastrous” sit-
column to an adversarial choice, and we display the pairwise                      uations, such as would occur if options o1 or o2 are recom-
max regret (allowing the adversary to choose utility) in the                      mended when w1 is very low (despite the fact that for a good
cells. The max regret of an option is shown in the last col-                      part of utility space, these options are optimal. Note, that as
umn, and corresponds to the maximum value in its row. In                          knowledge of user utility increases, more accurate recom-
the unconstrained situation (where w1 can take any value                          mendations are made; for example, recommending o2 when
between 0 and 1), we have the following values:                                   we learn that o4 is preferred to o5 .
Optimal Recommendation Sets: Setwise Regret                       set of utilities such that xi has greater utility than any option
In most cases the value of a set of recommendations is de-        in Z.
pendent on the elements of the set jointly, not on each in-
dividually. If the user is going to benefit from only one of      W Z→xi = {w ∈ W : u(xi ; w) > u(xj ; w) ∀j 6= i, 1 ≤ j ≤ k}
the recommendations (example: recommending apartments)
then the utility of the set is then the maximum utility among
the individual options, i.e., the one the user will pick from        The set of all W Z→xi for any xi ∈ Z partitions W (we
the set.                                                          ignore the possibility of ties over full-dimensional subsets of
   The problem of set recommendations has been addressed          W , which can easily be dealt with, but complicate the pre-
using probabilistic expectation: Price and Messinger (Price       sentation marginally). An important observation (that will
and Messinger 2005) optimize set recommendations using            be used later) is that we can rewrite the setwise max-regret
the EMAX criterion, defined as the expectation of the max-        SMR as the aggregate maximum of the (individual) max-
imum utility among the options in the set.                        regret considering a partition of the utility space according
   In order to retrieve optimal set recommendations, we de-       to which option has higher utility.
fine the notion of setwise max regret. The setwise max re-
gret of a recommendations set can be seen as the equivalent       Observation 1 Given Z = {x1 , . . . , xk } and, for 1 ≤ i ≤
of EMAX in our non-probabilistic framework. Suppose we            k,
have a slate of k options to present to the user and want to
quantify the possible loss by restricting the user’s decision     SMR(Z, W ) = max[MR(x1 , W Z→x1 ), . . . , MR(xk , W Z→xk )]
to options in that slate. Intuitively, the user may select any                                                            (6)
of the k options as being “optimal.” An adversary want-
ing to maximize regret should do so assuming the any such
choice is possible—unlike max regret, we allow the user to        Example (continued) We now consider setwise max re-
select from among any of the set of k options. In this for-
malization, we choose the set of k options first, but delay       gret for the example introduced above. Let the number of
the final choice from the slate only after the adversary has      options in a recommendation set be k = 2. The following
chosen a utility function w. The regret of a set is then the      combinations are ranked best according to the setwise regret
minimum difference between the utility of the best config-        criterion.
uration under w and the utility of the options in the slate.              Set         SMR Adversary Adversary W
Specifically, define the setwise maximum regret of option set          {o1 , o4 } 0.07            o3          w1 = 0
Z = {x1 , . . . , xj } to be:                                          {o1 , o2 }      0.1        o4          w1 = 1
                                                                       {o3 , o2 }      0.1        o4          w1 = 1
      SMR(Z; W ) = max max min u(x′ ; w) − u(x; w)                     {o3 , o4 } 0.11            o1        w1 = 0.42
                    ′ x ∈X w∈W x∈Z
                                                                     The set {o1 , o4 } is the best choice for a joint recommen-
  SMR-Adv (Z; W ) = arg max max min u(x′ ; w) − u(x; w)           dation of two options, corresponding to a value of regret of
                         ′x ∈X w∈W x∈Z
                                                                  0.07. Other combinations, such as {o1 , o2 }, {o3 , o2 } and
   Setwise max regret has some intuitive properties. First,       {o3 , o4 }, also have a relative low value of regret.
adding new items to a set cannot increase setwise max re-            A set recommendation can often have dramatically lower
gret: SMR(A ∪ B, W ) ≤ SMR(A, W ). At the same time               regret than the minimax optimal single recommendation (in
incorporating options that are known to be dominated given        this case, o5 ).
W does not change setwise max regret: in other words, if             It is interesting the fact that the optimal recommendation
u(a, w) > u(b, w) for some a ∈ Z and all w ∈ W , then             set is composed of two options, o1 and o4 that, when con-
SMR(Z ∪ {b}, W ) = SMR(Z, W ). Finally, the max regret            sidered alone, are associated with high regret. Any set in-
associated with recommending the entire product set is zero:      cluding o5 , the single best recommendation, is ranked poorly
SMR(X, W ) = 0. This is the equivalent to asking the user         with respect to setwise regret.
to directly choose the best option from the space of available       We now consider the case of larger sets. If we need to
options—obviously, a task of with extreme cognitive cost,         select a slate of three options (k = 3), the regret will be
and one that runs counter to the spirit of recommendation         0.04 and the recommendation would be {o1 , o3 , o4 }; in this
assistance! But should the user be able to answer correctly,      case the adversary would pick o2 , and the value w1 = 0.51
it guarantees optimality.                                         (intersection point of o1 and o4 ).
   Setwise max regret can be equivalently written in as fol-         In the case of four options to be selected (k = 4), the set
lows:                                                             {o1 , o2 , o3 , o4 } would be recommended and it would be as-
                                                                  sociated to a setwise regret of 0: the slate includes all the
  SMR(Z, W ) = max max [u(y; w) − max u(x; w)]             (5)    options that can ever become optimal (considering Observa-
                    y∈X w∈W                x∈Z
                                                                  tion 1, it follows that for any W Z→oi that partitions W , the
 This captures the intuition that, given w, the option (among     max regret has to be 0).
those in Z) that determines setwise max regret is that with          Now we consider how setwise regret changes when new
highest utility with respect to w. In fact, it can be useful to   information is included. We consider a slate of two options
explicitly partition utility space with respect to which option   to be selected (k = 2) and we suppose that the user asserts
in Z is maximal. We define the utility subset W Z→xi as the       the preference of o4 over o5 . The recommendation set is still
                                                                repeated computation of the pairwise regret between a can-
                                                                didate recommendation and an adversarial option in order to
                                                                identify the option with minimax regret. For ease of presen-
                                                                tation, assume a linear utility function as above, defined by
                                                                weights wi over m attributes. Pairwise regret MR(x, y, W )
                                                                of recommendation x and adversarial option y is readily
                                                                computed with the following LP:

                                                                                                   X
                                                                                  max                     wi (yi − xi )         (7)
                                                                                w:wi ∈[0,1]
                                                                                                  1≤i≤m
                                                                                              X
                                                                                      s.t.         wi = 1                       (8)
                                                                                              i
                                                                                             w∈W                                (9)
Figure 2: Alpha beta pruning can speed up the search, de-
pending on the evaluation order. In this case, x1 has regret       Here we assume the feasible parameter set W is captured
0.5 against x3 , that is worse than the value 0.4 (max regret   by linear constraints. A similar LP can be formulated for
of x2 ), so we do not need to test x1 against x2 .              discrete-valued attributes, without assuming linearity, just
                                                                additivity. Hybrid models with continuous and discrete at-
                                                                tributes can easily be represented with a combination of
{o1 , o4 } but with a much lower value of (setwise) regret:     these two representations. Generalized additive utility mod-
only 0.04.                                                      els models (Fishburn 1967; Braziunas and Boutilier 2006;
   We conclude the discussion of the example with some re-      2007b) can also be easily represented in this framework.
marks on the optimization process. The adversary’s utility      This means that pairwise regret can be computed extremely
does not necessarily corresponds to one the vertex of the       efficiently (e.g., in a few milliseconds using CPLEX on the
feasible region, as in the single recommendation case; it       types of problems discussed below).
may also lie in any intersection of the hyperplanes associ-        Minimax regret computation is more complex because
ated with the options. For instance (in the unconstrained       we need to maximize over all possible adversarial choices,
case) the setwise regret of {o3 , o4 } is maximized for the     and minimize over all possible recommendations. A naive
value w1 = 0.42 (the utility that makes o3 and o4 equally       approach would consider every pair of options, requiring
preferred).                                                     O(n2 ) pairwise regret computations for a database of size
                                                                n, where each of these computations requires the solution of
   Computation of Setwise Minimax Regret                        an LP of size proportional to the number of utility parame-
In this section we discuss how to efficiently compute regret-   ters.
based recommendations. We first discuss how to compute             However, since minimax regret can be seen as a game be-
minimax regret for single recommendations and then de-          tween the recommender and an adversary, the computation
scribe how to modify these procedures to compute setwise        can be greatly improved in practice by formulating the op-
minimax regret for recommendation sets. We distinguish          timization as a minimax search and using standard pruning
two settings: configuration problems, where options are         techniques. Unlike typical games, the search tree has very
defined by variables and configuration constraints (i.e., as    limited depth: only two ply, one choice of recommendation
solutions to a constraint satisfaction problem (CSP)); and      by the MIN player (attempting to minimize regret) and one
database problems, where options are enumerated in a prod-      choice of adversarial option by the MAX player (attempting
uct database.                                                   to maximize the regret of the recommendation).3 Note that
                                                                the game has a large number of actions, once per product in
Computing Minimax Optimal Single                                the database. The MIN player (recommender) moves first,
Recommendations                                                 the MAX player (adversary) second. The leaves of the mini-
Configuration problems In configuration problems, opti-         max tree are labeled with the pairwise max regret of the two
mization over product space X is formulated as a constraint     choices on its path.
optimization problem or MIP. In such domains, minimax re-          A full evaluation of the tree requires the solution of
gret computation can be formulated as a MIP, and solved         n(n − 1) pairwise regret LPs (noting that the MIN player’s
practically for large problems using techniques such as Ben-    choice need not be explicitly evaluated or even represented
der’s decomposition and constraint generation. We refer         as a possible MAX choice, since it must yield pairwise regret
to (Boutilier et al. 2006a; 2004; Braziunas and Boutilier       of 0). However, it is generally not necessary to evaluate ev-
2007a) for more details. Our MIP formulations for setwise       ery node of the tree as Alpha-beta pruning (see (Russell and
minimax regret below will draw heavily on these techniques,     Norvig 2003) for an introductory description) can be used to
but necessitate important modifications.                        eliminate branches from evaluation.
Database problems When options are enumerated in a                 3
                                                                     The choice of the utility function by the adversary is dictated
product database, minimax regret computation requires to        by pair of options, so it need not be modeled as a move.
   Alpha-beta pruning is simple in such a simple game tree:        size       attributes       constraints     num of pairwise checks
during the tree evaluation, we maintain an upper bound               40            4               0                     41
UB (initially +Inf) at the root, representing the max regret        200            5               0                    207
of the best solution found so far (from the perspective of          400            7               10                   492
MIN), and lower bounds LB (n) at each MAX node, one                1000           10               0                   1003
for each possible MIN choice (or recommendation). Ev-              1000           10               60                  1998
ery time we evaluate a leaf node, we compute pairwise re-          1000           15               30                   999
gret MR(omin , omax , W ) of MIN’s choice omin and MAX’s
choice omax on the path. We update the lower bound at             Table 1: Number of pairwise regret checks to compute min-
the corresponding MAX node, and prune (α cut 4 ) when-            max regret on some sample datasets. We evaluate the search
ever LB (n) ≥ U B. This is because MR(omin , omax , W ) ≤         tree with our heuristics of reference utilities.
MR(omin , W ). At the same time, whenever we complete
the evaluation of a MAX node n, we update the upper bound
UB to min(U B, v(n)) where v(n), the value of the node, is        experiments run on synthetic data for illustrative purposes.)
the maximum value among the leafs.                                Set Recommendations: Setwise Minimax Regret
   The efficiency of this pruning depends on the order in
                                                                  We now consider the modification of the techniques above
which nodes are evaluated (Russell and Norvig 2003); this
                                                                  for setwise minimax regret. Naturally, setwise max regret
is especially true given the very shallow, broad nature of our
                                                                  is more computationally demanding, requiring selection of
tree. Pruning is most effective when, at each node, the best
                                                                  a set of options. However, it is still possible to formulate the
children (with respect to the relevant node evaluation, MIN
                                                                  computation in a MIP for configuration problems. Database
or MAX) are evaluated first. Figure 2 shows, in our simple
                                                                  problems are more challenging: the adversarial search pre-
example, that in the best case only 5 nodes out of 9 need
                                                                  sented above for single-item recommendation can be ap-
to be evaluated (i.e., 5 pairwise regret maximization). To
                                                                  plied directly, with replacement of a single move by the rec-
speed up the search, we consider a heuristic that first evalu-
                                                                  ommender (MIN player) by k moves, corresponding to the
ates choices at the MIN (recommender) node that are likely
                                                                  choice of k options for the slate. However, performance can
to be good candidates for minimizing max regret; and we
                                                                  take a dramatic hit as the size of the desired recommenda-
first evaluate at at MAX (adversarial) nodes options that are
                                                                  tion set increases. However, we develop a simple heuristic
likely to induce high regret against the given MIN choice.
                                                                  hill-climbing strategy that seems to provide very good rec-
These heuristics give us an evaluation order for both MIN
                                                                  ommendation sets in practice.
and MAX choices and can lead to considerable pruning. We
discuss each in turn.                                             Configuration problems: MIP formulation For config-
   For the MIN node, we note that the regret of any option        uration problems we formulate the problem of setwise min-
is maximized at one of the vertices of the feasible region        imax regret following the general strategy for single-option
W . Thus we sample t vertices (for instance, by consider-         minimax regret, formulating as a (MIP) minimization with
ing extreme weights that maximize the importance of one           exponentially many constraints. We use a constraint gen-
of the attributes) and refer to the the w so-sampled as refer-    eration procedure to prevent enumeration of the entire con-
ence utilities. These are used to initialize the lower bounds     straint set (Boutilier et al. 2006a; 2004). However, there
LB (n): we simply compute the actual regret with respect to       are some critical differences in the formulation, which we
these utilities for the option that leads to (MAX node) n. We     describe here.
then evaluate MIN’s children n in increasing order of initial        Setwise minimax regret for configuration problems can be
lower bound LB (n).                                               formulated as the following MIP.
   To order the children of MAX node, for each MAX node
n, we consider the feasible utility function w− that mini-                min            M
mizes the utility the MIN choice. (This requires a simple                 j
                                                                      M,Iw ,Xj ,Vw
                                                                                   j

optimization.) The option that maximizes utility at w− (i.e.,                                  X
                                                                                s.t. M ≥             Vwj     ∀w ∈ Vert            (10)
the optimal choice under w− ) is likely to give a high value
                                                                                             1≤j≤k
of for pairwise regret and thus represents a potentially good
adversary. Moreover, once we have generated w− , we can                                Vwj ≥ w · (x∗w − Xj ) + (Iw
                                                                                                                 j
                                                                                                                   − 1)mbig       (11)
use it to update the lower bound by considering the actual                                  ∀j ∈ [1, k] ∧ ∀w ∈ Vert               (12)
regret for each option. MAX choices are evaluated in order                              X j
                                                                                              Iw = 1 ∀w ∈ Vert                    (13)
of decreasing utility under w− .
                                                                                       1≤j≤k
   In practice, these heuristics can significantly speed up the                         j
computation of minimax regret in product databases. Table 1                            Iw ∈ {0, 1}                                (14)
shows that number of pairwise regret checks (LPs) is almost                            Vwj ≥ 0    ∀j ∈ [1, k], ∀w ∈ Vert          (15)
linear in the number of options in the database; indeed, with
these orderings, MAX nodes are often pruned immediately
without even considering an adversarial choice. (These are          This MIP minimizes M by: (a) choosing k options (or
                                                                  configurations xj designated by variables Xj (where each
   4
       beta cuts are not possible given the depth of the tree     Xj is a vector of n attributes) for the recommendation set;
(b) selecting, for each adversary w, one of those options                 to the problem of setwise minimax regret for database prob-
(the jth option) as the choice that has minimum max regret                lems, scaling is sometimes a concern. We now present a
against an adversary, and ensuring that M is greater than the             heuristic hill-climbing strategy that scales much more effec-
true regret of the jth option relative to every possible choice           tively. We describe it in the context of database problems,
of adversary utility function and option.                                 but it can also be used directly for configuration problems.
   Note however that this constraint need not be applied to                  The central idea is that is possible to modify a given
(continuously many) utility functions or exponentially many               recommendation set Z in such a way that setwise max re-
adversarial choices. In the MIP, we post these constraints                gret cannot increase, and usually decreases until a high
only for each vertex of W (i.e., w ∈ Vert(W )) and for                    quality set is found. We define the MMR-transformation
the optimal product choice x∗w for that vertex. This relies               T to be a mapping that refines a recommendation set Z
on the observation that regret maximized at vertices of W ,               by partitioning the current feasible utility space W into
and, for any adversarial choice of w, the adversarial option              {W Z→xi }, ∀xi ∈ Z, as discussed in Observation 1. In each
that maximizes the pairwise regret for any user choice is the             partition we compute the single recommendation that has
optimal option for w.5                                                    minimax regret in that region of utility space, and define the
   However, this MIP still requires (potentially) exponen-                new set recommendation T (Z) to be the collection of these
tially many constraints, one for each element of Vert(W ).                (single) minimax-optimal recommendations.
We can make computation much more effective by applying
constraint generation, observing that at the optimal solution,            Definition 2 Define the MMR-transformation T : Z → Z′ ,
very few of these constraints are likely to be active. Our                where Z = {x1 , . . . , xk }, to be T (Z) = {x′1 , . . . , x′k } such
procedure works as follows: we solve a relaxed version of                 that for all 1 ≤ i ≤ k:
the MIP above—the master problem—using only the con-
straints corresponding to a small subset Gen ⊂ Vert(W )                                      x′i = MMR-Opt(W Z→xi )
of the constraints in the MIP above. We then test whether
                                                                          We can show that T cannot increase setwise max regret.
any unexpressed constraints are violated at the current so-
lution. This involves computing the true setwise max re-
gret of the slate generated by the master problem. If the                 Observation 2 For any              set     recommendation         Z
true setwise max regret is of the slate is greater than δ, we             SMR(T (Z)) ≤ SMR(Z)
know that a constraint has been violated. Specifically, the
                                                                          We use the MMR-transformation to define our heuristic
computation of setwise max regret will produce the element
                                                                          search strategy to produce good recommendation sets; in-
w ∈ Vert(W ) and optimal product x∗w that corresponds to
                                                                          tuitively, we repeatedly T until a fixed point (with respect to
the maximally violated constraint at the current master so-
                                                                          setwise max regret, not the set itself) is found.
lution. So if a constraint is violated, we add this maximally
violated constraint to Gen, tightening the MIP relaxation,
and repeat; if not, we are assured that the current solution              Alg 1 Hill-climbing-T algorithm (HCT)
                                                                          The algorithm considers an initial set Z, and rewrites Z us-
minimizes setwise max regret.6                                            ing T until a fixed-point is found.
   In the formulation, mbig is an arbitrary big number, that
we need to encode the fact that, for any given w, only the                • Repeat Z := T(Z)
option with the highest utility (among those in the slate) with           • Until SMR(T (Z), W ) = SMR(Z, W )
respect to w contributes to the actual setwise regret.
                                                                             We initialize the slate Z using the current solution strategy
   The SMR maximization subproblem can be also encoded
                                                                          (CSS), empirically, this seems to produce the most promis-
with a MIP, similar to (Boutilier et al. 2006b). The optimiza-
                                                                          ing recommendation sets. For k = 2, this means that the
tion makes use of a decision variable to explicitly represent
                                                                          initial set is Z = {x∗W , xw }, where x∗ = MMR-Opt (W ),
the setwise regret, M , to be maximized and we constrain
                                                                          and xw = MRAdv (W ).
M to be greater than the single max regret M R(xj , W ), for                 For larger sets (k > 2), there is not a standard definition
each option in the slate.                                                 of the CSS. We propose to use the following strategy, that
Database Problems: A Hill-climbing Strategy As dis-                       we call chain of adversaries, to generate the initial slate. We
                                                                          start from {x∗W , xw } and repeatedly maximize setwise max
cussed above, while minimax search can be applied directly                regret given the current set, in some sense maximizing the
    5 j
                                                                          diversity of choices from perspective of utility space. This
      Vw is the actual regret of option Xj of the slate with respect to   gives the set {x1 , . . . , xk } where:
                                          j
the utility w when the corresponding Iw     is activated. For any w,
                     j
one and only one Iw is set to 1. In order to minimize M , the op-                     x1 =    x∗W
                                                                                  
                                j
timization will activate the Iw   corresponding to the xj with lower                  xi =    Adv ({x1 , .., xi−1 }, W ) 2 ≤ i ≤ k
actual regret. The first constraint (10) captures the idea that for a
slate of options, given w, the regret of the joint slate is the mini-
mum among the individual values of regret (in the summation, all             The chain of adversaries requires to solve single minimax
but one term are zeros).                                                  regret once, and then k − 2 setwise regret maximizations.
    6
      Note that the adding a new constraint requires the introduction     The chain of adversaries can be seen as a generalization of
of new variables to the master problem. Every time we add a new           CSS to sets of any size, and could also be considered as an
w to Gen, k new variables I and V are necessary.                          alternative, faster strategy to select recommendations.
                     0.6                                      CSS
                                                                           We performed some preliminary experiments in order to
                                                              HCT       evaluate our recommendation strategy from the prospective
                     0.5
                                                                        of elicitation. We are interested in quantifying the reduc-
                                                                        tion of regret in practice. In Figure 3 we compare the ef-
                     0.4
                                                                        ficiency of an elicitation based on our SMR criterion and
                                                                        the current solution strategy (CSS), considering a syhnthetic
        max regret


                     0.3
                                                                        dataset with 5000 options and 10 attributes. We plot max
                                                                        regret in function of the number of queries (steps). SMR is
                     0.2
                                                                        optimized using the hill-climbing strategy (HCT).
                                                                                          Example Critiquing
                     0.1

                                                                        As an evaluation setting, we apply our regret-based recom-
                      0
                                                                        mender to example-critiquing. This domain is interesting
                           1   2     3    4           5   6   7     8
                                              steps                     because current systems usually rely on heuristics and we
                                                                        expect that an utility-based approach can be greatly benefi-
Figure 3: The hillclimbing strategy based on setwise regret             cial.
outperforms the current solution strategy in this experiment               Critiquing is a setting where the user expresses feedback
(20 runs).                                                              on options that the system shows to her. In particular we
                                                                        consider a particular version of critiquing, often called the
                                                                        dynamic critiquing model (Reilly et al. 2005), where a cur-
                                   Myopic Elicitation                   rent product or recommendation is displayed, and the user is
                                                                        invited to move to a different product by choosing particu-
In addition to produce final recommendations, our criterion             lar actions (laid out in the interface) that change the product.
can also be used as a driver for further elicitation of the util-       They can include unit critiques, which request modification
ity function of the user. In fact, whenever we consider a slate         of a particular product attribute; e.g., “give me a laptop that
of recommendations, the user may give us some feedback,                 is lighter than the current one.”
perhaps selecting the option that she prefers among those in               Often alternative suggestions or compound critiques are
the set. This information is very valuable, and as we have              used in which multiple attributes (“lighter and faster pro-
seen in the initial example, can be used to reduce regret. It           cessor, but more expensive”) are tweaked, or in which a se-
is therefore interesting to assess the value of a recommenda-           lection is made from a system-suggested set of alternative
tion set also with respect to the possible feedback.                    products (“let me see laptop 3 instead of the current one”).
   An important observation is that in the case of compari-                The set of possible critiques is generated by the system,
son queries (the user selects the preferred option in a slate),         and the user chooses one of the possible actions. At each in-
the set of k optimal recommendations that minimize setwise              teraction, the user may choose to critique the current product
regret is also the optimal choice set for a comparison query            if she is not completely satisfied with it, or simply because
with respect to myopic worst case regret (WR), a measure of             she wishes to explore the product space in more depth.
the value of information of a query.                                       In general those systems use heuristics to generate the set
   The Worst-case Regret (WR) of a comparison query based               of possible critiques. However, we expect that better perfor-
on a choice set Z = {x1 , .., xk } is defined as                        mance can be obtained if critiquing suggestions are selected
                                                                        according to a decision-theoretically sound criterion as our
W R({x1 , .., xk }) = max[MMR(W Z→x1 ), .., MMR(W Z→xk )]               setwise minimax regret.
                                                    (16)                   In order to implement our approach, it is necessary to
                                                                        give a precise semantics to each of the critiquing actions.
   WR considers the “single” max regret in each possible                We identify two main reasons a user will critique an option.
scenario. It is possible to verify that WR(Z) ≤ SMR(Z);                 First, she may want to explore the product space in an effort
the worst case regret is always lower (or equal) than the set-          to better understand either the space of feasible options or
wise max regret.                                                        her own preferences. This latter desire makes sense espe-
   The optimality of minimax setwise recommendations (we                cially when one adopts the view commonly held in behav-
omit the full proof for reasons of space) with respect to W R           ioral economics that decision support systems should help
is based on the consideration (an extension of Observation 2)           people construct their preferences (not just articulate them)
that the transformation T introduced in the previous section            (Slovic 1995). Second, she may wish to improve the current
is also such that SMR(T (Z)) ≤ WR(Z) (the proof requires                product, making tradeoffs among her preferences for differ-
considering the different partitions imposed by Observa-                ent attributes. It is this latter exploitive or improvement mode
tions 1 and compare the two expression componentwise).                  that critiquing systems fail to account for adequately when
We call Z∗ the optimal recommendation set according to                  deciding on appropriate product suggestions. In this evalua-
setwise regret. A set Z′ such that W R(Z′ ) < W R(Z∗ ) but
SMR(Z′ ) > SMR(Z∗ ) leads to contradiction.7                            that SMR(Z̄) ≤ WR(Z′ ) < WR(Z∗ ) ≤ SMR(Z∗ ) but this
                                                                        means that SMR(Z̄) < SMR(Z∗ ), contradicting the optimality of
   7
       If we apply the transformation T to Z′ we obtain a set Z̄ such   Z∗ with respect to SMR.
tion we use critiques of the latter type to constrain the set of   and a comparison operators from the set: <,>,¬,=. An ex-
possible user utility functions.                                   ample pattern might be: {[Price >], [ProcessorSpeed >]}.
   In the following, we describe our simulation setting and        Second, the algorithm uses APriori to find recurrent cri-
present our results.                                               tiquing patterns; a compound critique based on a pattern
                                                                   is then presented to the user it has sufficient support in the
Experiments                                                        product database. In our experiments, the support threshold
To validate our regret-based approach to critiquing we de-         is set to 0.3 and selection of compound critiques corresponds
signed a framework that simulates a full interaction of a user     to the low-support strategy in (Reilly et al. 2004).
with a user interface. As in a real system, each simulation
comprises a number of cycles of interaction, each showing a        Incremental Critiquing
current product which the user can critique using either unit      Incremental critiquing (Reilly et al. 2005) (IC) improves
or the selection of one of the suggested recommendations.          the basic dynamic critiquing model by incorporating a user
   The simulated user continues the critiquing process until       model. While suggestions are still based on the APriori al-
the perceived increase in utility is lower than some thresh-       gorithm (as above), the retrieval of the next product associ-
old. We assume that among all possible critiquing actions,         ated with a critiquing action is based on a quality metric that
the one with highest perceived improvement will be chosen          values both the score given to the product by the preference
by the user.                                                       model and its similarity to the current product.
   In our experiment, at each interaction the system displays:        In the implementation we developed for our experiments,
• the current product,                                             we take advantage of the fact that the preference ordering
                                                                   over attributes is known: the score is dictated by a linear
• a choice of unit critiques of the current product (they re-      utility function that gives equal weight to all attributes. The
  quest the modification of a particular product attribute),       initial product is the option with maximum utility. When
• a set of suggestions, alternative options that can change        retrieving the next example from the set of products that sat-
  focus for the search, presented as such or labeled as com-       isfy the user-chosen critique, we select the product x that
  pound critiques (“lighter and faster processor, but more         maximizes score(x)·Similarity (x, y), where y is the prod-
  expensive”)                                                      uct recommended at the previous cycle, and score is the
                                                                   heuristic utility function.
  At each step, the user can choose to either select the cur-
rent product (and finish the interaction) or to tweak it in or-    Incremental Critiquing: MAUT
der to improve it and get better recommendations in the next
cycle.                                                             Another implementation of incremental critiquing (Reilly et
  We compare our regret-based approach to three other ap-          al. 2007) uses a simple multi attribute utility (MAUT) model
proaches that use compound critiques. In our case, we use          to make recommendations and generate compound critiques
the generation of a set of recommendations (based on set-          (rather than similarity). In this approach, a simple additive
wise regret) to display as alternatives. One is selected as        utility model u is generated, initially giving equal weight to
current product and the others are displayed as suggestions.       all attributes; each time an attribute is critiqued, its weight
  We briefly review the different critiquing approaches and        is multiplied by a constant (and all weights renormalized).
then we present the experimental results.                          The original design of this algorithm makes use of param-
                                                                   eterized value functions for each attribute, where the value
Dynamic Critiquing                                                 taken by the current option is considered preferred. Since
The dynamic critiquing model (Reilly et al. 2004) makes            our experimental set up assumes that the local preference
use of a particular similarity metric to retrieve the current      ordering over attribute values is known, we instead assume
product and uses the APriori datamining algorithm to pro-          a linear utility model.
pose alternative compound critiques. The algorithm dynam-             Suggestions are generated using optimization with respect
ically generates compound critiques by discovering com-            to the estimated utility model, and the k best products are
mon feature patterns among the set of products. Essentially,       presented as alternative cases. A limitation of this approach
each compound critique describes a set of products in terms        is its reliance on a fixed utility model (as opposed to rea-
of the features they have in common. For example in the PC         soning with the space of possible user utilities). Moreover,
domain, a typical compound critique might be “Faster CPU           options that all have high value in a single utility sample
and a Larger Hard Drive.” Whenever a product is shown to           are unlikely to be diverse or informative enough to generate
the user as the current product, the APriori datamining algo-      useful distinctions.
rithm is used to quickly discover these patterns and convert
                                                                   Regret-based critiquing
them into a set of suggested compound critiques. Each com-
pound critique corresponds to a product that is, among all         Our version of dynamic critiquing exploits setwise minimax
products satisfying the pattern most similar to the current        regret using the ideas above. Specifically, at any point in
one.                                                               the interaction cycle, we generate the current optimal rec-
   The generation of suggestions consists of two steps. First,     ommendation set (with respect to minimax setwise regret),
each product is matched against the current product to pro-        and propose one of these options a current product. The re-
duce lists of critique patterns, each comprising an attributes     mainder of the set is used to display suggestions.
                   1
                                                 Regret−based                           0.3                              Regret−based
                  0.9
                                                 IC: wsim                                                                IC: wsim
                                                                                                                         IC: maut
                                                 IC: maut                              0.25
                  0.8                                                                                                    DC
                                                 DC
                  0.7
                                                                                        0.2


                                                                           real loss
     max regret


                  0.6

                                                                                       0.15
                  0.5


                  0.4                                                                   0.1


                  0.3
                                                                                       0.05

                  0.2


                                                                                         0
                  0.1                                                                         2   4     6     8     10     12    14
                                                                                                            steps
                   0
                        2   4   6     8     10     12    14
                                    steps
                                                                   Figure 5: True regret (real loss) of the recommended op-
                                                                   tion at each step for four algorithms: regret-based cri-
Figure 4: Maximum regret of the recommended op-                    tiquing (regret), dynamic critiquing with compound cri-
tion at each step for four algorithms: regret-based cri-           tiques generated with APriori (DC), incremental critiquing
tiquing (regret), dynamic critiquing with compound cri-            with weighted similarity and APriori (IC) and incremental
tiques generated with APriori (DC), incremental critiquing         critiquing with a multiattribute utility model (IC maut).
with weighted similarity and APriori (IC) and incremental
critiquing with a multiattribute utility model (IC maut).
                                                                   gret (i.e., worst-case loss); so one might wonder whether
                                                                   other techniques find better products despite being unable to
Empirical Results                                                  “prove” that they are good. Fig. 5 shows this not to be the
In the experiments we compare the four different versions of       case. While other critiquing techniques recommend prod-
dynamic critiquing discussed above: the original dynamic           ucts that are much better than their regret-bounds suggest,
critiquing algorithm (similarity plus APriori), incremental        regret-based critiquing is able to consistently find the opti-
critiquing, incremental critiquing with MAUT, and regret-          mal product (and find a near-optimal product in as few as
based critiquing. We used a C implementation of the APri-          five or six interaction cycles). By contrast, the other three
ori algorithm (Bodon 2003). All systems make available             methods are unable to identify the optimal option at conver-
unit critiques of any attribute (and user’s adopt an expected      gence.
improvement semantics). We evaluate the performance of
all algorithms with respect to recommendation efficiency                                              Conclusions
and offer some speculative examination of regret-based cri-        In this paper we presented a novel formalization of recom-
tiquing in terms the tradeoff between cognitive cost and           mendations of a joint set of alternatives based on the notion
number of compound critique options presented at each in-          of regret. The criterion that we propose, setwise max regret,
teraction.                                                         represents an intuitive extension of the traditional regret cri-
   We evaluate the different critiquing methods by compar-         terion for single recommendations.
ing the quality of the recommendations with respect to max            We show how optimal recommendation sets (with re-
regret. We tested the methods on a real database of 200            spect to our criterion) can be computed with mixed integer
apartments, using randomly drawn utility functions (as de-         programming (MIP) methods and the constraint generation
scribed above), and k = 3 suggested products at each in-           technique when options are constructed from a set of config-
teraction cycle. All results are averaged over 20 simulated        uration constraints. Alternatively, set recommendations can
users. Fig. 4 shows the maximum regret of the recom-               be obtained using a hill-climbing strategy interleaved with
mended product at each stage of the interaction. We note           adversarial search in discrete settings.
that regret-based critiquing outperforms the other methods            We discuss the problem of utility elicitation, showing
of generating compound critiques by a wide margin. This            that our recommendation strategy reduces max regret more
is true when considering both the “anytime” profile of the         quickly than any other possible choice. Finally we present
method (i.e., the degree to which minimax drops) and its fi-       an application of these principles for critiquing systems.
nal convergence: our technique converges on a product who             Our reliance on explicit utility modeling and minimax
max regret is about 3% on average, while the MAUT incre-           regret provides a powerful new means of generating good
mental critiquing settles at about 10% (and the others worse,      critiques and making good product recommendations. Our
with dynamic critiquing unable to reduce max regret to less        regret-based critiquing recommender can often lead to opti-
than 18%).                                                         mal recommendations using very few, say, compound cri-
   More interesting is the fact that regret-based critiquing of-   tiquing interactions, and outperforms other dynamic cri-
fers better “actual” recommendations, as measured by true          tiquing techniques both in speed of convergence and the
regret (difference from the true optimal recommendation).          quality of the final recommendations.
Regret-based critiquing is designed to attack bounds on re-           The incorporation of noisy feedback is an important next
step; we are currently considering the possibility of a clar-         Urszula Chajewska and Daphne Koller. Utilities as random vari-
ification dialogue. The idea is to verify information that is         ables: Density estimation and structure discovery. In Proceedings
sensitive with respect to regret.                                     of the Sixteenth Conference on Uncertainty in Artificial Intelli-
   Largely unaddressed in our critiquing model is the need            gence (UAI-00), pages 63–71, Stanford, 2000.
for users to explore the product space, one of the main ad-           Urszula Chajewska, Daphne Koller, and Ronald Parr. Making ra-
vantages of critiquing. We are currently developing hybrid            tional decisions using adaptive utility elicitation. In Proceedings
models in which the system and/or user explicitly distin-             of the Seventeenth National Conference on Artificial Intelligence
                                                                      (AAAI-00), pages 363–369, Austin, TX, 2000.
guishes exploratory actions from improving actions. Even
with such a distinction, there is still the interesting question      Peter C. Fishburn. Interdependence and additivity in multivariate,
of modeling user search processes in a way that would allow           unidimensional expected utility theory. International Economic
                                                                      Review, 8:335–342, 1967.
insight into preferences to be drawn during exploration as
well. Finally, the development of models of cognitive costs           Ralph L. Keeney and Howard Raiffa. Decisions with Multiple
using techniques from behavioral economics, decision the-             Objectives: Preferences and Value Trade-offs. Wiley, New York,
                                                                      1976.
ory and psychology remains an important avenue of future
research.                                                             Panos Kouvelis and Gang Yu. Robust Discrete Optimization and
                                                                      Its Applications. Kluwer, Dordrecht, 1997.
                         References                                   David McSherry. Diversity-conscious retrieval. In ECCBR
                                                                      ’02: Proceedings of the 6th European Conference on Advances
 Fahiem Bacchus and Adam Grove. Graphical models for prefer-          in Case-Based Reasoning, pages 219–233, London, UK, 2002.
 ence and utility. In Proceedings of the Eleventh Conference on       Springer-Verlag.
 Uncertainty in Artificial Intelligence (UAI-95), pages 3–10, Mon-
 treal, 1995.                                                         Robert Price and Paul R. Messinger. Optimal recommendation
                                                                      sets: Covering uncertainty over user preferences. In Proceedings
 Ferenc Bodon. A fast apriori implementation. In Bart Goethals        of the Twentieth National Conference on Artificial Intelligence
 and Mohammed J. Zaki, editors, Proceedings of the IEEE               (AAAI’05), pages 541–548, 2005.
 ICDM Workshop on Frequent Itemset Mining Implementations
 (FIMI’03), volume 90 of CEUR Workshop Proceedings, Mel-              James Reilly, Kevin McCarthy, Lorraine McGinty, and Barry
 bourne, Florida, USA, 19. November 2003.                             Smyth. Dynamic critiquing. In Peter Funk and Pedro A.
                                                                      González-Calero, editors, ECCBR, volume 3155 of Lecture Notes
 Craig Boutilier, Fahiem Bacchus, and Ronen I. Brafman. UCP-          in Computer Science, pages 763–777. Springer, 2004.
 Networks: A directed graphical representation of conditional util-
 ities. In Proceedings of the Seventeenth Conference on Uncer-        James Reilly, Kevin McCarthy, Lorraine McGinty, and Barry
 tainty in Artificial Intelligence (UAI-01), pages 56–64, Seattle,    Smyth. Incremental critiquing. Knowl.-Based Syst., 18(4-5):143–
 2001.                                                                151, 2005.
 Craig Boutilier, Richard S. Zemel, and Benjamin Marlin. Active       James Reilly, Jiyong Zhang, Lorraine McGinty, Pearl Pu, and
 collaborative filtering. In Christopher Meek and Uffe Kjærulff,      Barry Smyth. Evaluating compound critiquing recommenders:
 editors, UAI, pages 98–106. Morgan Kaufmann, 2003.                   a real-user study. In Jeffrey K. MacKie-Mason, David C. Parkes,
                                                                      and Paul Resnick, editors, ACM Conference on Electronic Com-
 Craig Boutilier, Tuomas Sandholm, and Rob Shields. Eliciting         merce, pages 114–123. ACM, 2007.
 bid taker non-price preferences in (combinatorial) auctions. In
                                                                      Stuart Russell and Peter Norvig. Artificial Intelligence: A Mod-
 Proceedings of the Nineteenth National Conference on Artificial
                                                                      ern Approach. Prentice-Hall, Englewood Cliffs, NJ, 2nd edition
 Intelligence (AAAI-04), pages 204–211, San Jose, CA, 2004.
                                                                      edition, 2003.
 Craig Boutilier, Relu Patrascu, Pascal Poupart, and Dale Schu-
                                                                      Ahti Salo and Raimo P. Hämäläinen. Preference ratios in multi-
 urmans. Constraint-based optimization and utility elicitation us-
                                                                      attribute evaluation (PRIME)–elicitation and decision procedures
 ing the minimax decision criterion. Artifical Intelligence, 170(8–
                                                                      under incomplete information. IEEE Trans. on Systems, Man and
 9):686–713, 2006.
                                                                      Cybernetics, 31(6):533–545, 2001.
 Craig Boutilier, Relu Patrascu, Pascal Poupart, and Dale Schuur-
                                                                      Leonard J. Savage. The Foundations of Statistics. Wiley, New
 mans. Constraint-based optimization and utility elicitation using
                                                                      York, 1954.
 the minimax decision criterion. Artif. Intell., 170(8-9):686–713,
 2006.                                                                Paul Slovic. The construction of preference. American Psychol-
                                                                      ogist, 50(5):364–371, 1995.
 Craig Boutilier. A POMDP formulation of preference elicitation
 problems. In Proceedings of the Eighteenth National Conference       Barry Smyth and Paul McClave. Similarity vs. diversity. In
 on Artificial Intelligence (AAAI-02), pages 239–246, Edmonton,       David W. Aha and Ian Watson, editors, ICCBR, volume 2080
 2002.                                                                of Lecture Notes in Computer Science, pages 347–361. Springer,
                                                                      2001.
 Darius Braziunas and Craig Boutilier. Preference elicitation and
 generalized additive utility. In AAAI, 2006.
 Darius Braziunas and Craig Boutilier. Minimax regret-based elic-
 itation of generalized additive utilities. In Proceedings of the
 Twenty-third Conference on Uncertainty in Artificial Intelligence
 (UAI-07), pages 25–32, Vancouver, 2007.
 Darius Braziunas and Craig Boutilier. Minimax regret based elic-
 itation of generalized additive utilities. In Proceedings of the
 Twenty-third Conference on Uncertainty in Artificial Intelligence
 (UAI-07), pages 25–32, Vancouver, 2007.