Optimal Set Recommendations based on Regret Paolo Viappiani Craig Boutilier Department of Computer Science Department of Computer Science University of Toronto University of Toronto Toronto, ON, Canada Toronto, ON, Canada paolo@cs.toronto.edu cebly@cs.toronto.edu Abstract a user’s score: the latter tends to produce results that are very similar one to each other, and thus not offer much ac- Current conversational recommender systems do not of- tual “choice” for a user. This is especially true when we fer guarantees on the quality of their recommendations, either because they do not maintain a model of a user’s recognize that estimated scores or preferences are likely to utility function, or do so in an ad hoc fashion. In this pa- be very crude. Diversity is also important in practice: we per, we propose an approach to recommender systems cannot generally predict how patient a user will be. They that incorporates explicit utility models into the rec- may terminate the exploration of product space at any time, ommendation process in a decision-theoretically sound hence the recommender system should be able to provide fashion. The system maintains explicit constraints on anytime recommendations, reflecting the best recommen- the user’s utility based on the semantics of the prefer- dations given the information provided by the user so far. ences revealed by the user’s actions. In particular, we This characteristic of conversational recommenders is simi- propose and investigate a new decision criterion, set- lar to the exploration-exploitation dilemma in reinforcement wise maximum regret, for constructing optimal recom- learning. Since we do not know know how much time the mendation sets. This new criterion extends the mathe- matical notion of maximum regret used in decision the- user is willing to spend in order to improve the recommenda- ory and preference elicitation to sets. We develop com- tion, we want to show products that are both: (a) expected to putational procedures for computing setwise max re- be rated highly given the current information about the user; gret. We also show that the criterion suggests choice and (b) are maximally informative should the user critique sets for queries that are myopically optimal: that is, it (or otherwise provide feedback on) them. refines knowledge of a user’s utility function in a way Many authors have considered the importance of diversity that reduces max regret more quickly than any other in the recommendations. For example, researchers in case- choice set. Thus setwise max regret acts both as guar- based reasoning have proposed techniques based on greedy antee on the quality of our recommendations and as a driver for further utility elicitation. maximization of diversity (McSherry 2002), defined as an Our simulation results suggest that this utility- aggregate of a distance metric, or as a weighted tradeoff be- theoretically sound approach to user modeling allows tween diversity and the recommendation score (Smyth and much more effective navigation of a product space than McClave 2001). However diversity and dissimilarity mea- traditional approaches based on, for example, heuristic sures does not consider the information that we have about utility models and product similarity measures. the user’s preference. While they guarantee that the set con- tains alternatives that differ in their features, they do not use Introduction at all the information about a user’s preferences available from previous user actions and feedback. It has been argued Recommender systems can help users navigate product that diversity should be instead tailored to the system’s be- spaces and make decisions involving very large sets of al- lief about the user (Price and Messinger 2005).1 ternatives. Conversational recommender systems rely on To maximize the information presented to the user in a mixed-initiative interactions, with both the user and the sys- recommendation set, and recommend a set of optimal rec- tem taking an active role in the decision process. User feed- ommendations, it is necessary to maintain an explicit rep- back can be entered in many forms, for instance, as direct resentation of the uncertainty in the preference model and answers to queries, or critique of the options displayed by a sound decision-theoretic semantics of the interaction in the system. the first place. In fact, most practical conversational recom- Many recommender systems employ some form of diver- mender systems (especially those using critiquing) do not sity to show a set of products that might be appealing to the use an explicit model of a user’s preferences, or only main- user. Intuitively, diversity overcomes a key problem with presentation of the top-k items based on some estimate of 1 Indeed, the natural decision theoretic account of set recom- Copyright c 2009, Association for the Advancement of Artificial mendations immediately suggests diversity w.r.t. belief about a Intelligence (www.aaai.org). All rights reserved. user’s preferences (Boutilier et al. 2003). tain such a model in an ad hoc, heuristic fashion. In this pa- set of recommended alternatives using setwise minimax re- per, we develop an approach to set-based recommendations gret. In Sec. 3 we discuss computation of setwise max re- with an explicit utility model. We represent the uncertainty gret and minimax regret, both for configuration problems w.r.t. the user model with constraints of her utility func- modeled as a constraint satisfaction problem (CSP) and for tion induced by choices or critiques. To construct a suitable product databases, while in Sec. 4 we briefly discuss the recommendation set, we develop a novel criterion, setwise performance of elicitation. Finally, in Sec. 5, we perform maximum regret, that captures the idea of providing a set of simulations of complete critiquing-based recommender sys- jointly optimal recommendations. Our qualitative model of tems, comparing our regret-based approach to state of the art uncertainty has two key advantages over probabilistic mod- critiquing algorithms such as dynamic critiquing and incre- els (Price and Messinger 2005): relatively simple prior infor- mental critiquing. mation in the form of bounds or constraints on user prefer- ences can be exploited (rather than probabilistic priors); and Regret-based Recommendation Systems exact computation is much more tractable (in contrast with We begin this section by presenting our formalization of the probabilistic models of utility that generally require reason- decision problem, reviewing minimax regret for robust rec- ing with densities that have no closed form (Boutilier 2002; ommendation and elicitation, and then defining our key con- Chajewska and Koller 2000)). cepts of setwise max regret and setwise minimax regret. To make this model effective, user actions should be asso- ciated with a precise, sound semantics. For instance, a user Underlying Decision Problem critique is assumed to reveal some aspect of the user’s pref- We assume a recommendation system is charged with the erences and this is used to update an explicit utility model. task of recommending an option to a user in a multi at- More precisely, in our work, unit critiques and compound tribute space (e.g., computers, cars, apartment rental, etc.). critiques places linear constraints on a user utility function. Products are characterized by a finite set of attributes X = The advantage of this approach is that we can use decision- {X1 , ...Xn }, each with finite domains Dom(Xi ). Let X ⊆ theoretically sound criteria to: Dom(X ) denote the set of feasible configurations. For in- 1. suggest or recommend a product; stance, attributes may correspond to the features of vari- ous apartments, such as size, neighborhood, distance from 2. bound the difference in the quality of a recommended public transportation, etc., with X defined either by con- product and the optimal option for the user; straints on attribute combinations (e.g., constraints on com- 3. determine which options and critiques carry the most in- puter components that can be put together), or by an explicit formation to help speed up the navigation process; and database of feasible configurations (e.g., a rental database). The user has a utility function u : Dom(X ) → R. In what 4. suggest to the user when to terminate the process (i.e., follows we will assume either a linear or additive utility when further interaction will offer only modest improve- function depending on the nature of the attributes (Keeney ment in recommendation quality). and Raiffa 1976). In both additive and linear models, we We adopt the notion of minimax regret (Boutilier et al. assume that u can be decomposed as follows: 2006a) to make product suggestions in the face of utility X X function uncertainty. This robust decision criterion allows u(x) = fi (xi ) = λi vi (xi ) us to bound the loss (difference from optimal) of any rec- i i ommendation. We propose and investigate a new decision where each local utility function fi assigns a value to each criterion, setwise maximum regret, for constructing optimal element of Dom(Xi ). In classical utility elicitation, these recommendation sets. This new criterion extends maximum values can be determined by assessing local value func- regret to sets of products rather than a single product. We de- tions vi over Dom(Xi ) that are normalized on the interval fine set maximum regret, argue that minimizing setwise max P [0, 1], and importance weights λi ( i λi = 1) for each at- regret is the best means for constructing a set of options for tribute (Keeney and Raiffa 1976; Fishburn 1967). This sets a user, and develop effective computational procedures for fi (xi ) = λi vi (xi ) and ensures that global utility is normal- computing optimal recommendation sets for setwise regret. ized on the interval [0, 1]. A simple additive model in the We present critiquing as a possible application domain. rental domain might be: While user-controlled exploration in traditional critiquing systems does not offer any guarantees (practical, empirical, u(Apt) = f1 (Size) + f2 (Distance) + f3 (Nbrhd ) or theoretical) of either sufficient or efficient exploration of the space (A user may cycle through a set of similar products When Dom(Xi ) is drawn from some real-valued set, we of- or converge at a product far from optimal), our regret-based ten assume that vi (hence fi ) is linear in Xi .2 recommender allows us to provide guarantees on the quality Since a user’s utility function is not generally known, we (utility) of the recommended product vis-à-vis feasible al- often write u(x; w) to emphasize the dependence of u on ternatives. We also show with simulations that regret-based 2 Our approach relies considerably on the additive assumption, critiquing can lead to much more efficient exploration of the though can easily be generalized to more general models such product space and lead to better decisions in practice. as GAI (Fishburn 1967; Bacchus and Grove 1995; Braziunas and In Sec. 2 we introduce our model of regret-based recom- Boutilier 2007a). The assumption of linearity is simply a conve- mendation and describe our strategy for selection of a joint nience; nothing critical depends on it. user-specific parameters. In the additive case, the values other words, it minimizes setwise max regret given the cur- fi (xi ) over ∪i {Dom(Xi )} serve as a sufficient parametriza- rent information about the user’s utility function, making it tion of u (for linear attributes, a more succinct representation extremely robust in the presence of utility function uncer- is possible). The optimal product for the user with utility pa- tainty (in a way to be made precise below). Second, max re- rameters w is that x ∈ X that maximizes u(x; w). Our goal gret is a well-defined progress metric that lets the user know is to recommend, or help the user find, an optimal product, the cost and benefit of further exploration of product space. or one whose utility is near optimal. Finally, the information contained in user selection of some choice from the recommended set is maximally informative Regret-based Recommendation (in a sense defined below). In probabilistic approaches to recommendation, a distribu- tion over preferences—typically in the form a density over Minimax Regret utility function parameters—is maintained, and the option Minimax regret has been advocated as a means for robust with highest expected utility is recommended (Chajewska et optimization (Kouvelis and Yu 1997), and has more re- al. 2000; Boutilier 2002; Boutilier et al. 2003). When a set cently been used for decision making with utility uncer- of alternatives need to be recommended, the expectimax or tainty (Boutilier et al. 2001; Salo and Hämäläinen 2001; EMAX criterion can be used (Boutilier et al. 2003; Price and Boutilier et al. 2006a). Messinger 2005). One difficulty with probabilistic models is Assume that through some interactions with a user, and that one requires probabilistic prior information over utility possibly using some prior knowledge, we determine that her models, which can be difficult to formulate and represent. utility function w lies in some set W . Following (Boutilier Another is that exact computation can often be computation- et al. 2006a) we define: ally intense; this is especially true since (arguably) natural density models for utility functions are rarely closed under Definition 1 Given a set of feasible utility functions W , we the type of evidence provided by user interaction (e.g., be- define the pairwise max regret MR(x, y; W ) of x, y ∈ X; havioral observation or answers to queries) (Boutilier 2002; the the max regret MR(x; W ) of x ∈ X; the minimax regret Chajewska and Koller 2000)); as a result, computationally MMR(W ) of W ; and the minimax optimal configuration demanding fitting of (say) mixture models is required after x∗W as follows: every model update. MR(x, y; W ) = max u(y; w) − u(x; w) (1) Instead, we propose the use of minimax regret to generate w∈W recommendation sets. As we will see, this obviates the need MR(x; W ) = max MR(x, y; W ) (2) y∈X to complex probabilistic reasoning, yet can offer robust rec- ommendations and provide very effective guidance for the MMR(W ) = min MR(x, W ) (3) x∈X user. In traditional regret-based approaches, a single recom- ∗ xW = arg min MR(x, W ) (4) mendation is made using the minimax regret (Savage 1954) x∈X criterion. For multiple joint recommendations, we develop the notion of setwise minimax regret (defined below). We Intuitively, MR(x; W ) is the worst-case loss associated with can summarize the correspondence between the Bayesian recommending configuration x; i.e., by assuming an adver- and the regret-based approach with the following table: sary will choose the user’s utility function w from W to maximize the difference in utility between the optimal con- figuration (under w) and x. The minimax optimal configu- Probabilistic approach Regret approach ration x∗W minimizes this potential loss. MR(x, W ) bounds Expected Utility Minimax Regret the loss associated with x, and is zero iff x is optimal for all Expected Max (EMAX) Minimax Setwise Regret w ∈ W . Any choice that is not minimax optimal has strictly greater loss than x∗W for some w ∈ W . In this paper we propose a framework that maintains a set Minimax regret has proven to be an effective tool in utility W of feasible utility models, and at each step, the system elicitation in a variety of domains. A decision support or shows a set of recommendations that are jointly optimal with recommender system can query (or otherwise interact with) respect to minimax regret. At a very high level, our regret- a user providing additional constraints on the utility set W based recommender works as follows: until minimax regret reaches some acceptable level (possibly optimality), elicitation costs become too high, or some other 1. The set W is initialized given some initial constraints; termination criterion is met. 2. The current recommendations are determined (using the Example Consider the following example, where the op- setwise minimax regret); tions oi are defined using two features/coordinates x1 and 3. After each user action, W is refined to reflect the new x2 : constraints imposed by the user’s feedback; x1 x2 4. The process repeats (steps 2 and 3) until the user is satis- o1 0.35 0.68 fied or minimax regret reaches some target threshold. o2 0.9 0.2 o3 0 0.75 This process is appealing for two reasons. First, the current o4 1 0 recommendation (i.e., set of options) is always optimal; in o5 0.5 0.3 1 M R(oi , oj ) o1 o2 o3 o4 o5 M R(oi ) 0.9 o1 0 0.55 0.07 0.65 0.15 0.65 o2 0.48 0 0.55 0.1 0.1 0.55 0.8 o3 0.35 0.9 0 1.00 0.5 1 0.7 o4 0.68 0.2 0.75 0 0.3 0.75 o5 0.38 0.4 0.45 0.5 0 0.5 0.6 utility 0.5 o1 Minimax regret is 0.5, and the minimax optimal recom- 0.4 o2 mendation is option o5 ; its max regret occurs at adversarial choice of utility w1 = 1, and choice of option o4 . It can be 0.3 o3 easily shown that regret is maximized at one of the vertexes 0.2 o4 of the feasible region W when W is a bounded, convex poly- 0.1 o5 tope (such a polytope induced by the interactions we discuss later). 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Now imagine that, perhaps as the result of interactions utility parameter w1 with the user, we learn that 0.2 ≤ w1 ≤ 0.6. The new min- imax regret value for this constrained case is 0.138. This Figure 1: Each options is represented by a single line in the value corresponds to recommendation o1 , and adversarial utility space. option o2 and utility w1 = 0.6. In this case, the constraints 0.2 ≤ w1 and w1 ≤ 0.6 decrease minimax regret signifi- cantly. However usually constraints are not added directly as such, but result from the acquisition of knowledge acquired through a variety of interaction modalities, such as direct We assume linear utility: u(x; w) = w1 x1 + w2 x2 where w user preference queries, or passive observation of user be- is vector of tradeoff weights, with w2 = 1−w1 , 0 ≤ w1 ≤ 1; havior. For instance, comparison queries ask the user which and the local value functions for each coordinate are identity of two proposed options is preferred. The impact of the in- functions. (i.e., vi (xi ) = xi ). Given these assumptions, formation acquired depends greatly on the comparison, as utility is one-dimensional; it can be written as u(x; w) = different options can lead to different degrees of regret re- (x1 − x2 )w1 + x2 . So we deal only with the uncertainty on duction. the single parameter w1 . A natural meta-heuristic for generating elicitation queries is the current solution strategy (CSS), first described in This simple example is convenient because it is easy to (Boutilier et al. 2006a). This strategy would ask the user visualize option utilities as a 1D function of w1 graphically. whether she prefers the minimax regret option x∗W or the The utility of the different options are shown in Fig. 1 with adversary option xw = MRAdv (x∗W , W ). respect to the parameter w1 . We notice that, for some values In our example, starting from the unconstrained space W , of w1 , each of the options o1 , o2 , o3 and o4 is optimal, but CSS would select {o4 , o5 } (the minimax regret option and not so o5 . When considering a particular value of w1 (a the adversarial option) and ask the user to compare them. particular utility function) the actual regret (or real loss) is Now, let’s assume that the user asserts that she prefers o4 the difference between the utility of the best option given over o5 ; then a new constraint u(o4 ; w1 ) ≥ u(o5 ; w1 ), w1 and the utility of our recommendation in that case. For equivalent to w1 ≥ 0.375, is added to our model. In the instance, for w1 = 0.9, the best option is o4 with utility 0.9, space W o4 ≻o5 resulting from the incorporation of this con- while o1 has utility 0.38, so the actual regret of o1 would be straint, the minimax regret is 0.1, resulting from recom- 0.9 − 0.38 = 0.52. Max regret accounts for the uncertainty mendation o2 and adversarial option o4 . Option o4 , even over w1 and it is the maximum of the possible actual regret if known to be better than o5 , has max regret of 0.18 (at values (for o1 is 0.65 when w1 = 1). w1 = 0.374, with adversarial option o1 ). Therefore, option o2 will be recommended. When the options are few as in this case, we can compute Minimax regret offers recommendations that are robust the max regret of each choice by explicitly enumerating the given the uncertainty of the preference model. In this ex- maximum the pairwise regret of that choice against any pos- ample, o5 is recommended (in the unconstrained setting) sible adversarial choice of option. The table below illustrates even though it cannot possibly be optimal for any user util- this, where each row corresponds to a recommendation, each ity function; this is so because it prevents “disastrous” sit- column to an adversarial choice, and we display the pairwise uations, such as would occur if options o1 or o2 are recom- max regret (allowing the adversary to choose utility) in the mended when w1 is very low (despite the fact that for a good cells. The max regret of an option is shown in the last col- part of utility space, these options are optimal. Note, that as umn, and corresponds to the maximum value in its row. In knowledge of user utility increases, more accurate recom- the unconstrained situation (where w1 can take any value mendations are made; for example, recommending o2 when between 0 and 1), we have the following values: we learn that o4 is preferred to o5 . Optimal Recommendation Sets: Setwise Regret set of utilities such that xi has greater utility than any option In most cases the value of a set of recommendations is de- in Z. pendent on the elements of the set jointly, not on each in- dividually. If the user is going to benefit from only one of W Z→xi = {w ∈ W : u(xi ; w) > u(xj ; w) ∀j 6= i, 1 ≤ j ≤ k} the recommendations (example: recommending apartments) then the utility of the set is then the maximum utility among the individual options, i.e., the one the user will pick from The set of all W Z→xi for any xi ∈ Z partitions W (we the set. ignore the possibility of ties over full-dimensional subsets of The problem of set recommendations has been addressed W , which can easily be dealt with, but complicate the pre- using probabilistic expectation: Price and Messinger (Price sentation marginally). An important observation (that will and Messinger 2005) optimize set recommendations using be used later) is that we can rewrite the setwise max-regret the EMAX criterion, defined as the expectation of the max- SMR as the aggregate maximum of the (individual) max- imum utility among the options in the set. regret considering a partition of the utility space according In order to retrieve optimal set recommendations, we de- to which option has higher utility. fine the notion of setwise max regret. The setwise max re- gret of a recommendations set can be seen as the equivalent Observation 1 Given Z = {x1 , . . . , xk } and, for 1 ≤ i ≤ of EMAX in our non-probabilistic framework. Suppose we k, have a slate of k options to present to the user and want to quantify the possible loss by restricting the user’s decision SMR(Z, W ) = max[MR(x1 , W Z→x1 ), . . . , MR(xk , W Z→xk )] to options in that slate. Intuitively, the user may select any (6) of the k options as being “optimal.” An adversary want- ing to maximize regret should do so assuming the any such choice is possible—unlike max regret, we allow the user to Example (continued) We now consider setwise max re- select from among any of the set of k options. In this for- malization, we choose the set of k options first, but delay gret for the example introduced above. Let the number of the final choice from the slate only after the adversary has options in a recommendation set be k = 2. The following chosen a utility function w. The regret of a set is then the combinations are ranked best according to the setwise regret minimum difference between the utility of the best config- criterion. uration under w and the utility of the options in the slate. Set SMR Adversary Adversary W Specifically, define the setwise maximum regret of option set {o1 , o4 } 0.07 o3 w1 = 0 Z = {x1 , . . . , xj } to be: {o1 , o2 } 0.1 o4 w1 = 1 {o3 , o2 } 0.1 o4 w1 = 1 SMR(Z; W ) = max max min u(x′ ; w) − u(x; w) {o3 , o4 } 0.11 o1 w1 = 0.42 ′ x ∈X w∈W x∈Z The set {o1 , o4 } is the best choice for a joint recommen- SMR-Adv (Z; W ) = arg max max min u(x′ ; w) − u(x; w) dation of two options, corresponding to a value of regret of ′x ∈X w∈W x∈Z 0.07. Other combinations, such as {o1 , o2 }, {o3 , o2 } and Setwise max regret has some intuitive properties. First, {o3 , o4 }, also have a relative low value of regret. adding new items to a set cannot increase setwise max re- A set recommendation can often have dramatically lower gret: SMR(A ∪ B, W ) ≤ SMR(A, W ). At the same time regret than the minimax optimal single recommendation (in incorporating options that are known to be dominated given this case, o5 ). W does not change setwise max regret: in other words, if It is interesting the fact that the optimal recommendation u(a, w) > u(b, w) for some a ∈ Z and all w ∈ W , then set is composed of two options, o1 and o4 that, when con- SMR(Z ∪ {b}, W ) = SMR(Z, W ). Finally, the max regret sidered alone, are associated with high regret. Any set in- associated with recommending the entire product set is zero: cluding o5 , the single best recommendation, is ranked poorly SMR(X, W ) = 0. This is the equivalent to asking the user with respect to setwise regret. to directly choose the best option from the space of available We now consider the case of larger sets. If we need to options—obviously, a task of with extreme cognitive cost, select a slate of three options (k = 3), the regret will be and one that runs counter to the spirit of recommendation 0.04 and the recommendation would be {o1 , o3 , o4 }; in this assistance! But should the user be able to answer correctly, case the adversary would pick o2 , and the value w1 = 0.51 it guarantees optimality. (intersection point of o1 and o4 ). Setwise max regret can be equivalently written in as fol- In the case of four options to be selected (k = 4), the set lows: {o1 , o2 , o3 , o4 } would be recommended and it would be as- sociated to a setwise regret of 0: the slate includes all the SMR(Z, W ) = max max [u(y; w) − max u(x; w)] (5) options that can ever become optimal (considering Observa- y∈X w∈W x∈Z tion 1, it follows that for any W Z→oi that partitions W , the This captures the intuition that, given w, the option (among max regret has to be 0). those in Z) that determines setwise max regret is that with Now we consider how setwise regret changes when new highest utility with respect to w. In fact, it can be useful to information is included. We consider a slate of two options explicitly partition utility space with respect to which option to be selected (k = 2) and we suppose that the user asserts in Z is maximal. We define the utility subset W Z→xi as the the preference of o4 over o5 . The recommendation set is still repeated computation of the pairwise regret between a can- didate recommendation and an adversarial option in order to identify the option with minimax regret. For ease of presen- tation, assume a linear utility function as above, defined by weights wi over m attributes. Pairwise regret MR(x, y, W ) of recommendation x and adversarial option y is readily computed with the following LP: X max wi (yi − xi ) (7) w:wi ∈[0,1] 1≤i≤m X s.t. wi = 1 (8) i w∈W (9) Figure 2: Alpha beta pruning can speed up the search, de- pending on the evaluation order. In this case, x1 has regret Here we assume the feasible parameter set W is captured 0.5 against x3 , that is worse than the value 0.4 (max regret by linear constraints. A similar LP can be formulated for of x2 ), so we do not need to test x1 against x2 . discrete-valued attributes, without assuming linearity, just additivity. Hybrid models with continuous and discrete at- tributes can easily be represented with a combination of {o1 , o4 } but with a much lower value of (setwise) regret: these two representations. Generalized additive utility mod- only 0.04. els models (Fishburn 1967; Braziunas and Boutilier 2006; We conclude the discussion of the example with some re- 2007b) can also be easily represented in this framework. marks on the optimization process. The adversary’s utility This means that pairwise regret can be computed extremely does not necessarily corresponds to one the vertex of the efficiently (e.g., in a few milliseconds using CPLEX on the feasible region, as in the single recommendation case; it types of problems discussed below). may also lie in any intersection of the hyperplanes associ- Minimax regret computation is more complex because ated with the options. For instance (in the unconstrained we need to maximize over all possible adversarial choices, case) the setwise regret of {o3 , o4 } is maximized for the and minimize over all possible recommendations. A naive value w1 = 0.42 (the utility that makes o3 and o4 equally approach would consider every pair of options, requiring preferred). O(n2 ) pairwise regret computations for a database of size n, where each of these computations requires the solution of Computation of Setwise Minimax Regret an LP of size proportional to the number of utility parame- In this section we discuss how to efficiently compute regret- ters. based recommendations. We first discuss how to compute However, since minimax regret can be seen as a game be- minimax regret for single recommendations and then de- tween the recommender and an adversary, the computation scribe how to modify these procedures to compute setwise can be greatly improved in practice by formulating the op- minimax regret for recommendation sets. We distinguish timization as a minimax search and using standard pruning two settings: configuration problems, where options are techniques. Unlike typical games, the search tree has very defined by variables and configuration constraints (i.e., as limited depth: only two ply, one choice of recommendation solutions to a constraint satisfaction problem (CSP)); and by the MIN player (attempting to minimize regret) and one database problems, where options are enumerated in a prod- choice of adversarial option by the MAX player (attempting uct database. to maximize the regret of the recommendation).3 Note that the game has a large number of actions, once per product in Computing Minimax Optimal Single the database. The MIN player (recommender) moves first, Recommendations the MAX player (adversary) second. The leaves of the mini- Configuration problems In configuration problems, opti- max tree are labeled with the pairwise max regret of the two mization over product space X is formulated as a constraint choices on its path. optimization problem or MIP. In such domains, minimax re- A full evaluation of the tree requires the solution of gret computation can be formulated as a MIP, and solved n(n − 1) pairwise regret LPs (noting that the MIN player’s practically for large problems using techniques such as Ben- choice need not be explicitly evaluated or even represented der’s decomposition and constraint generation. We refer as a possible MAX choice, since it must yield pairwise regret to (Boutilier et al. 2006a; 2004; Braziunas and Boutilier of 0). However, it is generally not necessary to evaluate ev- 2007a) for more details. Our MIP formulations for setwise ery node of the tree as Alpha-beta pruning (see (Russell and minimax regret below will draw heavily on these techniques, Norvig 2003) for an introductory description) can be used to but necessitate important modifications. eliminate branches from evaluation. Database problems When options are enumerated in a 3 The choice of the utility function by the adversary is dictated product database, minimax regret computation requires to by pair of options, so it need not be modeled as a move. Alpha-beta pruning is simple in such a simple game tree: size attributes constraints num of pairwise checks during the tree evaluation, we maintain an upper bound 40 4 0 41 UB (initially +Inf) at the root, representing the max regret 200 5 0 207 of the best solution found so far (from the perspective of 400 7 10 492 MIN), and lower bounds LB (n) at each MAX node, one 1000 10 0 1003 for each possible MIN choice (or recommendation). Ev- 1000 10 60 1998 ery time we evaluate a leaf node, we compute pairwise re- 1000 15 30 999 gret MR(omin , omax , W ) of MIN’s choice omin and MAX’s choice omax on the path. We update the lower bound at Table 1: Number of pairwise regret checks to compute min- the corresponding MAX node, and prune (α cut 4 ) when- max regret on some sample datasets. We evaluate the search ever LB (n) ≥ U B. This is because MR(omin , omax , W ) ≤ tree with our heuristics of reference utilities. MR(omin , W ). At the same time, whenever we complete the evaluation of a MAX node n, we update the upper bound UB to min(U B, v(n)) where v(n), the value of the node, is experiments run on synthetic data for illustrative purposes.) the maximum value among the leafs. Set Recommendations: Setwise Minimax Regret The efficiency of this pruning depends on the order in We now consider the modification of the techniques above which nodes are evaluated (Russell and Norvig 2003); this for setwise minimax regret. Naturally, setwise max regret is especially true given the very shallow, broad nature of our is more computationally demanding, requiring selection of tree. Pruning is most effective when, at each node, the best a set of options. However, it is still possible to formulate the children (with respect to the relevant node evaluation, MIN computation in a MIP for configuration problems. Database or MAX) are evaluated first. Figure 2 shows, in our simple problems are more challenging: the adversarial search pre- example, that in the best case only 5 nodes out of 9 need sented above for single-item recommendation can be ap- to be evaluated (i.e., 5 pairwise regret maximization). To plied directly, with replacement of a single move by the rec- speed up the search, we consider a heuristic that first evalu- ommender (MIN player) by k moves, corresponding to the ates choices at the MIN (recommender) node that are likely choice of k options for the slate. However, performance can to be good candidates for minimizing max regret; and we take a dramatic hit as the size of the desired recommenda- first evaluate at at MAX (adversarial) nodes options that are tion set increases. However, we develop a simple heuristic likely to induce high regret against the given MIN choice. hill-climbing strategy that seems to provide very good rec- These heuristics give us an evaluation order for both MIN ommendation sets in practice. and MAX choices and can lead to considerable pruning. We discuss each in turn. Configuration problems: MIP formulation For config- For the MIN node, we note that the regret of any option uration problems we formulate the problem of setwise min- is maximized at one of the vertices of the feasible region imax regret following the general strategy for single-option W . Thus we sample t vertices (for instance, by consider- minimax regret, formulating as a (MIP) minimization with ing extreme weights that maximize the importance of one exponentially many constraints. We use a constraint gen- of the attributes) and refer to the the w so-sampled as refer- eration procedure to prevent enumeration of the entire con- ence utilities. These are used to initialize the lower bounds straint set (Boutilier et al. 2006a; 2004). However, there LB (n): we simply compute the actual regret with respect to are some critical differences in the formulation, which we these utilities for the option that leads to (MAX node) n. We describe here. then evaluate MIN’s children n in increasing order of initial Setwise minimax regret for configuration problems can be lower bound LB (n). formulated as the following MIP. To order the children of MAX node, for each MAX node n, we consider the feasible utility function w− that mini- min M mizes the utility the MIN choice. (This requires a simple j M,Iw ,Xj ,Vw j optimization.) The option that maximizes utility at w− (i.e., X s.t. M ≥ Vwj ∀w ∈ Vert (10) the optimal choice under w− ) is likely to give a high value 1≤j≤k of for pairwise regret and thus represents a potentially good adversary. Moreover, once we have generated w− , we can Vwj ≥ w · (x∗w − Xj ) + (Iw j − 1)mbig (11) use it to update the lower bound by considering the actual ∀j ∈ [1, k] ∧ ∀w ∈ Vert (12) regret for each option. MAX choices are evaluated in order X j Iw = 1 ∀w ∈ Vert (13) of decreasing utility under w− . 1≤j≤k In practice, these heuristics can significantly speed up the j computation of minimax regret in product databases. Table 1 Iw ∈ {0, 1} (14) shows that number of pairwise regret checks (LPs) is almost Vwj ≥ 0 ∀j ∈ [1, k], ∀w ∈ Vert (15) linear in the number of options in the database; indeed, with these orderings, MAX nodes are often pruned immediately without even considering an adversarial choice. (These are This MIP minimizes M by: (a) choosing k options (or configurations xj designated by variables Xj (where each 4 beta cuts are not possible given the depth of the tree Xj is a vector of n attributes) for the recommendation set; (b) selecting, for each adversary w, one of those options to the problem of setwise minimax regret for database prob- (the jth option) as the choice that has minimum max regret lems, scaling is sometimes a concern. We now present a against an adversary, and ensuring that M is greater than the heuristic hill-climbing strategy that scales much more effec- true regret of the jth option relative to every possible choice tively. We describe it in the context of database problems, of adversary utility function and option. but it can also be used directly for configuration problems. Note however that this constraint need not be applied to The central idea is that is possible to modify a given (continuously many) utility functions or exponentially many recommendation set Z in such a way that setwise max re- adversarial choices. In the MIP, we post these constraints gret cannot increase, and usually decreases until a high only for each vertex of W (i.e., w ∈ Vert(W )) and for quality set is found. We define the MMR-transformation the optimal product choice x∗w for that vertex. This relies T to be a mapping that refines a recommendation set Z on the observation that regret maximized at vertices of W , by partitioning the current feasible utility space W into and, for any adversarial choice of w, the adversarial option {W Z→xi }, ∀xi ∈ Z, as discussed in Observation 1. In each that maximizes the pairwise regret for any user choice is the partition we compute the single recommendation that has optimal option for w.5 minimax regret in that region of utility space, and define the However, this MIP still requires (potentially) exponen- new set recommendation T (Z) to be the collection of these tially many constraints, one for each element of Vert(W ). (single) minimax-optimal recommendations. We can make computation much more effective by applying constraint generation, observing that at the optimal solution, Definition 2 Define the MMR-transformation T : Z → Z′ , very few of these constraints are likely to be active. Our where Z = {x1 , . . . , xk }, to be T (Z) = {x′1 , . . . , x′k } such procedure works as follows: we solve a relaxed version of that for all 1 ≤ i ≤ k: the MIP above—the master problem—using only the con- straints corresponding to a small subset Gen ⊂ Vert(W ) x′i = MMR-Opt(W Z→xi ) of the constraints in the MIP above. We then test whether We can show that T cannot increase setwise max regret. any unexpressed constraints are violated at the current so- lution. This involves computing the true setwise max re- gret of the slate generated by the master problem. If the Observation 2 For any set recommendation Z true setwise max regret is of the slate is greater than δ, we SMR(T (Z)) ≤ SMR(Z) know that a constraint has been violated. Specifically, the We use the MMR-transformation to define our heuristic computation of setwise max regret will produce the element search strategy to produce good recommendation sets; in- w ∈ Vert(W ) and optimal product x∗w that corresponds to tuitively, we repeatedly T until a fixed point (with respect to the maximally violated constraint at the current master so- setwise max regret, not the set itself) is found. lution. So if a constraint is violated, we add this maximally violated constraint to Gen, tightening the MIP relaxation, and repeat; if not, we are assured that the current solution Alg 1 Hill-climbing-T algorithm (HCT) The algorithm considers an initial set Z, and rewrites Z us- minimizes setwise max regret.6 ing T until a fixed-point is found. In the formulation, mbig is an arbitrary big number, that we need to encode the fact that, for any given w, only the • Repeat Z := T(Z) option with the highest utility (among those in the slate) with • Until SMR(T (Z), W ) = SMR(Z, W ) respect to w contributes to the actual setwise regret. We initialize the slate Z using the current solution strategy The SMR maximization subproblem can be also encoded (CSS), empirically, this seems to produce the most promis- with a MIP, similar to (Boutilier et al. 2006b). The optimiza- ing recommendation sets. For k = 2, this means that the tion makes use of a decision variable to explicitly represent initial set is Z = {x∗W , xw }, where x∗ = MMR-Opt (W ), the setwise regret, M , to be maximized and we constrain and xw = MRAdv (W ). M to be greater than the single max regret M R(xj , W ), for For larger sets (k > 2), there is not a standard definition each option in the slate. of the CSS. We propose to use the following strategy, that Database Problems: A Hill-climbing Strategy As dis- we call chain of adversaries, to generate the initial slate. We start from {x∗W , xw } and repeatedly maximize setwise max cussed above, while minimax search can be applied directly regret given the current set, in some sense maximizing the 5 j diversity of choices from perspective of utility space. This Vw is the actual regret of option Xj of the slate with respect to gives the set {x1 , . . . , xk } where: j the utility w when the corresponding Iw is activated. For any w, j one and only one Iw is set to 1. In order to minimize M , the op- x1 = x∗W  j timization will activate the Iw corresponding to the xj with lower xi = Adv ({x1 , .., xi−1 }, W ) 2 ≤ i ≤ k actual regret. The first constraint (10) captures the idea that for a slate of options, given w, the regret of the joint slate is the mini- mum among the individual values of regret (in the summation, all The chain of adversaries requires to solve single minimax but one term are zeros). regret once, and then k − 2 setwise regret maximizations. 6 Note that the adding a new constraint requires the introduction The chain of adversaries can be seen as a generalization of of new variables to the master problem. Every time we add a new CSS to sets of any size, and could also be considered as an w to Gen, k new variables I and V are necessary. alternative, faster strategy to select recommendations. 0.6 CSS We performed some preliminary experiments in order to HCT evaluate our recommendation strategy from the prospective 0.5 of elicitation. We are interested in quantifying the reduc- tion of regret in practice. In Figure 3 we compare the ef- 0.4 ficiency of an elicitation based on our SMR criterion and the current solution strategy (CSS), considering a syhnthetic max regret 0.3 dataset with 5000 options and 10 attributes. We plot max regret in function of the number of queries (steps). SMR is 0.2 optimized using the hill-climbing strategy (HCT). Example Critiquing 0.1 As an evaluation setting, we apply our regret-based recom- 0 mender to example-critiquing. This domain is interesting 1 2 3 4 5 6 7 8 steps because current systems usually rely on heuristics and we expect that an utility-based approach can be greatly benefi- Figure 3: The hillclimbing strategy based on setwise regret cial. outperforms the current solution strategy in this experiment Critiquing is a setting where the user expresses feedback (20 runs). on options that the system shows to her. In particular we consider a particular version of critiquing, often called the dynamic critiquing model (Reilly et al. 2005), where a cur- Myopic Elicitation rent product or recommendation is displayed, and the user is invited to move to a different product by choosing particu- In addition to produce final recommendations, our criterion lar actions (laid out in the interface) that change the product. can also be used as a driver for further elicitation of the util- They can include unit critiques, which request modification ity function of the user. In fact, whenever we consider a slate of a particular product attribute; e.g., “give me a laptop that of recommendations, the user may give us some feedback, is lighter than the current one.” perhaps selecting the option that she prefers among those in Often alternative suggestions or compound critiques are the set. This information is very valuable, and as we have used in which multiple attributes (“lighter and faster pro- seen in the initial example, can be used to reduce regret. It cessor, but more expensive”) are tweaked, or in which a se- is therefore interesting to assess the value of a recommenda- lection is made from a system-suggested set of alternative tion set also with respect to the possible feedback. products (“let me see laptop 3 instead of the current one”). An important observation is that in the case of compari- The set of possible critiques is generated by the system, son queries (the user selects the preferred option in a slate), and the user chooses one of the possible actions. At each in- the set of k optimal recommendations that minimize setwise teraction, the user may choose to critique the current product regret is also the optimal choice set for a comparison query if she is not completely satisfied with it, or simply because with respect to myopic worst case regret (WR), a measure of she wishes to explore the product space in more depth. the value of information of a query. In general those systems use heuristics to generate the set The Worst-case Regret (WR) of a comparison query based of possible critiques. However, we expect that better perfor- on a choice set Z = {x1 , .., xk } is defined as mance can be obtained if critiquing suggestions are selected according to a decision-theoretically sound criterion as our W R({x1 , .., xk }) = max[MMR(W Z→x1 ), .., MMR(W Z→xk )] setwise minimax regret. (16) In order to implement our approach, it is necessary to give a precise semantics to each of the critiquing actions. WR considers the “single” max regret in each possible We identify two main reasons a user will critique an option. scenario. It is possible to verify that WR(Z) ≤ SMR(Z); First, she may want to explore the product space in an effort the worst case regret is always lower (or equal) than the set- to better understand either the space of feasible options or wise max regret. her own preferences. This latter desire makes sense espe- The optimality of minimax setwise recommendations (we cially when one adopts the view commonly held in behav- omit the full proof for reasons of space) with respect to W R ioral economics that decision support systems should help is based on the consideration (an extension of Observation 2) people construct their preferences (not just articulate them) that the transformation T introduced in the previous section (Slovic 1995). Second, she may wish to improve the current is also such that SMR(T (Z)) ≤ WR(Z) (the proof requires product, making tradeoffs among her preferences for differ- considering the different partitions imposed by Observa- ent attributes. It is this latter exploitive or improvement mode tions 1 and compare the two expression componentwise). that critiquing systems fail to account for adequately when We call Z∗ the optimal recommendation set according to deciding on appropriate product suggestions. In this evalua- setwise regret. A set Z′ such that W R(Z′ ) < W R(Z∗ ) but SMR(Z′ ) > SMR(Z∗ ) leads to contradiction.7 that SMR(Z̄) ≤ WR(Z′ ) < WR(Z∗ ) ≤ SMR(Z∗ ) but this means that SMR(Z̄) < SMR(Z∗ ), contradicting the optimality of 7 If we apply the transformation T to Z′ we obtain a set Z̄ such Z∗ with respect to SMR. tion we use critiques of the latter type to constrain the set of and a comparison operators from the set: <,>,¬,=. An ex- possible user utility functions. ample pattern might be: {[Price >], [ProcessorSpeed >]}. In the following, we describe our simulation setting and Second, the algorithm uses APriori to find recurrent cri- present our results. tiquing patterns; a compound critique based on a pattern is then presented to the user it has sufficient support in the Experiments product database. In our experiments, the support threshold To validate our regret-based approach to critiquing we de- is set to 0.3 and selection of compound critiques corresponds signed a framework that simulates a full interaction of a user to the low-support strategy in (Reilly et al. 2004). with a user interface. As in a real system, each simulation comprises a number of cycles of interaction, each showing a Incremental Critiquing current product which the user can critique using either unit Incremental critiquing (Reilly et al. 2005) (IC) improves or the selection of one of the suggested recommendations. the basic dynamic critiquing model by incorporating a user The simulated user continues the critiquing process until model. While suggestions are still based on the APriori al- the perceived increase in utility is lower than some thresh- gorithm (as above), the retrieval of the next product associ- old. We assume that among all possible critiquing actions, ated with a critiquing action is based on a quality metric that the one with highest perceived improvement will be chosen values both the score given to the product by the preference by the user. model and its similarity to the current product. In our experiment, at each interaction the system displays: In the implementation we developed for our experiments, • the current product, we take advantage of the fact that the preference ordering over attributes is known: the score is dictated by a linear • a choice of unit critiques of the current product (they re- utility function that gives equal weight to all attributes. The quest the modification of a particular product attribute), initial product is the option with maximum utility. When • a set of suggestions, alternative options that can change retrieving the next example from the set of products that sat- focus for the search, presented as such or labeled as com- isfy the user-chosen critique, we select the product x that pound critiques (“lighter and faster processor, but more maximizes score(x)·Similarity (x, y), where y is the prod- expensive”) uct recommended at the previous cycle, and score is the heuristic utility function. At each step, the user can choose to either select the cur- rent product (and finish the interaction) or to tweak it in or- Incremental Critiquing: MAUT der to improve it and get better recommendations in the next cycle. Another implementation of incremental critiquing (Reilly et We compare our regret-based approach to three other ap- al. 2007) uses a simple multi attribute utility (MAUT) model proaches that use compound critiques. In our case, we use to make recommendations and generate compound critiques the generation of a set of recommendations (based on set- (rather than similarity). In this approach, a simple additive wise regret) to display as alternatives. One is selected as utility model u is generated, initially giving equal weight to current product and the others are displayed as suggestions. all attributes; each time an attribute is critiqued, its weight We briefly review the different critiquing approaches and is multiplied by a constant (and all weights renormalized). then we present the experimental results. The original design of this algorithm makes use of param- eterized value functions for each attribute, where the value Dynamic Critiquing taken by the current option is considered preferred. Since The dynamic critiquing model (Reilly et al. 2004) makes our experimental set up assumes that the local preference use of a particular similarity metric to retrieve the current ordering over attribute values is known, we instead assume product and uses the APriori datamining algorithm to pro- a linear utility model. pose alternative compound critiques. The algorithm dynam- Suggestions are generated using optimization with respect ically generates compound critiques by discovering com- to the estimated utility model, and the k best products are mon feature patterns among the set of products. Essentially, presented as alternative cases. A limitation of this approach each compound critique describes a set of products in terms is its reliance on a fixed utility model (as opposed to rea- of the features they have in common. For example in the PC soning with the space of possible user utilities). Moreover, domain, a typical compound critique might be “Faster CPU options that all have high value in a single utility sample and a Larger Hard Drive.” Whenever a product is shown to are unlikely to be diverse or informative enough to generate the user as the current product, the APriori datamining algo- useful distinctions. rithm is used to quickly discover these patterns and convert Regret-based critiquing them into a set of suggested compound critiques. Each com- pound critique corresponds to a product that is, among all Our version of dynamic critiquing exploits setwise minimax products satisfying the pattern most similar to the current regret using the ideas above. Specifically, at any point in one. the interaction cycle, we generate the current optimal rec- The generation of suggestions consists of two steps. First, ommendation set (with respect to minimax setwise regret), each product is matched against the current product to pro- and propose one of these options a current product. The re- duce lists of critique patterns, each comprising an attributes mainder of the set is used to display suggestions. 1 Regret−based 0.3 Regret−based 0.9 IC: wsim IC: wsim IC: maut IC: maut 0.25 0.8 DC DC 0.7 0.2 real loss max regret 0.6 0.15 0.5 0.4 0.1 0.3 0.05 0.2 0 0.1 2 4 6 8 10 12 14 steps 0 2 4 6 8 10 12 14 steps Figure 5: True regret (real loss) of the recommended op- tion at each step for four algorithms: regret-based cri- Figure 4: Maximum regret of the recommended op- tiquing (regret), dynamic critiquing with compound cri- tion at each step for four algorithms: regret-based cri- tiques generated with APriori (DC), incremental critiquing tiquing (regret), dynamic critiquing with compound cri- with weighted similarity and APriori (IC) and incremental tiques generated with APriori (DC), incremental critiquing critiquing with a multiattribute utility model (IC maut). with weighted similarity and APriori (IC) and incremental critiquing with a multiattribute utility model (IC maut). gret (i.e., worst-case loss); so one might wonder whether other techniques find better products despite being unable to Empirical Results “prove” that they are good. Fig. 5 shows this not to be the In the experiments we compare the four different versions of case. While other critiquing techniques recommend prod- dynamic critiquing discussed above: the original dynamic ucts that are much better than their regret-bounds suggest, critiquing algorithm (similarity plus APriori), incremental regret-based critiquing is able to consistently find the opti- critiquing, incremental critiquing with MAUT, and regret- mal product (and find a near-optimal product in as few as based critiquing. We used a C implementation of the APri- five or six interaction cycles). By contrast, the other three ori algorithm (Bodon 2003). All systems make available methods are unable to identify the optimal option at conver- unit critiques of any attribute (and user’s adopt an expected gence. improvement semantics). We evaluate the performance of all algorithms with respect to recommendation efficiency Conclusions and offer some speculative examination of regret-based cri- In this paper we presented a novel formalization of recom- tiquing in terms the tradeoff between cognitive cost and mendations of a joint set of alternatives based on the notion number of compound critique options presented at each in- of regret. The criterion that we propose, setwise max regret, teraction. represents an intuitive extension of the traditional regret cri- We evaluate the different critiquing methods by compar- terion for single recommendations. ing the quality of the recommendations with respect to max We show how optimal recommendation sets (with re- regret. We tested the methods on a real database of 200 spect to our criterion) can be computed with mixed integer apartments, using randomly drawn utility functions (as de- programming (MIP) methods and the constraint generation scribed above), and k = 3 suggested products at each in- technique when options are constructed from a set of config- teraction cycle. All results are averaged over 20 simulated uration constraints. Alternatively, set recommendations can users. Fig. 4 shows the maximum regret of the recom- be obtained using a hill-climbing strategy interleaved with mended product at each stage of the interaction. We note adversarial search in discrete settings. that regret-based critiquing outperforms the other methods We discuss the problem of utility elicitation, showing of generating compound critiques by a wide margin. This that our recommendation strategy reduces max regret more is true when considering both the “anytime” profile of the quickly than any other possible choice. Finally we present method (i.e., the degree to which minimax drops) and its fi- an application of these principles for critiquing systems. nal convergence: our technique converges on a product who Our reliance on explicit utility modeling and minimax max regret is about 3% on average, while the MAUT incre- regret provides a powerful new means of generating good mental critiquing settles at about 10% (and the others worse, critiques and making good product recommendations. Our with dynamic critiquing unable to reduce max regret to less regret-based critiquing recommender can often lead to opti- than 18%). mal recommendations using very few, say, compound cri- More interesting is the fact that regret-based critiquing of- tiquing interactions, and outperforms other dynamic cri- fers better “actual” recommendations, as measured by true tiquing techniques both in speed of convergence and the regret (difference from the true optimal recommendation). quality of the final recommendations. Regret-based critiquing is designed to attack bounds on re- The incorporation of noisy feedback is an important next step; we are currently considering the possibility of a clar- Urszula Chajewska and Daphne Koller. Utilities as random vari- ification dialogue. The idea is to verify information that is ables: Density estimation and structure discovery. In Proceedings sensitive with respect to regret. of the Sixteenth Conference on Uncertainty in Artificial Intelli- Largely unaddressed in our critiquing model is the need gence (UAI-00), pages 63–71, Stanford, 2000. for users to explore the product space, one of the main ad- Urszula Chajewska, Daphne Koller, and Ronald Parr. Making ra- vantages of critiquing. We are currently developing hybrid tional decisions using adaptive utility elicitation. In Proceedings models in which the system and/or user explicitly distin- of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), pages 363–369, Austin, TX, 2000. guishes exploratory actions from improving actions. Even with such a distinction, there is still the interesting question Peter C. Fishburn. Interdependence and additivity in multivariate, of modeling user search processes in a way that would allow unidimensional expected utility theory. International Economic Review, 8:335–342, 1967. insight into preferences to be drawn during exploration as well. Finally, the development of models of cognitive costs Ralph L. Keeney and Howard Raiffa. Decisions with Multiple using techniques from behavioral economics, decision the- Objectives: Preferences and Value Trade-offs. Wiley, New York, 1976. ory and psychology remains an important avenue of future research. Panos Kouvelis and Gang Yu. Robust Discrete Optimization and Its Applications. Kluwer, Dordrecht, 1997. References David McSherry. Diversity-conscious retrieval. In ECCBR ’02: Proceedings of the 6th European Conference on Advances Fahiem Bacchus and Adam Grove. Graphical models for prefer- in Case-Based Reasoning, pages 219–233, London, UK, 2002. ence and utility. In Proceedings of the Eleventh Conference on Springer-Verlag. Uncertainty in Artificial Intelligence (UAI-95), pages 3–10, Mon- treal, 1995. Robert Price and Paul R. Messinger. Optimal recommendation sets: Covering uncertainty over user preferences. In Proceedings Ferenc Bodon. A fast apriori implementation. In Bart Goethals of the Twentieth National Conference on Artificial Intelligence and Mohammed J. Zaki, editors, Proceedings of the IEEE (AAAI’05), pages 541–548, 2005. ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), volume 90 of CEUR Workshop Proceedings, Mel- James Reilly, Kevin McCarthy, Lorraine McGinty, and Barry bourne, Florida, USA, 19. November 2003. Smyth. Dynamic critiquing. In Peter Funk and Pedro A. González-Calero, editors, ECCBR, volume 3155 of Lecture Notes Craig Boutilier, Fahiem Bacchus, and Ronen I. Brafman. UCP- in Computer Science, pages 763–777. Springer, 2004. Networks: A directed graphical representation of conditional util- ities. In Proceedings of the Seventeenth Conference on Uncer- James Reilly, Kevin McCarthy, Lorraine McGinty, and Barry tainty in Artificial Intelligence (UAI-01), pages 56–64, Seattle, Smyth. Incremental critiquing. Knowl.-Based Syst., 18(4-5):143– 2001. 151, 2005. Craig Boutilier, Richard S. Zemel, and Benjamin Marlin. Active James Reilly, Jiyong Zhang, Lorraine McGinty, Pearl Pu, and collaborative filtering. In Christopher Meek and Uffe Kjærulff, Barry Smyth. Evaluating compound critiquing recommenders: editors, UAI, pages 98–106. Morgan Kaufmann, 2003. a real-user study. In Jeffrey K. MacKie-Mason, David C. Parkes, and Paul Resnick, editors, ACM Conference on Electronic Com- Craig Boutilier, Tuomas Sandholm, and Rob Shields. Eliciting merce, pages 114–123. ACM, 2007. bid taker non-price preferences in (combinatorial) auctions. In Stuart Russell and Peter Norvig. Artificial Intelligence: A Mod- Proceedings of the Nineteenth National Conference on Artificial ern Approach. Prentice-Hall, Englewood Cliffs, NJ, 2nd edition Intelligence (AAAI-04), pages 204–211, San Jose, CA, 2004. edition, 2003. Craig Boutilier, Relu Patrascu, Pascal Poupart, and Dale Schu- Ahti Salo and Raimo P. Hämäläinen. Preference ratios in multi- urmans. Constraint-based optimization and utility elicitation us- attribute evaluation (PRIME)–elicitation and decision procedures ing the minimax decision criterion. Artifical Intelligence, 170(8– under incomplete information. IEEE Trans. on Systems, Man and 9):686–713, 2006. Cybernetics, 31(6):533–545, 2001. Craig Boutilier, Relu Patrascu, Pascal Poupart, and Dale Schuur- Leonard J. Savage. The Foundations of Statistics. Wiley, New mans. Constraint-based optimization and utility elicitation using York, 1954. the minimax decision criterion. Artif. Intell., 170(8-9):686–713, 2006. Paul Slovic. The construction of preference. American Psychol- ogist, 50(5):364–371, 1995. Craig Boutilier. A POMDP formulation of preference elicitation problems. In Proceedings of the Eighteenth National Conference Barry Smyth and Paul McClave. Similarity vs. diversity. In on Artificial Intelligence (AAAI-02), pages 239–246, Edmonton, David W. Aha and Ian Watson, editors, ICCBR, volume 2080 2002. of Lecture Notes in Computer Science, pages 347–361. Springer, 2001. Darius Braziunas and Craig Boutilier. Preference elicitation and generalized additive utility. In AAAI, 2006. Darius Braziunas and Craig Boutilier. Minimax regret-based elic- itation of generalized additive utilities. In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (UAI-07), pages 25–32, Vancouver, 2007. Darius Braziunas and Craig Boutilier. Minimax regret based elic- itation of generalized additive utilities. In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (UAI-07), pages 25–32, Vancouver, 2007.