Explanations in Proactive Recommender Systems in Automotive Scenarios Roland Bader12 , Andreas Karitnig3 , Wolfgang Woerndl2 , and Gerhard Leitner3 1 BMW Group Research and Technology, 80992 Munich, Germany roland.bader@bmw.de 2 Technische Universitaet Muenchen, 85748 Garching, Germany woerndl@in.tum.de 3 Alpen-Adria Universitaet Klagenfurt, 9020 Klagenfurt, Austria Gerhard.Leitner@uni-klu.ac.at andreas.karitnig@gmx.at Abstract. Recommender techniques are commonly used to ease the se- lection and support the decision in the context of large quantities of items such as products, media or restaurants. Typically, recommender systems are used in contexts where users focus their full attention to the system. This is not the case in automotive scenarios, therefore we want to provide recommendations proactively to reduce driver distrac- tion while searching for information. Our application scenario is a gas station recommender. Proactively delivered recommendations may will not be accepted, if the user does not understand why something was recommended to her. Therefore, our goal in this paper is to enhance transparency of proactively delivered recommendations by means of ex- planations. We focus on explaining items to convince the user of the relevance of the items and to enable an efficient item selection during driving. We describe a method based on knowledge- and utility-based recommender systems to extract explanations automatically. Our evalu- ation shows that explanations enable fast decision making for items with reduced information provided to the user. 1 Introduction In recent years more and more information is digitally available. Due to the avail- ability of Internet connections in many state-of-the-art cars, this information can be made accessible for drivers. As searching for information is not the primary task during driving, providing information as recommendations in a proactive manner seems to be a reasonable approach to reduce information overload and driver distraction [2]. As the user does not request recommendations by her- self it is important to present the recommendations in a way that she quickly recognizes why this information is relevant for her. The goal of this paper is to investigate the applicability of explanation tech- niques to make proactive recommendations comprehensible for drivers with lim- ited amount of information. Explanations are already the focus of research in other areas of recommender systems, e.g. product recommendations ([9], [6]). To our knowledge there is no existing work on explanations for mobile proac- tive recommender systems. The challenge is to provide as little information as possible to make proactive decisions transparent without information overload. Our application scenario is a gas station recommender for driver, already pre- sented in [1]. The contribution of this paper is first, an investigation what the requirements on explanations in our application scenario are, second, how short explanations for items can be generated out of the recommendation process de- scribed in [1], and third, an evaluation of generated explanations. Note that the scope of this paper is limited to an offline investigation to lay the groundwork for an infield study in a car. The remainder of the paper is organized as follows. In Section 2 we describe fundamentals of explanations in recommender systems. Section 3 summarizes a preliminary study. In Section 4 we describe how explanations are generated out of the recommendation process and Section 5 includes a prototype evaluation of the presented method. Section 6 closes with conclusions and future work. 2 Fundamentals and Related Work Recommender systems suggest items such as products or restaurants to an active user. Proactively delivered, recommendations should have high relevance, be non- intrusive and the system should have a long term memory [7]. We have already developed methods for proactivity in recommender systems in [2] and [1]. Based on this work we observed that proactively delivered recommendations lack user acceptance if the user does not know why something was recommended to her. Transparency and comprehensibility are two aspects a proactive system should fulfil to be accepted [5]. Our goal in this paper is to avoid loss of acceptance by providing explanations in our existing proactive recommender for gas stations. An explanation is a set of arguments to describe a certain aspect, e.g. an item or a situation. An argument is a statement containing a piece of informa- tion related to the aspect which should be explained, e.g., ”The gas station is inexpensive” or ”Gas level is low”. In an item explanation arguments can be for (positive) or against (negative) an item or neutral. In [9] seven generalizable goals for explanations in recommender systems are provided. Which goals are accomplished by an explanation depends on the field of application. To give the user the chance to correct the system (scrutability) and to deliver effective recommendations is important for recommendation systems in general. For proactive recommender systems in a car, we think that especially transparency (Why was this recommended to me?), persuasiveness (Are the recommended items relevant for me?) and efficiency (Can I make a decision with little interaction?) are the most important reasons. If they are fulfilled trust and satisfaction can also be positively influenced. The work described in [6] contains design principles for explanations in rec- ommender systems. The principles are focused on categorizing alternative items and explain the categories. Due to limited amount of items represented in a proactive recommendation, we think that categorization can hardly be applied in our application domain. This applies to many explanation methods created for desktop systems, where the user can turn her attention fully to the interface. Hence, the challenge in proactive recommender systems is to convince the user quickly of the usefulness of the recommended items. As we want to explain utility- and knowledge-based recommendations based on [2], a utility-based approach for explanations seems reasonable. The work in [4] presents a method based on the utility of a whole explanation to select and rank explanations. Instead of the utility of the whole explanation, [3] measures the performance of a single argument and combines arguments to structured explanations. We combine ideas from both works in our proposed method. 3 Preliminary Study Before we implemented our methods for explanations in proactive recommender systems, we conducted a user survey to find out the main requirements for the generation of arguments in our application scenario of a gas station recom- mender. The user survey was conducted on the basis of an online questionnaire. The subjects had to rate different kinds of arguments and structures on a 5 point Lik- ert scale ranging from ”very useful” to ”not useful at all”. We focused on aspects we found in [9], [6] and [3]. The most important question was what kind of argu- ments should be used for explaining items in our application domain. Arguments are build either on context-based (e.g. gas level, opening times) or preference- based (e.g. gas brand or price preference) criteria. Moreover, we wanted to know how many arguments to use and how to combine and structure them (indepen- dent vs. comparative to other items vs. comparative to an average). We also asked the respondents about the usefulness of other type of information like situation explanations, status information and reliability of item attributes and context data. The survey had 81 respondents who completed the questions. The group of participants consisted of 64 male and 17 female with an average age of 29 years. The most important aspects influencing the decision for a certain gas station seem to be gas price, detour and gas level at the gas station. Following this pattern, arguments including detour, price and gas level have been rated mostly very good. Ratings for gas station context data, like opening times or a free soft drink, varied depending on the content of an argument. Arguments more related to the task of refilling, e.g. opening times, are rated better. There is no clear subject’s favourite for the structure of an explanation. Inde- pendent as well as comparative argumentation was rated equally. Two arguments seem to represent a good size for an explanation in the case of gas stations. Re- garding the desired number of items in a gas station recommendation, which ranges from 3 to 5, two arguments seem to be reasonable to distinguish them. Arguments concerning situations leading to a recommendation were rated differ- ently. Situations which are directly connected to the task and have an impact on the recommendation were rated best, e.g. ”only gas stations along the route were recommended because you do not have much time” or ”Just a few gas stations are available in this area”. Status information as well as data reliability were not interesting for the subjects. 4 Our Approach for Explanations in Proactive Recommender Systems Based on the results from the preliminary study, there are obviously two major aspects which should be explained to the user. First, we have to explain what has been the crucial situation for a recommendation. A low gas level is an obvious situation for a gas station recommendation, but there are some more situations which may lead to a recommendation: A rather good gas station along the route, e.g. very low priced, a deserted area with few gas stations or an important appointment which leads to a recommendation only with gas stations on the route. Without explanation a proactive recommendation in this situations may result in misunderstanding. Second, it should be clear to the user why the recommended items are rel- evant for her based on her user profile. In this paper we focus on explanations for items. Our explanation method is designed for a small set of recommended items because many items overwhelm the user if they are provided proactively. There are two main goals we try to accomplish. First, we want to enable effi- ciency because item selection is no primary task while driving and much harder compared to situations where users can focus their attention to the system (e.g. parking). Second, the user should be persuaded that the items are relevant. We use a ramping strategy like [8] to explain recommendations, i.e. explana- tions are distributed over several levels of detail. The lowest level (first phase) is provided automatically with the recommendations. Then gradually more and more information is accessible by the user manually. The elements in the first phase are short explanations for the situation and for the items. More detailed levels include a comparison of items, a list of all items or item details. The first phase is the most important one in the ramping strategy, as the user has to recognize quickly why the recommendation is relevant for her. The following description mainly comprises this phase. The arguments for items in the first phase are structured independently, i.e. no comparative explanations are used. The preliminary study showed that it makes no difference for the user but an independent structure allows for shorter arguments. We use preference- as well as context-based arguments, starting with a positive argument in the first place and adding a second one if necessary. A maximum of 2 arguments are used for every item. Information for arguments in an explanation can either be interpreted at- tribute values, e.g. gas level is low, or facts, e.g. gas level is 32 liter. An interpre- tation is a mapping from a specific value to a discrete interval. We used a generic nominal interval with One, Very High, High, Medium, Low, Very Low, Null to map values to a discrete value. Two kinds of values can be mapped. A utility interpretation maps the utility of an item, e.g. a gas level of 32 liter at a gas station can be mapped to Null, because most people do not refill at this level, therefore the utility is 0 on that decision dimension. Interpreting the attribute and context values leads to different results, e.g. a gas level of 32 liter is Medium if the tank has a capacity of 65 liters. This is called attribute interpretation. 4.1 Argument Assessment Our argument generation method for items is based on a context-aware rec- ommender system for gas stations presented in our previous work [1]. It uses Multi-Criteria Decision Making Methods (MCDM) to assess items I on multiple decision dimensions D by means of utility functions. For example, dimensions are price or detour. First, all item attributes and context (level 1) belonging to- gether are aggregated to local scores LSI,D in the range [0, 1] (level 2) on every dimension D. On level 3 all dimensions are aggregated to a global score GSI . Users are able to set their preferences for the item dimensions explicitly which results in a weight wD for every dimension D. The argument assessment uses two additional scores. The explanation score ESI,D describes the explaining performance of an item dimension and the in- formation score ISD measures the amount of information in a dimension. The explanation score is calculated by multiplying the weight of a dimension wD with the performance of the item I in that dimension: ESI,D = LSI,D · wD . This way, bad performing dimensions as well as aspects not important for the user are neglected. The score corresponds to the product of user interest in a dimension with the utility of an explanation for that dimension described in [4]. Instead of a whole explanation we measure the performance of the dimension directly. The problem of only using this score is that if every item performs well on a dimension and this dimension is important for the user, every item would be explained by the same information. This decreases the opportunity to make an effective decision as items are not distinguishable. Therefore the in- formation score measures the amount of information in a dimension relative to an item set. It is calculated by ISD = R+I 2 . The value R = max(x) − min(x) is the P range of x in the set. The information can either be Shannon’s entropy n I = − i=1 p(x)logn p(x) or simply I = n−h n−1 where n is the number of items in the set and h is the frequency of the most frequent x in a set. Taking x = LSI,D is a good choice if local scores have a small value range, otherwise the utility interpretation of LSI,D performs better. The information score is low if either all x are similar (R is low) or same x appear frequently (I is low), e.g. all gas stations are average priced. 4.2 Explanation Process Figure 1 shows the process to select arguments based on the scores we described in the previous section. It follows the framework for explanation generation de- scribed in [3] by dividing the process in the selection and organization of the explanation content and the transformation in a human understandable output. Content Selection Explanation Score Argument Information Score 1 2 Main argument 1 3 Information ESI,D > α ISD < γ Overall assessment 2 4 Second Argument GSI > β ESI,D > µ Abstract Explanation Attribute | Context 5 Attribute | Context Interpretation | Fact Interpretation | Fact Explanation Database Structure Argument 2 Argument 1 (optional) Explanation Surface Generation Fig. 1. Comparing scores to retrieve an explanation In content selection our argumentation strategy selects arguments for ev- ery item I separately. A positive argument is selected first to help the user to instantly recognize why this item is relevant. For this, the best performing di- mension D based on the explanation score ESI,D is compared to threshold α (1). Larger than α means the dimension is good enough for a first argument. The threshold α should be chosen so that the first argument is positive. If no dimension is larger α and thus no first argument can be selected, we look at the global score GSI (2). If this score is larger β than the item is a good average, otherwise we suppose that the recommender could not find better alternatives. With a first argument we look at the information score of its dimension (3). A small information score (lower than γ) means that this dimension provides low information, therefore a second argument is selected by means of the explanation score: The explanation score ESI,D of the second argument must be larger µ to make sure the second argument is meaningful enough (4). Generally, µ < α because the requirements on the second argument are lower. With the thresholds µ and γ the amount of information can be controlled. The result of the content selection is an abstract explanation, which needs to be resolved to something the user understands. This is done in the surface gen- eration. We map a key value pair, like (gaslevel, low), to human understandable information, e.g. textual phrases or icons (5). Either facts or attribute interpre- tations can be used as values. Human understandable explanation information is uniquely stored in a database, e.g. in XML format. Also the structure of an explanation (icon, independent phrase, comparative phrase etc.) can be defined here. 5 Evaluation To evaluate our generated explanations, we set up a user study with a desktop prototype. The prototype is a combination of a street map viewer and an expla- nation view. The map view is based on a street map from OpenStreetMap.com and is able to visualize a user’s route, icons for recommended gas stations and detour routes for the gas stations. The displayed content depends on the current phase in the ramping strategy. The view for the first phase which is shown to the user automatically provides a list of maximum 3 gas station recommenda- tions, 1 or 2 arguments for every gas station and a situation explanation. Due to shortness constraints of an explanation, negative arguments are avoided. From here, the subject can access the views for the second phase with item details and the third phase with a list of all gas stations prefiltered along the route. We conducted a user interview with 20 participants with an average age of 29, 17 male and 3 female. For that, we created 6 different scenarios (2 short, 3 average and 1 long route). In every phase, the subjects were asked for missing and relevant information in the explanation as well as on the map. The persuasiveness was measured by asking the subjects for their satisfaction with a selection in the first phase and if they need more information. Looking at how often the subjects needed to switch to deeper phases with more information accounts for the efficiency. The explanations were all text-based. For example, a set of 3 gas stations could be explained by (1) very low priced (2) on the route (3) low priced, little detour. Acoustic and tactile modalities are out of scope of this survey. The recommendations were generated by the methods presented in [1] and every subject was asked to give her preference for gas price, detour, brand and preferred gas level at the gas station. 5.1 Results The number of items provided by the recommender was rated as the right number by 14 subjects in average. The number of arguments was rated as too few by 7 subjects and exactly right by 8 subjects. Too few arguments have been criticized if two items could not be distinguished. Presenting the arguments either as facts or interpreted was rated differently. 11 subjects prefer facts, 9 interpretations. This may change in a real driving scenario, depending on which kind of argument imposes more cognitive effort. Almost all information in the first phase was rated as useful by most of the subjects. In regular scenarios, most subjects could make a satisfying decision only with this information. Interestingly, the predicted gas level at the gas station was useless for most subjects, although it is an important decision dimension for most of the subjects. This may indicate that user’s expectation plays also an important role: In our case, users only expect to get gas station recommendation if their gas level is low. The second phase only contained useful information and was selected if special details are needed, e.g. an ATM or a shop. In the beginning of the interview some subjects used the second phase to check the matching of interpreted values. The list of all items along the route was rarely selected and only if the recommendations do not corresponded to user expectations. In 70% of the cases the map played an important role for the decision process. 6 Conclusions and Future Work We conclude that the explained strategy worked well offline. Most of the sub- jects were satisfied with the items based on the explanations provided in the first phase. Therefore we think that the amount of information was enough to con- vince the subjects of the relevance of the items. Further phases were rarely used and if needed than they were quickly accessible, therefore the selection could also be made efficiently. In this stage of the project it could not be derived if users prefer interpreted or specific information in an argument. Next, we investigate if the results are transferable to a driving scenario with real proactive recom- mendations. In our further research, we also will adjust the parameters based on the results of the study. Furthermore, we want to use Shannon’s entropy on the whole prefiltered set of items to meet user expectations better. To further increase persuasiveness, we plan to integrate a dominance check like [6] over all arguments presented to the user to better distinguish items. References 1. Bader, R., Neufeld, E., Woerndl, W., Prinz, V.: Context-aware POI recommen- dations in an automotive scenario using multi-criteria decision making methods. In: Workshop on Context-awareness in Retrieval and Recommendation. pp. 23–30. ACM Press, Palo Alto, CA (2011) 2. Bader, R., Woerndl, W., Prinz, V.: Situation Awareness for Proactive In-Car Rec- ommendations of Points-Of-Interest ( POI ). In: Workshop on Context Aware In- telligent Assistance. Karlsruhe, Germany (2010) 3. Carenini, G., Moore, J.D.: Generating and evaluating evaluative arguments. Artifi- cial Intelligence 170(11), 925–952 (Aug 2006) 4. Felfernig, A., Gula, B., Leitner, G., Maier, M., Melcher, R., Teppan, E.: Persuasion in Knowledge-Based Recommendation. In: 3rd International Conference on Persuasive Technology. pp. 71–82. Springer, Oulu, Finland (2008) 5. Myers, K., Yorke-smith, N.: Proactive Behavior of a Personal Assistive Agent. In: Workshop on Metareasoning in Agent-Based Systems. Honolulu, HI (2007) 6. Pu, P., Chen, L.: Trust building with explanation interfaces. In: 11th International conference on Intelligent User Interfaces. pp. 93–100. ACM Press, Sydney, Australia (2006) 7. Puerta Melguizo, M.C., Bogers, T., Boves, L., Deshpande, A., Bosch, A.V.D., Car- doso, J., Cordeiro, J., Filipe, J.: What a Proactive Recommendation System Needs: Relevance, Non-Intrusiveness, and a New Long-Term Memory. In: 9th International Conference on Enterprise Information Systems. vol. 6, pp. 86–91. Madeira, Portugal (Apr 2007) 8. Rhodes, B.J.: Just-In-Time Information Retrieval. Phd thesis, MIT Media Lab (2000) 9. Tintarev, N., Masthoff, J.: Designing and Evaluating Explanations for Recom- mender Systems, pp. 479 – 510 (2011)