=Paper= {{Paper |id=None |storemode=property |title=Interactive Explanations in Mobile Shopping Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-1253/paper3.pdf |volume=Vol-1253 }} ==Interactive Explanations in Mobile Shopping Recommender Systems== https://ceur-ws.org/Vol-1253/paper3.pdf
                                    Interactive Explanations in Mobile
                                    Shopping Recommender Systems

                     Béatrice Lamche                                     Uğur Adıgüzel                     Wolfgang Wörndl
                      TU München                                        TU München                            TU München
                    Boltzmannstr. 3                                   Boltzmannstr. 3                       Boltzmannstr. 3
                85748 Garching, Germany                           85748 Garching, Germany               85748 Garching, Germany
                   lamche@in.tum.de                                 adiguzel@in.tum.de                    woerndl@in.tum.de

ABSTRACT                                                                             either find what we want or make decisions. Mobile rec-
This work presents a concept featuring interactive expla-                            ommender systems are addressing this problem in a mo-
nations for mobile shopping recommender systems in the                               bile environment by providing their users with potentially
domain of fashion. It combines previous research in expla-                           useful suggestions that can support their decisions to find
nations in recommender systems and critiquing systems. It                            what they are looking for or discover new interesting things.
is tailored to a modern smartphone platform, exploits the                            Explanations of recommendations help users to make bet-
benefits of the mobile environment and incorporates a touch-                         ter decisions in contrast to recommendations without ex-
based interface for convenient user input. Explanations have                         planations while also increasing the transparency between
the potential to be more conversational when the user can                            the system and the user [8]. However, recommender sys-
change the system behavior by interacting with them. How-                            tems employing explanations so far did not leverage their
ever, in traditional recommender systems, explanations are                           interactivity aspect. Touch based interfaces in smartphones
used for one-way communication only. We therefore design                             reduce user effort while giving input. This can empower the
a system, which generates personalized interactive explana-                          interactivity for explanations. There are two main goals of
tions using the current state of the user’s inferred preferences                     this work. One is to study whether a mobile recommender
and the mobile context. An Android application was devel-                            model with interactive explanations leads to more user con-
oped and evaluated by following the proposed concept. The                            trol and transparency in critique-based mobile recommender
application proved itself to outperform the previous version                         systems. Second is to develop a strategy to generate interac-
without interactive and personalized explanations in terms                           tive explanations in a content-based recommender system.
of transparency, scrutability, perceived efficiency and user                         A mobile shopping recommender system is chosen as appli-
acceptance.                                                                          cation scenario. The rest of the paper is organized as follows.
                                                                                     We first start off with some definitions relevant for explana-
                                                                                     tions in recommender systems and summarize related work.
Categories and Subject Descriptors                                                   The next section explains the reasoning behind and the path
H.5.2 [Information Interfaces and Presentation]: User                                towards a final mobile application, detailing the vision guid-
Interfaces—Interaction styles, User-centered design                                  ing the process. The user study evaluating the developed
                                                                                     system is discussed in section 4. We close by suggesting
                                                                                     opportunities for future research.
General Terms
Design, Experimentation, Human Factors.                                              2.   BACKGROUND & RELATED WORK
                                                                                        An important aspect of explanations is the benefit they
Keywords                                                                             can bring to a system. Tintarev et al. define the follow-
mobile recommender systems, explanations, user interac-                              ing seven goals for explanations in recommender systems
tion, Active Learning, content-based, scrutability                                   [8]: 1. Transparency to help users understand how the rec-
                                                                                     ommendations are generated and how the system works. 2.
                                                                                     Scrutability to help users correct wrong assumptions made
1.     INTRODUCTION                                                                  by the system. 3. Trust to increase users’ confidence in the
  In today’s world, we are constantly dealing with complex                           system. 4. Persuasiveness to convince users to try or buy
information spaces where we are often having trouble to                              items and enhance user acceptance of the system. 5. Effec-
                                                                                     tiveness to help users make better decisions. 6. Efficiency
                                                                                     to help users decide faster, which recommended item is the
                                                                                     best for them and 7. Satisfaction to increase the user’s satis-
                                                                                     faction with the system. However, meeting all these criteria
 Permission to make digital or hard copies of all or part of this work for           is unlikely, some of these aims are even contradicting such as
 personal or classroom use is granted without fee provided that copies are           persuasiveness and effectiveness. Thus, choosing which cri-
 not made or distributed for profit or commercial advantage and that copies          teria to improve is a trade-off. Explanations might also differ
 bear this notice and the full citation on the first page. To copy otherwise, to     by the degree of personalization. While non-personalized ex-
IntRS   2014,
 republish,      October
            to post         6, or
                    on servers  2014,    SilicontoValley,
                                   to redistribute              CA, prior
                                                      lists, requires USA.specific
                                                                                     planations use general information to indicate the relevance
Copyright     2014aby
 permission and/or    fee.the author(s).
                                                                                     of a recommendation, personalized explanations clarify how
 Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
a user might relate to a recommended item [8].                     supporting them in decision making by providing interac-
   Due to the benefits of explanations in mobile recommender       tive explanations. Mobile recommender systems use a lot of
systems, a lot of research has been conducted in this context.     situational information to generate recommendations, so it
Since our work focuses on explanations aiming at improving         might not always be clear to the user how the recommenda-
transparency and scrutability in a recommender system, we          tions are generated. Introducing transparency can help solv-
investigated previous research in these two areas.                 ing this problem. However, mobile devices require even more
   The work of Vig et al. [9] separates justification from         considerations in the design and development (e.g. due to
transparency. While transparency should give an honest             the small display size). Thus, these should also be taken into
statement of how the recommendation set is generated and           account when generating transparent explanations. More-
how the system works in general, justification can be re-          over, the explanation framework should generate textual ex-
frained from the recommendation algorithm and explain why          planations that make it clear to the user how her preferences
a recommendation was selected. Vig et al. developed a web-         are modeled. In order not to bore the user, explanations
based Tagsplanations system where the recommendation               must be concise and include variations in wording. Further-
is justified using relevance of tags. Their approach, as the       more, introducing transparency alone might not be enough
authors noted, lacked the ability to let users override their      because users often want to feel in control of the recommen-
inferred tag preferences.                                          dation process. The explanation goal scrutability addresses
   Cramer et al. [3] applied transparent explanations in the       this issue by letting users correct system mistakes. There
web-based CHIP (Cultural Heritage Information Person-              have been several approaches to incorporate scrutable ex-
alization) system that recommends artworks based on the            planations to traditional web-based recommender systems.
individual user’s ratings of artworks. The main goal of the        However, more investigation is required in the area of mo-
work was to make the criteria more transparent the system          bile recommender systems. First of all, the system should
uses to recommend artworks. It did so by showing the users         highlight the areas of textual explanations that can be in-
the criteria on which the system based its recommendation.         teracted with. Second, the system should allow the user to
The authors argue that transparency increased the accep-           easily make changes and get new recommendations. While
tance of the system.                                               transparent and scrutable explanations are the main focus of
   An interesting approach to increase scrutability has been       this work, there are also some side goals, such as satisfaction
taken by Czarkowski [4]. The author developed SASY,                and efficiency.
a web-based holiday recommender system which has scru-
tinization tools that aim not only to enable users to un-          3.1    The Baseline
derstand what is going on in the system, but also to let              Shopr, a previously developed mobile recommender sys-
them take control over recommendations by enabling them            tem serves as the baseline in our user study [6]. The system
to modify data that is stored about them.                          uses a conversation-based Active Learning strategy that in-
   TasteWeights is a web-based social recommender system           volves users in ongoing sessions of recommendations by get-
developed by Knijnenburg et al. [5] aiming at increasing in-       ting feedback on one of the items in each session. Thus, the
spectability and control. The system provides inspectability       system learns the user’s preferences in the current context.
by displaying a graph of the user’s items, friends and recom-      An important point is that the system initially recommends
mendations. The system allows control over recommenda-             very diverse items without asking its users to input their ini-
tions by allowing users to adjust the weights of the items and     tial preferences. After a recommendation set is presented,
friends they have. The authors evaluated the system with           the user is expected to give feedback on one of the items in
267 participants. Their results showed that users appreci-         the form of like or dislike over item features (e.g. price of
ated the inspectability and control over recommendations.          the item or color) and can state which features she in par-
The control given via weighting of items and friends made          ticular likes or dislikes. In case the user submitted a posi-
the system more understandable. Finally, the authors con-          tive feedback, using the refine algorithm shows more similar
cluded that such interactive control results in scrutability.      items. Otherwise, the system concludes a negative progress
   Wasinger et al. [10] apply scrutinization in a mobile restau-   has been made and refocuses on another item region and
rant recommender system named Menu Mentor. In this                 shows more diverse items. The algorithm keeps the previ-
system, users can see the personalized score of a recom-           ously critiqued item in the new recommendation set in order
mended restaurant and the details of how the system com-           to allow the user to further critique it for better recommen-
puted that score. However, users can change the recom-             dations. The explanation strategy used in this system is
mendation behavior only by critiquing presented items via          very simple. An explanation text is put on top of all items,
meal star ratings, no granular control over meal content is        which tries to convey the current profile of the user’s prefer-
provided. A conducted user study showed that participants          ences. It allows the user to observe the effect of her critiques
perceived enhanced personal control over given recommen-           and to compare the current profile against the actually dis-
dations.                                                           played items. An example for such an explanation text is
   In summary, although previous research focused on in-           ”avoid grey, only female, preferably shirt/dress”.
creasing either scrutability or transparency in recommender
systems, no research was conducted on how interactive ex-          3.2    How Explicit Feedback Affects Weights
planations can increase transparency as well as scrutability
                                                                      The modeling of the user’s preferences is an important
in mobile recommender systems.
                                                                   part of the proposed explanation generation strategy and
                                                                   feedback model and is adapted from the approach of Shopr
3.   DESIGNING THE PROTOTYPE                                       [6], described in the Baseline section. It is modeled as a
  Our system aims at offering shoppers a way to find nearby        search query q with weights for values of features (e.g. red
shopping locations with interesting clothing items while also      is a possible value of the feature color ). For each feature,
there is a weight vector that allows the prioritization of one       Local score LSI,D measures the performance of a dimen-
feature value over another. A query q for a user looking for      sion without taking into account how much the user values
only red dresses from open shops in 2000m reach would look        that dimension. Our system uses feature value weight vec-
like this (we here assume that each item has only the two         tors to represent both item features and features in a query,
features ’color’ and ’type’):                                     which represents the current preferences of the user. Local
                                                                  score of a feature is the scalar product of the weight vector
        q = ((distance ≤ 2000m) ∧ (time open =                    (for that feature) in the query with respective weight vector
         now + 30min)), {colorred,blue,green (1.0, 0, 0),   (1)   in the item’s representation. It is formalized as below, where
                                                                  wI,D represents the feature value weight vector for item di-
         typeblouse,dress,trousers (0, 1.0, 0)}                   mension D and wQ,D represents the feature value weight
   Our system uses two types of user feedback. One of them        vector for query dimension D and n stands for the number
is by critiquing the recommended items on their features          of feature values for that dimension:
(which was already provided in the baseline system, see sec-
tion 3.1). The other is by correcting mistakes regarding                                   n−1
                                                                                           X
the user’s preferences via explicit preference statement. Ex-                    LSI,D =         wI,D (i).wQ,D (i)           (3)
planations are designed to be interactive, so that the user                                i=0
can state her actual preference over feature values after tap-
                                                                     Explanation score ESI,D describes the explaining per-
ping on the explanation. If the user states interests on
                                                                  formance of a dimension. The weight for each dimension is
some feature values, a new value vector will be initialized
                                                                  calculated dynamically by using a function that decreases
for the query with all interested values being assigned equal
                                                                  the effects of the number of feature values in each dimen-
weight summing to 1.0 and the rest having 0.0 weight. That
                                                                  sion. It is formalized as follows, where lengthwD denotes
means that the system will focus on the stated feature val-
                                                                  the number of feature values in a specific dimension D and
ues, whereas the other values will be avoided. For example
                                                                  lengthtotal attribute values the total number of feature values
if a user interacts with the explanation associated with the
                                                                  for all dimensions. Using the square root produced good
query presented in equation 1 and states that she is actually
                                                                  results since it limits the effect of number feature values on
only interested in blue and green, then the resulting new
                                                                  the calculation of weights.
weight vector would look like the following (assuming that
we only distinguish between three colors) which will influ-                           s
ence the search query and thus the new recommendations:                                          lengthwD
                                                                               wD =                                          (4)
                                                                                          lengthtotal attribute values
                f eedbackpositive (blue, green) :
                                                            (2)      With the following dynamically calculated weight for a
                colorred,blue,green (0.0, 0.5, 0.5)
                                                                  dimension, explanation score of the dimension can be calcu-
3.3     Generating Interactive Explanations                       lated by multiplying it with the local score of that dimension:
  The main vision behind interactive explanations is to use
them not only as a booster for transparency and understand-                            ESI,D = LSI,D .wD                     (5)
ability of the recommendation process but also as an enabler
                                                                     Information score ISD measures the amount of infor-
for user control. In order to explain the current state of the
                                                                  mation provided by a dimension. The calculation of infor-
user model (which stores the user’s preferences) and the rea-
                                                                  mation score suggested by [1] is preserved as it already lays a
soning behind recommendations, two types of explanations
                                                                  good foundation to reason whether explaining an item from
are defined: recommendation- and preference explanations.
                                                                  a given dimension provides a good value. So, it can be de-
3.3.1    Interactive Recommendation Explanations                  fined as follows where R denotes the range of explanation
   Recommendation explanations are interactive textual ex-        scores for that dimension for all recommended items and I
planations. Their first aim is to justify why an item in the      denotes the information that dimension provides for an item:
recommendation set is relevant for the user. Second, they
let the user make direct changes to her inferred preferences.                                      R+I
                                                                                           ISD =                              (6)
The generation is based on the set of recommended items,                                              2
the user model and the mobile context.                               Range R is calculated as the difference between the max-
                                                                  imum and minimum explanation score for the given dimen-
Argument Assessment.                                              sion for all recommended items, namely R = max(ESI,D ) −
  Argument assessment is used to determine the quality of         min(ESI,D ). Information I, however, is calculated quite
every possible argument about an item. The argument as-           differently from the strategy proposed by [1]. In their sys-
sessment method is based on the method described in [1] . It      tem, a dimension provides less and less information as the
uses Multi-Criteria Decision Making Methods (MCDM) to             number of items to be explained from the same dimension
assess items I on multiple decision dimensions D (e.g. fea-       increases. This does not apply to the context of the cloth-
tures that an item can have) by means of utility functions.       ing recommender developed for this work. An item could
Dimensions in the context of this recommender system are          still provide good information if not there are not so many
features and contexts. The method described in [1] uses           items that can be explained from the same feature value.
four scores, which lay a good foundation for the method in        For instance, it is still informative to explain an item from
this work. However, their calculations have to be adapted         the color blue; although another item is also explained by
to the underlying recommendation infrastructure to produce        the same dimension (color) but from a different value, let’s
meaningful explanations.                                          say green. Therefore, I is calculated as a function of the size
of recommendation set (n) and number of items in the set
                                                   n−h
that has the same value for a dimension (h): I =         .
                                                   n−1
   Global score GSI measures the overall quality of an item
in all dimensions. It is the mean of explanation scores of all
of its dimensions. The following formula demonstrates how
it is formalized, where n denotes the total number of all
dimensions and ESI,Di the explanation score of an item on
ith dimension.

                            Pn−1
                              i=0 ESI,Di                  (7)
                    GSI =
                                   n
   The above-defined methods for calculating explanation
and information scores are only valid for item features. Ex-
planations should also include relevant context arguments.
In order to support that, every context instance that is cap-
tured and used by the system in the computation of the
recommendation set should also be assessed. The expla-
nation score of a context dimension is calculated using do-
main knowledge. The most important values for the context
gets the highest explanation score and it becomes lower and
lower as the relevance of the value of the context decreases.
For example, for location context, the explanation score is
inversely proportional to the distance between the current
location of the user and the shop where the explained item
is sold. Explanation score gets higher as the distance gets
lower. Information score is calculated with the same formula                Figure 1: Generation of explanations.
                                     R+I
defined earlier for features ISD =        , but Information I
                                       2
slightly changes. As proposed earlier, it is calculated using
                    n−h                                          as the process is divided into the selection and organization
the formula I =           , but in this case h stands for the
                    n−1                                          of explanation content and the transformation in a human
number of items with similar explanation score.
                                                                 readable form.
                                                                    Content Selection. The argumentation strategy selects
Argument Types.                                                  arguments for every item I separately. One or more primary
  In order to generate explanations with convincing argu-        arguments are selected first to help the user to instantly rec-
ments, different argument aspects are defined by follow-         ognize why the item is relevant. There are four alternative
ing the guidelines for evaluative arguments described in [2].    ways to select the primary arguments (alternatives 1 to 4
Moreover, the types of arguments described in [1] are taken      in figure 1). First alternative is that the item is in the rec-
as a basis. First of all, arguments can be either positive or    ommendation set because it was the last critique and it was
negative. While positive arguments are used to convince the      carried (1). Another is that the system has enough strong
user to the relevance of recommendations, negative argu-         arguments to explain an item (2). If there are not any strong
ments are computed so that the system can give an honest         arguments, the strategy checks if there are any weak argu-
statement about the quality of the recommended item. The         ments (3). In case there are one or more weak arguments,
second aspect of arguments is the type of dimension they         the system also adds supporting arguments to make the ex-
explain, feature or context. Lastly, they can be primary or      planation more convincing. Finally, if there are no weak
supporting arguments. Primary arguments alone are used           arguments too, then the item is checked if it is a good av-
to generate concise explanations. Combination of primary         erage by comparing its global score GSI to threshold β (4).
and supporting arguments are used to generate detailed ex-       If so, similar to alternative (3), supporting arguments are
planations. We distinguish between five argument types:          also added to increase the competence of the explanation.
Strong primary feature arguments, Weak primary feature ar-       Otherwise the strategy supposes that the recommended item
guments, Supporting feature arguments, Context arguments         is serendipitous and added to the set to explore the user’s
and Negative arguments.                                          preferences. With one or more primary arguments, the sys-
                                                                 tem checks if there are any negative arguments and context
Explanation Process.                                             arguments to add (5 and 6).
   The explanation process is based on the approach de-             Surface Generation. The result of the content selec-
scribed in [1] but it is adapted to use the previously de-       tion is an abstract explanation, which needs to be resolved
fined argument types. Different from the system in [1], ex-      to something the user understands. This is done in the sur-
planations are designed to contain multiple positive argu-       face generation phase. Various explanation sentence tem-
ments on features. Negative arguments are generated but          plates are decorated with either feature values or context
only displayed when necessary by using a ramping strategy.       values (7 and 8). Explanation templates are sentences with
Figure 1 shows the process to select arguments. It follows       placeholders for feature and context values stored in XML
the framework for explanation generation described in [2]        format. The previously determined primary argument type
Table 1: Text templates for recommendation explanations.               Table 2: Text templates for preference explanations.

 Text template         Example phrase                                Text template       Example phrase
 Strong argument       “Mainly because you currently like X.”        Only some val-      “You are currently interested only in
 Weak argument         “Partially as you are currently inter-        ues                 X, Y [...].” The word “only” in the
                       ested in X.”                                                      text is emphasized in bold.
 Supporting argu-      “Also, slightly because of your current       Avoiding    some    “You are currently avoiding X, Y
 ment                  interest in X.”                               values              [...].” The word “avoiding” is empha-
 Location context      “And it is just Y meters away from                                sized in bold.
                       you.”                                         Preferably some     “It seems, you currently prefer X, Y
 Average item          “An average item but might be inter-          values              [...].”
                       esting for you.”                              Indifferent  to     “You are currently indifferent to X
 Last critique         “Kept so that you can keep track of           feature             feature”.
                       your critiques.”
 Serendipity           “This might help us discovering your
                       preferences.” or “A serendipitous item
                       that you perhaps like.”
 Negative      argu-   “However, it has the following fea-
 ment                  ture(s) you don’t like: X, Y [...].”


is used to determine which type of explanation template to
use. Feature values in the generated textual output are then
highlighted and their interaction endpoints are defined (9).
The resulting output is a textual explanation, highlighted
in the parts where feature values are mentioned. They are
interactive such that, after the user taps on the highlighted
areas, she can specify what she exactly wants.

3.3.2    Interactive Preference Explanations
   Preference explanations have got two main goals. First,
they aim to let the user inspect the current state of the sys-                     (a)                          (b)
tem’s understanding of the user’s preferences. Second, they
intend to let the user make direct changes to the prefer-           Figure 2: Recommendation list (a) and explicit preference
ence. Two main types of preferences explanations are de-            feedback screen (b).
fined, interactive textual explanations and interactive visual
explanations.
                                                                    appearing in the chart is modeled with its weights (scaled
Generating Textual Preference Explanations.                         to a percentage), color and description in the user interface.
  The only input to textual preference explanation gener-           Figure 5 illustrates this chart representation.
ation algorithm is the user model. For each dimension D
the algorithm can generate interactive explanations. Di-            3.3.3    Using Text Templates Supporting Variation
mensions are features that an item can have. The algorithm             XML templates are used to generate explanation sentences
distinguishes between four feature value weight vectors, indi-      for the different user preference types in English language.
cating different user preferences: First, the user is indifferent   Those templates contain placeholders for feature and con-
to any feature value. Second, the user is only interested in        text values which are replaced during the explanation gen-
a set of feature values. Third, the user is avoiding a set of       eration process. For recommendation explanations, there are
feature values. And fourth, the user prefers a set of feature       a few sentence variations for almost every type of arguments.
values over others.                                                 See table 1 for examples of the different text templates for
                                                                    recommendation explanations. These templates can be used
Generating Visual Preference Explanations.                          in combination with each other. For example, supporting
  Visual preference explanations are generated also by using        arguments can support a weak argument. In such cases,
the user model, more specifically by making use of the array        argument sentences are connected using conjunctions.
of feature value weight vectors, which represents the user’s           Similar mechanism is also used for the preference explana-
current preferences. For each feature, there is already a           tions. However, to keep it simple, variation is not provided,
feature value weight vector, which indicates the priorities of      as the number of features to explain is already limited. See
the user among feature values. All those weights are between        table 2 for selected examples of several text templates for
0.0 and 1.0 summing up to 1.0. They could be scaled to a            preference explanations.
percentage to generate charts showing the distribution of
percentage of interests for feature values.                         3.4     Interaction and Interface Design
  In order to generate charts, it is also required to determine       The first issue was to clarify how to integrate the inter-
with which color and description a feature value will be rep-       action process with textual explanations. It was envisioned
resented in a chart. In order to support that, a feature value      to give the user the opportunity to tap on the highlighted
                                                                  different “drill down” screen for all screens was developed as
                                                                  part of the mindmap feature. Figure 5 shows the mindmap
                                                                  detail screens for the clothing color feature. The user’s pref-
                                                                  erences on feature values are represented as a chart. Every
                                                                  feature value is displayed as a different color in the charts.
                                                                  One of the most important features is that the highlighted
                                                                  parts of the explanation texts and the charts are interactive
                                                                  as well which lets the user access the explicit feedback screen
                                                                  to provide her actual preferences.
                                                                    The full source code and resources for the Android app
                                                                  and the algorithm are available online1 .

                                                                  4.     USER STUDY
                                                                    The main three goals of the evaluation are: First, to find
                                                                  out whether transparency and user control can be improved
                                                                  by feature-based personalized explanations supported by scru-
                (a)                         (b)                   table interfaces in recommender systems. Second, to find out
                                                                  whether side goals such as higher satisfaction are achieved
      Figure 3: Detailed information screens of items.            and lastly to see whether other important system goals such
                                                                  as efficiency are not damaged.

                                                                  4.1     Setup
areas of the explanation text to state her actual preferences        The test hardware is a 4.3 inch 480 x 800 resolution An-
on a feature. This leads to a two-step process. First, the        droid smartphone (Samsung Galaxy S2) running the Jelly
user sees an item with an explanation including highlighted       Bean version of the Android operating system (4.1.2).
words (highlighted words are always associated with a fea-           Two variants of the system are put to the test. In order
ture, see figure 2a) and taps on one of them (e.g. in figure      to refrain from the effects of different recommender algo-
2b, ”t-shirt” was tapped). Then the system directs the user       rithms, both variants use the same recommendation algo-
to the screen where the user can make changes. In this sec-       rithm which uses diversity-based Active Learning [6]. More-
ond step, she specifies which feature values she is currently     over, critiquing and item details interfaces are exactly the
interested in. Lastly, the system updates the list of recom-      same. The difference lies in the explanations: The EXP vari-
mendations which complets a recommendation cycle. Note            ant refers to the proposed system, described in the previous
that the critiquing process and associated screens from the       section. In order to test the value of the developed explana-
project Shopr, which is taken as a basis (see section 3.1)        tions and scrutinization tools, a baseline (BASE variant) to
are kept in the developed system. Eventually, the interac-        compare against is needed (see subsection 3.1). The study
tion is a hybrid of critiquing and explicitly stating current     is designed as within-subject to keep the number of testers
preferences. On top of each explicit feedback screen, a text      at a reasonable size. Thus one group of people tests both
description of what is expected from the user is given.           variants. Which system is tested first was flipped in between
   Due to the applied ramping strategy mentioned in sec-          subjects so that a bias because of learning effects could be
tion 3.3.1, all extra arguments in explanations that are not      reduced.
important were not shown as explanations in the list of rec-         In order to create a realistic setup, it is necessary to gen-
ommendations but in the screen where item details are pre-        erate a data set that represents real-world items. For that
sented. Tapping on an item picture accesses that screen.          purpose, we developed a data set creation tool as an open-
Here, the user can also browse through several pictures of        source project2 . The tool crawls clothing items from a well-
an item by swiping the current picture from right to left         known online clothing retailer website. To keep the amount
(see figure 3b). In order to make it obvious for the user, the    of work reasonable, items were associated with an id, one of
sentences with positive arguments always start with a green       19 types of clothing, one of 18 colors, one of 5 brands, the
“+” sign. Negative arguments sentences, on the other hand,        price (in Euro), the gender (male, female or unisex) and a
always start with a red “-” sign (see figure 3).                  list of image links for the item. The resulting set is 2318
   The next issue was to implement preference explanations,       items large, with 1141 for the male and 1177 for the female
what we call Mindmap feature. Mindmap feature is the way          gender.
that system explains its mental map about the preferences            For the study, participants of various age, educational
of the user. The overview screen for mindmap was designed         background and current profession were looked for. Overall
to quickly show the system’s assumptions about the user’s         30 people participated, whereas 33% of users were female
current preferences. To keep it simple but yet usable, only       and 67% were male.
textual explanations are used for each feature (see figure 4b).      The actual testing procedure used in the evaluation was
In order to make it easy for the user to notice what is im-       structured as follows: We first asked the participants to
portant, the feature values used in the explanation text are      provide background information about themselves, such as
highlighted. Moreover, every element representing a feature       demographic information and their knowledge about mobile
is made interactive. This lets the user access the explicit       systems and recommender systems. Next, the idea of the
feedback screen to provide her actual preferences.
                                                                  1
   The user should also be able to get more detailed visual           https://github.com/adiguzel/Shopr
                                                                  2
information for all the features. In order to achieve that, a         https://github.com/adiguzel/pickpocket
               (a)                          (b)

   Figure 4: Navigation Drawer (a) and Overview (b).


system was introduced and the purpose of the user study                  Figure 5: Mindmap detail screens for color.
was made clear. We chose a realistic scenario instead of
asking users to find an item they could like:
Task: Imagine you want to buy yourself new clothes for
                                                                 preferences in each cycle and uses it to generate recommen-
an event in a summer evening. You believe that following
                                                                 dations that can be interesting for the user.
type of clothes would be appropriate for this event: shirt,
                                                                    On average, when asked if a tester understands the sys-
t-shirt, polo shirt, dress, blouse or top. As per color you
                                                                 tem’s reasoning behind its recommendations, EXP performs
consider shades of blue, green, white, black and red. You
                                                                 better than BASE (mean average of 4.63 compared to 4.3
have a budget of up to e 100. You use the Shopr app to look
                                                                 out of a 1-5 Likert scale). Further analysis suggests that the
for a product you might want to purchase.
                                                                 variant with interactive explanations (EXP) is perceived sig-
   After introducing them to the task, users were given hands    nificantly more transparent than the variant with baseline
on time to familiarize themselves with the user interface and    explanations (one-tail t-test, p<0.05 with p=0.018).
grasp how the app works. After selecting and confirming             Users were asked about the ease of telling the system what
the choice for a product, the task was completed. Then           they want in order to measure the overall user control they
testers were asked to rate statements about transparency,        perceived. Average rating of participants was better with
user control, efficiency and satisfaction based on their expe-   EXP (4.33 versus 3.23). In a further analysis, EXP seemed
rience with the system on a five-point Likert scale (from 1,     significantly better in terms of perceived overall control than
strongly disagree to 5, strongly agree) and offer any general    BASE (one-tail t-test, p<0.05 with p=0.0003).
feedback and observations. After having tested both vari-           When asked about the ease of correcting system mistakes,
ants, participants stated which variant they preferred and       EXP performs a lot better than BASE (mean average of
why that was the case.                                           4.36 compared to 3 out of a 1-5 Likert scale). Further anal-
                                                                 ysis reveals that EXP is significantly better in terms of per-
4.2   Results                                                    ceived scrutability than BASE (one-tail t-test, p<0.05 with
   The testing framework applied in the user study is a sub-     p=0.6.08E-06).
set of the aspects that are relevant for critiquing recom-          Participants completed their task in average one cycle less
menders and explanations in critiquing recommenders. It          using EXP than BASE (6.5 with EXP, 7.46 with BASE).
follows the user-centric approach presented in [7]. The mea-     However, one-tail t-test shows that EXP is not significantly
sured data is divided into four areas: transparency, user        better than BASE (p>0.05 with p=0.14).
control, efficiency and satisfaction.                               The next part of measuring objective effort is done via
   The means of the measured values for the most important       tracking the time it took for each participant from seeing
metrics of the two systems, BASE denoting the variant using      the initial set of recommendations until the target item was
only simple non-interactive explanations, EXP the version        selected. On average BASE seems to be better with a mean
with interactive explanations, are shown in table 3. Next        session length of 160 seconds against 165 seconds. However,
to the mean the standard deviation is shown, the last col-       it was found not to be significantly more time efficient (one-
umn denoting the p-value of a one-tail paired t-test with 29     tail t-test, p>0.05 with p=0.39). One reason for this could
degrees of freedom (30 participants - 1).                        be that although EXP gives its users tools to update pref-
   In order to measure actual understanding after using a        erences over several features quickly, it has more detailed
variant, users were asked to describe how the underlying         explanations. Thus, users spent more time with reading.
recommendation system of that variant works. In general,            Users were asked about the ease of finding information
almost all of the participants could explain for both rec-       and the effort required to use the system in order to get
ommenders that the systems builds a model of the user’s          an idea about the system’s efficiency. The participants’ av-
Table 3: The means of some important measured values                 recommendations, skipping to the next list of recommenda-
comparing both variations of the system.                             tions without critiquing and having more item attributes for
                                                                     critiquing, could make the application even more appealing.
                         BASE stdev EXP stdev p                         Future development may also include the creation of more
                         mean       mean      value                  complex recommendation scenarios to test the capability of
 Perceived    trans-     4.3  0.70 4.63 0.49 0.018                   the proposed concept even further. One can add more item
 parency                                                             features to critique and also take the user’s mobile context
 Perceived overall       3.23     1.04   4.33    0.71   0.0003       (e.g. mood and seasonal conditions) into account during
 control                                                             the recommendation process. Furthermore, future research
 Scrutability            3        1.31   4.36    0.85   6.08E-       might study the generation of interactive explanations for
                                                        06           systems with rather complex recommendation algorithms.
 Cycles                  7.46     3.64   6.5     3.28   0.14         Interactive explanations might make adjustable parts of the
 Time consumption        160 s    74     165     83     0.39         algorithm transparent and allow the user to change them.
                                         s
 Perceived efficiency    3.43     1.13   4.33    0.75   0.0003       6.   REFERENCES
 Satisfaction            3.76     0.85   4.43    0.56   0.0004        [1] R. Bader, W. Woerndl, A. Karitnig, and G. Leitner.
                                                                          Designing an explanation interface for proactive
                                                                          recommendations in automotive scenarios. In
erage rating was better with EXP with 4.33 against 3.43                   Proceedings of the 19th International Conference on
with BASE. Further analysis revealed that users perceived                 Advances in User Modeling, UMAP’11, pages 92–104,
EXP significantly more efficient than BASE (one-tail t-test,              Berlin, Heidelberg, 2012. Springer-Verlag.
p<0.05 with p=0.0003).                                                [2] G. Carenini and J. D. Moore. Generating and
  When inquired how satisfied participants were with the                  evaluating evaluative arguments. Artif. Intell.,
system overall, EXP performs better with 4.43 against 3.76.               170(11):925–952, Aug. 2006.
One-tail t-test suggests that this is a significant result (p<0.05    [3] H. Cramer, V. Evers, S. Ramlal, M. Someren,
with p=0.0004).                                                           L. Rutledge, N. Stash, L. Aroyo, and B. Wielinga. The
  Finally, participants were asked to pick a favorite from                effects of transparency on trust in and acceptance of a
the two evaluated variants. 90% preferred the variant with                content-based art recommender. User Modeling and
interactive explanations (EXP) over the variant with simple               User-Adapted Interaction, 18(5):455–496, Nov. 2008.
non-interactive explanations (BASE), mostly because of the            [4] M. Czarkowski. A Scrutable Adaptive Hypertext. PhD
increased perception of control over recommendations.                     thesis, University of Sydney, 2006.
                                                                      [5] B. P. Knijnenburg, S. Bostandjiev, J. O’Donovan, and
5.   CONCLUSION AND FUTURE WORK                                           A. Kobsa. Inspectability and control in social
                                                                          recommenders. In Proceedings of the Sixth ACM
   This work investigated the development and impact of a                 Conference on Recommender Systems, RecSys ’12,
concept featuring interactive explanations for Active Learn-              pages 43–50, New York, NY, USA, 2012. ACM.
ing critique-based mobile recommender systems in the fash-
                                                                      [6] B. Lamche, U. Trottman, and W. Wörndl. Active
ion domain. The developed concept proposes the generation
                                                                          learning strategies for exploratory mobile
of explanations to make the system more transparent while
                                                                          recommender systems. In Proceedings of CaRR
also using them as an enabler for user control in the recom-
                                                                          workshop, 36th European Conference on Information
mendation process. Furthermore, the concept defines the
                                                                          Retrieval, Amsterdam, Netherlands, Apr 2014.
user feedback as a hybrid of critiquing and explicit state-
                                                                      [7] P. Pu, L. Chen, and R. Hu. A user-centric evaluation
ments of current interests. A method is developed to gener-
                                                                          framework for recommender systems. In Proceedings of
ate explanations based on a content-based recommendation
                                                                          the Fifth ACM Conference on Recommender Systems,
approach. The explanations are always made interactive
                                                                          RecSys ’11, pages 157–164, New York, NY, USA,
to give the user a chance to correct possible system mis-
                                                                          2011. ACM.
takes. In order to measure the applicability of the concept,
a mobile Android app using the proposed concept and the               [8] N. Tintarev and J. Masthoff. Evaluating the
explanation generation algorithm was developed. Several                   effectiveness of explanations for recommender systems.
aspects regarding display and interaction design of explana-              User Modeling and User-Adapted Interaction,
tions in mobile recommender systems are discussed and solu-               22(4-5):399–439, Oct. 2012.
tions to the problems faced during the development process            [9] J. Vig, S. Sen, and J. Riedl. Tagsplanations:
are summarized. The prototype was evaluated in a study                    Explaining recommendations using tags. In
with 30 real users. The proposed concept performed signifi-               Proceedings of the 14th International Conference on
cantly better compared to the approach with non-interactive               Intelligent User Interfaces, IUI ’09, pages 47–56, New
simple explanations in terms of our main goals to increase                York, NY, USA, 2009. ACM.
transparency and scrutability and side goals to increasing           [10] R. Wasinger, J. Wallbank, L. Pizzato, J. Kay,
perceived efficiency and satisfaction. Overall, the developed             B. Kummerfeld, M. Böhmer, and A. Krüger. Scrutable
interactive explanations approach demonstrated the user ap-               user models and personalised item recommendation in
preciation of transparency and control over the recommen-                 mobile lifestyle applications. In User Modeling,
dation process in a conversation-based Active Learning mo-                Adaptation, and Personalization, volume 7899 of
bile recommender system tailored to a modern smartphone                   Lecture Notes in Computer Science, pages 77–88.
platform. Some changes, such as increasing the number of                  Springer Berlin Heidelberg, 2013.