Designing Explanation Interfaces for Transparency and Beyond
                                Chun-Hua Tsai                                                             Peter Brusilovsky
                            University of Pittsburgh                                                     University of Pittsburgh
                               Pittsburgh, USA                                                              Pittsburgh, USA
                                cht77@pitt.edu                                                              peterb@pitt.edu

ABSTRACT                                                                               achieves all these goals equally well, the designer needs to make
In this work-in-progress paper, we presented a participatory process                   a trade-off while choosing or designing the form of interface [17].
of designing explanation interfaces for a social recommender sys-                      For instance, an interactive interface can be adapted to increase
tem with multiple explanatory goals. We went through four stages                       the user trust and satisfaction but may prolong the decision and
to identify the key components of the recommendation model, ex-                        explore process while using the system (i.e., lead to decreasing of
pert mental model, user mental model, and target mental model.                         efficiency) [19].
We reported the results of an online survey of current system users                       Over the past few years, several approaches have been discussed
(N=14) and a controlled user study with a group of target users                        to enhance the explainability in the recommender systems. The ap-
(N=15). Based on the findings, we proposed five set of explanation                     proaches can be summarized by different styles, reasoning models,
interfaces for five recommendation models (N=25) and discussed                         paradigms and information [2]. 1) Styles: Kouki et al. [8] conducted
the user preference of the interface prototypes.                                       an online user survey to explore the user preference in nine expla-
                                                                                       nation styles. They found Venn diagrams outperformed all other
CCS CONCEPTS                                                                           visual and text-based interfaces. 2) Reasoning Models: Vig et al.
                                                                                       [24] used tags to explain the recommended item and the user’s
• Information systems → Recommender systems; • Human-
                                                                                       profile. The approach emphasized the factor of why a specific rec-
centered computing → HCI design and evaluation methods.
                                                                                       ommendation is plausible, instead of revealing the process of rec-
                                                                                       ommendation or data. 3) Paradigms: Herlocker et al. [5] presented
KEYWORDS
                                                                                       a model for explanations based on the user’s conceptual model of
Social Recommendation; Explanation; Mental Model; User Interface                       the collaborative-based recommendation process. The result of the
ACM Reference Format:                                                                  evaluation indicates two interfaces - “Histogram with grouping”
Chun-Hua Tsai and Peter Brusilovsky. 2019. Designing Explanation Inter-                and “Presenting past performance” - improved the acceptance of
faces for Transparency and Beyond. In Joint Proceedings of the ACM IUI 2019            recommendations. 4) Information: Pu and Chen [13] proposed ex-
Workshops, Los Angeles, USA, March 20, 2019, 11 pages.                                 planations tailored to the user and recommendation, i.e., although
                                                                                       one recommendation is not the most popular one, the explanation
1    INTRODUCTION                                                                      would justify the recommendation by providing the reasons.
Enhancing explainability in recommender systems has drawn more                            Although many approaches have been proposed to enhance the
and more attention in the field of Human-Computer Interaction                          recommender explainability, bringing explanation interfaces to an
(HCI). Further, the newly initiated European Union’s General Data                      existing recommender system is still a challenging task. More re-
Protection Regulation (GDPR) required the owner of any data-                           cently Eiband et al. [1] suggested a different approach to improve
driven application to maintain a “right to the explanation” of al-                     user mental model (UMM) while bringing transparency (explana-
gorithmic decisions [1], which urging to gain transparency in all                      tions) to a recommender system. The model described the process of
existing intelligent systems. Self-explainable recommender systems                     a user builds an internal conceptualization of the system or interface
have been proved to gain user perception on system transparency                        along with user-system interactions, i.e., building the knowledge
[17], trust [13] and accepting the system suggestions [7]. Instead of                  of how to interact with the system. If the model is misguided or
the offline performance improvements, more and more researches                         opaque, the users will face difficulties in predicting or interpret-
focused on the works of evaluating the system from the user experi-                    ing the system [1]. Hence, the researchers suggested to improve
ence, i.e., what is the user perception on the explanation interfaces?                 the mental model, so the users can gain awareness while using the
   Explaining recommendations (i.e., enhancing the system explain-                     system as well as the explanation interfaces.
ability) can achieve different explanatory goals which help users to                      In this work-in-progress paper, we presented a stage-based par-
make a better decision or persuading them to accept the sugges-                        ticipatory process [1] for integrating seven exploratory goals into
tions from a system [14, 16]. We followed the seven explanatory                        real-world hybrid social recommender system. First, we introduced
goals that proposed by Tintarev and Masthoff [17]: Transparency,                       the Expert Mental Model to summarize the key components of each
Scrutability, Trust, Persuasiveness, Effectiveness, Efficiency, and Sat-               recommendation feature. Second, we conducted an online survey
isfaction. Since it is hard to have a single explanation interface that                to identify the User Mental Model of seven explanatory goals from
                                                                                       the current system users. Third, we did a user study with card-
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                     sorting and semi-interview to determine the user’s Target User
Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted   Model. Fourth, we proposed a total of 25 explanation interfaces for
for private and academic purposes. This volume is published and copyrighted by its     five recommendation features and compared the user perceptions
editors.
                                                                                       across designs.
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                   Chun-Hua Tsai and Peter Brusilovsky


Figure 1: Relevance Tuner+: (A) relevance sliders; (B) stackable score bar; (C) explanation icon; (D) user profiles. The interface
supports the user-driven exploration of recommended items in Section A and inspects the fusion in Section B. The user can
further inspect the explanation model by clicking Section C, and more profile detail is presented in section D. Our goal is to
provide an explanation interfaces for each explanation model. (The scholar names have been pixelated for privacy protection)


2   BACKGROUND                                                          3   FIRST STAGE: EXPERT MENTAL MODEL
We adopted the stage-based participatory framework from Eiband          Instead of interactive recommender [7, 23], we attached an expla-
et al. [1], which intends to answer two key questions while design-     nation icon next to each social recommendation. The users have
ing the explainable user interface (UI): a) What to Explain? And        a choice of requesting the explanations while exploring or brows-
b) How to explain? The process can be summarized in four stages.        ing the recommendations. We adopted a hybrid explanation ap-
1) Expert Mental Model: What can be explained? We defined an            proach [8, 12], which mixed multiple visualizations to explain the
expert as the recommender system developer. 2) User Mental Model:       details of the recommendation model. We would like to let the users
What is the user mental model of the system based on its current        understand both a) the mutual relationship (similarity) between
UI? The model should be built through the current recommender           him/herself and the recommended scholar and b) the key compo-
system users. 3) Target Mental Model: Which key components of           nent in each recommendation model. We then discussed the Expert
the algorithm do users want to be made explainable in the UI? The       Mental Model through the system developing process of the five
target user is the users who are new to the system. 4) Iterative Pro-   recommendation models.
totyping: How can the target mental model be reached through UI            1) Publication Similarity: The similarity was determined by
design. The key is to measure if the proposed explanation interfaces    the degree of text similarity between two scholars’ publications
achieved the explanatory goals.                                         using cosine similarity. We applied tf-idf to create the vector with
   In this work, we aimed to enhance the explainability in a confer-    a word frequency upper bound of 0.5 and a lower bound of 0.01 to
ence support system - Conference Navigator 3 (CN3). The system          eliminate both common and rarely used words. In this model, the
has been used to support more than 45 conferences at the time           key components were the terms of the paper title and abstract as
of writing this paper and has data on approximately 7,045 articles      well as its term frequency.
presented at these conferences; 13,055 authors; 7,407 attendees;           2) Topic Similarity: This similarity was determined by match-
32,461 bookmarks; and 1,565 social connections. Our work was in-        ing research interests using topic modeling. We used latent Dirichlet
formed by the results of a controlled user study where we explored      allocation (LDA) to attribute collected terms from publications to
an earlier version of the social recommender interface Relevance        one of the topics. We chose 30 topics to build the topic model for all
Tuner [19] (shown in Figure 1). It was a controllable interface for     scholars. Based on the model, we then calculated the topic similarity
the user to fuse weightings of multiple recommendation models           between any two scholars. The key components were the research
and to inspect the explanations.                                        topics and the topical words of each research topic [25].
   A total of five recommendation models were introduced in this           3) Co-Authorship Similarity: This similarity approximated
study: 1) Publication Similarity: the degree of cosine similarity of    the network distance between the source and recommended users.
users’ publication text. 2) Topic Similarity: the overlap of research   For each pair of the scholar, we tried to find six possible paths for
interests (using topic modeling). 3) Co-Authorship Similarity: the      connecting them, based on their coauthorship relationships. The
degree of connection, based on a shared network of co-authors.          network distance is determined by the average distance of the six
4) Interest Similarity: the number of papers co-bookmarked, as          paths. The key components were the coauthors (as nodes), coau-
well as the authors co-followed. 5) Geographic Distance: a mea-         thorship (as edges) and the distance of connection the two scholars.
surement of the geographic distance between affiliations. Based            4) CN3 Interest Similarity: This similarity was determined by
on the stage-based participatory framework, we went through the         the number of co-bookmarked conference papers and co-connected
same four stages for each recommendation model to identify the          authors in the experimental social system (CN3). We simply used
user-preferred user interface design. We aimed to design expla-         the number of shared items as the CN3 interest similarity. The key
nation interfaces for each recommendation model with multiple           component is the shared conference papers and authors.
exploratory goals.
Designing Explanation Interfaces for Transparency and Beyond                         IUI Workshops’19, March 20, 2019, Los Angeles, USA


   5) Geographic Distance: This similarity was a measurement of            3) Trust: 28% of respondents mentioned that they trusted the
the geographic distance between attendees. We retrieved longitude       system more when they perceived the benefits of using the sys-
and latitude data based on attendees’ affiliation information. We       tem. 35% of respondents preferred to trust a system with reliable
used the Haversine formula to compute the geographic distance           and informative explanations, more detailed information or un-
between scholars. The key components are the geographic distance        derstandable. 35% of respondents mentioned they trust a system
and affiliation information of the scholars.                            with transparency or passed their verification. We then summa-
                                                                        rized the feedback into three factors: 10) The visualization presents
                                                                        a convincing explanation to justify the recommendation. 11) The visu-
4   SECOND STAGE: USER MENTAL MODEL                                     alization presents the components (e.g., algorithm) that influenced the
As a first step towards understanding the design factors of explana-    recommendation. 5) The visualization allows me to see the connections
tory interfaces, we deployed a survey through a social recommender      between people and understand how they are connected.
system, Conference Navigator [18], and analyzed data from the re-          4) Persuasiveness: Half of the respondents mentioned the ex-
spondents. We targeted the users who had created an account and         planation of social familiarity would persuade them to explore novel
interacted with the system in their previous conference attendance      social connections; namely, when shown social context details or
(at least using the system for one conference). The survey was          shared interests. 21% of respondents indicated that an informative
initiated by sending an invitation to the qualified users in Decem-     interface could boost the exploration of new friendship. 28% of
ber 2017. We sent out 89 letters to the conference attendees of         respondents preferred a design that inspired curiosity, implicit rela-
UMAP/HT 2016, and a total of 14 participants (7 female) replied to      tionships. We then summarized the feedback into three factors: 12)
create the pool of participants for the user study. The participants    The visualization shows me the shared interests, i.e., why my interests
were from 13 different countries; their ages ranged from 20 to 40       are aligned with the recommended person. 13) The visualization has
(M=31.36, SE=5.04). We did an online survey to collect necessary        a friendly, easy-to-use interface. 14) The visualization inspired my
demographic information and self-reflection about how to design         curiosity (to discover more information).
an explanation function in seven explanatory goals [17].                   5) Effectiveness: 64% of respondents mentioned that the as-
   The proposed questions were: How can an explanation function         pects of social recommendation relevance helped them to make a
help you to perceive system 1) Transparency - explain how the           good decision. The aspect included explaining the recommendation
system works? 2) Scrutability - allow you to tell the system it         process, understandable or more informative. 28% of respondents
is wrong? 3) Trust - increase your confidence in the system? 4)         suggested a reminder that a historical or successful decision could
Persuasiveness - convince you to explore or to follow new friends?      help them to make a good decision, i.e., a previously-made user
5) Effectiveness - help you make good decisions? 6) Efficiency -        decision and success stories. We then summarized the feedback
help you to make decisions faster? 7) Satisfaction - make using the     into three factors: 15) The visualization presents the recommenda-
system fun and useful? We asked the participants to answer each         tion process. 5) The visualization allows me to see the connections
question in 50-100 words, in particular reflecting the explanatory      between people and understand how they are connected. 11) The vi-
goals of the social recommendation. The data was published in [20].     sualization presents the components (e.g., algorithm) that influenced
   1) Transparency: 71% of respondents pointed out the reasons of       the recommendation.
generated social recommendation that help them to perceive higher          6) Efficiency: 28% of respondents mentioned that a proper high-
system transparency, i.e., the personalized explanation, the linkage    lighting of the recommendation helped to make the decision faster.
and data sources, reasoning method and understandability. We            For example, they are emphasizing the relatedness, identifying the
then summarized the feedback into five factors: 1) The visualization    top recommendations or providing success stories. 28% of respon-
presents the similarity between my interest and the recommended         dents preferred a tune-able or visualized interface to accelerate the
person. 2) The visualization presents the relationship between the      decision process, such as tuning the recommendation features, visu-
recommended person and me. 3) The visualization presents where did      alizing the recommendations. However, the explanations may not
the data were retrieved. 4) The visualization presents more in-depth    always be useful. 21% of respondents argued that the explanation
information on how the score amounts up. 5) The visualization allows    would prolong the decision process instead of speeding it up: the
me to see the connections between people and understand how they        user may need to take extra time to examine the explanations. We
are connected.                                                          then summarized the feedback into two factors: 16) The visualiza-
   2) Scrutability: Half of the respondents mentioned they needed       tion presents highlighted items/information that is strongly related to
“inspectable details” to figure out the wrong recommendation. 35%       me. 17) The visualization presents aggregated, non-obvious relations
of respondents suggested the mechanism of accepting user feedback       to me.
on improving wrong recommendations, such as a space to submit              7) Satisfaction: The feedback on how an explanation can help
user ratings or yes/no options. 14% of respondents preferred a          the user satisfy the system was varied. Three aspects received an
dynamic exploration process to determine the recommendation             equal 7% of respondents’ preferences. That is, users preferred to
quality. We then summarized the feedback into four factors: 6) The      view the feedback from the community, shown the historical inter-
visualization allows me to understand whether the recommendation        action record and provided a personalized explanation. Two aspects
is good or not. 7) The visualization presents the data for making the   received an equal 14% of respondents’ preference; i.e., a focus on a
recommendations. 8) The visualization allows me to compare and          friendly user interface and saved decision time. 21% of respondents
decide whether the system is correct or wrong. 9) The visualization     reported a higher satisfaction on using the explanation as a “small
allows me to explore and then determine the recommendation quality.     talk topic”, i.e., as an initial conversation in a conference. 28% of
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                     Chun-Hua Tsai and Peter Brusilovsky

     Table 1: The card-sorting results of the third stage.                   (12) The visualization shows me the shared interests, i.e., why
                                                                                  my interests are aligned with the recommended person.
                  Very         Less          Not           Not               (13) The visualization has a friendly, easy-to-use interface
                Important    Important     Important     Relevant            (14) The visualization inspired my curiosity (to discover more
  Factor 1         11           1             3              0                    information).
  Factor 2         9            5             1              0               (15) The visualization presents the recommendation process clearly.
   Factor 3        0            2             10             3               (16) The visualization presents highlighted items/information
   Factor 4        1            8             3              3                    that is strongly related to me.
   Factor 5        5            4             6              0               (17) The visualization presents aggregated, non-obvious relations
  Factor 6         7            6             2              0                    to me.
   Factor 7        3            2             9              1               (18) The visualization presents feedback from other users, i.e., I
   Factor 8        4            3             3              5                    can see how others rated a recommended person.
   Factor 9        7            2             4              2               (19) The visualization allows me to tell why does this system
  Factor 10        3            9             2              1                    recommend the person to me.
  Factor 11        0            6             6              3              We also found some factors across different exploratory goals.
  Factor 12        4            6             5              0           For example, Factor 1 were shared by the exploratory goal of Trans-
  Factor 13        13           2             0              0           parency and Satisfaction. Factor 5 were shared by Transparency,
  Factor 14        0            13            2              0           Trust and Effectiveness. Factor 11 was shared by Trust and Effective-
  Factor 15        4            7             3              1           ness. Factor 13 was shared by Persuasiveness and Satisfaction.
  Factor 16        10           5             0              0
  Factor 17        3            6             3              3           5     THIRD STAGE: TARGET MENTAL MODEL
  Factor 18        1            5             5              4
                                                                         In this stage, we conducted a controlled lab study for creating
  Factor 19        1            10            3              1           the Target Mental Model. The model is used to identify the key
                                                                         components of the recommendation model that the users might
                                                                         want to be explainable in the UI. Since the goal is to identify the
respondents preferred an interactive interface for perceiving the        information need for new users, we specifically selected subjects
system to be fun, e.g., a controllable interface. We then summarized     who never used the CN3 system. A total of 15 (6 female) participants
the feedback into four factors: 18) The visualization presents the       (N=15) were recruited for this study. They are first, or second-year
feedback from other users, i.e., I can see how others rated the recom-   graduate students (major in information sciences) at the University
mended person. 19) The visualization allows me to tell why does this     of Pittsburgh with age ranged from 20 to 30 (M=25.73, SE=2.89). All
system recommend the person to me. 1) The visualization presents the     participants had no previous experience of using the CN system.
similarity between my interest and the recommended person. 13) The       Each participant received USD$20 compensation and signed an
visualization is a friendly, easy-to-use interface.                      informed consent form.
   Based on the result of the online survey, we concluded a total of        We asked the subjects to complete a card-sorting task about their
19 factors in the second stage of building the user mental model.        preference for the 19 factors we identified in the second stage. We
   (1) The visualization presents the similarity between my interest     started by presenting the CN3 system (shown in Figure 1) to the
       and the recommended person.                                       subjects and introducing the five recommendation models through
   (2) The visualization presents the relationship between the rec-      the Expert Mental Model. After the tutorial, the subjects were asked
       ommended person and me.                                           to do a closed card-sorting that assigns cards into four predefined
   (3) The visualization presents where the data was retrieved.          groups. The four groups are 1) very important; 2) less important; 3)
   (4) The visualization presents more in-depth information on           not important and 4) not relevant.
       how the scores sum up.                                               The survey result is reported in Table 1. We found that for the
   (5) The visualization allows me to see the connections between        target users, factor 1, 13, 16 outperformed other factors: more than
       people and understand how they are connected.                     ten subjects assigned the three factors into the “very important”
   (6) The visualization allows me to understand whether the rec-        group. The factor 2, 6, 10, 12, 14, 15 and 19 formed the secondary
       ommendation is good or not.                                       preference group with at least 10 subject assigning them into “very
   (7) The visualization presents the data for making the recom-         important” or “less important” groups. The subjects least preferred
       mendations.                                                       factor were 3, 7, 11, 18 with at least nine subjects assigning these
   (8) The visualization allows me to compare and decide whether         factors into “not important” or “not relevant” groups.
       the system is correct or wrong.                                      Based on the card-sorting result, we found the user preferred an
   (9) The visualization allows me to explore and then determine         explainable UI is presenting the similarity between his/her interests
       the recommendation quality.                                       and the recommended person (F1). The UI should be friendly and
  (10) The visualization presents a convincing explanation to jus-       easy-to-use (F13) as well as highlighted the items or information
       tify the recommendation.                                          that is strongly related to the user (F16). Besides, some factors are
  (11) The visualization presents the components (e.g., algorithm)       also liked by the subjects. For instance, the UI is presenting the mu-
       that influenced the recommendation.                               tual relationship (F2), shared interests (F12) and recommendation
Designing Explanation Interfaces for Transparency and Beyond                           IUI Workshops’19, March 20, 2019, Los Angeles, USA

    Table 2: The card-sorting results of the fourth stage.                6     FOURTH STAGE: ITERATIVE
                                                                                PROTOTYPING
                                             Not         Total
              R1    R2   R3    R4    R5                                   The fourth stage: interactive prototyping was performed within
                                           Applicable    Votes
                                                                          the same user study as the third stage. After the card-sorting task,
      E1-1    19    25    21   19    44       22          150
                                                                          we asked the subject to identify the chosen ten factors across some
      E1-2    23    37    17   30    26       17          150
                                                                          UI prototypes. A total of 25 interfaces (five interfaces for each
      E1-3     7    16    42   44    19       22          150             recommendation model) were exposed in this stage. We used a
      E1-4    76    32    27    2    0        13          150             within-subject design, i.e., all participants required to do a card-
      E1-5    19    31    33   28    20       19          150             sorting task. In each session, the participants were asked to sort
                                                                          the given five interfaces into groups 1 to 5 (1: Strongly Agree, 5:
      E2-1    12     8    14   21    60        35         150             Strongly Disagree), in each exploratory factor. If one interface is not
      E2-2     6     2     9   73    36        24         150             contributing to the factor, the participant can mark it as irrelevant
      E2-3    24    78    28    7    2         11         150             (not applicable). We continued with a semi-interview after the
      E2-4    86    31    13   11    0         9          150             subject completed each session to collect the qualitative feedback.
      E2-5    13    21    70   14    11        21         150                There were a total of five card-sorting sessions for all five recom-
                                                                          mendation model. At the beginning of each session, we introduced
      E3-1    13     5     9   18    69        36         150             the recommendation model through the Expert Mental Model, i.e.,
      E3-2    37    26    17   36    20        14         150             tell the participant how the similarity is calculated and what data
      E3-3    32    38    29   28    11        12         150             were adopted in this process, to make sure the subject understands
      E3-4    45    41    37   11    0         16         150             the details of each recommendation model. After that, we provided
      E3-5    15    32    41   36    11        15         150             five interface printouts, a paper sheet with a table contains 19 ex-
                                                                          ploratory factors and a pen - the subjects were expected to write
      E4-1     8    11     6   31    64        30         150             down rankings on the paper sheet. All subjects took around 80-100
      E4-2    17    61    48   16    2          6         150             minutes to complete the study.
      E4-3    49    41    41   11    3          5         150
      E4-4    64    28    41    7    1         9          150             6.1    Explaining Publication Similarity
      E4-5     8     5     6   65    46        20         150             The key component of publication similarity is terms and term
                                                                          frequency of the publication as well as its mutual relationship (i.e.,
      E5-1    20     7    13   24    55        31         150             the common terms) between two scholars. We presented four visual
      E5-2    16    22     6   45    36        25         150             interface prototypes (shown in Figure 2) for explaining publication
      E5-3    42    16    44   11    6         31         150             similarity and one text-based interface (E1-1), which simply says
      E5-4    15    49    36   18    4         28         150             “You and [the scholar] have common words in [W1], [W2], [W3].”
      E5-5    40    35    26   20    3         26         150             6.1.1 E1-2: Two-way Bar Chart. The bar chart is a common ap-
                                                                          proach in analyzing the text mining outcome [15] using a histogram
                                                                          of terms and term frequency. We extended the design to a two-way
                                                                          bar chart to show the mutual relationship of two scholars’ publica-
process (F15). The UI should also allow the user to understand (F6)       tion terms and term frequency, i.e., one scholar in positive and the
and justify (F10) the quality of recommendation as well as inspired       other scholar on a negative scale. The design is shown in Figure 2a.
the curiosity of exploration (F14) and recommendation process
(F19). Interestingly, we also found the users were less interested in     6.1.2 E1-3: Word Clouds. Word cloud is a common design in ex-
a UI of presenting the data source (F3) and raw data (F7) as well         plaining text similarity [18]. We adopted the word cloud design from
as the detail of algorithm (F11) and the recommendation feedback          [26], which presented the term in the cloud and the term frequency
from the other users in the same community (F18).                         by the font size. This interface provided two word clouds (one for
    Hence, we decide to filter out the factors that were less preferred   each scholar) so the user can perceive the mutual relationship. The
by the subjects. We choose to keep the factors with more than             design is shown in Figure 2b.
ten votes in the groups of “Very Important” and “Less Important”,
                                                                          6.1.3 E1-4: Venn Word Cloud. Venn diagram was recognized as
which are F1, F2, F6, F10, F12, F13, F14, F15, F16, F19, the chosen
                                                                          an effective hybrid explanation interface by Kouki et al. [8]. This
factors were highlighted in red color in Table 1 We can project
                                                                          interface could be considered as a combination of a word cloud
the factors back to the original explanatory goals. The mentioned
                                                                          and a Venn diagram [22], which presents term frequency using the
percentage of each exploration goal is listed as below: Transparency
                                                                          font size. The unique terms of each scholar are shown in a different
(40%, 2 out of 5), Scrutability (0%, 0 out of 4), Trust (33%, 1 out of
                                                                          color (green and blue) while the common terms are presented in
3), Persuasiveness (67%, 2 out of 3), Effectiveness (33%, 1 out of 3),
                                                                          the middle, with red color, for determining the mutual relationship.
Efficiency (50%, 1 out of 2) and Satisfaction, (75%, 3 out of 4). That
                                                                          The design is shown in Figure 2c.
is, the Target Mental model was built through the exploratory goal
of (rank from high to low importance) Satisfaction, Persuasiveness,       6.1.4 E1-5: Interactive Word Cloud. A word cloud can be interactive.
Efficiency, Transparency, Trust, and Effectiveness.                       We extend the idea from [18] and used Zoomdata Wordcloud [27],
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                    Chun-Hua Tsai and Peter Brusilovsky

                     (a) E1-2: Two-way Bar Chart                                              (b) E1-3: Word Clouds


                      (c) E1-4: Venn Word Cloud                                         (d) E1-5: Interactive Word Cloud


                      Figure 2: The interfaces used to explain the Publication Similarity in the fourth stage.


which follows the common approach to visualize term frequency            6.2.1 E2-2: Topical Words. This interface followed the approach
with the font size. The font color was selected to distinguish the       of [10], which attempted to help users in interpreting the meaning
scholars’ terms, i.e., different term color for each scholar. A slider   of each topic by presented topical words in a table. We adopted
was attached to the bottom of the interface that provides real-time      the idea as E2-2 Topical Words that present the topical words in
interactive functionality to increase or decrease the number of          two multi-column tables (each column contains the top 10 words
terms in the word cloud. The design was shown in Figure 2d.              of each topic). The design is shown in Figure 3a.
6.1.5 Results. The card-sorting result was presented in Table 2.         6.2.2 E2-3: FLAME. This interface followed Wu and Ester [26],
We found the E1-4 Venn Word Cloud was preferred by the partici-          which adopted a bar chart and two word clouds in displaying the
pants, received 76 votes in Rank 1, which was outperformed other         opinion mining result. In their design, each bar would be considered
four interfaces. According to the post-session interview, 13 subjects    as a “sentiment”; then the user can interpret the model by the figure
agreed E1-4 is the best interface versus the other four interfaces.      (for the beta value of topic) and table (for the topical words). We
The supporting reasons can be summarized as 1) the Venn dia-             extended the idea as E2-3: FLAME that showed two sets of research
gram provided common terms in the middle, which highlighted              topics (top 5) and the relevant topic words in two word clouds (one
the common terms and shared relationship; 2) it is useful to show        for each scholar). The design is shown in Figure 3b.
non-overlapping terms on the sides (N=5) and 3) the design is sim-
ple, easy to understand and require less time to process (N=3). Two      6.2.3 E2-4: Topical Radar. The E2-4 Topical Radar was used in Tsai
subjects mentioned they preferred E1-2 the most due to histograms        and Peter [22]. The radar chart was presented in the left. We picked
gives them the “concrete numbers” for “calculating” the similarity,      the top 5 topics (ranked by beta value from a total of 30 topics) of
which was harder when using word clouds.                                 the user and compared them with the examined attendee through
                                                                         the overlay. A table with topical words was presented in the right
6.2    Explaining Topic Similarity                                       so that the user can inspect the context of each research topic. The
                                                                         design is shown in Figure 3c.
The key component of topic similarity is research topics and topi-
cal words of the scholar as well as its mutual relationship (i.e., the   6.2.4 E2-5: Topical Bars. We adopted several bar charts in this in-
common research topics) between two scholars. We presented four          terface as E2-5: Topical Bar. The interface showed top three topics
visual interfaces prototypes (shown in Figure 3) and one text-based      of two scholars (top row and the second row) and the topical infor-
prototype for explaining the topic similarity. The text-based in-        mation (top eight topical words in the y-axis and topic beta value
terface (E2-1) simply says “You and [the scholar] have common            in x-axis) using a bar chart with histograms. The design was shown
research topics on [T1], [T2], [T3].”                                    in Figure 3d.
Designing Explanation Interfaces for Transparency and Beyond                        IUI Workshops’19, March 20, 2019, Los Angeles, USA

                       (a) E2-2: Topical Words                                                 (b) E2-3: FLAME


                       (c) E2-4: Topical Radar                                               (d) E2-5: Topical Bar


                         Figure 3: The interfaces used to explain the Topic Similarity in the fourth stage.


6.2.5 Results. The card-sorting result was presented in Table 2.       6.3.2 E3-3: ForceAtlas2. E3-3: ForceAtlas2 was inspired by Garnett
We found the E2-4 Topical Radar received 86 votes in Rank 1 outper-    et al. [3] that presented Co-authorship graph of NiMCS and re-
forming all other interfaces. E2-3 ended up being second with most     lated research with both high and low-level network structure
votes in the R2 group. According to the post-session interview, 13     and information. Nodes and edges are representing authors and
subjects agreed E2-4 is the best interface among all examined inter-   co-authorship, respectively. Graph layout uses the ForceAtlas2 al-
faces. One subject preferred E2-3, and one subject suggested a mix     gorithm [3]. Clusters are calculated via Louvain modularity and
of E2-3 and E2-4 as the best design. The supporting reasons for E2-4   delineated by color. The frequency of co-authorship is calculated
can be summarized as 1) It is easy to see the relevance through the    via Eigenvector centrality and represented by size. The design was
overlapping area from the Radar chart and the percentage numbers       shown in Figure 4(b).
from the table (N=12). 2) It is informative to compare the shared
                                                                       6.3.3 E3-4: Strength Graph. E3-4 Strength Graph was inspired by
research topics and topical words (N=9).
                                                                       Tsai and Brusilovsky [18] that tried to present the co-authorship
                                                                       network using D3plus network style [9]. Nodes and edges are repre-
                                                                       senting authors and co-authorship, respectively. The edge thickness
6.3    Explaining Co-Authorship Similarity                             is the weighting of the coauthorship (number of co-worked papers).
The key component of co-authorship similarity is coauthors, coau-      The node was assigned different color by their groups, i.e., the orig-
thorship and distance of connections of the scholars as well as its    inal scholar, target scholar and via scholars. The design was shown
mutual relationship (i.e., the connecting path) between two schol-     in Figure 4(c).
ars. We presents the five prototyping interfaces (shown in Figure 4,
                                                                       6.3.4 E3-5: Social Viz. The E3-5 Social Viz was used in [22]. There
E3-1 presented in text below) for explaining publication similarity.
                                                                       were six possible paths (one shortest and five alternatives). The
In addition to four visualized interfaces, we also include one text-
                                                                       user will be presented in the left with a yellow circle. The target
based interface (E3-1). That is, “You and [the scholar] have common
                                                                       user will be presented in the right with red color. The circle size
co-authors, they are [A1], [A2], [A3].”
                                                                       represented the weighting of the scholar, which was determined by
                                                                       the appearing frequency in the six paths. For example, the scholar
                                                                       Peter is the only node that scholar Chu can reach scholar Nav, so
6.3.1 E3-2: Correlation Matrix. E3-2 Correlation matrix was in-
                                                                       the circle size was the largest one (size = 6). The design was shown
spired by Heckel et al. [4] that was used to present overlapping
                                                                       in Figure 4(d).
user-item co-clusters in a scalable and interpretable product recom-
mendation model. We extended the interface to a user-to-user cor-      6.3.5 Results. The card-sorting result was presented in Table 2.
relation matrix that the user can inspect the scholar co-authorship    We found the E3-4 Strength Graph was preferred by the participants,
network. The design was shown in Figure 4(a).                          received 45 votes in Rank 1. However, the votes were close with
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                    Chun-Hua Tsai and Peter Brusilovsky

                     (a) E3-2: Correlation Matrix                                              (b) E3-3: ForceAtlas2


                       (c) E3-4: Strength Graph                                                 (d) E3-5: Social Viz


                     Figure 4: The interfaces used to explain the Co-Authorship Similarity in the fourth stage.


E3-2 Correlation Matrix (37 votes) and E3-3 ForceAtlas2 (32 votes).      the common terms) between two scholars. We presented the five
According to the post-session interview, four subjects agreed E3-4       prototyping interfaces (shown in Figure 5, E4-1 presented in the
is the best interface versus the other four interfaces. The supporting   text below) for explaining publication similarity. In addition to four
reasons were the interface highlighted the mutual relations and          visualized interfaces, we also include one text-based interface (E4-1).
let the user can understand the path between two scholars. The           That is, “You and [the scholar] have common bookmarking, they
arrow and edge thickness were also useful. Two subjects supported        are [P1], [P2], [P3].”
E3-2, they liked the correlation matrix provided a clear number
and correlation information that easier for them to process. Three       6.4.1 E4-2: Similar Keywords. E4-2 Similar Keywords was proposed
subjects supported E3-3, they preferred the interface provided a         and deployed in Conference Navigator [11]. We extended the in-
piece of high-level information by giving a “big picture”. Also, E3-3    terface to explain shared bookmarks between two scholars. The
would be good to explore the coauthorship network beyond the             interface represents the scholars in two sides and the common co-
connecting path, although the interface was reported to be too           bookmarking items (e.g., the five common co-bookmark papers or
complicated as an explanation. Four subjects supported E3-5, they        authors) in the middle. A strong (solid line) or weak (dash line) tie
enjoy the simple, clear and “straightforward” connecting path as         will be used to connect the item was bookmarked by the one-side
the explanation for coauthorship network.                                or two-sides. The design was shown in Figure 5(a).

                                                                         6.4.2 E4-3: Tagsplanations. E4-3 Tagsplanations was proposed by
6.4    Explaining CN3 Interest Similarity                                Vig et al. [24]. The idea is to show both tag, user preference, and
The key component of CN3 interest similarity is papers and authors       relevance that used to recommending movies. We extended the
of the system bookmarking as well as its mutual relationship (i.e.,      interface to explain the co-bookmarking information. In our design,
Designing Explanation Interfaces for Transparency and Beyond                       IUI Workshops’19, March 20, 2019, Los Angeles, USA

                     (a) E4-2: Similar Keywords                                           (b) E4-3: Tagsplanations


                         (c) E4-4: Venn Tags                                               (d) E4-5: Itemized List


                     Figure 5: The interfaces used to explain the CN3 Interest Similarity in the fourth stage.


the co-bookmarked item will be listed and ranked by its social        eight subjects agreed E4-4 is the best interface versus the other
popularity, i.e., how many users have followed/bookmarked the         four interfaces. The supporting reasons can be summarized as 1)
item? The design was shown in Figure 5(b).                            the Venn diagram is more familiar or clear than other interfaces
                                                                      (N=4); 2) The Venn diagram is simple and easy to understand (N=4).
6.4.3 E4-4: Venn Tags. The study of [8] has pointed out the user      Three subjects mentioned they preferred E4-3 the most due to the
preferred the Venn diagram as an explanation in a recommender         interface provide extra attribution, don’t need to hover for detail
system. In the interface of E4-4: Venn Tags, we implemented the       and easy-to-use.
same idea with the bookmarked items. The idea is to present the
bookmarked item, using an icon, in the Venn diagram. The two sides
                                                                      6.5    Explaining Geographic Similarity
are the bookmarked item belong to one party. The co-bookmarked
or co-followed item will be placed in the middle. The users can       The key component of geographic similarity is location and distance
hover the icon for detail information, i.e., paper title or author    of the two scholars as well as their mutual relationship (i.e., the
name. The design was shown in Figure 5(c).                            geographic distance). We presented the five prototyping interfaces
                                                                      (shown in Figure 6, E5-1 presented in the text below) for explaining
6.4.4 E4-5: Itemized List. An itemized list has been adopted to       the geographic similarity. In addition to four visualized interfaces,
explain the bookmark in [21]. We proposed E4-5: Itemized List that    we also include one text-based interface (E5-1). That is, “From
presented the bookmarked or followed items in two lists. The design   [Institution A] to [sample]’s affiliation ([Institution B]) = N miles.”
was shown in Figure 5(d).
                                                                      6.5.1 E5-2: Earth Style. Using Google Map [6] for explaining geo-
6.4.5 Results. The card-sorting result was presented in Table 2.      graphic distance in a social recommender system has been discussed
We found the E4-4 Venn Tags was preferred by the participants,        in Tsai and Brusilovsky [21]. We extended the interface to a different
received 64 votes in Rank 1, which was outperformed all other         style. In E5-2 Earth Style, we “zoom out” the map to an earth surface
four interfaces. E4-4 Venn Tags was also be favored by the subject,   and place the two connected icons (with geographic distance) on
which received 49 votes. According to the post-session interview,     the map. The design was shown in Figure 6(a).
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                     Chun-Hua Tsai and Peter Brusilovsky

                          (a) E5-2: Earth Style                                              (b) E5-3: Navigation Style


                           (c) E5-4: Icon Style                                                 (d) E5-5: Label Style


                       Figure 6: The interfaces used to explain the Geography Similarity in the fourth stage.


6.5.2 E5-3: Navigation Style. E5-3 Navigation Style followed the          7   DISCUSSION AND CONCLUSIONS
same Google Map API (shown in E5-2), but presented navigation             In this work-in-progress paper, we presented a participatory process
between the two locations, either by car or flight. To be noted, the      of bringing explanation interfaces to a social recommender system.
transportation time, i.e., the fly or driving time in E5-2 or E5-3, did   We proposed four stages in responding to the challenge questions in
not be considered in the recommendation model. The design was             identifying the key components of explanation models and mental
shown in Figure 6(b).                                                     models. In the first stage, we discussed the Expert Mental Model by
                                                                          discussing the key components (based on the similarity algorithm)
6.5.3 E5-4: Icon Style. E5-4 Icon Style followed the same Google          of each recommendation model. In the second stage, we reported
Map API (shown in E5-2), but presented two icons on the map               an online survey of current system users (N=14) and identified
without any navigation information. The users can hover to see            19 explanatory goals as the User Mental Model. In the third stage,
the detail affiliation, but the geographic distance information was       we reported the card-sorting results of a controlled user study
not presented. The design was shown in Figure 6(c).                       (N=15) that created the Target Mental Model through the target
                                                                          users’ preference of the explanatory factors.
6.5.4 E5-5: Label Style. E5-4 Label Style followed the same Google           In the fourth stage, we proposed a total of 25 explanation inter-
Map API (shown in E5-2), but presented two labels on the map              faces for five recommendation models and reported the card-sorting
without any navigation information. The users can see the detail          and semi-interview result. We found, in general, the participants
affiliation profile through the floating label without extra clicking     preferred visualization interfaces more than the text-based inter-
or hovering interactions. The design was shown in Figure 6(d).            face. Based on the study, we found E1-4: Venn Word Cloud, E2-4:
                                                                          Topical Radar, E3-4: Strength Graph, E4-4: Venn Tags, E5-3: Navi-
6.5.5 Results. The card-sorting result was presented in Table 2. We       gation Style were preferred by the study participants. We further
found the E5-3 Navigation Style was preferred by the participants,        discussed the top-rated and second-rated explanation interfaces
received 42 votes in Rank 1. However, the votes are close with            and user feedback in each session. Based on the experiment results,
E3-5 Label Style (40 votes). According to the post-session interview,     we concluded the design guideline of bringing the explanation
six subjects agreed E5-3 is the best interface versus the other four      interface to a real-world social recommender system.
interfaces. But there were three subjects particularly mentioned             A further controlled study will be required to test if the proposed
the navigation function was irrelevant in explaining or exploring         explanation interface can achieve the target mental as we identified
the social recommendations. The supporting reasons of E5-3 can            in this paper. In our future works, we plan to implement the top-
be summarized as 1) The map is informative (N=2). 2) It is useful         rated explanation interfaces and deploy those interfaces to the CN3
to see navigation (N=5). Three subjects mentioned they preferred          system. Moreover, we expect to provide the explanation interfaces
E5-5 the most due to the label contains affiliation information that      with an information-seeking task, so we can analyze how and why
they can understand the affiliation without extra actions. Although       does a user adopt the explanation interfaces in exploring the social
there is no geographic distance information, one subject pointed          recommendations.
out he will realize the distance after knowing the affiliation title.
Designing Explanation Interfaces for Transparency and Beyond                                               IUI Workshops’19, March 20, 2019, Los Angeles, USA


REFERENCES                                                                                [15] Julia Silge and David Robinson. 2016. tidytext: Text mining and analysis using
 [1] Malin Eiband, Hanna Schneider, Mark Bilandzic, Julian Fazekas-Con, Mareike                tidy data principles in r. The Journal of Open Source Software 1, 3 (2016), 37.
     Haug, and Heinrich Hussmann. 2018. Bringing Transparency Design into Practice.       [16] Nava Tintarev and Judith Masthoff. 2012. Evaluating the effectiveness of expla-
     In 23rd International Conference on Intelligent User Interfaces. ACM, 211–223.            nations for recommender systems. User Modeling and User-Adapted Interaction
 [2] Gerhard Friedrich and Markus Zanker. 2011. A taxonomy for generating expla-               22, 4-5 (1 Oct. 2012), 399–439.
     nations in recommender systems. AI Magazine 32, 3 (2011), 90–98.                     [17] Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design
 [3] Alex Garnett, Grace Lee, and Judy Illes. 2013. Publication trends in neuroimaging         and evaluation. In Recommender systems handbook. Springer, 353–382.
     of minimally conscious states. PeerJ 1 (2013), e155.                                 [18] Chun-Hua Tsai and Peter Brusilovsky. 2017. Providing Control and Transparency
 [4] Reinhard Heckel, Michail Vlachos, Thomas Parnell, and Celestine Dünner.                   in a Social Recommender System for Academic Conferences. In Proceedings of the
     2017. Scalable and interpretable product recommendations via overlapping                  25th Conference on User Modeling, Adaptation and Personalization. ACM, 313–317.
     co-clustering. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference   [19] Chun-Hua Tsai and Peter Brusilovsky. 2018. Beyond the Ranked List: User-Driven
     on. IEEE, 1033–1044.                                                                      Exploration and Diversification of Social Recommendation. In 23rd International
 [5] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col-             Conference on Intelligent User Interfaces. ACM, 239–250.
     laborative filtering recommendations. In Proceedings of the 2000 ACM conference      [20] Chun-Hua Tsai and Peter Brusilovsky. 2018. Explaining Social Recommendations
     on Computer supported cooperative work. ACM, 241–250.                                     to Casual Users: Design Principles and Opportunities. In Proceedings of the 23rd
 [6] Google Inc. 2018. Google Maps Directions API. https://developers.google.com/              International Conference on Intelligent User Interfaces Companion. ACM, 59.
     maps/documentation/directions/intro                                                  [21] Chun-Hua Tsai and Peter Brusilovsky. 2019. Exploring Social Recommendations
 [7] Bart P. Knijnenburg, Svetlin Bostandjiev, John O’Donovan, and Alfred Kobsa.               with Visual Diversity-Promoting Interfaces. TiiS 1, 1 (2019), 1–1.
     2012. Inspectability and Control in Social Recommenders. In 6th ACM Conference       [22] Chun-Hua Tsai and Brusilovsky Peter. 2019. Explaining Recommendations in an
     on Recommender System. 43–50.                                                             Interactive Hybrid Social Recommender. In Proceedings of the 2019 Conference on
 [8] Pigi Kouki, James Schaffer, Jay Pujara, John O’Donovan, and Lise Getoor. 2017.            Intelligent User Interface. ACM, 1–12.
     User preferences for hybrid explanations. In Proceedings of the Eleventh ACM         [23] Katrien Verbert, Denis Parra, Peter Brusilovsky, and Erik Duval. 2013. Visualizing
     Conference on Recommender Systems. ACM, 84–88.                                            recommendations to support exploration, transparency and controllability. In
 [9] Lawrence. 2018. Customize D3plus network style. https://codepen.io/choznerol/             Proceedings of the 2013 international conference on Intelligent user interfaces. ACM,
     pen/evaYyv                                                                                351–362.
[10] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:            [24] Jesse Vig, Shilad Sen, and John Riedl. 2009. Tagsplanations: explaining recommen-
     understanding rating dimensions with review text. In Proceedings of the 7th ACM           dations using tags. In Proceedings of the 14th international conference on Intelligent
     conference on Recommender systems. ACM, 165–172.                                          user interfaces. ACM, 47–56.
[11] Conference Navigator. 2018. Paper Tuner. http://halley.exp.sis.pitt.edu/cn3/         [25] Yao Wu and Martin Ester. 2015. FLAME: A Probabilistic Model Combining Aspect
     portalindex.php                                                                           Based Opinion Mining and Collaborative Filtering. In Proceedings of the Eighth
[12] Alexis Papadimitriou, Panagiotis Symeonidis, and Yannis Manolopoulos. 2012.               ACM International Conference on Web Search and Data Mining (WSDM ’15). ACM,
     A generalized taxonomy of explanations styles for traditional and social recom-           New York, NY, USA, 199–208. https://doi.org/10.1145/2684822.2685291
     mender systems. Data Mining and Knowledge Discovery 24, 3 (2012), 555–583.           [26] Yao Wu and Martin Ester. 2015. Flame: A probabilistic model combining aspect
[13] Pearl Pu and Li Chen. 2007. Trust-inspiring explanation interfaces for recom-             based opinion mining and collaborative filtering. In Proceedings of the Eighth
     mender systems. Knowledge-Based Systems 20, 6 (2007), 542–556.                            ACM International Conference on Web Search and Data Mining. ACM, 199–208.
[14] Amit Sharma and Dan Cosley. 2013. Do social explanations work?: studying             [27] Zoomdata. 2018.               Real-time Interactive Zoomdata Wordcloud.
     and modeling the effects of social explanations in recommender systems. In                https://visual.ly/community/interactive-graphic/social-media/
     Proceedings of the 22nd international conference on World Wide Web. ACM, 1133–            real-time-interactive-zoomdata-wordcloud
     1144.