Designing Explanation Interfaces for Transparency and Beyond Chun-Hua Tsai Peter Brusilovsky University of Pittsburgh University of Pittsburgh Pittsburgh, USA Pittsburgh, USA cht77@pitt.edu peterb@pitt.edu ABSTRACT achieves all these goals equally well, the designer needs to make In this work-in-progress paper, we presented a participatory process a trade-off while choosing or designing the form of interface [17]. of designing explanation interfaces for a social recommender sys- For instance, an interactive interface can be adapted to increase tem with multiple explanatory goals. We went through four stages the user trust and satisfaction but may prolong the decision and to identify the key components of the recommendation model, ex- explore process while using the system (i.e., lead to decreasing of pert mental model, user mental model, and target mental model. efficiency) [19]. We reported the results of an online survey of current system users Over the past few years, several approaches have been discussed (N=14) and a controlled user study with a group of target users to enhance the explainability in the recommender systems. The ap- (N=15). Based on the findings, we proposed five set of explanation proaches can be summarized by different styles, reasoning models, interfaces for five recommendation models (N=25) and discussed paradigms and information [2]. 1) Styles: Kouki et al. [8] conducted the user preference of the interface prototypes. an online user survey to explore the user preference in nine expla- nation styles. They found Venn diagrams outperformed all other CCS CONCEPTS visual and text-based interfaces. 2) Reasoning Models: Vig et al. [24] used tags to explain the recommended item and the user’s • Information systems → Recommender systems; • Human- profile. The approach emphasized the factor of why a specific rec- centered computing → HCI design and evaluation methods. ommendation is plausible, instead of revealing the process of rec- ommendation or data. 3) Paradigms: Herlocker et al. [5] presented KEYWORDS a model for explanations based on the user’s conceptual model of Social Recommendation; Explanation; Mental Model; User Interface the collaborative-based recommendation process. The result of the ACM Reference Format: evaluation indicates two interfaces - “Histogram with grouping” Chun-Hua Tsai and Peter Brusilovsky. 2019. Designing Explanation Inter- and “Presenting past performance” - improved the acceptance of faces for Transparency and Beyond. In Joint Proceedings of the ACM IUI 2019 recommendations. 4) Information: Pu and Chen [13] proposed ex- Workshops, Los Angeles, USA, March 20, 2019, 11 pages. planations tailored to the user and recommendation, i.e., although one recommendation is not the most popular one, the explanation 1 INTRODUCTION would justify the recommendation by providing the reasons. Enhancing explainability in recommender systems has drawn more Although many approaches have been proposed to enhance the and more attention in the field of Human-Computer Interaction recommender explainability, bringing explanation interfaces to an (HCI). Further, the newly initiated European Union’s General Data existing recommender system is still a challenging task. More re- Protection Regulation (GDPR) required the owner of any data- cently Eiband et al. [1] suggested a different approach to improve driven application to maintain a “right to the explanation” of al- user mental model (UMM) while bringing transparency (explana- gorithmic decisions [1], which urging to gain transparency in all tions) to a recommender system. The model described the process of existing intelligent systems. Self-explainable recommender systems a user builds an internal conceptualization of the system or interface have been proved to gain user perception on system transparency along with user-system interactions, i.e., building the knowledge [17], trust [13] and accepting the system suggestions [7]. Instead of of how to interact with the system. If the model is misguided or the offline performance improvements, more and more researches opaque, the users will face difficulties in predicting or interpret- focused on the works of evaluating the system from the user experi- ing the system [1]. Hence, the researchers suggested to improve ence, i.e., what is the user perception on the explanation interfaces? the mental model, so the users can gain awareness while using the Explaining recommendations (i.e., enhancing the system explain- system as well as the explanation interfaces. ability) can achieve different explanatory goals which help users to In this work-in-progress paper, we presented a stage-based par- make a better decision or persuading them to accept the sugges- ticipatory process [1] for integrating seven exploratory goals into tions from a system [14, 16]. We followed the seven explanatory real-world hybrid social recommender system. First, we introduced goals that proposed by Tintarev and Masthoff [17]: Transparency, the Expert Mental Model to summarize the key components of each Scrutability, Trust, Persuasiveness, Effectiveness, Efficiency, and Sat- recommendation feature. Second, we conducted an online survey isfaction. Since it is hard to have a single explanation interface that to identify the User Mental Model of seven explanatory goals from the current system users. Third, we did a user study with card- IUI Workshops’19, March 20, 2019, Los Angeles, USA sorting and semi-interview to determine the user’s Target User Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted Model. Fourth, we proposed a total of 25 explanation interfaces for for private and academic purposes. This volume is published and copyrighted by its five recommendation features and compared the user perceptions editors. across designs. IUI Workshops’19, March 20, 2019, Los Angeles, USA Chun-Hua Tsai and Peter Brusilovsky Figure 1: Relevance Tuner+: (A) relevance sliders; (B) stackable score bar; (C) explanation icon; (D) user profiles. The interface supports the user-driven exploration of recommended items in Section A and inspects the fusion in Section B. The user can further inspect the explanation model by clicking Section C, and more profile detail is presented in section D. Our goal is to provide an explanation interfaces for each explanation model. (The scholar names have been pixelated for privacy protection) 2 BACKGROUND 3 FIRST STAGE: EXPERT MENTAL MODEL We adopted the stage-based participatory framework from Eiband Instead of interactive recommender [7, 23], we attached an expla- et al. [1], which intends to answer two key questions while design- nation icon next to each social recommendation. The users have ing the explainable user interface (UI): a) What to Explain? And a choice of requesting the explanations while exploring or brows- b) How to explain? The process can be summarized in four stages. ing the recommendations. We adopted a hybrid explanation ap- 1) Expert Mental Model: What can be explained? We defined an proach [8, 12], which mixed multiple visualizations to explain the expert as the recommender system developer. 2) User Mental Model: details of the recommendation model. We would like to let the users What is the user mental model of the system based on its current understand both a) the mutual relationship (similarity) between UI? The model should be built through the current recommender him/herself and the recommended scholar and b) the key compo- system users. 3) Target Mental Model: Which key components of nent in each recommendation model. We then discussed the Expert the algorithm do users want to be made explainable in the UI? The Mental Model through the system developing process of the five target user is the users who are new to the system. 4) Iterative Pro- recommendation models. totyping: How can the target mental model be reached through UI 1) Publication Similarity: The similarity was determined by design. The key is to measure if the proposed explanation interfaces the degree of text similarity between two scholars’ publications achieved the explanatory goals. using cosine similarity. We applied tf-idf to create the vector with In this work, we aimed to enhance the explainability in a confer- a word frequency upper bound of 0.5 and a lower bound of 0.01 to ence support system - Conference Navigator 3 (CN3). The system eliminate both common and rarely used words. In this model, the has been used to support more than 45 conferences at the time key components were the terms of the paper title and abstract as of writing this paper and has data on approximately 7,045 articles well as its term frequency. presented at these conferences; 13,055 authors; 7,407 attendees; 2) Topic Similarity: This similarity was determined by match- 32,461 bookmarks; and 1,565 social connections. Our work was in- ing research interests using topic modeling. We used latent Dirichlet formed by the results of a controlled user study where we explored allocation (LDA) to attribute collected terms from publications to an earlier version of the social recommender interface Relevance one of the topics. We chose 30 topics to build the topic model for all Tuner [19] (shown in Figure 1). It was a controllable interface for scholars. Based on the model, we then calculated the topic similarity the user to fuse weightings of multiple recommendation models between any two scholars. The key components were the research and to inspect the explanations. topics and the topical words of each research topic [25]. A total of five recommendation models were introduced in this 3) Co-Authorship Similarity: This similarity approximated study: 1) Publication Similarity: the degree of cosine similarity of the network distance between the source and recommended users. users’ publication text. 2) Topic Similarity: the overlap of research For each pair of the scholar, we tried to find six possible paths for interests (using topic modeling). 3) Co-Authorship Similarity: the connecting them, based on their coauthorship relationships. The degree of connection, based on a shared network of co-authors. network distance is determined by the average distance of the six 4) Interest Similarity: the number of papers co-bookmarked, as paths. The key components were the coauthors (as nodes), coau- well as the authors co-followed. 5) Geographic Distance: a mea- thorship (as edges) and the distance of connection the two scholars. surement of the geographic distance between affiliations. Based 4) CN3 Interest Similarity: This similarity was determined by on the stage-based participatory framework, we went through the the number of co-bookmarked conference papers and co-connected same four stages for each recommendation model to identify the authors in the experimental social system (CN3). We simply used user-preferred user interface design. We aimed to design expla- the number of shared items as the CN3 interest similarity. The key nation interfaces for each recommendation model with multiple component is the shared conference papers and authors. exploratory goals. Designing Explanation Interfaces for Transparency and Beyond IUI Workshops’19, March 20, 2019, Los Angeles, USA 5) Geographic Distance: This similarity was a measurement of 3) Trust: 28% of respondents mentioned that they trusted the the geographic distance between attendees. We retrieved longitude system more when they perceived the benefits of using the sys- and latitude data based on attendees’ affiliation information. We tem. 35% of respondents preferred to trust a system with reliable used the Haversine formula to compute the geographic distance and informative explanations, more detailed information or un- between scholars. The key components are the geographic distance derstandable. 35% of respondents mentioned they trust a system and affiliation information of the scholars. with transparency or passed their verification. We then summa- rized the feedback into three factors: 10) The visualization presents a convincing explanation to justify the recommendation. 11) The visu- 4 SECOND STAGE: USER MENTAL MODEL alization presents the components (e.g., algorithm) that influenced the As a first step towards understanding the design factors of explana- recommendation. 5) The visualization allows me to see the connections tory interfaces, we deployed a survey through a social recommender between people and understand how they are connected. system, Conference Navigator [18], and analyzed data from the re- 4) Persuasiveness: Half of the respondents mentioned the ex- spondents. We targeted the users who had created an account and planation of social familiarity would persuade them to explore novel interacted with the system in their previous conference attendance social connections; namely, when shown social context details or (at least using the system for one conference). The survey was shared interests. 21% of respondents indicated that an informative initiated by sending an invitation to the qualified users in Decem- interface could boost the exploration of new friendship. 28% of ber 2017. We sent out 89 letters to the conference attendees of respondents preferred a design that inspired curiosity, implicit rela- UMAP/HT 2016, and a total of 14 participants (7 female) replied to tionships. We then summarized the feedback into three factors: 12) create the pool of participants for the user study. The participants The visualization shows me the shared interests, i.e., why my interests were from 13 different countries; their ages ranged from 20 to 40 are aligned with the recommended person. 13) The visualization has (M=31.36, SE=5.04). We did an online survey to collect necessary a friendly, easy-to-use interface. 14) The visualization inspired my demographic information and self-reflection about how to design curiosity (to discover more information). an explanation function in seven explanatory goals [17]. 5) Effectiveness: 64% of respondents mentioned that the as- The proposed questions were: How can an explanation function pects of social recommendation relevance helped them to make a help you to perceive system 1) Transparency - explain how the good decision. The aspect included explaining the recommendation system works? 2) Scrutability - allow you to tell the system it process, understandable or more informative. 28% of respondents is wrong? 3) Trust - increase your confidence in the system? 4) suggested a reminder that a historical or successful decision could Persuasiveness - convince you to explore or to follow new friends? help them to make a good decision, i.e., a previously-made user 5) Effectiveness - help you make good decisions? 6) Efficiency - decision and success stories. We then summarized the feedback help you to make decisions faster? 7) Satisfaction - make using the into three factors: 15) The visualization presents the recommenda- system fun and useful? We asked the participants to answer each tion process. 5) The visualization allows me to see the connections question in 50-100 words, in particular reflecting the explanatory between people and understand how they are connected. 11) The vi- goals of the social recommendation. The data was published in [20]. sualization presents the components (e.g., algorithm) that influenced 1) Transparency: 71% of respondents pointed out the reasons of the recommendation. generated social recommendation that help them to perceive higher 6) Efficiency: 28% of respondents mentioned that a proper high- system transparency, i.e., the personalized explanation, the linkage lighting of the recommendation helped to make the decision faster. and data sources, reasoning method and understandability. We For example, they are emphasizing the relatedness, identifying the then summarized the feedback into five factors: 1) The visualization top recommendations or providing success stories. 28% of respon- presents the similarity between my interest and the recommended dents preferred a tune-able or visualized interface to accelerate the person. 2) The visualization presents the relationship between the decision process, such as tuning the recommendation features, visu- recommended person and me. 3) The visualization presents where did alizing the recommendations. However, the explanations may not the data were retrieved. 4) The visualization presents more in-depth always be useful. 21% of respondents argued that the explanation information on how the score amounts up. 5) The visualization allows would prolong the decision process instead of speeding it up: the me to see the connections between people and understand how they user may need to take extra time to examine the explanations. We are connected. then summarized the feedback into two factors: 16) The visualiza- 2) Scrutability: Half of the respondents mentioned they needed tion presents highlighted items/information that is strongly related to “inspectable details” to figure out the wrong recommendation. 35% me. 17) The visualization presents aggregated, non-obvious relations of respondents suggested the mechanism of accepting user feedback to me. on improving wrong recommendations, such as a space to submit 7) Satisfaction: The feedback on how an explanation can help user ratings or yes/no options. 14% of respondents preferred a the user satisfy the system was varied. Three aspects received an dynamic exploration process to determine the recommendation equal 7% of respondents’ preferences. That is, users preferred to quality. We then summarized the feedback into four factors: 6) The view the feedback from the community, shown the historical inter- visualization allows me to understand whether the recommendation action record and provided a personalized explanation. Two aspects is good or not. 7) The visualization presents the data for making the received an equal 14% of respondents’ preference; i.e., a focus on a recommendations. 8) The visualization allows me to compare and friendly user interface and saved decision time. 21% of respondents decide whether the system is correct or wrong. 9) The visualization reported a higher satisfaction on using the explanation as a “small allows me to explore and then determine the recommendation quality. talk topic”, i.e., as an initial conversation in a conference. 28% of IUI Workshops’19, March 20, 2019, Los Angeles, USA Chun-Hua Tsai and Peter Brusilovsky Table 1: The card-sorting results of the third stage. (12) The visualization shows me the shared interests, i.e., why my interests are aligned with the recommended person. Very Less Not Not (13) The visualization has a friendly, easy-to-use interface Important Important Important Relevant (14) The visualization inspired my curiosity (to discover more Factor 1 11 1 3 0 information). Factor 2 9 5 1 0 (15) The visualization presents the recommendation process clearly. Factor 3 0 2 10 3 (16) The visualization presents highlighted items/information Factor 4 1 8 3 3 that is strongly related to me. Factor 5 5 4 6 0 (17) The visualization presents aggregated, non-obvious relations Factor 6 7 6 2 0 to me. Factor 7 3 2 9 1 (18) The visualization presents feedback from other users, i.e., I Factor 8 4 3 3 5 can see how others rated a recommended person. Factor 9 7 2 4 2 (19) The visualization allows me to tell why does this system Factor 10 3 9 2 1 recommend the person to me. Factor 11 0 6 6 3 We also found some factors across different exploratory goals. Factor 12 4 6 5 0 For example, Factor 1 were shared by the exploratory goal of Trans- Factor 13 13 2 0 0 parency and Satisfaction. Factor 5 were shared by Transparency, Factor 14 0 13 2 0 Trust and Effectiveness. Factor 11 was shared by Trust and Effective- Factor 15 4 7 3 1 ness. Factor 13 was shared by Persuasiveness and Satisfaction. Factor 16 10 5 0 0 Factor 17 3 6 3 3 5 THIRD STAGE: TARGET MENTAL MODEL Factor 18 1 5 5 4 In this stage, we conducted a controlled lab study for creating Factor 19 1 10 3 1 the Target Mental Model. The model is used to identify the key components of the recommendation model that the users might want to be explainable in the UI. Since the goal is to identify the respondents preferred an interactive interface for perceiving the information need for new users, we specifically selected subjects system to be fun, e.g., a controllable interface. We then summarized who never used the CN3 system. A total of 15 (6 female) participants the feedback into four factors: 18) The visualization presents the (N=15) were recruited for this study. They are first, or second-year feedback from other users, i.e., I can see how others rated the recom- graduate students (major in information sciences) at the University mended person. 19) The visualization allows me to tell why does this of Pittsburgh with age ranged from 20 to 30 (M=25.73, SE=2.89). All system recommend the person to me. 1) The visualization presents the participants had no previous experience of using the CN system. similarity between my interest and the recommended person. 13) The Each participant received USD$20 compensation and signed an visualization is a friendly, easy-to-use interface. informed consent form. Based on the result of the online survey, we concluded a total of We asked the subjects to complete a card-sorting task about their 19 factors in the second stage of building the user mental model. preference for the 19 factors we identified in the second stage. We (1) The visualization presents the similarity between my interest started by presenting the CN3 system (shown in Figure 1) to the and the recommended person. subjects and introducing the five recommendation models through (2) The visualization presents the relationship between the rec- the Expert Mental Model. After the tutorial, the subjects were asked ommended person and me. to do a closed card-sorting that assigns cards into four predefined (3) The visualization presents where the data was retrieved. groups. The four groups are 1) very important; 2) less important; 3) (4) The visualization presents more in-depth information on not important and 4) not relevant. how the scores sum up. The survey result is reported in Table 1. We found that for the (5) The visualization allows me to see the connections between target users, factor 1, 13, 16 outperformed other factors: more than people and understand how they are connected. ten subjects assigned the three factors into the “very important” (6) The visualization allows me to understand whether the rec- group. The factor 2, 6, 10, 12, 14, 15 and 19 formed the secondary ommendation is good or not. preference group with at least 10 subject assigning them into “very (7) The visualization presents the data for making the recom- important” or “less important” groups. The subjects least preferred mendations. factor were 3, 7, 11, 18 with at least nine subjects assigning these (8) The visualization allows me to compare and decide whether factors into “not important” or “not relevant” groups. the system is correct or wrong. Based on the card-sorting result, we found the user preferred an (9) The visualization allows me to explore and then determine explainable UI is presenting the similarity between his/her interests the recommendation quality. and the recommended person (F1). The UI should be friendly and (10) The visualization presents a convincing explanation to jus- easy-to-use (F13) as well as highlighted the items or information tify the recommendation. that is strongly related to the user (F16). Besides, some factors are (11) The visualization presents the components (e.g., algorithm) also liked by the subjects. For instance, the UI is presenting the mu- that influenced the recommendation. tual relationship (F2), shared interests (F12) and recommendation Designing Explanation Interfaces for Transparency and Beyond IUI Workshops’19, March 20, 2019, Los Angeles, USA Table 2: The card-sorting results of the fourth stage. 6 FOURTH STAGE: ITERATIVE PROTOTYPING Not Total R1 R2 R3 R4 R5 The fourth stage: interactive prototyping was performed within Applicable Votes the same user study as the third stage. After the card-sorting task, E1-1 19 25 21 19 44 22 150 we asked the subject to identify the chosen ten factors across some E1-2 23 37 17 30 26 17 150 UI prototypes. A total of 25 interfaces (five interfaces for each E1-3 7 16 42 44 19 22 150 recommendation model) were exposed in this stage. We used a E1-4 76 32 27 2 0 13 150 within-subject design, i.e., all participants required to do a card- E1-5 19 31 33 28 20 19 150 sorting task. In each session, the participants were asked to sort the given five interfaces into groups 1 to 5 (1: Strongly Agree, 5: E2-1 12 8 14 21 60 35 150 Strongly Disagree), in each exploratory factor. If one interface is not E2-2 6 2 9 73 36 24 150 contributing to the factor, the participant can mark it as irrelevant E2-3 24 78 28 7 2 11 150 (not applicable). We continued with a semi-interview after the E2-4 86 31 13 11 0 9 150 subject completed each session to collect the qualitative feedback. E2-5 13 21 70 14 11 21 150 There were a total of five card-sorting sessions for all five recom- mendation model. At the beginning of each session, we introduced E3-1 13 5 9 18 69 36 150 the recommendation model through the Expert Mental Model, i.e., E3-2 37 26 17 36 20 14 150 tell the participant how the similarity is calculated and what data E3-3 32 38 29 28 11 12 150 were adopted in this process, to make sure the subject understands E3-4 45 41 37 11 0 16 150 the details of each recommendation model. After that, we provided E3-5 15 32 41 36 11 15 150 five interface printouts, a paper sheet with a table contains 19 ex- ploratory factors and a pen - the subjects were expected to write E4-1 8 11 6 31 64 30 150 down rankings on the paper sheet. All subjects took around 80-100 E4-2 17 61 48 16 2 6 150 minutes to complete the study. E4-3 49 41 41 11 3 5 150 E4-4 64 28 41 7 1 9 150 6.1 Explaining Publication Similarity E4-5 8 5 6 65 46 20 150 The key component of publication similarity is terms and term frequency of the publication as well as its mutual relationship (i.e., E5-1 20 7 13 24 55 31 150 the common terms) between two scholars. We presented four visual E5-2 16 22 6 45 36 25 150 interface prototypes (shown in Figure 2) for explaining publication E5-3 42 16 44 11 6 31 150 similarity and one text-based interface (E1-1), which simply says E5-4 15 49 36 18 4 28 150 “You and [the scholar] have common words in [W1], [W2], [W3].” E5-5 40 35 26 20 3 26 150 6.1.1 E1-2: Two-way Bar Chart. The bar chart is a common ap- proach in analyzing the text mining outcome [15] using a histogram of terms and term frequency. We extended the design to a two-way bar chart to show the mutual relationship of two scholars’ publica- process (F15). The UI should also allow the user to understand (F6) tion terms and term frequency, i.e., one scholar in positive and the and justify (F10) the quality of recommendation as well as inspired other scholar on a negative scale. The design is shown in Figure 2a. the curiosity of exploration (F14) and recommendation process (F19). Interestingly, we also found the users were less interested in 6.1.2 E1-3: Word Clouds. Word cloud is a common design in ex- a UI of presenting the data source (F3) and raw data (F7) as well plaining text similarity [18]. We adopted the word cloud design from as the detail of algorithm (F11) and the recommendation feedback [26], which presented the term in the cloud and the term frequency from the other users in the same community (F18). by the font size. This interface provided two word clouds (one for Hence, we decide to filter out the factors that were less preferred each scholar) so the user can perceive the mutual relationship. The by the subjects. We choose to keep the factors with more than design is shown in Figure 2b. ten votes in the groups of “Very Important” and “Less Important”, 6.1.3 E1-4: Venn Word Cloud. Venn diagram was recognized as which are F1, F2, F6, F10, F12, F13, F14, F15, F16, F19, the chosen an effective hybrid explanation interface by Kouki et al. [8]. This factors were highlighted in red color in Table 1 We can project interface could be considered as a combination of a word cloud the factors back to the original explanatory goals. The mentioned and a Venn diagram [22], which presents term frequency using the percentage of each exploration goal is listed as below: Transparency font size. The unique terms of each scholar are shown in a different (40%, 2 out of 5), Scrutability (0%, 0 out of 4), Trust (33%, 1 out of color (green and blue) while the common terms are presented in 3), Persuasiveness (67%, 2 out of 3), Effectiveness (33%, 1 out of 3), the middle, with red color, for determining the mutual relationship. Efficiency (50%, 1 out of 2) and Satisfaction, (75%, 3 out of 4). That The design is shown in Figure 2c. is, the Target Mental model was built through the exploratory goal of (rank from high to low importance) Satisfaction, Persuasiveness, 6.1.4 E1-5: Interactive Word Cloud. A word cloud can be interactive. Efficiency, Transparency, Trust, and Effectiveness. We extend the idea from [18] and used Zoomdata Wordcloud [27], IUI Workshops’19, March 20, 2019, Los Angeles, USA Chun-Hua Tsai and Peter Brusilovsky (a) E1-2: Two-way Bar Chart (b) E1-3: Word Clouds (c) E1-4: Venn Word Cloud (d) E1-5: Interactive Word Cloud Figure 2: The interfaces used to explain the Publication Similarity in the fourth stage. which follows the common approach to visualize term frequency 6.2.1 E2-2: Topical Words. This interface followed the approach with the font size. The font color was selected to distinguish the of [10], which attempted to help users in interpreting the meaning scholars’ terms, i.e., different term color for each scholar. A slider of each topic by presented topical words in a table. We adopted was attached to the bottom of the interface that provides real-time the idea as E2-2 Topical Words that present the topical words in interactive functionality to increase or decrease the number of two multi-column tables (each column contains the top 10 words terms in the word cloud. The design was shown in Figure 2d. of each topic). The design is shown in Figure 3a. 6.1.5 Results. The card-sorting result was presented in Table 2. 6.2.2 E2-3: FLAME. This interface followed Wu and Ester [26], We found the E1-4 Venn Word Cloud was preferred by the partici- which adopted a bar chart and two word clouds in displaying the pants, received 76 votes in Rank 1, which was outperformed other opinion mining result. In their design, each bar would be considered four interfaces. According to the post-session interview, 13 subjects as a “sentiment”; then the user can interpret the model by the figure agreed E1-4 is the best interface versus the other four interfaces. (for the beta value of topic) and table (for the topical words). We The supporting reasons can be summarized as 1) the Venn dia- extended the idea as E2-3: FLAME that showed two sets of research gram provided common terms in the middle, which highlighted topics (top 5) and the relevant topic words in two word clouds (one the common terms and shared relationship; 2) it is useful to show for each scholar). The design is shown in Figure 3b. non-overlapping terms on the sides (N=5) and 3) the design is sim- ple, easy to understand and require less time to process (N=3). Two 6.2.3 E2-4: Topical Radar. The E2-4 Topical Radar was used in Tsai subjects mentioned they preferred E1-2 the most due to histograms and Peter [22]. The radar chart was presented in the left. We picked gives them the “concrete numbers” for “calculating” the similarity, the top 5 topics (ranked by beta value from a total of 30 topics) of which was harder when using word clouds. the user and compared them with the examined attendee through the overlay. A table with topical words was presented in the right 6.2 Explaining Topic Similarity so that the user can inspect the context of each research topic. The design is shown in Figure 3c. The key component of topic similarity is research topics and topi- cal words of the scholar as well as its mutual relationship (i.e., the 6.2.4 E2-5: Topical Bars. We adopted several bar charts in this in- common research topics) between two scholars. We presented four terface as E2-5: Topical Bar. The interface showed top three topics visual interfaces prototypes (shown in Figure 3) and one text-based of two scholars (top row and the second row) and the topical infor- prototype for explaining the topic similarity. The text-based in- mation (top eight topical words in the y-axis and topic beta value terface (E2-1) simply says “You and [the scholar] have common in x-axis) using a bar chart with histograms. The design was shown research topics on [T1], [T2], [T3].” in Figure 3d. Designing Explanation Interfaces for Transparency and Beyond IUI Workshops’19, March 20, 2019, Los Angeles, USA (a) E2-2: Topical Words (b) E2-3: FLAME (c) E2-4: Topical Radar (d) E2-5: Topical Bar Figure 3: The interfaces used to explain the Topic Similarity in the fourth stage. 6.2.5 Results. The card-sorting result was presented in Table 2. 6.3.2 E3-3: ForceAtlas2. E3-3: ForceAtlas2 was inspired by Garnett We found the E2-4 Topical Radar received 86 votes in Rank 1 outper- et al. [3] that presented Co-authorship graph of NiMCS and re- forming all other interfaces. E2-3 ended up being second with most lated research with both high and low-level network structure votes in the R2 group. According to the post-session interview, 13 and information. Nodes and edges are representing authors and subjects agreed E2-4 is the best interface among all examined inter- co-authorship, respectively. Graph layout uses the ForceAtlas2 al- faces. One subject preferred E2-3, and one subject suggested a mix gorithm [3]. Clusters are calculated via Louvain modularity and of E2-3 and E2-4 as the best design. The supporting reasons for E2-4 delineated by color. The frequency of co-authorship is calculated can be summarized as 1) It is easy to see the relevance through the via Eigenvector centrality and represented by size. The design was overlapping area from the Radar chart and the percentage numbers shown in Figure 4(b). from the table (N=12). 2) It is informative to compare the shared 6.3.3 E3-4: Strength Graph. E3-4 Strength Graph was inspired by research topics and topical words (N=9). Tsai and Brusilovsky [18] that tried to present the co-authorship network using D3plus network style [9]. Nodes and edges are repre- senting authors and co-authorship, respectively. The edge thickness 6.3 Explaining Co-Authorship Similarity is the weighting of the coauthorship (number of co-worked papers). The key component of co-authorship similarity is coauthors, coau- The node was assigned different color by their groups, i.e., the orig- thorship and distance of connections of the scholars as well as its inal scholar, target scholar and via scholars. The design was shown mutual relationship (i.e., the connecting path) between two schol- in Figure 4(c). ars. We presents the five prototyping interfaces (shown in Figure 4, 6.3.4 E3-5: Social Viz. The E3-5 Social Viz was used in [22]. There E3-1 presented in text below) for explaining publication similarity. were six possible paths (one shortest and five alternatives). The In addition to four visualized interfaces, we also include one text- user will be presented in the left with a yellow circle. The target based interface (E3-1). That is, “You and [the scholar] have common user will be presented in the right with red color. The circle size co-authors, they are [A1], [A2], [A3].” represented the weighting of the scholar, which was determined by the appearing frequency in the six paths. For example, the scholar Peter is the only node that scholar Chu can reach scholar Nav, so 6.3.1 E3-2: Correlation Matrix. E3-2 Correlation matrix was in- the circle size was the largest one (size = 6). The design was shown spired by Heckel et al. [4] that was used to present overlapping in Figure 4(d). user-item co-clusters in a scalable and interpretable product recom- mendation model. We extended the interface to a user-to-user cor- 6.3.5 Results. The card-sorting result was presented in Table 2. relation matrix that the user can inspect the scholar co-authorship We found the E3-4 Strength Graph was preferred by the participants, network. The design was shown in Figure 4(a). received 45 votes in Rank 1. However, the votes were close with IUI Workshops’19, March 20, 2019, Los Angeles, USA Chun-Hua Tsai and Peter Brusilovsky (a) E3-2: Correlation Matrix (b) E3-3: ForceAtlas2 (c) E3-4: Strength Graph (d) E3-5: Social Viz Figure 4: The interfaces used to explain the Co-Authorship Similarity in the fourth stage. E3-2 Correlation Matrix (37 votes) and E3-3 ForceAtlas2 (32 votes). the common terms) between two scholars. We presented the five According to the post-session interview, four subjects agreed E3-4 prototyping interfaces (shown in Figure 5, E4-1 presented in the is the best interface versus the other four interfaces. The supporting text below) for explaining publication similarity. In addition to four reasons were the interface highlighted the mutual relations and visualized interfaces, we also include one text-based interface (E4-1). let the user can understand the path between two scholars. The That is, “You and [the scholar] have common bookmarking, they arrow and edge thickness were also useful. Two subjects supported are [P1], [P2], [P3].” E3-2, they liked the correlation matrix provided a clear number and correlation information that easier for them to process. Three 6.4.1 E4-2: Similar Keywords. E4-2 Similar Keywords was proposed subjects supported E3-3, they preferred the interface provided a and deployed in Conference Navigator [11]. We extended the in- piece of high-level information by giving a “big picture”. Also, E3-3 terface to explain shared bookmarks between two scholars. The would be good to explore the coauthorship network beyond the interface represents the scholars in two sides and the common co- connecting path, although the interface was reported to be too bookmarking items (e.g., the five common co-bookmark papers or complicated as an explanation. Four subjects supported E3-5, they authors) in the middle. A strong (solid line) or weak (dash line) tie enjoy the simple, clear and “straightforward” connecting path as will be used to connect the item was bookmarked by the one-side the explanation for coauthorship network. or two-sides. The design was shown in Figure 5(a). 6.4.2 E4-3: Tagsplanations. E4-3 Tagsplanations was proposed by 6.4 Explaining CN3 Interest Similarity Vig et al. [24]. The idea is to show both tag, user preference, and The key component of CN3 interest similarity is papers and authors relevance that used to recommending movies. We extended the of the system bookmarking as well as its mutual relationship (i.e., interface to explain the co-bookmarking information. In our design, Designing Explanation Interfaces for Transparency and Beyond IUI Workshops’19, March 20, 2019, Los Angeles, USA (a) E4-2: Similar Keywords (b) E4-3: Tagsplanations (c) E4-4: Venn Tags (d) E4-5: Itemized List Figure 5: The interfaces used to explain the CN3 Interest Similarity in the fourth stage. the co-bookmarked item will be listed and ranked by its social eight subjects agreed E4-4 is the best interface versus the other popularity, i.e., how many users have followed/bookmarked the four interfaces. The supporting reasons can be summarized as 1) item? The design was shown in Figure 5(b). the Venn diagram is more familiar or clear than other interfaces (N=4); 2) The Venn diagram is simple and easy to understand (N=4). 6.4.3 E4-4: Venn Tags. The study of [8] has pointed out the user Three subjects mentioned they preferred E4-3 the most due to the preferred the Venn diagram as an explanation in a recommender interface provide extra attribution, don’t need to hover for detail system. In the interface of E4-4: Venn Tags, we implemented the and easy-to-use. same idea with the bookmarked items. The idea is to present the bookmarked item, using an icon, in the Venn diagram. The two sides 6.5 Explaining Geographic Similarity are the bookmarked item belong to one party. The co-bookmarked or co-followed item will be placed in the middle. The users can The key component of geographic similarity is location and distance hover the icon for detail information, i.e., paper title or author of the two scholars as well as their mutual relationship (i.e., the name. The design was shown in Figure 5(c). geographic distance). We presented the five prototyping interfaces (shown in Figure 6, E5-1 presented in the text below) for explaining 6.4.4 E4-5: Itemized List. An itemized list has been adopted to the geographic similarity. In addition to four visualized interfaces, explain the bookmark in [21]. We proposed E4-5: Itemized List that we also include one text-based interface (E5-1). That is, “From presented the bookmarked or followed items in two lists. The design [Institution A] to [sample]’s affiliation ([Institution B]) = N miles.” was shown in Figure 5(d). 6.5.1 E5-2: Earth Style. Using Google Map [6] for explaining geo- 6.4.5 Results. The card-sorting result was presented in Table 2. graphic distance in a social recommender system has been discussed We found the E4-4 Venn Tags was preferred by the participants, in Tsai and Brusilovsky [21]. We extended the interface to a different received 64 votes in Rank 1, which was outperformed all other style. In E5-2 Earth Style, we “zoom out” the map to an earth surface four interfaces. E4-4 Venn Tags was also be favored by the subject, and place the two connected icons (with geographic distance) on which received 49 votes. According to the post-session interview, the map. The design was shown in Figure 6(a). IUI Workshops’19, March 20, 2019, Los Angeles, USA Chun-Hua Tsai and Peter Brusilovsky (a) E5-2: Earth Style (b) E5-3: Navigation Style (c) E5-4: Icon Style (d) E5-5: Label Style Figure 6: The interfaces used to explain the Geography Similarity in the fourth stage. 6.5.2 E5-3: Navigation Style. E5-3 Navigation Style followed the 7 DISCUSSION AND CONCLUSIONS same Google Map API (shown in E5-2), but presented navigation In this work-in-progress paper, we presented a participatory process between the two locations, either by car or flight. To be noted, the of bringing explanation interfaces to a social recommender system. transportation time, i.e., the fly or driving time in E5-2 or E5-3, did We proposed four stages in responding to the challenge questions in not be considered in the recommendation model. The design was identifying the key components of explanation models and mental shown in Figure 6(b). models. In the first stage, we discussed the Expert Mental Model by discussing the key components (based on the similarity algorithm) 6.5.3 E5-4: Icon Style. E5-4 Icon Style followed the same Google of each recommendation model. In the second stage, we reported Map API (shown in E5-2), but presented two icons on the map an online survey of current system users (N=14) and identified without any navigation information. The users can hover to see 19 explanatory goals as the User Mental Model. In the third stage, the detail affiliation, but the geographic distance information was we reported the card-sorting results of a controlled user study not presented. The design was shown in Figure 6(c). (N=15) that created the Target Mental Model through the target users’ preference of the explanatory factors. 6.5.4 E5-5: Label Style. E5-4 Label Style followed the same Google In the fourth stage, we proposed a total of 25 explanation inter- Map API (shown in E5-2), but presented two labels on the map faces for five recommendation models and reported the card-sorting without any navigation information. The users can see the detail and semi-interview result. We found, in general, the participants affiliation profile through the floating label without extra clicking preferred visualization interfaces more than the text-based inter- or hovering interactions. The design was shown in Figure 6(d). face. Based on the study, we found E1-4: Venn Word Cloud, E2-4: Topical Radar, E3-4: Strength Graph, E4-4: Venn Tags, E5-3: Navi- 6.5.5 Results. The card-sorting result was presented in Table 2. We gation Style were preferred by the study participants. We further found the E5-3 Navigation Style was preferred by the participants, discussed the top-rated and second-rated explanation interfaces received 42 votes in Rank 1. However, the votes are close with and user feedback in each session. Based on the experiment results, E3-5 Label Style (40 votes). According to the post-session interview, we concluded the design guideline of bringing the explanation six subjects agreed E5-3 is the best interface versus the other four interface to a real-world social recommender system. interfaces. But there were three subjects particularly mentioned A further controlled study will be required to test if the proposed the navigation function was irrelevant in explaining or exploring explanation interface can achieve the target mental as we identified the social recommendations. The supporting reasons of E5-3 can in this paper. In our future works, we plan to implement the top- be summarized as 1) The map is informative (N=2). 2) It is useful rated explanation interfaces and deploy those interfaces to the CN3 to see navigation (N=5). Three subjects mentioned they preferred system. Moreover, we expect to provide the explanation interfaces E5-5 the most due to the label contains affiliation information that with an information-seeking task, so we can analyze how and why they can understand the affiliation without extra actions. Although does a user adopt the explanation interfaces in exploring the social there is no geographic distance information, one subject pointed recommendations. out he will realize the distance after knowing the affiliation title. Designing Explanation Interfaces for Transparency and Beyond IUI Workshops’19, March 20, 2019, Los Angeles, USA REFERENCES [15] Julia Silge and David Robinson. 2016. tidytext: Text mining and analysis using [1] Malin Eiband, Hanna Schneider, Mark Bilandzic, Julian Fazekas-Con, Mareike tidy data principles in r. The Journal of Open Source Software 1, 3 (2016), 37. Haug, and Heinrich Hussmann. 2018. Bringing Transparency Design into Practice. [16] Nava Tintarev and Judith Masthoff. 2012. Evaluating the effectiveness of expla- In 23rd International Conference on Intelligent User Interfaces. ACM, 211–223. nations for recommender systems. User Modeling and User-Adapted Interaction [2] Gerhard Friedrich and Markus Zanker. 2011. A taxonomy for generating expla- 22, 4-5 (1 Oct. 2012), 399–439. nations in recommender systems. AI Magazine 32, 3 (2011), 90–98. [17] Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design [3] Alex Garnett, Grace Lee, and Judy Illes. 2013. Publication trends in neuroimaging and evaluation. In Recommender systems handbook. Springer, 353–382. of minimally conscious states. PeerJ 1 (2013), e155. [18] Chun-Hua Tsai and Peter Brusilovsky. 2017. Providing Control and Transparency [4] Reinhard Heckel, Michail Vlachos, Thomas Parnell, and Celestine Dünner. in a Social Recommender System for Academic Conferences. In Proceedings of the 2017. Scalable and interpretable product recommendations via overlapping 25th Conference on User Modeling, Adaptation and Personalization. ACM, 313–317. co-clustering. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference [19] Chun-Hua Tsai and Peter Brusilovsky. 2018. Beyond the Ranked List: User-Driven on. IEEE, 1033–1044. Exploration and Diversification of Social Recommendation. In 23rd International [5] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col- Conference on Intelligent User Interfaces. ACM, 239–250. laborative filtering recommendations. In Proceedings of the 2000 ACM conference [20] Chun-Hua Tsai and Peter Brusilovsky. 2018. Explaining Social Recommendations on Computer supported cooperative work. ACM, 241–250. to Casual Users: Design Principles and Opportunities. In Proceedings of the 23rd [6] Google Inc. 2018. Google Maps Directions API. https://developers.google.com/ International Conference on Intelligent User Interfaces Companion. ACM, 59. maps/documentation/directions/intro [21] Chun-Hua Tsai and Peter Brusilovsky. 2019. Exploring Social Recommendations [7] Bart P. Knijnenburg, Svetlin Bostandjiev, John O’Donovan, and Alfred Kobsa. with Visual Diversity-Promoting Interfaces. TiiS 1, 1 (2019), 1–1. 2012. Inspectability and Control in Social Recommenders. In 6th ACM Conference [22] Chun-Hua Tsai and Brusilovsky Peter. 2019. Explaining Recommendations in an on Recommender System. 43–50. Interactive Hybrid Social Recommender. In Proceedings of the 2019 Conference on [8] Pigi Kouki, James Schaffer, Jay Pujara, John O’Donovan, and Lise Getoor. 2017. Intelligent User Interface. ACM, 1–12. User preferences for hybrid explanations. In Proceedings of the Eleventh ACM [23] Katrien Verbert, Denis Parra, Peter Brusilovsky, and Erik Duval. 2013. Visualizing Conference on Recommender Systems. ACM, 84–88. recommendations to support exploration, transparency and controllability. In [9] Lawrence. 2018. Customize D3plus network style. https://codepen.io/choznerol/ Proceedings of the 2013 international conference on Intelligent user interfaces. ACM, pen/evaYyv 351–362. [10] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: [24] Jesse Vig, Shilad Sen, and John Riedl. 2009. Tagsplanations: explaining recommen- understanding rating dimensions with review text. In Proceedings of the 7th ACM dations using tags. In Proceedings of the 14th international conference on Intelligent conference on Recommender systems. ACM, 165–172. user interfaces. ACM, 47–56. [11] Conference Navigator. 2018. Paper Tuner. http://halley.exp.sis.pitt.edu/cn3/ [25] Yao Wu and Martin Ester. 2015. FLAME: A Probabilistic Model Combining Aspect portalindex.php Based Opinion Mining and Collaborative Filtering. In Proceedings of the Eighth [12] Alexis Papadimitriou, Panagiotis Symeonidis, and Yannis Manolopoulos. 2012. ACM International Conference on Web Search and Data Mining (WSDM ’15). ACM, A generalized taxonomy of explanations styles for traditional and social recom- New York, NY, USA, 199–208. https://doi.org/10.1145/2684822.2685291 mender systems. Data Mining and Knowledge Discovery 24, 3 (2012), 555–583. [26] Yao Wu and Martin Ester. 2015. Flame: A probabilistic model combining aspect [13] Pearl Pu and Li Chen. 2007. Trust-inspiring explanation interfaces for recom- based opinion mining and collaborative filtering. In Proceedings of the Eighth mender systems. Knowledge-Based Systems 20, 6 (2007), 542–556. ACM International Conference on Web Search and Data Mining. ACM, 199–208. [14] Amit Sharma and Dan Cosley. 2013. Do social explanations work?: studying [27] Zoomdata. 2018. Real-time Interactive Zoomdata Wordcloud. and modeling the effects of social explanations in recommender systems. In https://visual.ly/community/interactive-graphic/social-media/ Proceedings of the 22nd international conference on World Wide Web. ACM, 1133– real-time-interactive-zoomdata-wordcloud 1144.