=Paper= {{Paper |id=Vol-1679/paper5 |storemode=property |title=Scalable Exploration of Relevance Prospects to Support Decision Making |pdfUrl=https://ceur-ws.org/Vol-1679/paper5.pdf |volume=Vol-1679 |authors=Katrien Verbert,Karsten Seipp,Chen He,Denis Parra,Chirayu Wongchokprasitti,Peter Brusilovsky |dblpUrl=https://dblp.org/rec/conf/recsys/VerbertSHPWB16 }} ==Scalable Exploration of Relevance Prospects to Support Decision Making== https://ceur-ws.org/Vol-1679/paper5.pdf
     Scalable Exploration of Relevance Prospects to Support
                        Decision Making

                      Katrien Verbert                                         Karsten Seipp                         Chen He
                 Department of Computer                              Department of Computer                 Department of Computer
                        Science                                             Science                                Science
                       KU Leuven                                           KU Leuven                              KU Leuven
                    Leuven, Belgium                                     Leuven, Belgium                        Leuven, Belgium
               katrien.verbert@cs.kuleuven.be                       karsten.seipp@cs.kuleuven.be              chen.he@cs.kuleuven.be
                          Denis Parra                            Chirayu Wongchokprasitti                     Peter Brusilovsky
              Dept. of Computer Science                             Department of Biomedical                  School of Information
             Pontificia Universidad Católica                               Informatics                              Sciences
                         de Chile                                    University of Pittsburgh                University of Pittsburgh
                      Santiago, Chile                                 Pittsburgh, PA, USA                     Pittsburgh, PA, USA
                           dparras@uc.cl                                       chw20@pitt.edu                     peterb@pitt.edu


ABSTRACT                                                                                tem acts as a “black box” [7]. One approach to deal with
Recent efforts in recommender systems research focus in-                                this issue is to support exploration of recommendations by
creasingly on human factors that affect acceptance of recom-                            exposing recommendation mechanisms and explaining why
mendations, such as user satisfaction, trust, transparency,                             a certain item was selected [19]. For example, graph-based
and user control. In this paper, we present a scalable vi-                              visualisations can explain collaborative filtering results by
sualisation to interleave the output of several recommender                             representing relationships among items and users [11, 3].
engines with human-generated data, such as user bookmarks                                  Our work has been motivated by the presence of multi-
and tags. Such a visualisation enables users to explore which                           ple relevance prospects in modern social tagging systems.
recommendations have been bookmarked by like-minded mem-                                Items bookmarked by a specific user offer a social relevance
bers of the community or marked with a specific relevant                                prospect: if this user is known or appears to be like-minded,
tag. Results of a preliminary user study (N =20) indicate                               a collection of her bookmarks is perceived as an interesting
that effectiveness and probability of item selection increase                           set that is worth to explore. Similarly, items marked by a
when users can explore relations between multiple recom-                                specific tag offer a content relevance prospect. In a social
mendations and human feedback. In addition, perceived ef-                               tagging system extended with a personalised recommender
fectiveness and actual effectiveness of the recommendations                             engine [12, 15, 4], top items recommended to a user offer a
as well as user trust into the recommendations are higher                               personalised relevance prospect.
than a traditional list representation of recommendations.                                 Existing personalised social systems do not allow their
                                                                                        users to explore and combine these different relevance prospects.
                                                                                        Only one prospect can be explored at any given time: a list
CCS Concepts                                                                            of items suggested by a recommender engine, a list of items
•Human-centered computing → Information visual-                                         bookmarked by a user, or a list of items marked with a
isation; Empirical studies in visualisation; User inter-                                specific tag. In our work, we focus on the use of visual-
face design;                                                                            isation techniques to support exploration of multiple rele-
                                                                                        vance prospects, such as relationships between different rec-
Keywords                                                                                ommendation methods, socially connected users, and tags,
                                                                                        as a basis to increase acceptance of recommendations. In
Interactive visualisation; recommender systems; set visuali-
                                                                                        earlier work, we investigated how users explore these recom-
sation; scalability; user study
                                                                                        mendations using a cluster map visualisation [20]. Although
                                                                                        we were able to show the potential value of combining rec-
1.     INTRODUCTION                                                                     ommendations with tags and bookmarks of users, the user
 When recommendations fail, a user’s trust in a recom-                                  interface was found to be challenging. Further, the nature
mender system often decreases, particularly when the sys-                               of the employed visualisation made our approach difficult to
                                                                                        scale: in a field study, users only explored relations between
                                                                                        a maximum of three entities. Due to these limitations, the
                                                                                        effect of using multiple prospects could not be fully assessed.
                                                                                           In this paper, we present the use of a scalable visualisation
                                                                                        that combines personalised recommendations with two addi-
                                                                                        tional prospects: (1) bookmarks of other users (a social rel-
                                                                                        evance prospect), and (2) tags (content relevance prospect).
                                                                                        Personalised recommendations are generated with four dif-
IntRS 2016, September 16, 2016, Boston, MA, USA.
                                                                                        ferent recommendation techniques and embodied as agents
Copyright remains with the authors and/or original copyright holders, 2016.
to put them on the same ground as users (i.e., recommenda-               In addition, TasteWeights [3] allows users to control the
tions made by agents are treated in the same way as book-             impact of the profiles and behaviours of friends and peers
marks left by users). We use the UpSet visualisation [9],             on the recommendation results. Similar to our work, Taste-
which offers a scalable approach to combine multiple sets             Weights provides an interface for such hybrid recommenda-
of relevance prospects, i.e. different recommender agents,            tions. The system elicits preference data and relevance feed-
bookmarks of users, and tags. We aim to assess whether the            back from users at run-time in order to adapt recommenda-
combination of multiple relevance prospects shown with this           tions. This idea can be traced back to the work of Schafer
technique can be used to increase the effectiveness of recom-         et al. [17] concerning meta-recommendation systems. These
mendations while also addressing several issues related to            meta-recommenders provide users with personalised control
the “black box” problem. In particular, we explore the fol-           over the generation of recommendations by allowing them
lowing research questions:                                            to alter the importance of specific factors on a scale from 1
                                                                      (not important) to 5 (must have). SetFusion [13] is a recent
     • RQ1 Under which condition may a scalable visualisa-            example that allows users to fine-tune weights of a hybrid
       tion increase user acceptance of recommended items?            recommender system. SetFusion uses a Venn diagram to vi-
     • RQ2 Does a scalable set visualisation increase per-            sualise relationships between recommendations. Our work
       ceived effectiveness of recommendations?                       extends this concept by visualising relationships between dif-
                                                                      ferent relevance prospects, including human-generated data,
     • RQ3 Does a scalable set visualisation increase user            such as user bookmarks and tags in addition to outputs of
       trust in recommendations?                                      recommenders, in order to incite the exploration of related
                                                                      items and to increase their relevance and importance in the
     • RQ4 Does a scalable set visualisation improve user             eye of the user. To do so, we employ a set-based visualisation
       satisfaction with a recommender system?                        that allows users to quickly discern relations and common-
                                                                      alities between the items of recommenders, users, and tags
The contribution of this research is threefold:
                                                                      for a richer and more relevant choice.
     1. First, we present a novel interface that integrates a            Relevance-based or set-based visualisation attempts to spa-
        simplified version of the UpSet visualisation, allowing       tially organize recommendation results. This type of visual-
        the user to flexibly combine multiple prospects to ex-        isation has its roots in the field of information retrieval and
        plore recommended items.                                      was used for the display of search results. For example: for a
                                                                      query that uses three terms, this type of visualisation would
     2. Second, we present a preliminary user study that as-          create seven set areas. Three sets will show the results sep-
        sesses the effect of combining multiple relevance prospects   arately for each term. Another set of three will show results
        on the decision-making process. We find that users            for any combination of two of these terms. Finally, one set
        explore combinations of recommendations with users            will show results that are relevant to all three terms together.
        and tags more frequently than recommendations only            The classic example of such a set-based relevance visualisa-
        based on agents. Further, this combination is found to        tion is InfoCrystal [18]. The Aduna clustermap visualisation
        provide more relevant results, leading to an increase in      [1] also belongs to this category, but offers a more complex
        user acceptance.                                              visualisation paradigm and a higher degree of interactivity.
     3. Third, we find indications of an increase in user trust,      The strongest point of both approaches, however, is the clear
        user satisfaction, and both perceived and actual effec-       representation of the query terms and their relevant items,
        tiveness of recommendations compared to a baseline            separately or in combination.
        system. This shows the positive effects of combining             In the context of similar work, the novelty of the approach
        multiple prospects on user experience.                        suggested in this paper is twofold: first, we use a set-based
                                                                      relevance approach that is not limited to keywords or tags,
This paper is organized as follows: first, we present related         but which combines these with other relevance-bearing enti-
work in the area of interactive recommender systems. We               ties (users and recommendation agents). The major differ-
then introduce the design of IntersectionExplorer, an in-             ence and innovation of our work is that we allow end-users
teractive visualisation that allows users to explore recom-           to combine multiple relevance prospects to increase richness
mendations by combining multiple relevance prospects in a             and relevance of recommendations. Second, we present and
scalable way. We assess its impact on the decision-making             evaluate the use of a novel scalable visualisation technique
process and finish with a discussion of the results.                  (UpSet [9]) to perform this task and thereby demonstrate
                                                                      this approach’s ability to increase recommendation effective-
2.     RELATED WORK                                                   ness and user trust.
   In a recent study, we analyzed 24 interactive recommender
systems that use a visualisation technique to support user            3.    INTERSECTIONEXPLORER
interaction [6]. A large share of these systems focuses on               IntersectionExplorer (IE) is an interactive visualisation
transparency of the recommendation process to address the             tool that enables users to combine suggestions of recom-
“black box” issue. Here, the overall objective is to explain          mender agents with user bookmarks and tags in order to
the inner logic of a recommender system to the user in order          find relevant items. We describe the visualisation and inter-
to increase acceptance of recommendations. Good exam-                 action design of the system, followed by its implementation.
ples of this approach are PeerChooser [11] and SmallWorlds
[5]. Both allow exploration of relationships between recom-           3.1    Set Visualisation Design
mended items and friends with a similar profile using mul-               We have adapted the UpSet [9] technique to visualise re-
tiple aspects.                                                        lations between users, tags, and recommendations. UpSet
represents set relations in a matrix view: while columns rep-         The user can explore the details of data items related to
resent sets of different entities (such as recommender agents      a certain row by clicking on the row. For example, after
or other users’ bookmarks), rows represent commonalities           clicking the first row in Figure 2, the right part shows the
between these (Figure 1). The column header shows the              title and authors of two papers that are bookmarked by “P
name of the agent, user, or tag. The vertical bar chart below      Brusilovsky” and also suggested by three different agents.
the column headers depicts the number of items belonging to           The user can explore the items related to a specific set by
each related set. Set relations are represented by the rows.       clicking on the column header: all containing items of this
In such a row, a filled cell indicates that the correspond-        set are then presented in the panel on the right. Meanwhile,
ing set contributes to the relation. An empty cell indicates       the rows related to this set are also gathered at the top to
that the corresponding set is not part of the relation. The        facilitate exploration of relations with other sets.
horizontal bar chart next to each row shows the number of             At the top of the set view, the user can also sort the
items that could be explored for this relation set. For exam-      rows (set intersections) by number of items or number of
ple, the first row in Figure 1 indicates that there are three      related sets in ascending or descending order. The example
items that belong to both the set of recommendations sug-          of Figure 2 sorts the rows by the number of related sets in
gested by the bookmark-based recommender agent, and the            descending order. The first row represents items in the in-
set of recommendations suggested by the tag-based agent.           tersection of four sets. The second row represents items in
The second row shows suggestions of the bookmark-based             the intersection of three sets and the next five rows repre-
agent only, whereas the third row only shows suggestions of        sent items in the intersection of two sets. The other rows
the tag-based agent. For the convenience of the reader, we         represent items related to a single set only.
also depicted this relation in a traditional Venn diagram to
support the understanding of the concept.                          3.3   Implementation
                                                                      We have implemented IE on top of data from Conference
                                                                   Navigator 3 (CN3). CN3 is a social personalised system that
                                                                   supports attendees at academic conferences [14]. The main
                                                                   feature is its conference scheduling system where users can
                                                                   add talks of the conference to create a personal schedule.
                                                                   Social information collected by CN3 is extensively used to
                                                                   help users find interesting papers. For example, CN3 lists
                                                                   the most popular papers, the most active people, and the
                                                                   most popular tags assigned to the talks. When visiting the
                                                                   talk page, users can also see who scheduled each talk during
                                                                   the conference and which tags were assigned to this talk.
                                                                      We use the list of conference talks as data items in IE.
                                                                   CN3 offers four different recommendation services that rely
                                                                   on different recommendation engines. The tag-based rec-
Figure 1: Set visualisation of IntersectionExplorer                ommender engine matches user tags (tags provided by the
                                                                   user) with item tags (tags assigned to different talks by the
   One of the biggest advantages of a visual matrix is scal-       community of users) using the Okapi BM25 algorithm [10].
ability. Whereas a Venn diagram can only display the in-           The bookmark-based recommendation engine builds the user
tersections of a limited number of sets, the UpSet technique       interest profile as a vector of terms with weights based on
can present many sets in parallel, as only a single column         TF-IDF [2] using the content of the papers that the user
has to be added to add another set to the visualisation. This      has scheduled. It then recommends papers that match this
greatly reduces space requirements while increasing the in-        profile of interests. Another two recommender engines, ex-
formation density. The visual encoding of IE is identical          ternal bookmark and bibliography, are the augmented version
for any number and constellation of sets. In practice, users       of the bookmark-based engines [21]. The external bookmark
may wish to first familiarise themselves with the display of       recommender engine combines both the content of the sched-
a small number of sets, but due to the consistent and space-       uled papers and the research papers bookmarked by the user
efficient design, they can seamlessly increase the set numbers     in academic social bookmarking systems such as Mendeley,
without altering the view.                                         CiteUlike, or BibSonomy. Similarly, the bibliography recom-
                                                                   mender engine uses the content of papers published by the
3.2    Interaction Design                                          user in the past to augment the bookmarked papers.
   An overview of the full IE interface is shown in Figure 2.         The suggestions of these four recommender engines are
The interface is separated into three connected parts. In the      represented as separate agents in IE. Users can explore which
left part, the user can select different entities: agents, users   items are suggested by a single agent, for instance the tag-
and tags. If an agent is selected, the set of items suggested      based recommender, but they can also explore which items
by this agent is added to the matrix visualisation in the          are recommended by multiple agents to filter out the po-
canvas area. If a user is added, the set of bookmarks of this      tentially more relevant recommendations. In addition, users
user is added. Similarly, if a tag is added, the set of papers     can explore relations between agent suggestions and book-
marked with this tag is added to the view.                         marks of real users. As shown in Figure 2, the third row
   The canvas area represents user-selected sets as columns        represents items suggested by the tag-based agent that have
in a matrix view, allowing the user to explore overlaps be-        also been bookmarked by “P Brusilovsky”, but that are not
tween these sets. Each row represents relations between the        suggested by the two other agents and that have not been
different columns as explained in the previous section.            bookmarked by the active user (“K Verbert”). In this paper,
Figure 2: IntersectionExplorer visualises relationships between recommendations generated by multiple rec-
ommendation techniques (agents) and bookmarks of users and tags to increase relevance of recommendations.


we evaluate whether enabling users to explore relations be-       five items in each of the proceedings. In addition, users’
tween recommendations of different techniques, real users,        publication history and academic social bookmark systems
and tags increases the acceptance of recommendations.             (CiteULike and Bibsonomy) were read. From the combined
  The set visualisation shows the relations of the selected       data, recommendations were generated in both conditions
sets as described in section 3.1. The column of the current       using the four different techniques described in Section 3.
user is displayed in blue while the other columns are repre-      These were then presented as four individual agents: a tag-
sented in grey. As presented in Figure 2, the bar chart below     based agent, a bookmark-based agent, an external bookmark
the column headers of users overlays a blue bar that encodes      agent and a biography agent.
the number of common bookmarks with the current user.                To explore the impact of the IE visualisation on the users’
The similarity between users is also represented next to the      acceptance of items, users were tasked to explore the rec-
user name in the panel on the left: “P Brusilovsky (9/31)”        ommendations of the four agents freely and to bookmark
indicates that the user “P Brusilovsky” has 31 bookmarks in       five items. During this period we recorded the time and
total. Nine out of these 31 talks are also bookmarked by the      amount of steps taken to create a bookmark. In particu-
active user (“K Verbert”).                                        lar, we recorded the following actions: selection/deselection
  For the user study presented in this paper, we used the         of agents, users and tags, sorting, hovering over a result
data from the EC-TEL conferences of 2014 and 2015. EC-            row (if mouse position was held for more than two seconds),
TEL is a large conference on technology enhanced learning.        clicking onto a paper’s title, and clicking the bookmark but-
We retrieved user bookmarks and tags of these conferences,        ton. Further, we collected data using a think-aloud proto-
and had access to the different recommender services for          col, synchronizing screen recording and microphone input.
both the 2014 and 2015 edition of the conference. Attendees       Finally, users completed a questionnaire using a five-point
of the EC-TEL conference participated in the user study           Likert scale. The questions were based on ResQue [16] and
that is presented in the next section.                            the framework proposed by Knijnenburg et al. [8], both of
                                                                  which have been validated for the measurement of subjective
                                                                  aspects of user experience with recommender systems.
4.   USER STUDY                                                      Before exploring the recommendations using IE, users were
  To investigate to what extent the set visualisation may         shown a three-minute video to explain the system’s opera-
support users in finding relevant items, we conducted a within-   tion. In the IE view, users saw the intersections with the
subjects study with 20 users (mean age: 32.9 years; SD:           agents’ recommendations. In the CN3 (baseline) view, users
6.32; female: 3) in two conditions, both of which had to be       saw the full results of the bookmark agent and could navi-
completed by all participants.                                    gate to the recommendations generated by the three other
  In the first condition (baseline), users were tasked to ex-     agents, as presented in Figure 3. The study was counterbal-
plore recommendations presented to them using the CN3             anced by mode of exploration (CN3/IntersectionExplorer).
“my recommendations” page with four ranked lists. In the          Five users completed the study with a researcher present
second condition, users explored recommendations using In-        in the same room, whereas 15 users completed the study
tersectionExplorer (IE). To avoid a learning effect, each con-    via an on-line video call. To establish users’ background-
dition used a separate data set from which to generate rec-       knowledge, we asked each participant a set of questions us-
ommendations. The baseline condition (CN3) used the EC-           ing a five-point Likert scale after the study. Mean results
TEL 2014 proceedings (172 items), the IE condition used           were as follows:
the EC-TEL 2015 proceedings (112 items).
  To prepare for the study, users bookmarked and tagged              • Users were familiar with technology-enhanced learning
      (mean: 4; SD: 1.1).                                            Yield measured the fraction of items of an explored set
                                                                  that were actually bookmarked by all users in total. For in-
   • Users were familiar with recommender systems (mean:          stance, if the results of the intersection with one agent listed
     4; SD: 0.95).                                                a total of 93 items for all users combined, but only five book-
   • Users were familiar with visualisation techniques (mean:     marks were created from this type and size of intersection
     4.05; SD: 0.86).                                             across the whole study, its overall yield was 5/93=0.05 (Fig-
                                                                  ure 5, first row).
   • Users occasionally followed the advice of recommender           Figure 4 and Figure 5 reveal an interesting effect: sets
     systems (mean: 4.25; SD: 0.77).                              which included the recommendations of agents and other
                                                                  entities, such as other users’ bookmarks and tags, appeared
   • Eight participants had never heard of CN3 before. Twelve     to have a higher yield and effectiveness than sets based on
     had heard of it, but had no particular familiarity with      agent recommendation alone, even if the number of intersec-
     the system (mean: 3.25; SD: 1.13).                           tions were the same. To further explore this aspect, we di-
                                                                  vided the results for effectiveness and yield into two groups:
   One user had no publications, four had two to four pub-
                                                                  those obtained for interaction with one to four agents, and
lications, fifteen had five publications or more. Within the
                                                                  those obtained from interaction with the recommendations
last group, 93.3% had published on an EC-TEL conference
                                                                  of different numbers of agents and another entity (user or
in the past.
                                                                  tag). A Friedman test indicated a significant effect of rec-
                                                                  ommendation source on effectiveness, c2 (1) = 4, p = .046
                                                                  revealing that users who explored the recommendations of
                                                                  agents combined with another entity in the recommendation
                                                                  matrix of IE (median: .43), tended to find more than twice
                                                                  as many relevant items as when only using the agents for the
                                                                  recommendation (median: 0.21) (Figure 4). These results
                                                                  correspond to our findings that the richer the set (the more
                                                                  “perspectives” contribute the recommendation), the higher
                                                                  the yield (Figure 6). In general, Figure 7 shows that the
                                                                  larger the amount of intersections with a specific type, the
                                                                  higher the yield. Pearson’s correlation showed a positive
Figure 3: CN3 baseline interface with four ranked                 correlation between the number of intersections and yield (r
lists provided by four recommender engines.                       = .839, n = 6, p = .037).


4.1     Results and Evaluation
4.1.1    Quantitative results
   The main focus of this study was to investigate under
which condition the visualisation may increase user accep-
tance of recommended items. To answer our question, we
need to analyse the in-depth behaviour of users exploring
the recommendations using various combinations of recom-
mender agents and the bookmarks and tags of other users.
   In order to be able to determine the impact of visualising
relations between agents, users, and tags, we defined two
measures: effectiveness and yield.
   Effectiveness measured how frequently the exploration
of a specific set providing a number of intersections (hence-
forth called ‘size’) eventually led to the user bookmarking       Figure 4: Effectiveness of the combinations of vari-
another paper (from the recommended set of papers). By            ous amounts of agents and the combinations of var-
the exploration of a set we mean clicking on a row of inter-      ious amounts of agents and other entities, such as
sections in the visualisation (Figure 1, Figure 2) to show the    users or tags. Effectiveness was higher when agents
items belonging to the intersection of the selected sets.         were combined with another entity.
   Effectiveness was calculated as the number of cases where
the exploration of an intersection of a specific type and size       Overall, these results suggests that enriching automated
resulted in a new user bookmark, divided by the number of         recommendations based on tags, previous bookmarks, pub-
times this intersection type and size was explored. Intersec-     lication history and academic social bookmarks with socially
tion types could be a single agent, a combination of agents,      collected relevance evidence, such as the bookmarks made by
or a combination of agents with another entity (user or tag).     other users of the same conference or a tag, greatly increases
The size represented the number of sets in the intersection.      the relevance of recommendations, resulting in a higher ac-
For instance, users explored suggestions of a single agent 26     ceptance rate.
times in total (one agent, Figure 4, first row). Exploration of      Regarding the overall operability of IE, an ANOVA of task
these sets resulted in the creation of five bookmarks. Thus,      completion time showed an effect of task number F (4, 44)
the single agent’s effectiveness is 5/26 = 19%.                   = 20.5, p < .001 on interaction time. However, a post-hoc
                                                             a Bonferroni-Holm correction, differences were not statisti-
                                                             cally significant. This suggests that while IE may have a
                                                             higher learning curve than CN3, no statistically significant
                                                             differences exist in terms of efficiency of operation after ac-
                                                             quaintance with the system (Figure 8).




Figure 5: Yield of the combinations of various
amounts of agents and the combinations of various
amounts of agents and other entities, such as users          Figure 8: Median time (mm:ss) and steps of each
or tags. Yield was higher when agents were com-              task with IntersectionExplorer (IE) and CN3.
bined with another entity.
                                                             4.1.2    Questionnaire results
                                                                Results are reported in Figure 9, 10 and 11. Running a set
                                                             of Bonferroni-Holm-corrected Wilcoxon signed-rank tests on
                                                             the questionnaire results revealed the following:
                                                                • Papers explored with IE were perceived to be of a
                                                                  higher quality than with CN3 (Z = 3.54, p < .001).

                                                                • IE was perceived to be more effective than CN3 (Z =
Figure 6: Yield of different numbers of perspectives              4.24, p < .001).
in an exploration. Pearson’s correlation showed a               • User satisfaction was higher with IE than with CN3
positive correlation between number of perspectives               (Z = 3.22, p = .001).
in an exploration and yield (r = 1.0, n = 3, p =
.015).                                                          • Users would me more willing to use IE frequently than
                                                                  CN3 (Z = 3.42, p = .001).

                                                                • Users perceived the recommendations shown in IE to
                                                                  be more trustworthy than CN3 (Z = 2.55, p = .011).




Figure 7: Yield of different numbers of entities in the      Figure 9: Questionnaire results with statistical sig-
intersection. Pearson’s correlation showed a positive        nificance. Differences between the aspects “Fun”
correlation between number of entities in an inter-          and “Choice satisfaction” were not significant after
section and yield (r = .839, n = 6, p = .037).               the Bonferroni-Holm correction.

                                                                In addition, a trend was observed that users experienced
Bonferroni-Holm-corrected Wilcoxon signed-rank test indi-    IE to be more fun than CN3 (Z = 2.28, p = .023) and to
cated that differences were not statistically significant.   provide a higher choice satisfaction (Z = 2.1, p = .039).
   A Greenhouse-Geisser corrected ANOVA of the amount        However, after applying a Bonferroni-Holm correction, dif-
of steps needed to complete the bookmarking tasks showed     ferences were not statistically significant.
an effect of condition, F (1, 11) = 7.86, p = .017, and an      Similarly, the results for the novelty of items (median: 4),
effect of task order, F (2.09, 23) = 168.82, p = .002. A     effort to use the systems (median: 2), usefulness (median:
Wilcoxon signed-rank test showed a trend for task one tak-   4), and ease of use (median: 4) were the same for both
ing more steps when using IE (median: 11) than when using    systems. Users tended to perceive the creation of bookmarks
CN3 (median: 4), Z = 2.5, p = .012, but after applying       as more difficult in IE (median: 3) than in CN3 (median:
                                                                 increases user acceptance. We thus recommend to combine
                                                                 automated sources and personal sources whenever possible.
                                                                    RQ2 Does a scalable set visualisation increase perceived
                                                                 effectiveness of recommendations?
                                                                    Perceived effectiveness (expressed in the questionnaire)
                                                                 and actual effectiveness (how frequently users bookmarked
                                                                 a recommended paper) were increased by this type of visu-
                                                                 alisation.
                                                                    RQ3 Does a scalable set visualisation increase user trust
                                                                 in recommendations?
Figure 10: Questionnaire results without statistic                  The evaluation of the subjective data showed that user
significance.                                                    trust into the recommended items was increased with set-
                                                                 based visualisation of recommendation sources.
                                                                    RQ4 Does a scalable set visualisation improve user satis-
2), but tended to read the bookmarked papers afterwards          faction with a recommender system?
more frequently when using IE (median: 4) that when using           Overall, user satisfaction was higher when using the visu-
CN3 (median: 3).                                                 alisation, suggesting this to be a key feature of the approach.
  As for the IE-specific aspects shown in Figure 11, users
perceived the visualisation to be adequate (median: 4) and       4.2     Discussion
the amount of information provided by the system to be
sufficient to make a bookmark decision (median: 4). Users        4.2.1    Simplicity vs. effectiveness
tended to be undecided regarding the interaction adequacy           The analysis of task completion time and amount of steps
of IE (median: 3.5, see [16] for a definition), but found it     needed to complete the bookmarking tasks has shown that
easy to modify their preference to find relevant papers (user    users require more time and interactions to set their first
control, median: 4).                                             bookmark in IE, but that after this ‘training phase’, the
                                                                 operational efficiency between IE and CN3 does not differ.
                                                                 This corresponds to the observations made during the anal-
                                                                 ysis of the think-aloud study, where it was found that some
                                                                 users initially struggled to understand the meaning of the
                                                                 different circle types or what a ‘set’ was.
                                                                    However, the analysis of the subjective data has shown
                                                                 that users perceived IE to be more effective and its rec-
                                                                 ommendations more trustworthy than those given by CN3.
                                                                 Especially the last point may be the result of removing the
Figure 11: Interaction and visualisation sufficiency.            frequently lamented “black box” problem of recommenders
                                                                 by simply visualising how and why certain items are selected.
                                                                 In addition, users perceived items resulting from their use of
4.1.3    Observation                                             IE to be of higher quality and found the overall experience
  The think-aloud protocol revealed the following:               more satisfying. This positive user experience may compen-
  Interface: Some users misinterpreted empty circles to          sate for the initial conceptual problems encountered in the
be a match of bookmarks or recommendations (three users)         first exploration of the application and suggests that IE may
or initially failed to understand the meaning of the circles     be a helpful addition to the conference explorer service.
(three users). Others stated that they did not know that a
tag-based agent was available and that the list of entities on   4.2.2    Comparison to previous work
the left was too long (three users).                                In our previous work we presented the idea of combin-
  Terminology: Two users had problems understanding              ing recommendations embodied as agents with bookmarks
the meaning of “sets”, “related sets” or the numbers repre-      of users and tags as a basis to increase effectiveness of rec-
senting the amount of papers in a set.                           ommendations [20]. A cluster map technique was used to
  It was further observed that some users only explored sets     enable users to explore these relations. Whereas the ap-
recommended by the agents, the majority explored sets rec-       proach seemed promising, the cluster map was challenging
ommended by agents and related to other users or tags.           for users to understand. In a first controlled user study, we
                                                                 asked users explicitly to explore recommendations of agents,
4.1.4    Answering the research questions                        bookmarks of users, tags and their combinations to try to
   RQ1 Under which condition may a scalable visualisation        find relevant items. Results of this user study indicate that
increase user acceptance of recommended items?                   there is an increase in effectiveness. In a follow-up uncon-
   Our research showed that user acceptance of recommended       trolled field study users did not explore many intersections
items increased with the amount of sources used. However,        between different relevance prospects. As a result, the ef-
the most important finding is that the addition of human-        fect of combining relevance prospects could not be confirmed
generated data – such as bookmarks of other users or tags –      when users were not pushed to do so.
to the agent-generated recommendations resulted in a signif-        IE employs the novel UpSet visualisation technique that
icant increase of effectiveness and yield. Our data suggests     was presented at IEEE VIS in 2014. We simplified the inter-
that providing users with insight into relations of recommen-    face and deployed it on top of data collected by Conference
dations with bookmarks and tags of community members             Navigator. The approach addresses the previous limitations
regarding ease of use and scalability: in this study users did        recommendations. In Computer Graphics Forum,
explore many intersections, enabling us to investigate the            volume 29, pages 833–842. Wiley Online Library, 2010.
effect of the approach on acceptance of recommendations.          [6] C. He, D. Parra, and K. Verbert. Interactive
                                                                      recommender systems: a survey of the state of the art
4.2.3    Limitations                                                  and future research challenges and opportunities.
   One limitation is the low number of participants. Further,         Expert Systems with Applications, 2016.
the study was conducted with researchers from the field of        [7] J. L. Herlocker, J. A. Konstan, and J. Riedl.
technology enhanced learning with a high degree of visuali-           Explaining collaborative filtering recommendations. In
sation expertise (mean: 4.05, SD: 0.86). Such users may be            Proc. CSCW ’00, pages 241–250. ACM, 2000.
biased due to their graph literacy. In addition, our data was     [8] B. P. Knijnenburg, M. C. Willemsen, Z. Gantner,
limited to that provided by the EC-TEL conferences.                   H. Soncu, and C. Newell. Explaining the user
                                                                      experience of recommender systems. User Modeling
5.   CONCLUSION AND FUTURE WORK                                       and User-Adapted Interaction, 22(4-5):441–504, 2012.
   We presented a study that used the UpSet visualisation         [9] A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot, and
technique to combine agent-based recommendations with                 H. Pfister. Upset: visualization of intersecting sets.
human-generated recommendations in the form of bookmarks              Visualization and Computer Graphics, IEEE
and tags. Despite the initial learning curve (when compared           Transactions on, 20(12):1983–1992, 2014.
to the baseline system CN3), we found that this combina-         [10] C. D. Manning, P. Raghavan, H. Schütze, et al.
tion resulted in a higher degree of item exploration and ac-          Introduction to information retrieval, volume 1.
ceptance of recommendations, than when using agent-only               Cambridge university press Cambridge, 2008.
results. This way, user trust, usefulness, quality, and effec-   [11] J. O’Donovan, B. Smyth, B. Gretarsson,
tiveness were increased. We could thereby demonstrate the             S. Bostandjiev, and T. Höllerer. Peerchooser: visual
positive effects of the combination of multiple prospects on          interactive recommendation. In Proc CHI ’08, pages
user experience and relevance of recommendations.                     1085–1088. ACM, 2008.
   Future work will explore the applicability of our findings    [12] D. Parra and P. Brusilovsky. Collaborative filtering for
to a more diverse dataset and audience, as well as different          social tagging systems: an experiment with citeulike.
types of visualisations. We have currently deployed Intersec-         In Proc. RecSys ’09, pages 237–240. ACM, 2009.
tionExplorer for attendees of the ACM IUI 2016 conference        [13] D. Parra and P. Brusilovsky. User-controllable
and will evaluate whether the visualisation can be used in an         personalization: A case study with setfusion.
open setting, without the presence of a researcher. In addi-          International Journal of Human-Computer Studies,
tion, we plan to deploy the visualisation on top of data from         78:43–67, 2015.
large conferences, including the Digital Humanities confer-
                                                                 [14] D. Parra, W. Jeng, P. Brusilovsky, C. López, and
ence series. Follow-up studies will assess the added value of
                                                                      S. Sahebi. Conference navigator 3: An online social
our visualisation on top of larger data collections and with
                                                                      conference support system. In UMAP Workshops,
a less technical audience. With these studies, we intent to
                                                                      pages 1–4, 2012.
reach a wider range of users to further evaluate the effect of
                                                                 [15] J. Peng, D. D. Zeng, H. Zhao, and F.-y. Wang.
the approach on the effectiveness of recommendations.
                                                                      Collaborative filtering in social tagging systems based
                                                                      on joint item-tag recommendations. In Proc CICM
6.   ACKNOWLEDGMENTS                                                  ’10, pages 809–818. ACM, 2010.
  We thank all participants for their participation and useful   [16] P. Pu, L. Chen, and R. Hu. A user-centric evaluation
comments. Part of this work has been supported by the                 framework for recommender systems. In Proc.
Research Foundation Flanders (FWO), grant agreement no.               RecSys’11, pages 157–164. ACM, 2011.
G0C9515N, and the KU Leuven Research Council, grant              [17] J. B. Schafer, J. A. Konstan, and J. Riedl.
agreement no. STG/14/019. The author Denis Parra was                  Meta-recommendation systems: User-controlled
supported by CONICYT, project FONDECYT 11150783.                      integration of diverse recommendations. In Proc.
                                                                      CIKM ’02, pages 43–51, NY, USA, 2002. ACM.
7.   REFERENCES                                                  [18] A. Spoerri. Infocrystal: A visual tool for information
 [1] Aduna clustermap.                                                retrieval & management. In Proc. CIKM ’93, pages
     www.aduna-software.com/technology/clustermap.                    11–20. ACM, 1993.
     Retrieved on-line 20 Augustus 2014.                         [19] N. Tintarev and J. Masthoff. Designing and evaluating
 [2] R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern                   explanations for recommender systems. In
     information retrieval, volume 463. ACM press New                 Recommender Systems Handbook, pages 479–510.
     York, 1999.                                                      Springer, 2011.
 [3] S. Bostandjiev, J. O’Donovan, and T. Höllerer.             [20] K. Verbert, D. Parra, P. Brusilovsky, and E. Duval.
     Tasteweights: a visual interactive hybrid recommender            Visualizing recommendations to support exploration,
     system. In Proc. RecSys’12, pages 35–42. ACM, 2012.              transparency and controllability. In Proc. IUI’13,
 [4] J. Gemmell, T. Schimoler, B. Mobasher, and                       pages 351–362. ACM, 2013.
     R. Burke. Recommendation by example in social               [21] C. Wongchokprasitti. Using external sources to
     annotation systems. In E-Commerce and Web                        improve research talk recommendation in small
     Technologies, pages 209–220. Springer, 2011.                     communities. PhD thesis, University of Pittsburgh,
 [5] B. Gretarsson, J. O’Donovan, S. Bostandjiev, C. Hall,            2015.
     and T. Höllerer. Smallworlds: visualizing social