uRank: Exploring Document Recommendations through
            an Interactive User-Driven Approach

                   Cecilia di Sciascio                                   Vedran Sabol                            Eduardo Veas
                    Know-Center GmbH                                  Know-Center GmbH                         Know-Center GmbH
                       Graz, Austria                                     Graz, Austria                            Graz, Austria
                  cdisciascio@know-                             vsabol@know-center.at                     eveas@know-center.at
                       center.at

ABSTRACT                                                                             becoming familiar with the underlying topic. Advanced search en-
Whenever we gather or organize knowledge, the task of search-                        gines and recommender systems (RS) have grown as the preferred
ing inevitably takes precedence. As exploration unfolds, it be-                      solution for contextualized search by narrowing down the number
comes cumbersome to reorganize resources along new interests,                        of entries that need to be explored at a time.
as any new search brings new results. Despite huge advances in                          Traditional information retrieval (IR) systems strongly depend
retrieval and recommender systems from the algorithmic point of                      on precise user-generated queries that should be iteratively refor-
view, many real-world interfaces have remained largely unchanged:                    mulated in order to express evolving information needs. However,
results appear in an infinite list ordered by relevance with respect to              formulating queries has proven to be more complicated for humans
the current query. We introduce uRank, a user-driven visual tool for                 than plainly recognizing information visually [6]. Hence, the com-
exploration and discovery of textual document recommendations.                       bination of IR with machine learning and HCI techniques led to a
It includes a view summarizing the content of the recommenda-                        shift towards – mostly Web-based – browsing search strategies that
tion set, combined with interactive methods for understanding, re-                   rely on on-the-fly selections, navigation and trial-and-error [15]. As
fining and reorganizing documents on-the-fly as information needs                    users manipulate data through visual elements, they are able to drill
evolve. We provide a formal experiment showing that uRank users                      down and find patterns, relations or different levels of detail that
can browse the document collection and efficiently gather items rel-                 would otherwise remain invisible to the bare eye [32]. Moreover,
evant to particular topics of interest with significantly lower cogni-               well-designed interactive interfaces can effectively address infor-
tive load compared to traditional list-based representations.                        mation overload issues that may arise due to limited attention span
                                                                                     and human capacity to absorb information at once.
                                                                                        Sometimes RS can be more limited than IR systems if they do
General Terms                                                                        not tackle trust factors that may hinder user engagement in explo-
Theory                                                                               ration. As Swearingen et al. [27] pointed out in their seminal work,
                                                                                     the RS has to persuade the user to try the recommended items. To
                                                                                     fulfill such challenge not only the recommendation algorithm has to
Keywords                                                                             fetch items effectively, but also the user interfaces must deliver rec-
recommending interface, exploratory search, visual analytics, sense-                 ommendations in a way that they can be compared and explained
making                                                                               [22]. The willingness to provide feedback is directly related to the
                                                                                     overall perception and satisfaction the user has of the RS [13]. Ex-
                                                                                     planatory interfaces increase confidence in the system (trust) by
1. INTRODUCTION                                                                      explaining how the system works (transparency) [28] and allowing
   With the advent of electronic archival, seeking for information                   users to tell the system when it is wrong (scrutability) [11]. Hence,
occupies a large portion of our daily productive time. Thus, the skill               to warrant increased user involvement the RS has to justify recom-
to find and organize the right information has become paramount.                     mendations and let the user customize their generation.
Exploratory search is part of a discovery process in which the user                     In this work we focus mainly on transparency and controllability
often becomes familiar with new terminology in order to filter out                   aspects and, to some extent, on predictability as well. uRank 1 is
irrelevant content and spot potentially interesting items. For exam-                 and interactive user-driven tool that supports exploration of textual
ple, after inspecting a few documents related to robots, sub-topics                  document recommendations through:
like human-robot interaction or virtual environments could attract                      i) an automatically generated overview of the document collec-
the user’s attention. Exploration requires careful inspection of at                  tion depicted as augmented keyword tags,
least a few titles and abstracts, when not full documents, before                       ii) a drag-and-drop-based mechanism for refining search inter-
                                                                                     ests, and
                                                                                        iii) a transparent stacked-bar representation to convey document
                                                                                     ranking and scores, plus query term contribution. A user study
Permission to make digital or hard copies of all or part of this work for            revealed that uRank incurs in lower workload compared to a tradi-
personal or classroom use is granted without fee provided that copies are            tional list representation.
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.                                                             1 http://eexcessvideos.know-center.tugraz.at/
IntRS ’15
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.                                     urank-demo.mp4
2. RELATED WORK
2.1 Search Result Visualization                                                     Federated
                                                                                    RS
   Modern search interfaces assist user exploration in a variety of                 Directory
ways. For example, query expansion techniques like Insyder’s Vi-                    Listing
sual Query [21] address the query formulation problem by leverag-                   Knowledge
                                                                                    Management
ing stored related concepts to help the user extend the initial query.              System
Tile-based visualizations like TileBars [7] and HotMap [9] make an


                                                                         Feedback
efficient use of space to convey relative frequency of query terms
through – gray or color – shaded squares, and in the case of the
former, also their distribution within documents and relative docu-
ment length. This paradigm aims to foster analytical understanding
                                                                                           User
of Boolean-type queries, hence they do not yield any rank or rele-                       Collection
vance score. All these approaches rely on the user being able to ex-
press precise information needs and do not support browsing-based                       Interactive process
                                                                                        Automati c process
discovery within the already available results.
   Faceted search interfaces allow for organizing or filtering items
throughout orthogonal categories. Despite being particularly useful      Figure 1: uRank visual analytics workflow showing automatic
for inspecting enriched multimedia catalogs [33, 23], they require       (black arrows) and interactive mechanisms (red arrows)
metadata categories and hardly support topic-wise exploration.
   Rankings conveying document relevance have been discouraged
                                                                         among recommendations, users and tags in a transparent manner,
as opaque an under-informative [7]. However, the advantage of
                                                                         while SetFusion emphasizes controllability over a hybrid RS. Rank-
ranked lists is that users know where to start their search for po-
                                                                         ings are not transparent though, as there is no explanation as to how
tentially relevant documents and that they employ a familiar for-
                                                                         they were obtained. Kangasraasio et al. [10] highlighted that not
mat of presentation. A study [24] suggests that: i) users prefer
                                                                         only allowing the user to influence the RS is important, but also
bars over numbers or the absence of graphical explanations of rel-
                                                                         adding predictability features that produce an effect of causality
evance scores, and ii) relevance scores encourage users to explore
                                                                         for user actions.
beyond the first two results. As a tradeoff, lists imply a sequential
                                                                            With uRank we intend to enhance predictability through docu-
search through consecutive items and only a small subset is visible
                                                                         ment hint previews (section 3.1.1), allow the user to control the
at a given time, thus they are mostly apt for sets no larger than a
                                                                         ranking by choosing keywords as parameters, and support under-
few tens of documents. Focus+Context and Overview+Detail tech-
                                                                         standing by means of a transparent graphic representation for scores
niques [20, 9] sometimes help overcome this limitation while alter-
                                                                         (section 3.2).
native layouts like RankSpiral’s [25] rolled list can scale up to hun-
dreds and maybe thousands of documents. Other approaches such
as WebSearchViz [16] and ProjSnippet [3] propose complementary           3.            URANK VISUAL ANALYTICS
visualizations to ordered lists, yet unintuitive context switching is        uRank is a visual analytics approach that combines lightweight
a potential problem when analyzing different aspects of the same         text analytics and an augmented ranked list to assist in exploratory
document.                                                                search of textual recommendations. The Web-based implementa-
   Although ranked list are not a novelty, our approach attempts         tion is fed with textual document surrogates by a federated RS (F-
to leverage the advantages provided by lists; i.e. user familiarity,     RS) connected to several sources. A keyword extraction module
and augment them with stacked-bar charts to convey document rel-         analyzes all titles and abstracts and outputs a set of representative
evance and query term contribution in a transparent manner. Insy-        terms for the whole collection and for each document. The UI al-
der’s bar graph [21] is an example of augmented ranked lists that        lows users to explore the collection content and refine information
displays document an keyword relevance relevance with disjoint           needs in terms of topic keywords. As the user selects terms of in-
horizontal bars aligned to separate baselines. Although layered bar      terest, the ranking is updated, bringing related documents closer to
dispositions are appropriate for visualizing distribution of values in   the top and pushing down the less relevant ones. Figure 1 outlines
each category across items, comparison of overall quantities and         the workflow between automatic and interactive components.
the contribution of each category to the totals is better supported          uRank’s layout is arranged in a multiview fashion that displays
by stacked-bar configurations [26]. Additionally, we rely on inter-      different perspectives of the document recommendations. Follow-
action as the key to provide controllability over the ranking criteria   ing Baldonados’s guidelines [30], we decided to limit the number of
and hence support browsing-based exploratory search.                     views to keep display space requirements relatively low. Therefore,
   LineUp [4] has proven the simplicity and usefulness of stacked        instead of multiple overlapping views, we favor a reduced number
bars to represent multi-attribute rankings. Despite targeting data of    of perspectives fitting in any laptop or desktop screen. The GUI
different nature – uRanks’s domain is rather unstructured with no        dynamically scales to the window size, remaining undistorted up to
measurable attributes –, the visual technique itself served as inspi-    a screen width of approximately 770 px.
ration for our work.                                                         The GUI presents the data in juxtaposed views that add to a
                                                                         semantic Overview+Detail scheme [2] with three levels of gran-
2.2 Recommending Interfaces                                              ularity: Collection overview. The Tag Box (Figure 2.A) sum-
  In recent years, considerable efforts have been invested into lever-   marizes the entire collection through by representing keywords as
aging the power of social RS through visual interfaces [17, 12]. As      augmented tags. Documents overview. The Document List shows
for textual content, TalkExplorer [29] and SetFusion [18] are ex-        titles augmented with ranking information and the Ranking View
amples of interfaces for exploration of conference talk recommen-        displays stacked bar charts depicting document relevance scores
dations. The former is mostly focused in depicting relationships         (Figure 2.C and D, respectively). Together they represent mini-
Figure 2: uRank User Interface displaying a ranking of documents for the keywords “gender”, “wage” and “gap”. The user has
selected the third item in the list. A. The Tag Box presents a keyword-based summary, B. the Query Box contains the selected
keywords that originated the current ranking state, C. the Document List presents a list with augmented document titles. D. the
Ranking View renders stacked bars indicating relevance scores, E. the Document Viewer shows the title, year and snippet of the
selected document with augmented keywords, and F. the Ranking Controls wrap buttons for ranking settings.


mal views of documents where they can be differentiated by title        in terms of keywords and their relative frequencies. Nevertheless, a
or position in the ranking and compared at a glance basing on the       bag-of-words representation per se does not supply further details
presence of certain keywords of interest. Document detailed view.       about how a keyword relates to other keywords or documents. We
For a document selected in the list, the Document Viewer (Figure        bridge this gap by augmenting tags with two compact visual hints
2.E) displays the title and snippet with color-augmented keywords.      – visible on mouse over – that reveal additional information: i) co-
   These views can be modified through interaction with the Rank-       occurence respect to other keywords, and i) a preview of the effect
ing Controls (Figure 2.F) and the Query Box (Figure 2.B). The for-      of selecting the keyword.
mer provides controls to reset the ranking or switch ranking modes         The document hint (Figure 3) consists in a pie chart that con-
between overall and maximum score. The latter is the container          veys the proportion of documents in which the keyword appears.
where the user drops keywords tags to trigger changes in the rank-      A tooltip indicates the exact quantity and percentage. Upon click-
ing visualization.                                                      ing on the document hint, unrelated documents are dimmed so that
                                                                        documents containing the keyword remain in focus Even unranked
3.1 Collection Overview                                                 documents become discretely visible at the bottom of the Docu-
   uRank automatically extracts keywords from the recommended           ment List. This hint provides certain predictability regarding the
documents with a twofold purpose: i) give an overview of the col-       effect of selecting a keyword, in terms of which ranked items will
lection, and ii) provide manipulable elements that serve as input for   change their scores and which documents will be added to the rank-
an on-the-fly ranking mechanism (see section 3.2).                      ing.
   Summarizing the collection in a few representative terms allows         The co-occurrence hint (Figure 2.A) shows the number of fre-
the user to scan the recommendations and grasp the general topic        quently co-occurring keywords in a red circle. Moving the mouse
at a glance, before even reading any of them. This is particularly      pointer over it brings co-occurring terms to focus by dimming the
important in the context of collections brought by RS, where the        others in the background. Clicking on the visual hint locks the
user is normally not directly generating the queries that feed the      view so that the user can hover over co-occurring keywords, which
search engine.                                                          shows a tooltip stating the amount of co-occurrences between the
                                                                        hovered and the selected keyword. This hint supports the user in
3.1.1 Inspecting the Collection                                         finding possible key phrases and sub-topics within the collection.
   The Tag Box provides a summary of the recommended texts as
a whole by presenting extracted keywords as tags. Keywords tags         3.1.2     Mining a collection of documents
are arranged in a bag-of-words fashion, encoding relative frequen-         The aforementioned interactive features are supported by a com-
cies through position and intensity (Figure 2.A). The descending        bination of well-known text-mining techniques that extend the rec-
ordering conveys document frequency (DF) while five levels of           ommended documents with document vectors and provide mean-
blue shading help the user identify groups of keywords in the same      ingful terms to populate the Tag Box.
frequency range. Redundant coding is intentional and aims at max-          Document vectors ideally include only content-bearing terms like
imizing distinctiveness among items in the keyword set [32].            nouns and frequent adjectives – appearing in at least 50% of the col-
   At first glance, the Tag Box gives an outline of the covered topic   lection –, hence it is not enough to just rely on a list of stop words
                                                                        a


                                                                        b


                                                                       Figure 4: a) Keyword tag before being dropped in Tag Box.
                                                                       b) Keyword tag after dropped: weight slider and delete button
                                                                       added, background color changed according to a categorical
                                                                       color scale. Weight sliders have been tuned.


                                                                       nale for the recommendations and features for shaping the recom-
                                                                       mendation criteria. Hence, one of uRank’s major features is the
                                                                       user-driven mechanism for re-organizing documents as information
Figure 3: Document hints show a preview of documents con-              needs evolve, along with its visually transparent logic.
taining the hovered keyword, even if they are currently un-
ranked                                                                 3.2.1      Ranking Visualization
                                                                          The ranking-based visualization consists of a list of document ti-
                                                                       tles (Figure 2.C) and stacked bar charts (Figure 2.D) depicting rank
to remove meaningless terms. Firstly, we perform a part-of-speech      and relevance scores for documents and keywords within them.
tagging (POS tagging) [1] step to identify words that meet our cri-    Document titles are initially listed following the order in which they
teria, i.e. common and proper nouns and adjectives. Filtering out      were supplied by the F-RS.
non-frequent adjectives requires an extra step. Then, plural nouns        Interactions with the view are the means for users to directly
are singularized, proper nouns are kept capitalized and terms in up-   or indirectly manipulate the data [31]. In uRank, changes in the
per case, e.g. "IT", remain unchanged. We apply the Porter Stem-       ranking visualization originate from keyword tag manipulation in-
mer method [19] over the resulting terms, in order to increase the     side the Query Box (Figure 2.B). As the user manipulates tags, se-
probability of matching for similar words, e.g. "robot", "robots"      lected keywords are immediately forwarded to the Ranking Model
and "robotics" all match the stem "robot". A document vector is        as ranking parameters. Selected tags are re-rendered by adding a
thus conformed by stemmed versions of content-bearing terms.           weight slider, a delete button on the right-upper corner – visible on
   Next, we generate a weighing scheme by computing TF-IDF             hover – and a specific background color determined by a qualita-
(term frequency inverse document frequency) for each term in a         tive palette (Figure 4). We chose Color Brewer’s [5] 9-class Set
document vector. The score is a statistical measure of how impor-      1 palette for background color encoding, as it allows the user to
tant the term is to a document in a collection. Therefore, the more    clearly distinguish tags from one another. When the user adjusts a
frequent a term is in a document and the fewer times it appears in     weight slider, the intensity of the tag’s background color changes
the corpora, the higher its score will be. Documents’ metadata are     accordingly (see Figure 4). We provide three possibilities for key-
extended with these weighted document vectors.                         word tag manipulation:
   To fill the Tag Box with representative keywords for the collec-
tion set, all document keywords are collected in a global keyword           • Addition: keyword tags in the Tag Box can be manually
set. Global keywords are sorted by document frequency (DF), i.e.              unpinned (Figure 4a), dragged with the mouse pointer and
the number of documents in which they appear, regardless of the               dropped into the Query Box (Figure 4b).
frequency within documents. To avoid overpopulating the Tag Box,
only terms with DF above certain threshold (by default 5) are taken         • Weight change: tags in the Query Box contain weight slid-
into account. Note that terms used to label keyword tags are actual           ers that can be tuned to assign a keyword a higher or lower
words and not plain stems. Scanning a summary of stemmed words                priority in the ranking.
would turn unintuitive for users. Thus, we keep a record of all term        • Deletion: tags can be removed from the Query Box and re-
variations matching each stem, in order to allow for reverse stem-            turned to their initial position in the Tag Box by clicking on
ming and pick one representative word as follows:                             the delete button.
1. if there is only one term for a stem, use it to label the tag,
2. if a stem has two variants, one in lower case and the other in         As the document ranking is generated, the Document List is re-
upper case or capitalized, use it in lower case,                       sorted in descending order by overall score and stacked bars appear
3. use a term that ends in ’ion’, ’ment’, ’ism’ or ’ty’,               in the Ranking View, horizontally aligned to each list item. Items
4. use a term matching the stem,                                       with null score are hidden, shrinking the list size to fit only ranked
5. use the shortest term.                                              items. The total width of stacked bars indicates the overall score of
   To feed the document hint (Figure 3), uRank attaches a list of      a document and bar fragments represent the individual contribution
bearing documents to each global keyword. For the case of co-          of keywords to the overall score. Bar colors match the color en-
occurrence hints (Figure 2.A), keyword co-occurrences with a max-      coding for selected keywords in the Query Box, enabling the user
imum word distance of 5 and a minimum of 4 repetitions are recorded.   to make an immediate association between keyword tags and bars.
                                                                       Missing colored bars in a stack denote the absence of certain words
3.2 Ranking Documents On The Fly                                       in the document surrogate. Additionally, each item in the Docu-
  In theory, recommendations returned by a RS are already ranked       ment List contains two types of numeric indicators: the first one
by relevance. However, in practice the lack of control thereof could   - in a dark circle - shows the position of a document in the rank-
hinder user engagement if the GUI does not provide enough ratio-       ing while the adjacent colored number reveals how many positions
                                                                            sum of each individual term score s(td ). The collection D is next
                                                                            sorted in descending order by overall score with the quicksort al-
                                                                            gorithm and ranking positions are assigned in such way that docu-
                                                                            ments with equivalent overall score share the same place.
                                                                              Alternatively, users can rank documents by maximum score, in
                                                                            which case S(d) = max(s(td )).

Figure 5: Ranking visualization in maximum score mode: doc-                 3.3    Details on Demand
uments are ranked basing on the keyword with highest score                     Once the user identifies documents that seem worth further in-
                                                                            specting, the next logical step is to drill down one by one to deter-
the document has shifted, encoding upward and downward shifts in            mine whether the initial assumption holds. The Document Viewer
green and red, respectively. This graphic representation attempts to        (Figure 2.D) gives access to textual content - title and snippet -
help the user concentrate only on useful items and ignore the rest by       and available metadata for a particular document. Query terms are
bringing likely relevant items to the top, pushing less relevant ones       highlighted in the text following the same color coding for tags in
to the bottom and hiding those that seem completely irrelevant.             the Query Box and stacked bars in the Ranking View. These sim-
   uRank allows for choosing between two ranking modes: overall             ple visual cues pop out from their surroundings, enabling the user
score (selected by default) and maximum score (Figure 5). In max-           to preattentively recognize keywords in the text and perceive their
imum score mode, the Ranking View renders a single color-coded              general context prior to conscious reading.
bar per document in order to emphasize its most influential key-            3.4 Change-Awareness Cues and Attention Guid-
word. Finally, resetting the visualization clears the Query Box and             ance
the Ranking View, relocating all selected keywords in the Tag Box
                                                                               We favor the use of animation to convey ranking-state transitions
and restoring the Document List to its initial state.
                                                                            rather than abrupt static changes. Animated transitions are inher-
3.2.2 Document Ranking Computation                                          ently intuitive and engaging, giving a perception of causality and
                                                                            intentionality [8]. As the user manipulates a keyword tag in the
   Quick content exploration in uRank depends on its ability to
                                                                            Query Box, uRank raises change awareness in the following way:
readily re-sort documents according to changing information needs.
As the user manipulates keyword tags and builds queries from a                 • Keyword tags are re-styled as explained in section 3.2.1. If
subset of the global keyword collection, uRank computes docu-                    the tag is removed from the Query Box, animation is used
ments scores to arrange them accordingly in a document ranking.                  to shift the tag to its original position in the Tag Box at a
We assume that some keywords are more important to the topic                     perceivable pace.
model than others and allow the user to assign weights to them.
   Document scores are relevance measures for documents respect                • Depending on the type of ranking transition, the Document
to a query. As titles and snippets are the only content available for            List shows a specific effect:
retrieved document surrogates, these scores are computed with a
                                                                                     – If the ranking is generated for the first time, an accordion-
term-frequency scheme. Term distribution schemes are rather ade-
                                                                                       like upward animation shows that its nature has changed
quate for long or full texts and are hence out of our scope. Boolean
                                                                                       from a plain list to a ranked one.
models have the disadvantages that they not only consider every
term equally important but also produce absolute values that pre-                    – If the ranking is updated, list items shift to their new
clude document ranking.                                                                positions at a perceptible pace.
   The Ranking Model implements a vector space model to com-                         – If ranking positions remain unchanged, the list stays
pute document-query similarity using the document vectors previ-                       static as a soft top-down shadow crosses it.
ously generated during keyword extraction (section 3.1.2). Nonethe-
less, a single relevance measure like cosine similarity alone is not           • Green or red shading effects are applied on the left side of list
enough to convey query-term contribution, given that the best over-              items moving up or down, respectively, disappearing after a
all matches are not necessarily the ones in which most query terms               few seconds.
are found [7, 14]. The contribution that each query term adds to the           • Stacked bars grow from left to right revealing new overall
document score should be clear in the visual representation, in or-              and keyword scores.
der to give the user a transparent explanation as to why a document
ranks in a higher position than another. Therefore, we break down              The user can closely follow how particular documents shift po-
the cosine similarity computation and obtain individual scores for          sitions by clicking on the watch - eye-shaped - icon. The item is
each query term, which are then added up as an overall relevance            brought to focus as it is surrounded with a slightly darker shadow
score.                                                                      and the title is underlined. Also, watched documents remain on top
   Given a document collection D and a set of weighted query terms          of the z-index during list animations, avoiding being overlaid by
T , such that ∀t ∈ T : 0 ≤ wt ≤ 1; the relevance score for term t in        other list items.
document vector d ∈ D respect to query terms T is calculated as                The same principle of softening changes is applied to re-direct
follows:                                                                    user attention when a document is selected in the Ranking View.
                                                                            The selected row is highlighted and the snippet appears in the Doc-
                                  t f id f (td ) × wt                       ument Viewer in a fade-in fashion. Animated transitions for ranking-
                       s(td ) =              √        ,                     state changes and document selection help the user intuitively switch
                                     |d| × |T |
                                                                            contexts, either from the Tag Box to the Document List and Rank-
where t f id f (td ) is the tf-idf score for term t in document d and |d|   ing View, or from the latter to the Document Viewer. As Baldonado
is the norm for vector d.                                                   [30] states in the rule of attention management, perceptual tech-
   The overall score of a document S(d) is then computed as the             niques lead the users attention to the right view at the right time.
4. EVALUATION
                                                                         Table 1: Participants found uRank reduces workload in all di-
   The goal of this study was to find out how people responded           mensions
when working with our tool. In the current scenario, recommenda-
tions were delivered in a sorted list with no relevance information.                   Dimension          F(1, 23)   p          ε
Since we aim at supporting exploratory search, we hypothesized                         Mental Demand        19.70    p < .05    .10
                                                                                       Physical Demand      14.52    p < .01    .07
that participants using uRank would be able to gather items faster                     Temporal Demand       7.72    p < .05    .05
and with less difficulty, compared to a typical list-based UI.                         Performance          11.80    p < .01    .10
   We were also interested in observing the effect of exposing users                   Effort               48.60    p < .001   .22
                                                                                       Frustration          15.12    p < .01    .07
to different sizes of recommendation lists. We expected that with-                     Workload             35.25    p < .01    .20
out this relevance information, a slight growth in the number of
displayed items would frustrate the user at the moment of deciding
which items should be inspected in detail in the first place. For ex-
ample, finding the 5 most relevant items in a list of ten appears as     13 male, between 22 and 37 years old). We recruited mainly gradu-
an easy task, whereas accomplishing the same task but searching          ate and post-graduate students from the medical and computer sci-
a list of forty or sixty items would be more time consuming and          ence domains. None of them is majoring in the topic areas selected
entail a heavier cognitive load.                                         for the study.

4.1 Method                                                               4.1.2     Procedure
   We conducted an offline evaluation where participants performed          A session started with an introductory video explaining the func-
four iterations of the same task with either uRank (U) or a baseline     tionality of uRank. Each participant got exactly the same instruc-
list-based UI (L) with usual Web browser tools, e.g. Control+F           tions. Then came a short training session with a different topic
keyword search. Furthermore, we introduced two variations in the         (Renaissance) to let participants familiarize with uRank and the
number of items to which participants were exposed, namely 30            baseline the tool. At the beginning of the first task, the system
or 60 items. Therefore, the study was structured in a 2 x 2 re-          showed a short text describing the topic and the task to be fulfilled.
peated measures design with tool and #items as independent vari-         After reading the text, the participant pressed "Start" to redirect the
ables, each with 2 levels (tool = U/L, #items = 30/60).                  browser to the corresponding UI. At this point, the first sub-task be-
   The general task goal was to "find 5 relevant items" for the given    gan and the internal timer initiated the count, without disturbing the
topic and all participants had to perform one task for each com-         user. The goal of the task and the reference text were shown in the
bination of the independent variables, i.e. U-30, U-60, L-30 and         upper part of the UI. Participants were able to select items by click-
L-60.                                                                    ing on the star-shaped icon and inspect them later on a drop-down
   To counterbalance learning effects, we chose four different top-      list. In a pilot study, we realized that asking for the "most" rele-
ics covering a spectrum of cultural, technical and scientific content:   vant items made the experiment overly long, as participants tried to
Women in workforce (WW), Robots (Ro), Augmented Reality (AR)             carefully inspect their selections (particularly in the L condition).
and Circular economy (CE). Thus, topic was treated as a random           Then we decided to limit the duration of the three tasks to 3m, 3m
variable within constraints. We corroborated that participants were      and 6m respectively. The time constraint was not a hard deadline.
not knowledgeable in any of the topics. All variable combinations        During the study the experimenter reminded the participants when
were randomized and assigned with balanced Latin Square.                 the allotted time was almost over, but did not force them to aban-
   Wikipedia provides a well-defined article for each topic men-         don. The sub-task concluded when the participant clicked on the
tioned above. We considered them as fictional initial exploration        "Finished" button. The UI alerted participants when attempting to
scenarios but participants were not exposed to them. Instead, we         finish without collecting 5 items, but allowed them to continue if
simulated a situation in which the user has already received a list of   desired. The second sub-task started immediately afterward and
recommendations while exploring certain Wikipedia page. There-           once the whole task was completed they had to fill the NASA TLX
fore, we prepared static recommendation lists of 60 and 30 items         questionnaire. The procedure for the remaining tasks was repeated
for each topic and used them as inputs for uRank throughout the dif-     following the same steps. Finally, participants were asked about
ferent participants and tasks. To create each list, portions of texts    comments and preferences.
from the original Wikipedia articles were fed to the F-RS, which
preprocessed the text and created queries that were forwarded to a       4.2     Results
number of content providers. The result was a sorted merged list of         Workload: A two-way repeated measures ANOVA with tool and
items from each provider with no scoring information.                    #items as independent variables revealed a significant effect of tool
   Each task comprised three sub-tasks (Q1, Q2 and Q3) that con-         on perceived workload F(1,23)=35.254, p < .01, ε = .18. Bonfer-
sisted in finding the 5 most relevant items for a given piece of text.   roni post-hoc tests showed significantly lower workload when us-
In Q1 and Q2 we targeted a specific search and the supplied text         ing uRank (p < .001). We also assessed the effect for each work-
was limited to two or three words. Q3 was designed as a broad-           load dimension. Again, ANOVA showed a significant effect of tool
search sub-task where we provided an entire paragraph extracted          in all of them, as shown in Table 1. (#items) did not have a major
from the Wikipedia page and the users had to decide themselves           effect in any case.
which keywords described the topic better. The motivation to ask
for the "most relevant" documents was to avoid careless selection.       Completion Time: We analyzed the task overall completion time,
   We recorded completion time for every individual sub-task and         as well as completion times for each sub-task. A two-way re-
for the overall task. To measure workload, we leveraged a 7-likert       peated measures ANOVA revealed a significant effect of tool on
scale NASA TLX questionnaire covering six workload dimensions.           overall completion time F(1,23)=4.94, p < .05, ε = .02. This ef-
                                                                         fect disappeared in a Bonferroni post-hoc comparison. For Q1
4.1.1 Participants                                                       and Q2 ANOVA reported no significant effect, but it showed a
  Twenty four (24) participants took part in the study (11 female,       significant effect of tool on completion time for Q3, F(1,23)=6.2,
                                                                        broad search tasks. Participants commented feeling alleviated when
                                                                        they could browse the ranking and instantly discard document that
                                                                        did not contain any word of interest. As a remark, the majority
                                                                        claimed that a few tasks were too hard to solve, especially without
                                                                        the uRank, because sometimes the terms of interest barely appeared
                                                                        in the titles or were perceived as too ambiguous, e.g. "participa-
                                                                        tion of women in the workforce". Also dealing with technical texts
                                                                        about unfamiliar topics was posed some strain. For example, two
                                                                        participants had to momentarily interrupt exploration to look up a
                                                                        word they did not understand. In spite of that, workload was sig-
                                                                        nificantly lower with uRank across all dimensions.

                                                                        Completion Time: We expected people would be faster perform-
                                                                        ing with uRank than using a browser-based keyword filter, but com-
                                                                        pletion times were not significantly different. The closing interview
Figure 6: Results. (Left) Workload interaction lines show that
                                                                        revealed that participants who had collected five items before the
uRank is significantly less demanding. (Right) Boxplots of time
                                                                        due time exploited the remainder to refine their selections. In gen-
completion for each condition show a regularity towards using
                                                                        eral, participants understood that they were not expected to perform
all available time.
                                                                        perfectly but to do their best in the given time. However, we noticed
                                                                        that a small group that behaved in the opposite way reported feel-
Table 2: Similarities in collections gathered during evaluation         ing more pressed by time and not satisfied with their performance.
      Sub-task   Comparison     WW    Ro    AR    CE    All topics      The general tendency is reflected in the significant result on tem-
                 U vs L         .55   .79   .58   .74      .66          poral demand: participants felt significantly less pressed to finish
      Q1         U-30 vs U-60   .71   .83   .94   .67      .79          while performing with uRank. The lower subjective time pressure
                 L-30 vs L-60   .58   .83   .56   .56      .63
                 U vs L         .70   .86   .84   .86      .81
                                                                        suggests that participants indeed had more time to analyze their
      Q2         U-30 vs U-60   .84   .89   .90   .93      .89          choices with uRank.
                 L-30 vs L-60   .82   .74   .81   .87      .81
                 U vs L         .75   .72   .75   .63      .72             Performance: The results suggest that our tool produces more
      Q3         U-30 vs U-60   .64   .88   .75   .62      .72
                 L-30 vs U-60   .59   .66   .63   .33      .55          uniform results as the number of items to which users are exposed
                                                                        grows. Nevertheless, the proportion of matching documents in list-
                                                                        generated collections – two out of three – still conveys a moderate
                                                                        consensus.
p < .05, ε = .05. As a surprise, post-hoc comparison showed that           The decrease in consensus for broad search task respect to tar-
using uRank took significantly longer.                                  geted search could be explained by the inherent variability across
                                                                        participants at the moment of chosing the terms of interest for a
Performance: Relevance is a rather subjective measure. Hence,           given text larger than a couple of words.
instead of contrasting item selections to some ground truth, we an-
alyzed “consensus” in item selection.
   We aggregated the collections gathered under the manipulated         5.    CONCLUSION
conditions and computed cosine similarity across UI (tool), data           We introduced a visual tool for exploration, discovery and anal-
set size (#items), topic (WW, Ro, AR, and CE) and sub-task (Q1,         ysis of recommendations of textual documents. uRank aims to help
Q2 and Q3).                                                             the user: i) quickly overview the most important topics in a col-
   Overall, there was a high similarity between collections pro-        lection of documents, ii) interact with content to describe a topic
duced with uRank and those obtained with the list-based UI across       in terms of keywords, and iii) on-the-fly reorganize the documents
all sub-tasks. Choices regarding relevant documents matched three       along keywords describing a topic.
out of four times (M = .73, SD = .1).                                      This paper presented the reasoning line for the visual and inter-
   Table 2 shows that collections produced with our tool (U) for the    active design and a comparative user study where we evaluated the
two variations of #items (U-30 vs U-60) turned highly similar re-       experience of collecting relevant items to topics of interest. Par-
gardless of topic and sub-task (M = .8, SD = .12, with a minimum        ticipants found it significantly more relaxing to work with uRank,
of .62). Comparisons for a typical list-based UI (L) displaying 30      and most of them wanted to start actively using it in their scientific
and 60 items (L-30 vs L-60) denote greater diversity (M = .67,          endeavors (e.g., report or paper writing). Yet, selecting the right
SD = .16, with a minimum of .33) in item selection.                     keywords to describe a topic is not a trivial task, as it showed on
   Interestingly, similarity values tend to decrease for broad search   the performance results of the evaluation. We will continue to ex-
task (Q3) (M = .66, SD = .13) respect to targeted search (Q1 and        plore different techniques, e.g. topic modeling, in the near future.
Q2) (M = .77, SD = .13).                                                As for the GUI, we will work further on solving scaling problems,
                                                                        for example when the amount of tags in the Tag Box or the length of
4.3    Discussion                                                       the result list becomes unmanageable. Moreover, we will leverage
   The study results shed a light on how people interact with a tool    the document selections collected during the evaluation as feedback
like uRank. For each hypotheses we contrasted the results with the      to improve recommendations, closing the interactive loop with the
subjective feedback acquired after evaluation.                          RS as depicted in Figure 1.

Workload: The results support our hypothesis that uRank incurs
in lower workload during exploratory search, both in specific and
6. REFERENCES                                                             Recommendation. Proceedings of the 19th international
 [1] E. Brill. A simple rule-based part of speech tagger. In              conference on Intelligent User Interfaces - IUI ’14, pages
     Proceedings of the workshop on Speech and Natural                    235–240, 2014.
     Language - HLT ’91, page 112, Morristown, NJ, USA, 1992.        [19] M. Porter. An algorithm for suffix stripping. Program:
     Association for Computational Linguistics.                           electronic library and information systems, 40(3):211–218,
 [2] A. Cockburn, A. Karlson, and B. B. Bederson. A review of             1980.
     overview+detail, zooming, and focus+context interfaces.         [20] R. Rao and S. K. Card. The table lens. In Proceedings of the
     ACM Computing Surveys, 41(1):1–31, 2008.                             SIGCHI conference on Human factors in computing systems
 [3] E. Gomez-Nieto, F. San Roman, P. Pagliosa, W. Casaca, E. S.          celebrating interdependence - CHI ’94, number April, pages
     Helou, M. C. F. de Oliveira, and L. G. Nonato. Similarity            318–322, New York, New York, USA, 1994. ACM Press.
     preserving snippet-based visualization of web search results.   [21] H. Reiterer, G. Tullius, and T. Mann. Insyder: a
     IEEE transactions on visualization and computer graphics,            content-based visual-information-seeking system for the
     20(3):457–70, Mar. 2014.                                             web. International Journal on Digital Libraries, pages
 [4] S. Gratzl, A. Lex, N. Gehlenborg, H. Pfister, and M. Streit.         25–41, 2005.
     LineUp: visual analysis of multi-attribute rankings. IEEE       [22] F. Ricci, L. Rokach, and B. Shapira. Introduction to
     transactions on visualization and computer graphics,                 recommender systems handbook. In F. Ricci, L. Rokach,
     19(12):2277–86, Dec. 2013.                                           B. Shapira, and P. B. Kantor, editors, Recommender Systems
 [5] M. Harrower and C. A. Brewer. ColorBrewer.org: An Online             Handbook, pages 1–35. Springer, 2011.
     Tool for Selecting Colour Schemes for Maps. The                 [23] C. Seifert, J. Jurgovsky, and M. Granitzer. FacetScape : A
     Cartographic Journal, 40(1):27–37, June 2003.                        Visualization for Exploring the Search Space. In Proceedings
 [6] M. Hearst. User interfaces for search. Modern Information            18th International Conference on Information Visualzation,
     Retrieval, 2011.                                                     pages 94–101, 2014.
 [7] M. A. Hearst. TileBars: Visualization of Term Distribution      [24] G. Shani and N. Tractinsky. Displaying relevance scores for
     Information in Full Text Information Access. In Proceedings          search results. Proceedings of the 36th international ACM
     of the SIGCHI conference on Human factors in computing               SIGIR13, pages 901–904, 2013.
     systems - CHI ’95, pages 59–66. ACM Press, 1995.                [25] A. Spoerri. Coordinated Views and Tight Coupling to
 [8] J. Heer and G. Robertson. Animated transitions in statistical        Support Meta Searching. In Proceedings of Second
     data graphics. IEEE transactions on visualization and                International Conference on Coordinated and Multiple Views
     computer graphics, 13(6):1240–7, 2007.                               in Exploratory Visualization, pages 39–48, 2004.
 [9] O. Hoeber and X. D. Yang. The Visual Exploration of Web         [26] M. Streit and N. Gehlenborg. Bar charts and box plots.
     Search Results Using HotMap. In Proceedings of the                   Nature methods, 11(2):117, Feb. 2014.
     Information Visualization (IV06), 2006.                         [27] K. Swearingen and R. Sinha. Beyond Algorithms Beyond
[10] A. Kangasrääsiö, D. Gowacka, and S. Kaski. Improving                 Algorithms : An HCI Perspective on Recommender
     Controllability and Predictability of Interactive                    Systems. ACM SIGIR 2001 Workshop on Recommender
     Recommendation Interfaces for Exploratory Search. In IUI,            Systems (2001), pages 1–11, 2001.
     pages 247–251, 2015.                                            [28] N. Tintarev and J. Masthoff. Evaluating the effectiveness of
[11] J. Kay. Scrutable adaptation: Because we can and must. In            explanations for recommender systems. User Modeling and
     Lecture Notes in Computer Science (including subseries               User-Adapted Interaction, 22(4-5):399–439, Oct. 2012.
     Lecture Notes in Artificial Intelligence and Lecture Notes in   [29] K. Verbert, D. Parra, P. Brusilovsky, and E. Duval.
     Bioinformatics), volume 4018 LNCS, pages 11–19, 2006.                Visualizing recommendations to support exploration,
[12] B. P. Knijnenburg, S. Bostandjiev, J. O’Donovan, and                 transparency and controllability. Proceedings of the 2013
     A. Kobsa. Inspectability and control in social recommenders.         international conference on Intelligent user interfaces - IUI
     Proceedings of the 6th ACM conference on Recommender                 ’13, page 351, 2013.
     systems - RecSys ’12, page 43, 2012.                            [30] M. Q. Wang Baldonado, A. Woodruff, and A. Kuchinsky.
[13] B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu,            Guidelines for using multiple views in information
     and C. Newell. Explaining the user experience of                     visualization. Proceedings of the working conference on
     recommender systems. User Modelling and User-Adapted                 Advanced visual interfaces (AVI), pages 110–119, 2000.
     Interaction, 22(4-5):441–504, 2012.                             [31] M. O. Ward, G. Grinstein, and D. A. Keim. Interactive Data
[14] C. D. Manning. Introduction to Information Retrieval.                Visualization: Foundations, Techniques, and Application. A.
     Cambridge University Press, 2008.                                    K. Peters, Ltd, May 2010.
[15] G. Marchionini. Exploratory search: from finding to             [32] C. Ware. Information visualization: perception for design.
     understanding. Communications of the ACM, 49(4):41, 2006.            Elsevier, 3rd edition, 2013.
[16] T. N. Nguyen and J. Zhang. A novel visualization model for      [33] K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted
     web search results. IEEE transactions on visualization and           metadata for image search and browsing. Proceedings of the
     computer graphics, 12(5):981–8, 2006.                                conference on Human factors in computing systems - CHI
[17] J. O’Donovan, B. Smyth, B. Gretarsson, S. Bostandjiev, and           ’03, pages 401–408, 2003.
     T. Höllerer. PeerChooser: Visual Interactive
     Recommendation. Proceeding of the twenty-sixth annual
     SIGCHI conference on Human factors in computing systems,
     pages 1085–1088, 2008.
[18] D. Parra, P. Brusilovsky, and C. Trattner. See what you want
     to see: Visual User-Driven Approach for Hybrid