uRank: Exploring Document Recommendations through an Interactive User-Driven Approach Cecilia di Sciascio Vedran Sabol Eduardo Veas Know-Center GmbH Know-Center GmbH Know-Center GmbH Graz, Austria Graz, Austria Graz, Austria cdisciascio@know- vsabol@know-center.at eveas@know-center.at center.at ABSTRACT becoming familiar with the underlying topic. Advanced search en- Whenever we gather or organize knowledge, the task of search- gines and recommender systems (RS) have grown as the preferred ing inevitably takes precedence. As exploration unfolds, it be- solution for contextualized search by narrowing down the number comes cumbersome to reorganize resources along new interests, of entries that need to be explored at a time. as any new search brings new results. Despite huge advances in Traditional information retrieval (IR) systems strongly depend retrieval and recommender systems from the algorithmic point of on precise user-generated queries that should be iteratively refor- view, many real-world interfaces have remained largely unchanged: mulated in order to express evolving information needs. However, results appear in an infinite list ordered by relevance with respect to formulating queries has proven to be more complicated for humans the current query. We introduce uRank, a user-driven visual tool for than plainly recognizing information visually [6]. Hence, the com- exploration and discovery of textual document recommendations. bination of IR with machine learning and HCI techniques led to a It includes a view summarizing the content of the recommenda- shift towards – mostly Web-based – browsing search strategies that tion set, combined with interactive methods for understanding, re- rely on on-the-fly selections, navigation and trial-and-error [15]. As fining and reorganizing documents on-the-fly as information needs users manipulate data through visual elements, they are able to drill evolve. We provide a formal experiment showing that uRank users down and find patterns, relations or different levels of detail that can browse the document collection and efficiently gather items rel- would otherwise remain invisible to the bare eye [32]. Moreover, evant to particular topics of interest with significantly lower cogni- well-designed interactive interfaces can effectively address infor- tive load compared to traditional list-based representations. mation overload issues that may arise due to limited attention span and human capacity to absorb information at once. Sometimes RS can be more limited than IR systems if they do General Terms not tackle trust factors that may hinder user engagement in explo- Theory ration. As Swearingen et al. [27] pointed out in their seminal work, the RS has to persuade the user to try the recommended items. To fulfill such challenge not only the recommendation algorithm has to Keywords fetch items effectively, but also the user interfaces must deliver rec- recommending interface, exploratory search, visual analytics, sense- ommendations in a way that they can be compared and explained making [22]. The willingness to provide feedback is directly related to the overall perception and satisfaction the user has of the RS [13]. Ex- planatory interfaces increase confidence in the system (trust) by 1. INTRODUCTION explaining how the system works (transparency) [28] and allowing With the advent of electronic archival, seeking for information users to tell the system when it is wrong (scrutability) [11]. Hence, occupies a large portion of our daily productive time. Thus, the skill to warrant increased user involvement the RS has to justify recom- to find and organize the right information has become paramount. mendations and let the user customize their generation. Exploratory search is part of a discovery process in which the user In this work we focus mainly on transparency and controllability often becomes familiar with new terminology in order to filter out aspects and, to some extent, on predictability as well. uRank 1 is irrelevant content and spot potentially interesting items. For exam- and interactive user-driven tool that supports exploration of textual ple, after inspecting a few documents related to robots, sub-topics document recommendations through: like human-robot interaction or virtual environments could attract i) an automatically generated overview of the document collec- the user’s attention. Exploration requires careful inspection of at tion depicted as augmented keyword tags, least a few titles and abstracts, when not full documents, before ii) a drag-and-drop-based mechanism for refining search inter- ests, and iii) a transparent stacked-bar representation to convey document ranking and scores, plus query term contribution. A user study Permission to make digital or hard copies of all or part of this work for revealed that uRank incurs in lower workload compared to a tradi- personal or classroom use is granted without fee provided that copies are tional list representation. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 1 http://eexcessvideos.know-center.tugraz.at/ IntRS ’15 Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. urank-demo.mp4 2. RELATED WORK 2.1 Search Result Visualization Federated RS Modern search interfaces assist user exploration in a variety of Directory ways. For example, query expansion techniques like Insyder’s Vi- Listing sual Query [21] address the query formulation problem by leverag- Knowledge Management ing stored related concepts to help the user extend the initial query. System Tile-based visualizations like TileBars [7] and HotMap [9] make an Feedback efficient use of space to convey relative frequency of query terms through – gray or color – shaded squares, and in the case of the former, also their distribution within documents and relative docu- ment length. This paradigm aims to foster analytical understanding User of Boolean-type queries, hence they do not yield any rank or rele- Collection vance score. All these approaches rely on the user being able to ex- press precise information needs and do not support browsing-based Interactive process Automati c process discovery within the already available results. Faceted search interfaces allow for organizing or filtering items throughout orthogonal categories. Despite being particularly useful Figure 1: uRank visual analytics workflow showing automatic for inspecting enriched multimedia catalogs [33, 23], they require (black arrows) and interactive mechanisms (red arrows) metadata categories and hardly support topic-wise exploration. Rankings conveying document relevance have been discouraged among recommendations, users and tags in a transparent manner, as opaque an under-informative [7]. However, the advantage of while SetFusion emphasizes controllability over a hybrid RS. Rank- ranked lists is that users know where to start their search for po- ings are not transparent though, as there is no explanation as to how tentially relevant documents and that they employ a familiar for- they were obtained. Kangasraasio et al. [10] highlighted that not mat of presentation. A study [24] suggests that: i) users prefer only allowing the user to influence the RS is important, but also bars over numbers or the absence of graphical explanations of rel- adding predictability features that produce an effect of causality evance scores, and ii) relevance scores encourage users to explore for user actions. beyond the first two results. As a tradeoff, lists imply a sequential With uRank we intend to enhance predictability through docu- search through consecutive items and only a small subset is visible ment hint previews (section 3.1.1), allow the user to control the at a given time, thus they are mostly apt for sets no larger than a ranking by choosing keywords as parameters, and support under- few tens of documents. Focus+Context and Overview+Detail tech- standing by means of a transparent graphic representation for scores niques [20, 9] sometimes help overcome this limitation while alter- (section 3.2). native layouts like RankSpiral’s [25] rolled list can scale up to hun- dreds and maybe thousands of documents. Other approaches such as WebSearchViz [16] and ProjSnippet [3] propose complementary 3. URANK VISUAL ANALYTICS visualizations to ordered lists, yet unintuitive context switching is uRank is a visual analytics approach that combines lightweight a potential problem when analyzing different aspects of the same text analytics and an augmented ranked list to assist in exploratory document. search of textual recommendations. The Web-based implementa- Although ranked list are not a novelty, our approach attempts tion is fed with textual document surrogates by a federated RS (F- to leverage the advantages provided by lists; i.e. user familiarity, RS) connected to several sources. A keyword extraction module and augment them with stacked-bar charts to convey document rel- analyzes all titles and abstracts and outputs a set of representative evance and query term contribution in a transparent manner. Insy- terms for the whole collection and for each document. The UI al- der’s bar graph [21] is an example of augmented ranked lists that lows users to explore the collection content and refine information displays document an keyword relevance relevance with disjoint needs in terms of topic keywords. As the user selects terms of in- horizontal bars aligned to separate baselines. Although layered bar terest, the ranking is updated, bringing related documents closer to dispositions are appropriate for visualizing distribution of values in the top and pushing down the less relevant ones. Figure 1 outlines each category across items, comparison of overall quantities and the workflow between automatic and interactive components. the contribution of each category to the totals is better supported uRank’s layout is arranged in a multiview fashion that displays by stacked-bar configurations [26]. Additionally, we rely on inter- different perspectives of the document recommendations. Follow- action as the key to provide controllability over the ranking criteria ing Baldonados’s guidelines [30], we decided to limit the number of and hence support browsing-based exploratory search. views to keep display space requirements relatively low. Therefore, LineUp [4] has proven the simplicity and usefulness of stacked instead of multiple overlapping views, we favor a reduced number bars to represent multi-attribute rankings. Despite targeting data of of perspectives fitting in any laptop or desktop screen. The GUI different nature – uRanks’s domain is rather unstructured with no dynamically scales to the window size, remaining undistorted up to measurable attributes –, the visual technique itself served as inspi- a screen width of approximately 770 px. ration for our work. The GUI presents the data in juxtaposed views that add to a semantic Overview+Detail scheme [2] with three levels of gran- 2.2 Recommending Interfaces ularity: Collection overview. The Tag Box (Figure 2.A) sum- In recent years, considerable efforts have been invested into lever- marizes the entire collection through by representing keywords as aging the power of social RS through visual interfaces [17, 12]. As augmented tags. Documents overview. The Document List shows for textual content, TalkExplorer [29] and SetFusion [18] are ex- titles augmented with ranking information and the Ranking View amples of interfaces for exploration of conference talk recommen- displays stacked bar charts depicting document relevance scores dations. The former is mostly focused in depicting relationships (Figure 2.C and D, respectively). Together they represent mini- Figure 2: uRank User Interface displaying a ranking of documents for the keywords “gender”, “wage” and “gap”. The user has selected the third item in the list. A. The Tag Box presents a keyword-based summary, B. the Query Box contains the selected keywords that originated the current ranking state, C. the Document List presents a list with augmented document titles. D. the Ranking View renders stacked bars indicating relevance scores, E. the Document Viewer shows the title, year and snippet of the selected document with augmented keywords, and F. the Ranking Controls wrap buttons for ranking settings. mal views of documents where they can be differentiated by title in terms of keywords and their relative frequencies. Nevertheless, a or position in the ranking and compared at a glance basing on the bag-of-words representation per se does not supply further details presence of certain keywords of interest. Document detailed view. about how a keyword relates to other keywords or documents. We For a document selected in the list, the Document Viewer (Figure bridge this gap by augmenting tags with two compact visual hints 2.E) displays the title and snippet with color-augmented keywords. – visible on mouse over – that reveal additional information: i) co- These views can be modified through interaction with the Rank- occurence respect to other keywords, and i) a preview of the effect ing Controls (Figure 2.F) and the Query Box (Figure 2.B). The for- of selecting the keyword. mer provides controls to reset the ranking or switch ranking modes The document hint (Figure 3) consists in a pie chart that con- between overall and maximum score. The latter is the container veys the proportion of documents in which the keyword appears. where the user drops keywords tags to trigger changes in the rank- A tooltip indicates the exact quantity and percentage. Upon click- ing visualization. ing on the document hint, unrelated documents are dimmed so that documents containing the keyword remain in focus Even unranked 3.1 Collection Overview documents become discretely visible at the bottom of the Docu- uRank automatically extracts keywords from the recommended ment List. This hint provides certain predictability regarding the documents with a twofold purpose: i) give an overview of the col- effect of selecting a keyword, in terms of which ranked items will lection, and ii) provide manipulable elements that serve as input for change their scores and which documents will be added to the rank- an on-the-fly ranking mechanism (see section 3.2). ing. Summarizing the collection in a few representative terms allows The co-occurrence hint (Figure 2.A) shows the number of fre- the user to scan the recommendations and grasp the general topic quently co-occurring keywords in a red circle. Moving the mouse at a glance, before even reading any of them. This is particularly pointer over it brings co-occurring terms to focus by dimming the important in the context of collections brought by RS, where the others in the background. Clicking on the visual hint locks the user is normally not directly generating the queries that feed the view so that the user can hover over co-occurring keywords, which search engine. shows a tooltip stating the amount of co-occurrences between the hovered and the selected keyword. This hint supports the user in 3.1.1 Inspecting the Collection finding possible key phrases and sub-topics within the collection. The Tag Box provides a summary of the recommended texts as a whole by presenting extracted keywords as tags. Keywords tags 3.1.2 Mining a collection of documents are arranged in a bag-of-words fashion, encoding relative frequen- The aforementioned interactive features are supported by a com- cies through position and intensity (Figure 2.A). The descending bination of well-known text-mining techniques that extend the rec- ordering conveys document frequency (DF) while five levels of ommended documents with document vectors and provide mean- blue shading help the user identify groups of keywords in the same ingful terms to populate the Tag Box. frequency range. Redundant coding is intentional and aims at max- Document vectors ideally include only content-bearing terms like imizing distinctiveness among items in the keyword set [32]. nouns and frequent adjectives – appearing in at least 50% of the col- At first glance, the Tag Box gives an outline of the covered topic lection –, hence it is not enough to just rely on a list of stop words a b Figure 4: a) Keyword tag before being dropped in Tag Box. b) Keyword tag after dropped: weight slider and delete button added, background color changed according to a categorical color scale. Weight sliders have been tuned. nale for the recommendations and features for shaping the recom- mendation criteria. Hence, one of uRank’s major features is the user-driven mechanism for re-organizing documents as information Figure 3: Document hints show a preview of documents con- needs evolve, along with its visually transparent logic. taining the hovered keyword, even if they are currently un- ranked 3.2.1 Ranking Visualization The ranking-based visualization consists of a list of document ti- tles (Figure 2.C) and stacked bar charts (Figure 2.D) depicting rank to remove meaningless terms. Firstly, we perform a part-of-speech and relevance scores for documents and keywords within them. tagging (POS tagging) [1] step to identify words that meet our cri- Document titles are initially listed following the order in which they teria, i.e. common and proper nouns and adjectives. Filtering out were supplied by the F-RS. non-frequent adjectives requires an extra step. Then, plural nouns Interactions with the view are the means for users to directly are singularized, proper nouns are kept capitalized and terms in up- or indirectly manipulate the data [31]. In uRank, changes in the per case, e.g. "IT", remain unchanged. We apply the Porter Stem- ranking visualization originate from keyword tag manipulation in- mer method [19] over the resulting terms, in order to increase the side the Query Box (Figure 2.B). As the user manipulates tags, se- probability of matching for similar words, e.g. "robot", "robots" lected keywords are immediately forwarded to the Ranking Model and "robotics" all match the stem "robot". A document vector is as ranking parameters. Selected tags are re-rendered by adding a thus conformed by stemmed versions of content-bearing terms. weight slider, a delete button on the right-upper corner – visible on Next, we generate a weighing scheme by computing TF-IDF hover – and a specific background color determined by a qualita- (term frequency inverse document frequency) for each term in a tive palette (Figure 4). We chose Color Brewer’s [5] 9-class Set document vector. The score is a statistical measure of how impor- 1 palette for background color encoding, as it allows the user to tant the term is to a document in a collection. Therefore, the more clearly distinguish tags from one another. When the user adjusts a frequent a term is in a document and the fewer times it appears in weight slider, the intensity of the tag’s background color changes the corpora, the higher its score will be. Documents’ metadata are accordingly (see Figure 4). We provide three possibilities for key- extended with these weighted document vectors. word tag manipulation: To fill the Tag Box with representative keywords for the collec- tion set, all document keywords are collected in a global keyword • Addition: keyword tags in the Tag Box can be manually set. Global keywords are sorted by document frequency (DF), i.e. unpinned (Figure 4a), dragged with the mouse pointer and the number of documents in which they appear, regardless of the dropped into the Query Box (Figure 4b). frequency within documents. To avoid overpopulating the Tag Box, only terms with DF above certain threshold (by default 5) are taken • Weight change: tags in the Query Box contain weight slid- into account. Note that terms used to label keyword tags are actual ers that can be tuned to assign a keyword a higher or lower words and not plain stems. Scanning a summary of stemmed words priority in the ranking. would turn unintuitive for users. Thus, we keep a record of all term • Deletion: tags can be removed from the Query Box and re- variations matching each stem, in order to allow for reverse stem- turned to their initial position in the Tag Box by clicking on ming and pick one representative word as follows: the delete button. 1. if there is only one term for a stem, use it to label the tag, 2. if a stem has two variants, one in lower case and the other in As the document ranking is generated, the Document List is re- upper case or capitalized, use it in lower case, sorted in descending order by overall score and stacked bars appear 3. use a term that ends in ’ion’, ’ment’, ’ism’ or ’ty’, in the Ranking View, horizontally aligned to each list item. Items 4. use a term matching the stem, with null score are hidden, shrinking the list size to fit only ranked 5. use the shortest term. items. The total width of stacked bars indicates the overall score of To feed the document hint (Figure 3), uRank attaches a list of a document and bar fragments represent the individual contribution bearing documents to each global keyword. For the case of co- of keywords to the overall score. Bar colors match the color en- occurrence hints (Figure 2.A), keyword co-occurrences with a max- coding for selected keywords in the Query Box, enabling the user imum word distance of 5 and a minimum of 4 repetitions are recorded. to make an immediate association between keyword tags and bars. Missing colored bars in a stack denote the absence of certain words 3.2 Ranking Documents On The Fly in the document surrogate. Additionally, each item in the Docu- In theory, recommendations returned by a RS are already ranked ment List contains two types of numeric indicators: the first one by relevance. However, in practice the lack of control thereof could - in a dark circle - shows the position of a document in the rank- hinder user engagement if the GUI does not provide enough ratio- ing while the adjacent colored number reveals how many positions sum of each individual term score s(td ). The collection D is next sorted in descending order by overall score with the quicksort al- gorithm and ranking positions are assigned in such way that docu- ments with equivalent overall score share the same place. Alternatively, users can rank documents by maximum score, in which case S(d) = max(s(td )). Figure 5: Ranking visualization in maximum score mode: doc- 3.3 Details on Demand uments are ranked basing on the keyword with highest score Once the user identifies documents that seem worth further in- specting, the next logical step is to drill down one by one to deter- the document has shifted, encoding upward and downward shifts in mine whether the initial assumption holds. The Document Viewer green and red, respectively. This graphic representation attempts to (Figure 2.D) gives access to textual content - title and snippet - help the user concentrate only on useful items and ignore the rest by and available metadata for a particular document. Query terms are bringing likely relevant items to the top, pushing less relevant ones highlighted in the text following the same color coding for tags in to the bottom and hiding those that seem completely irrelevant. the Query Box and stacked bars in the Ranking View. These sim- uRank allows for choosing between two ranking modes: overall ple visual cues pop out from their surroundings, enabling the user score (selected by default) and maximum score (Figure 5). In max- to preattentively recognize keywords in the text and perceive their imum score mode, the Ranking View renders a single color-coded general context prior to conscious reading. bar per document in order to emphasize its most influential key- 3.4 Change-Awareness Cues and Attention Guid- word. Finally, resetting the visualization clears the Query Box and ance the Ranking View, relocating all selected keywords in the Tag Box We favor the use of animation to convey ranking-state transitions and restoring the Document List to its initial state. rather than abrupt static changes. Animated transitions are inher- 3.2.2 Document Ranking Computation ently intuitive and engaging, giving a perception of causality and intentionality [8]. As the user manipulates a keyword tag in the Quick content exploration in uRank depends on its ability to Query Box, uRank raises change awareness in the following way: readily re-sort documents according to changing information needs. As the user manipulates keyword tags and builds queries from a • Keyword tags are re-styled as explained in section 3.2.1. If subset of the global keyword collection, uRank computes docu- the tag is removed from the Query Box, animation is used ments scores to arrange them accordingly in a document ranking. to shift the tag to its original position in the Tag Box at a We assume that some keywords are more important to the topic perceivable pace. model than others and allow the user to assign weights to them. Document scores are relevance measures for documents respect • Depending on the type of ranking transition, the Document to a query. As titles and snippets are the only content available for List shows a specific effect: retrieved document surrogates, these scores are computed with a – If the ranking is generated for the first time, an accordion- term-frequency scheme. Term distribution schemes are rather ade- like upward animation shows that its nature has changed quate for long or full texts and are hence out of our scope. Boolean from a plain list to a ranked one. models have the disadvantages that they not only consider every term equally important but also produce absolute values that pre- – If the ranking is updated, list items shift to their new clude document ranking. positions at a perceptible pace. The Ranking Model implements a vector space model to com- – If ranking positions remain unchanged, the list stays pute document-query similarity using the document vectors previ- static as a soft top-down shadow crosses it. ously generated during keyword extraction (section 3.1.2). Nonethe- less, a single relevance measure like cosine similarity alone is not • Green or red shading effects are applied on the left side of list enough to convey query-term contribution, given that the best over- items moving up or down, respectively, disappearing after a all matches are not necessarily the ones in which most query terms few seconds. are found [7, 14]. The contribution that each query term adds to the • Stacked bars grow from left to right revealing new overall document score should be clear in the visual representation, in or- and keyword scores. der to give the user a transparent explanation as to why a document ranks in a higher position than another. Therefore, we break down The user can closely follow how particular documents shift po- the cosine similarity computation and obtain individual scores for sitions by clicking on the watch - eye-shaped - icon. The item is each query term, which are then added up as an overall relevance brought to focus as it is surrounded with a slightly darker shadow score. and the title is underlined. Also, watched documents remain on top Given a document collection D and a set of weighted query terms of the z-index during list animations, avoiding being overlaid by T , such that ∀t ∈ T : 0 ≤ wt ≤ 1; the relevance score for term t in other list items. document vector d ∈ D respect to query terms T is calculated as The same principle of softening changes is applied to re-direct follows: user attention when a document is selected in the Ranking View. The selected row is highlighted and the snippet appears in the Doc- t f id f (td ) × wt ument Viewer in a fade-in fashion. Animated transitions for ranking- s(td ) = √ , state changes and document selection help the user intuitively switch |d| × |T | contexts, either from the Tag Box to the Document List and Rank- where t f id f (td ) is the tf-idf score for term t in document d and |d| ing View, or from the latter to the Document Viewer. As Baldonado is the norm for vector d. [30] states in the rule of attention management, perceptual tech- The overall score of a document S(d) is then computed as the niques lead the users attention to the right view at the right time. 4. EVALUATION Table 1: Participants found uRank reduces workload in all di- The goal of this study was to find out how people responded mensions when working with our tool. In the current scenario, recommenda- tions were delivered in a sorted list with no relevance information. Dimension F(1, 23) p ε Since we aim at supporting exploratory search, we hypothesized Mental Demand 19.70 p < .05 .10 Physical Demand 14.52 p < .01 .07 that participants using uRank would be able to gather items faster Temporal Demand 7.72 p < .05 .05 and with less difficulty, compared to a typical list-based UI. Performance 11.80 p < .01 .10 We were also interested in observing the effect of exposing users Effort 48.60 p < .001 .22 Frustration 15.12 p < .01 .07 to different sizes of recommendation lists. We expected that with- Workload 35.25 p < .01 .20 out this relevance information, a slight growth in the number of displayed items would frustrate the user at the moment of deciding which items should be inspected in detail in the first place. For ex- ample, finding the 5 most relevant items in a list of ten appears as 13 male, between 22 and 37 years old). We recruited mainly gradu- an easy task, whereas accomplishing the same task but searching ate and post-graduate students from the medical and computer sci- a list of forty or sixty items would be more time consuming and ence domains. None of them is majoring in the topic areas selected entail a heavier cognitive load. for the study. 4.1 Method 4.1.2 Procedure We conducted an offline evaluation where participants performed A session started with an introductory video explaining the func- four iterations of the same task with either uRank (U) or a baseline tionality of uRank. Each participant got exactly the same instruc- list-based UI (L) with usual Web browser tools, e.g. Control+F tions. Then came a short training session with a different topic keyword search. Furthermore, we introduced two variations in the (Renaissance) to let participants familiarize with uRank and the number of items to which participants were exposed, namely 30 baseline the tool. At the beginning of the first task, the system or 60 items. Therefore, the study was structured in a 2 x 2 re- showed a short text describing the topic and the task to be fulfilled. peated measures design with tool and #items as independent vari- After reading the text, the participant pressed "Start" to redirect the ables, each with 2 levels (tool = U/L, #items = 30/60). browser to the corresponding UI. At this point, the first sub-task be- The general task goal was to "find 5 relevant items" for the given gan and the internal timer initiated the count, without disturbing the topic and all participants had to perform one task for each com- user. The goal of the task and the reference text were shown in the bination of the independent variables, i.e. U-30, U-60, L-30 and upper part of the UI. Participants were able to select items by click- L-60. ing on the star-shaped icon and inspect them later on a drop-down To counterbalance learning effects, we chose four different top- list. In a pilot study, we realized that asking for the "most" rele- ics covering a spectrum of cultural, technical and scientific content: vant items made the experiment overly long, as participants tried to Women in workforce (WW), Robots (Ro), Augmented Reality (AR) carefully inspect their selections (particularly in the L condition). and Circular economy (CE). Thus, topic was treated as a random Then we decided to limit the duration of the three tasks to 3m, 3m variable within constraints. We corroborated that participants were and 6m respectively. The time constraint was not a hard deadline. not knowledgeable in any of the topics. All variable combinations During the study the experimenter reminded the participants when were randomized and assigned with balanced Latin Square. the allotted time was almost over, but did not force them to aban- Wikipedia provides a well-defined article for each topic men- don. The sub-task concluded when the participant clicked on the tioned above. We considered them as fictional initial exploration "Finished" button. The UI alerted participants when attempting to scenarios but participants were not exposed to them. Instead, we finish without collecting 5 items, but allowed them to continue if simulated a situation in which the user has already received a list of desired. The second sub-task started immediately afterward and recommendations while exploring certain Wikipedia page. There- once the whole task was completed they had to fill the NASA TLX fore, we prepared static recommendation lists of 60 and 30 items questionnaire. The procedure for the remaining tasks was repeated for each topic and used them as inputs for uRank throughout the dif- following the same steps. Finally, participants were asked about ferent participants and tasks. To create each list, portions of texts comments and preferences. from the original Wikipedia articles were fed to the F-RS, which preprocessed the text and created queries that were forwarded to a 4.2 Results number of content providers. The result was a sorted merged list of Workload: A two-way repeated measures ANOVA with tool and items from each provider with no scoring information. #items as independent variables revealed a significant effect of tool Each task comprised three sub-tasks (Q1, Q2 and Q3) that con- on perceived workload F(1,23)=35.254, p < .01, ε = .18. Bonfer- sisted in finding the 5 most relevant items for a given piece of text. roni post-hoc tests showed significantly lower workload when us- In Q1 and Q2 we targeted a specific search and the supplied text ing uRank (p < .001). We also assessed the effect for each work- was limited to two or three words. Q3 was designed as a broad- load dimension. Again, ANOVA showed a significant effect of tool search sub-task where we provided an entire paragraph extracted in all of them, as shown in Table 1. (#items) did not have a major from the Wikipedia page and the users had to decide themselves effect in any case. which keywords described the topic better. The motivation to ask for the "most relevant" documents was to avoid careless selection. Completion Time: We analyzed the task overall completion time, We recorded completion time for every individual sub-task and as well as completion times for each sub-task. A two-way re- for the overall task. To measure workload, we leveraged a 7-likert peated measures ANOVA revealed a significant effect of tool on scale NASA TLX questionnaire covering six workload dimensions. overall completion time F(1,23)=4.94, p < .05, ε = .02. This ef- fect disappeared in a Bonferroni post-hoc comparison. For Q1 4.1.1 Participants and Q2 ANOVA reported no significant effect, but it showed a Twenty four (24) participants took part in the study (11 female, significant effect of tool on completion time for Q3, F(1,23)=6.2, broad search tasks. Participants commented feeling alleviated when they could browse the ranking and instantly discard document that did not contain any word of interest. As a remark, the majority claimed that a few tasks were too hard to solve, especially without the uRank, because sometimes the terms of interest barely appeared in the titles or were perceived as too ambiguous, e.g. "participa- tion of women in the workforce". Also dealing with technical texts about unfamiliar topics was posed some strain. For example, two participants had to momentarily interrupt exploration to look up a word they did not understand. In spite of that, workload was sig- nificantly lower with uRank across all dimensions. Completion Time: We expected people would be faster perform- ing with uRank than using a browser-based keyword filter, but com- pletion times were not significantly different. The closing interview Figure 6: Results. (Left) Workload interaction lines show that revealed that participants who had collected five items before the uRank is significantly less demanding. (Right) Boxplots of time due time exploited the remainder to refine their selections. In gen- completion for each condition show a regularity towards using eral, participants understood that they were not expected to perform all available time. perfectly but to do their best in the given time. However, we noticed that a small group that behaved in the opposite way reported feel- Table 2: Similarities in collections gathered during evaluation ing more pressed by time and not satisfied with their performance. Sub-task Comparison WW Ro AR CE All topics The general tendency is reflected in the significant result on tem- U vs L .55 .79 .58 .74 .66 poral demand: participants felt significantly less pressed to finish Q1 U-30 vs U-60 .71 .83 .94 .67 .79 while performing with uRank. The lower subjective time pressure L-30 vs L-60 .58 .83 .56 .56 .63 U vs L .70 .86 .84 .86 .81 suggests that participants indeed had more time to analyze their Q2 U-30 vs U-60 .84 .89 .90 .93 .89 choices with uRank. L-30 vs L-60 .82 .74 .81 .87 .81 U vs L .75 .72 .75 .63 .72 Performance: The results suggest that our tool produces more Q3 U-30 vs U-60 .64 .88 .75 .62 .72 L-30 vs U-60 .59 .66 .63 .33 .55 uniform results as the number of items to which users are exposed grows. Nevertheless, the proportion of matching documents in list- generated collections – two out of three – still conveys a moderate consensus. p < .05, ε = .05. As a surprise, post-hoc comparison showed that The decrease in consensus for broad search task respect to tar- using uRank took significantly longer. geted search could be explained by the inherent variability across participants at the moment of chosing the terms of interest for a Performance: Relevance is a rather subjective measure. Hence, given text larger than a couple of words. instead of contrasting item selections to some ground truth, we an- alyzed “consensus” in item selection. We aggregated the collections gathered under the manipulated 5. CONCLUSION conditions and computed cosine similarity across UI (tool), data We introduced a visual tool for exploration, discovery and anal- set size (#items), topic (WW, Ro, AR, and CE) and sub-task (Q1, ysis of recommendations of textual documents. uRank aims to help Q2 and Q3). the user: i) quickly overview the most important topics in a col- Overall, there was a high similarity between collections pro- lection of documents, ii) interact with content to describe a topic duced with uRank and those obtained with the list-based UI across in terms of keywords, and iii) on-the-fly reorganize the documents all sub-tasks. Choices regarding relevant documents matched three along keywords describing a topic. out of four times (M = .73, SD = .1). This paper presented the reasoning line for the visual and inter- Table 2 shows that collections produced with our tool (U) for the active design and a comparative user study where we evaluated the two variations of #items (U-30 vs U-60) turned highly similar re- experience of collecting relevant items to topics of interest. Par- gardless of topic and sub-task (M = .8, SD = .12, with a minimum ticipants found it significantly more relaxing to work with uRank, of .62). Comparisons for a typical list-based UI (L) displaying 30 and most of them wanted to start actively using it in their scientific and 60 items (L-30 vs L-60) denote greater diversity (M = .67, endeavors (e.g., report or paper writing). Yet, selecting the right SD = .16, with a minimum of .33) in item selection. keywords to describe a topic is not a trivial task, as it showed on Interestingly, similarity values tend to decrease for broad search the performance results of the evaluation. We will continue to ex- task (Q3) (M = .66, SD = .13) respect to targeted search (Q1 and plore different techniques, e.g. topic modeling, in the near future. Q2) (M = .77, SD = .13). As for the GUI, we will work further on solving scaling problems, for example when the amount of tags in the Tag Box or the length of 4.3 Discussion the result list becomes unmanageable. Moreover, we will leverage The study results shed a light on how people interact with a tool the document selections collected during the evaluation as feedback like uRank. For each hypotheses we contrasted the results with the to improve recommendations, closing the interactive loop with the subjective feedback acquired after evaluation. RS as depicted in Figure 1. Workload: The results support our hypothesis that uRank incurs in lower workload during exploratory search, both in specific and 6. REFERENCES Recommendation. Proceedings of the 19th international [1] E. Brill. A simple rule-based part of speech tagger. In conference on Intelligent User Interfaces - IUI ’14, pages Proceedings of the workshop on Speech and Natural 235–240, 2014. Language - HLT ’91, page 112, Morristown, NJ, USA, 1992. [19] M. Porter. An algorithm for suffix stripping. Program: Association for Computational Linguistics. electronic library and information systems, 40(3):211–218, [2] A. Cockburn, A. Karlson, and B. B. Bederson. A review of 1980. overview+detail, zooming, and focus+context interfaces. [20] R. Rao and S. K. Card. The table lens. In Proceedings of the ACM Computing Surveys, 41(1):1–31, 2008. SIGCHI conference on Human factors in computing systems [3] E. Gomez-Nieto, F. San Roman, P. Pagliosa, W. Casaca, E. S. celebrating interdependence - CHI ’94, number April, pages Helou, M. C. F. de Oliveira, and L. G. Nonato. Similarity 318–322, New York, New York, USA, 1994. ACM Press. preserving snippet-based visualization of web search results. [21] H. Reiterer, G. Tullius, and T. Mann. Insyder: a IEEE transactions on visualization and computer graphics, content-based visual-information-seeking system for the 20(3):457–70, Mar. 2014. web. International Journal on Digital Libraries, pages [4] S. Gratzl, A. Lex, N. Gehlenborg, H. Pfister, and M. Streit. 25–41, 2005. LineUp: visual analysis of multi-attribute rankings. IEEE [22] F. Ricci, L. Rokach, and B. Shapira. Introduction to transactions on visualization and computer graphics, recommender systems handbook. In F. Ricci, L. Rokach, 19(12):2277–86, Dec. 2013. B. Shapira, and P. B. Kantor, editors, Recommender Systems [5] M. Harrower and C. A. Brewer. ColorBrewer.org: An Online Handbook, pages 1–35. Springer, 2011. Tool for Selecting Colour Schemes for Maps. The [23] C. Seifert, J. Jurgovsky, and M. Granitzer. FacetScape : A Cartographic Journal, 40(1):27–37, June 2003. Visualization for Exploring the Search Space. In Proceedings [6] M. Hearst. User interfaces for search. Modern Information 18th International Conference on Information Visualzation, Retrieval, 2011. pages 94–101, 2014. [7] M. A. Hearst. TileBars: Visualization of Term Distribution [24] G. Shani and N. Tractinsky. Displaying relevance scores for Information in Full Text Information Access. In Proceedings search results. Proceedings of the 36th international ACM of the SIGCHI conference on Human factors in computing SIGIR13, pages 901–904, 2013. systems - CHI ’95, pages 59–66. ACM Press, 1995. [25] A. Spoerri. Coordinated Views and Tight Coupling to [8] J. Heer and G. Robertson. Animated transitions in statistical Support Meta Searching. In Proceedings of Second data graphics. IEEE transactions on visualization and International Conference on Coordinated and Multiple Views computer graphics, 13(6):1240–7, 2007. in Exploratory Visualization, pages 39–48, 2004. [9] O. Hoeber and X. D. Yang. The Visual Exploration of Web [26] M. Streit and N. Gehlenborg. Bar charts and box plots. Search Results Using HotMap. In Proceedings of the Nature methods, 11(2):117, Feb. 2014. Information Visualization (IV06), 2006. [27] K. Swearingen and R. Sinha. Beyond Algorithms Beyond [10] A. Kangasrääsiö, D. Gowacka, and S. Kaski. Improving Algorithms : An HCI Perspective on Recommender Controllability and Predictability of Interactive Systems. ACM SIGIR 2001 Workshop on Recommender Recommendation Interfaces for Exploratory Search. In IUI, Systems (2001), pages 1–11, 2001. pages 247–251, 2015. [28] N. Tintarev and J. Masthoff. Evaluating the effectiveness of [11] J. Kay. Scrutable adaptation: Because we can and must. In explanations for recommender systems. User Modeling and Lecture Notes in Computer Science (including subseries User-Adapted Interaction, 22(4-5):399–439, Oct. 2012. Lecture Notes in Artificial Intelligence and Lecture Notes in [29] K. Verbert, D. Parra, P. Brusilovsky, and E. Duval. Bioinformatics), volume 4018 LNCS, pages 11–19, 2006. Visualizing recommendations to support exploration, [12] B. P. Knijnenburg, S. Bostandjiev, J. O’Donovan, and transparency and controllability. Proceedings of the 2013 A. Kobsa. Inspectability and control in social recommenders. international conference on Intelligent user interfaces - IUI Proceedings of the 6th ACM conference on Recommender ’13, page 351, 2013. systems - RecSys ’12, page 43, 2012. [30] M. Q. Wang Baldonado, A. Woodruff, and A. Kuchinsky. [13] B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, Guidelines for using multiple views in information and C. Newell. Explaining the user experience of visualization. Proceedings of the working conference on recommender systems. User Modelling and User-Adapted Advanced visual interfaces (AVI), pages 110–119, 2000. Interaction, 22(4-5):441–504, 2012. [31] M. O. Ward, G. Grinstein, and D. A. Keim. Interactive Data [14] C. D. Manning. Introduction to Information Retrieval. Visualization: Foundations, Techniques, and Application. A. Cambridge University Press, 2008. K. Peters, Ltd, May 2010. [15] G. Marchionini. Exploratory search: from finding to [32] C. Ware. Information visualization: perception for design. understanding. Communications of the ACM, 49(4):41, 2006. Elsevier, 3rd edition, 2013. [16] T. N. Nguyen and J. Zhang. A novel visualization model for [33] K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted web search results. IEEE transactions on visualization and metadata for image search and browsing. Proceedings of the computer graphics, 12(5):981–8, 2006. conference on Human factors in computing systems - CHI [17] J. O’Donovan, B. Smyth, B. Gretarsson, S. Bostandjiev, and ’03, pages 401–408, 2003. T. Höllerer. PeerChooser: Visual Interactive Recommendation. Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages 1085–1088, 2008. [18] D. Parra, P. Brusilovsky, and C. Trattner. See what you want to see: Visual User-Driven Approach for Hybrid