1. Introduction and Background

B. Rahdari); peterb@pitt.edu (P. Brusilovsky)

PaperExplorer: Personalized Exploratory Search for Conference Proceedings

Behnam Rahdari

Peter Brusilovsky

0 0 School of Computing and Information, University of Pittsburgh , 135 North Bellefield Avenue Pittsburgh, PA 15260 , USA

2021

000 0 0001

This paper presents our attempt to create an exploratory search system, PaperExplorer, for a historic archive of conference proceedings. PaperExplorer uses concept extraction, knowledge graphs, and user-controlled recommendation to assist users with various levels of domain expertise in their information needs.

eol>Exploratory Search Knowledge Graph Information Exploration Intelligent interface

1. Introduction and Background Exploratory search systems form an increasingly popu

lar category of information access and exploration tools. 1.2. Controllability These systems creatively combined search, browsing, and information analysis steps shifting user eforts from re- User controllability has been recognized as a valuable call (formulating a query) to recognition (i.e., selecting component of advanced information access interfaces. a link) and helping them to gradually learn more about The ideas of controllability were made popular by a the explored domain [ 1 ]. stream of work on user-controllable recommender sys

In this paper we present our attempt to augment the tems [ 4 ]. However the value of extended user control set of search systems focused on conference proceedings has been also demonstrated in the area of exploratory with a personalized exploratory search system PaperEx- search. plorer 1. We hope that PaperExplorer ability to support in- For example, NameSieve [ 5 ] presented a summary of formation discovery, learning-while-searching, and per- search results in the form of entity clouds, which a consonalization could help a broader set of users to benefit trollable filtering and exploration of results. PeopleExfrom the assembled collection of conference proceedings. plorer [ 6 ] ofered users an option to re-sort people search results based on multiple user-related factors. uRank [ 7 ] 1.1. Exploratory Search introduced a controllable interface for refining and reorganizing search result and SciNoon [ 8 ] simplifies the exploratory search process for scientific groups. of finding research publications related to a certain conference.

A number of real-life search tasks require a considerable amount of learning during the search process to achieve adequate results. These tasks are known as exploratory search tasks [ 2 ]. Since simple search systems are usually not eficient in supporting exploratory search tasks, a range of specialized systems have been developed and evaluated.

More recently, few projects in this area demonstrated that the efectiveness of exploratory search could be improved by using a personalized system, which builds a profile of user interests and adapts to the individual user [ 3 ]. The work presented in this paper investigates the ideas of profile-based exploratory search in the context

The idea to apply open user profiles (also known as open

user models) to better support personalized information access was among the early ideas explored in this field. Open user profiles allow users to examine and possibly change the content of their interest profiles, which are used to personalize their search or browsing process.

Since the open user profiles increase interactivity, transparency, and controllability of the information exploration process, their application was a good match to the nature of exploratory search. While first attempts to introduce “bag-of-words" open user profiles had mixed success [ 9 ], more recent work focused on semantic level user profiles demonstrated its potential for personalized exploratory search [ 3, 10 ].

We start the paper with the presentation of PaperExplorer interface and follow with the details on concept Personalized information exploration in PaperExplorer is centered around user interest profile [ 11 ] - a collection of concepts represented by keyphrases that express user interests. Unlike traditional search that requires users to specify all keyphrases in a query, PaperExplorer supports users in the process of gradual discovery and refinement of their interests. It also allows the users to control the importance of each keyphrase in recommending relevant results. PaperExplorer interface consists of the following main sections.

2.1. Instant Search Box The search box (Figure 1A) is the gateway to the system.

The instant search approach allows users to discover relevant keyphrases representing concepts of interest without a fully formulated query. When a user starts typing a query, a series of matching keyphrases appears helping the user to discover a concepts of interest (e.g., User Interfaces and User Modeling). When an item is selected from the list, it will automatically adds to the slider area (Figure 1C). at the same time, an updated list of search results will be presented to the user.

2.2. Recommended Keyphrases When at least one keyphrase is added to the user’s profile,

the system recommends five semantically similar concepts (shown as keyphrases) in the Similar keyphrases area of the interface (Figure 1B). Users can add recom- 2.4. Search Results mended keyphrases to their interest profiles by clicking on the plus button to the right of each keyphrase. As the extraction, knowledge graph organization, and recom- user’s profile grows and refines, the set of recommended mendation that enable the work of this interface. concepts is updated since the system recommends instances similar to all concepts in the user’s profile. Each recommended concept also provides users with a short 2. The Interface of PaperExplorer description of the concept. Clicking on the question mark button next to the add button, opens up a separate window containing the abstract of that concept’s Wikipedia entry.

2.3. Open User Profile The slider area (Figure 1C) displays the current user pro

ifle of interest. PaperExplorer implements a contentbased recommendation approach, which generates the list of recommended results (Figure 1D) using the profile. To support transparency and controllability of this process, the interest profile is visible and directly editable by the end users.

To build the profile the user can add relevant concepts represented by keyphrases as explained above as well as remove less relevant keyphrases (using the red x) as they discover more relevant concepts or explore diferent interests.

Sliders associated with each keyphrase enable users to control the relative importance of the represented concept compared to others in their profile, ranging from 1 (least important) to 10 (most important). The use of sliders for fine-tuning of user profile was motivated by keyword tuning approach in uRank [ 7 ], which was conifrmed as a user-friendly and eficient in an exploratory search context. All actions within the profile (adding, removing, or adjusting sliders) immediately afect the search results list.

As soon as the user adds the first keyphrase to the interest profile, a table of the 20 most relevant publications 3.1. Data Source and Keyphrase Extraction

We used the collection of proceedings from two main conferences (Hypertext and UMAP) as the main source of data to build the knowledge graph and extract the keyphrases. This collection covers all publications of these two conferences from 2008 to 2020. Using this dataset and the concept extraction explained below, we generated the knowledge graph covering 2023 publications. 14404 keyphrases were extracted from titles and abstracts of these publications.

We used TopicRank [ 12 ], a graph-based keyphrase extraction method to extract the initial set of candidate keyphrases from the title and abstract of the publications. We then used the Wikipedia API to filter all extracted keyphrases; only keyphrases with an entry in Wikipedia were kept in the knowledge graph. We further assign weight to each publication keyphrase pair using cosine similarity between the bags-of-words extracted from the Wikipedia page and the publications.

4. Profile-Based Search

is generated (Figure 1:D). The first column of the table visualizes the combined relevance between keyphrases in the user interest profile and each result. The colors in the stacked-bar (Figure 1:D1) are matched with the color of slider in the profile and the size and opacity of each bar expresses the relevance of the result to each profile keyphrase.

The second column of table lists the titles of relevant publications. Clicking on each title expands a window that holds the abstract of the paper. The mentioned keyphrases are highlighted with corresponding colors.

The opacity of the colors reflect the relevance of a keyphrase to the paper and the current value of slider for that keyphrase. To further assist the users, PaperExplorer underlines all available keyphrases in the text (both in title and abstract).

Hovering over the underlined portion of the text opens a popup window (Figure 1:D2) that enable user to (1) see the relevance of the keyphrase to the text in a form of a vertical bar-chart, (2) add the keyphrase directly to the interest profile, and (3) report the improper keyphrases to the administrator for removal.

The latter helps us to improve the quality of extracted keyphrases and eliminate the occasional errors in the process of extraction.

We deployed a two-phase search process to produce the most relevant results based on user interest profile. In the ifrst phase, a primary list of candidates is being selected from the graph and the second phase assure that the results are presented to the user in the right order based on their relevancy to the query. We describe these two phases in more details in the following.

Candidate selection: We used the Cypher Querying Language to generate the initial list of candidate publications. At each instance of user interaction with the system (e.g., adding/removing keyphrases or tuning the 3. The Knowledge Graph sliders), the system considers all publications connected to at least one of the concepts of interest in the user The knowledge graph consists of three main entities - profile. publications, authors, keyphrases and their relationships Reordering the results: After generating the list of can- extracted from our data set and hosted in a native graph didate results, the system rearranges the results in a way database Neo4j2. that the most relevant results appear at the top of the list.

Figure 2 presents the schematic representation of the In order to do that, first a complete list of keyphrases that knowledge graph. Authors are interconnected by the re- appear in the text (title and abstract) of each publication, lation Co-Author (based on co-authorship) and connected alongside with their relevancy score (weight) is being to papers by the relation Published. Papers connected to generated. Then for every keyphrase that exist in the keyphrases using the Has-Key relationship. The latter user interest profile, we multiplied its weight with the carries a weight that determines the strength of the rela- value of corresponding slider. Finally, the relevance score tionship between each keyphrase and the publication. is assigned to each candidate considering candidate’s similarity to each of profile concepts and the value of the sliders. PaperExplorer system has been deployed online and also demonstrated to several target users. The early results indicate that the success of the system to a considerable extent depends on the quality of keyphrase extraction. We are interested to collaborate with experts on keyphrase extraction to develop approaches optimized for exploratory search.

[1]

R. W.

White ,

Kules ,

S. M.

Drucker , et al., Supporting exploratory search , Communications of the ACM 49 ( 2006 ) 36 - 39 .

[2]

Marchionini , Exploratory search: From finding to understanding , Communications of the ACM 49 ( 2006 ) 41 - 46 .

[3]

Bakalov ,

König-Ries ,

Nauerz , M. Welsch, IntrospectiveViews: An interface for scrutinizing semantic user models , in: 18th International Conference on User Modeling, Adaptation, and Personalization , Springer, 2010 , pp. 219 - 230 .

[4]

B. P.

Knijnenburg ,

Bostandjiev , J. O'Donovan , A. Kobsa , Inspectability and control in social recommenders , in: 6th ACM Conference on Recommender Systems , 2012 , pp. 43 - 50 .

[5] J.-w. Ahn,

Brusilovsky ,

Grady ,

He ,

Florian , Semantic annotation based exploratory search for information analysts , Information Processing & Management 46 ( 2010 ) 383 - 402 .

[6]

Han ,

He ,

Jiang ,

Yue , Supporting exploratory people search: a study of factor transparency and user control , in: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, ACM , 2013 , pp. 449 - 458 .

[7] C. di Sciascio ,

Sabol ,

E. E.

Veas , Rank as you go: User-driven exploration of search results , in: 21st International Conference on Intelligent User Interfaces , 2016 , pp. 118 - 129 .

[8]

Nedumov ,

Babichev , I. Mashonsky,

Semina , Scinoon: Exploratory search system for scientific groups , in: IUI 2019 Workshop on Exploratory Search and Interactive Data Analytics , 2019 . URL: http://ceur-ws. org/ Vol- 2327 / IUI19WS-ESIDA-3.pdf .

[9] J.-w. Ahn,

Brusilovsky ,

Grady ,

He ,

S. Y.

Syn , Open user profiles for adaptive news systems: help or harm? , in: the 16th international conference on World Wide Web, WWW '07 , ACM , 2007 , pp. 11 - 20 .

[10]

Ruotsalo , G. Jacucci,

Kaski , Interactive faceted query suggestion for exploratory search: Wholesession efectiveness and interaction engagement , Journal of the Association for Information Science and Technology 71 ( 2020 ) 742 - 756 .

[11]

Rahdari ,

Brusilovsky ,

Babichenko , Personalizing information exploration with an open user model , in: 31st ACM Conference on Hypertext and Social Media (HT '20) , Association for Computing Machinery, New York, NY, USA, 2020 , p. 0 . doi: 10 .1145/3372923.3404797.

[12]

Bougouin ,

Boudin , B. Daille, TopicRank: Graph-based topic ranking for keyphrase extraction , in: Proceedings of the Sixth International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing , Nagoya, Japan, 2013 .