=Paper= {{Paper |id=Vol-2127/paper4-profs |storemode=property |title=Explainable IR for Personalizing Professional Search |pdfUrl=https://ceur-ws.org/Vol-2127/paper4-profs.pdf |volume=Vol-2127 |authors=Suzan Verberne |dblpUrl=https://dblp.org/rec/conf/sigir/Verberne18 }} ==Explainable IR for Personalizing Professional Search== https://ceur-ws.org/Vol-2127/paper4-profs.pdf
       Explainable IR for personalizing professional search

                                               Suzan Verberne
                               Leiden Institute of Advanced Computer Science
                                              Leiden University
                                        s.verberne@liacs.leidenuniv.nl




                                                        Abstract
                       In this position paper we establish the need for transparency in per-
                       sonalized professional search. We provide a brief overview of prior
                       work, identify the gaps, and list four research directions that need to
                       be explored to close these gaps. The central idea of our proposal is
                       the professional knowledge graph. Graphs are a natural and transpar-
                       ent means of representing knowledge. A graph-based search paradigm
                       enables and stimulates the exploratory search behaviour for complex in-
                       formation needs that are inevitable in professional work environments.

1     Introduction
Professional searchers, such as lawyers, policy officers, architects, and scientists, need to process increasing
amounts of documents to find relevant, complete, high quality, work-related information [4, 35]. Not being able
to find the needed information is a costly problem in our information-driven society in which the amount of
available information from diverse sources is amplifying (internet, digital libraries, internal collections).
   A problem of the general search paradigm when applied work-related search is that result ranking relies on
popularity of web pages: the more often a result is clicked for a given query, the higher it is ranked in future
searches [19]. However, information search by professionals is essentially different from generic web search in
three important aspects:
    • The search tasks of professionals are complex, i.e. highly-specific and typically recall-oriented: the searchers
      want to be sure that they have found all the relevant information [27, 21];
    • The searching is not limited to sending one query and clicking one result, but is often exploratory by
      nature [15], and includes browsing, analysing [26] and re-finding previously used information [36];
    • Each user has their own individual needs: not only interests, expertise and information needs differ per
      user, but also the perceived relevance of retrieved documents [40]. The search evolves on the searcher’s own
      knowledge.
   Because the information needs are highly specific and individual in professional search, the click data available
from other users is limited and irrelevant [17]. Hence, result ranking cannot depend on popularity.
   Thus, for effective professional search, a different approach is necessary. We argue that the search results
should not depend on a single query matched to the collection of documents, but should be centred around the
knowledge of the individual user, allowing to serve their highly specific information needs. This idea is based on
the classic model for information seeking by Dervin in which a search is motivated by the gap between what the
user already knows and what he wants to know [11, 25].

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: Joint Proceedings of the First International Workshop on Professional Search (ProfS2018); the Second Workshop on Knowledge
Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search
(DATA:SEARCH18). Co-located with SIGIR 2018, Ann Arbor, Michigan, USA – 12 July 2018, published at http://ceur-ws.org




                                                        35
   For a search engine to be centred around the knowledge of the user, a user profile must be created and utilized
for personalized ranking. User profiling and personalization have been addressed extensively in IR research, but
barely in the context of professional search. The reason is that transparency is essential in work-related search:
professional users do not want to have the feeling they lose control over the search process because the ranking
of the search results is not stable or not predictable.

Our position
The lack of methods for transparent personalized professional search is a gap that should be addressed in IR
research. We argue that it is time to change the classic query-based paradigm of information
retrieval and move towards environments that allow users to explore their own knowledge, iden-
tify the knowledge gap, explore the surrounding content and finding the hooks where the new
information should be attached.
   For that purpose we propose the concept of the professional knowledge graph, an automatically deduced knowl-
edge graph of terms and documents that are relevant to the individual user. Graphs are a natural and transparent
means of representing knowledge [6]. A graph-based search paradigm enables and stimulates the exploratory
search behaviour for complex information needs that are inevitable in professional work environments [15].

   In the remainder of this paper we first define the aims and objectives for transparent personalization in
professional search (Section 2). In Section 3 we give an outline of prior and related work. In Section 4 we outline
the research topics that need to be addressed in order to meet the aims and objectives. We conclude our paper
with recommendations in Section 5.

2      Future aims and objectives
Successful personalized professional search relies on transparent and explainable IR: opening up the black box of
the search algorithm and making the user’s knowledge the central component of the search experience. To that
end, three research challenges need to be addressed:

    1. to create a human and machine understandable representation of the knowledge of the user. Methods should
       be developed to deduce the user’s professional knowledge graph from his searching and reading history.

    2. to utilize the user’s knowledge graph for more effective information retrieval. Methods should be developed
       to utilize the information in the graph by an existing retrieval system to better rank the relevant documents
       for the user.

    3. to do this in a transparent way.

Figure 1 illustrates how we envision the professional knowledge graph in a search engine interface.

3      Prior and related work
User profiling in domain-specific search. Approaches to user profiling and personalization typically learn
user preferences by collecting queries and clicked documents [28]. A rich user profile can be learned by extracting
prominent terms from the clicked documents and storing them in a term profile [43, 41]. The term profile can
then be used for re-ranking search results [29], for query disambiguation [42], query expansion [48], or query
suggestion [22, 46]. Often, the extracted information is linked to a reference ontology [39, 10].
   Although all these works report an improvement of personalization over the non-personalized baseline, the
actual implementation of personalization strategies in search environments is limited: on average, only 11.7% of
Google Web Search results show differences due to personalization [13]. This is because users are wary when it
comes to personalization. Privacy-preserving personalization is an important societal topic [32, 20]. As such, a
crucial step in the development of privacy-secure systems is to make the system transparent and explainable [16].
Recently, explainable methods have found their way to the field of recommendation and search [2, 7]. It is
important for users to have insight in the data that is stored by the search engine [47] and to understand the
influence of their personal data on the search results. Transparent/Explainable IR was also addressed as a
discussion topic during the Third Strategic Workshop on Information Retrieval [8], indicating its importance for
the research community.




                                                     36
Figure 1: A personal graph-based search interface with a query field, the professional knowledge graphs, and
relevant documents connected to the knowledge graph
   Knowledge graphs in IR. The use of knowledge graphs for text processing and information retrieval has
gained attention from the research community in the past years [1, 3, 12]. Knowledge graphs have been shown to
be especially helpful in exploratory search [37, 38], and to model the semantic relations between documents [34].
When knowledge graphs are combined with search logs they give insight in the user’s facts and beliefs of the
search topic [14]. Most previous works in graph-based search use an external knowledge graph covering all
domain knowledge. A graph representing the knowledge and interests of one user, is much smaller than a graph
representing the complete index of a search engine [5] and can be stored locally (client-side), if privacy regulations
require it.
   Data for user profiling in domain-specific search. An important gap for learning and evaluating
user profiles for professional search is that there are no data sets available that contain explicit descriptions of
information needs and background knowledge, together with search activity data and relevance judgments.
   An important example was set by the iSearch data set [24]: this collection contains 65 personal information
needs (topics) described by 23 physics scientists. The iSearch dataset is unique in size and richness of the topics;
it provides a valuable test bed for domain specific search. However, for experiments on user profiling the iSearch
data lacks an important component: user interaction data corresponding to the topic, i.e. issued queries and
clicked documents in a search engine.

4      Research proposal
In this section we propose a line of research that addresses the need for transparent personalized search in
professional contexts. The research line consists of four steps, which will each be addressed in the following
subsections:

    1. Data collection;

    2. Methods for constructing professional knowledge graphs;

    3. Methods for transparent personalization;

    4. Evaluation protocol for transparent personalized search.




                                                     37
Figure 2: Small part of a graph, with terms (t1,t2) and documents (d1,d2). This example was extracted from
the iSearch data with information needs of physics scientists.
4.1   Data collection
The first necessary step will be to collect personal information needs (topics) of professionals and link them to in-
dividual user interaction data. Like with the iSearch data, the topics can be collected by means of semi-structured
interviews and questionnaires with users of a professional search engine who agree to provide information needs
and give consent to storage of their interactions with the search engine. These users would be asked to describe
an information need, their background knowledge on the topic and the context of the work task (cf. [24]).
   The search logs of the participants should then be collected from the back-end of the search engine. The
search logs consist of all issued queries and accessed documents, together with timestamps. The data could be
analysed semi-automatically, to split the search behaviour in search stages [15] and to map the information needs
to the actual search behaviour.

4.2   Methods for constructing the professional knowledge graphs
For graph construction, the main challenge is selecting those terms that constitute a good (informative) user
profile. Terms could be selected in three ways: keyword extraction from clicked documents, named entity
extraction from clicked documents and term extraction from queries. Documents, terms and user interactions
(clicks, reads) can then be stored as nodes in a heterogeneous weighted graph. The edges between nodes
might represent the similarity between two documents [34], the similarity between terms [31, 30], and the
representativeness of a term for a document (tf-idf weight). Figure 2 illustrates a professional knowledge graph
with an excerpt of a graph with two terms and two documents as nodes, and weights on the relations between
the nodes.
   One additional challenge in storing the user profile is that there will be change in information needs over
the time (gradually or suddenly, because of diverging professional interests). The risk is that users will end up
searching their own filter bubble. Therefore, it is important to balance the exploitation of the user profile and
the exploration of new directions.

4.3   Methods for transparent personalization
Effectively utilizing knowledge graphs from sparse user data for effective information finding is probably the
most challenging research direction of the four.
   In pilot experiments we explored how the professional knowledge graph can be used for better ranking the
retrieved documents given a user query using in a two-stage retrieval method [45] (thus, implementing the
professional knowledge graph in the current classic query-based IR model). Given a user query, the first step was
retrieval of the 1000 most relevant documents according to the default ranking algorithm in the search engine.
The professional knowledge graph was then utilized to re-rank (2nd stage) the 1000 documents, resulting in a
personalized ranking. The goal in the re-ranking (personalization) step was to estimate the personal relevance
of the retrieved documents, based on the knowledge in the graph. We did this by temporarily adding each




                                                    38
candidate document to the user’s graph and computing their centrality. This is challenging in a heterogeneous
weighted graph with multiple types of nodes, edges and weights. We tackled this challenge by building on
methods for combining multiple node characteristics in one metric [33] and implement them in a learning-to-
rank framework [23, 9]. We obtained a small but significant improvement over the non-personalized baseline.
   Future research with professional knowledge graphs should diverge from the classic IR model. Completely
new methods need to be developed for (1) browsing the professional knowledge graph, (2) assisting the user in
identifying their knowledge gap, (3) assessing the relevance of documents in the heterogeneous graph.

4.4     Evaluation protocol for transparent personalized search
The efficacy of the professional knowledge graph for personalized ranking could be evaluated in two ways: 1)
with a simulation using log data, and 2) with users.
   For the data-centric evaluation, historical user queries and relevance assessments can be deduced from click
data [19] to set up interaction simulations [46] in order to measure the effect of personalized ranking compared
to the original, non-personalized ranking of documents.
   For the user-centric evaluation, a demo interface needs to be developed in which the user can view his
professional knowledge graph and see the effect of the graph content on the document ranking. In a within-
subject setting, the classic view of the search engine (control setting) can then be compared with the personalized
search engine (experimental setting). Outcome measures should be: (1) how long do the users take to fulfil the
information need [18], to be measured using server-side logging measure; and (2) user satisfaction, do be measured
using a post-task questionnaire [44]. In the questionnaire, it should be evaluated (a) how satisfied the users are
with the answer; (b) how satisfied the users are with the usability of the interactive viewer for the task and (c)
how satisfied the users are with the transparency of the tool.

5      Conclusions and recommendations
In this position paper we have established the need for transparency in personalized professional search. We
have provided a brief overview of prior work, identified the gaps, and listed four research directions that need to
be explored to close the gaps.
   In summary, we argue that:

    1. Data collection is instrumental for research in professional search. Data sets with user-generated input are
       sparse in the field, because user-centric research is time-expensive and target group users are not always
       available to provide input. Work is needed to collect a truly user-centric dataset that includes both in-
       formation needs and search engine logs. The data should be made available to other researchers in the
       field.

    2. Knowledge graphs provide a great potential to transparency and personalization in information search. Re-
       search is needed to develop methods for constructing individual professional knowledge graphs and evaluating
       those with expert users.

    3. There is a large body of academic work on personalization, but personalization in professional search engines
       is still limited, because transparency is essential for professional users. Research on professional search should
       include transparency by design. The IR community should bring together research on professional search,
       knowledge graphs, and explainable IR.

    4. For the effective exploitation of the professional knowledge graphs, new methods need to be developed for
       retrieval environments that are centred around the knowledge graphs.

References
 [1] Aggarwal, C.C., Zhao, P.:        Towards graphical models for text processing. Knowledge and information
     systems 36(1) (2013) 1–21

 [2] Ai, Q., Zhang, Y., Bi, K., Chen, X., Croft, W.B.: Learning a hierarchical embedding model for personal-
     ized product search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and
     Development in Information Retrieval, ACM (2017) 645–654




                                                       39
 [3] Alonso, O., Hearst, M.A., Kamps, J.: Report on the First SIGIR Workshop on Graph Search and Beyond
     (GSB’15). ACM SIGIR Forum 49(2) (2016) 89–97
 [4] Bawden, D., Robinson, L.: The dark side of information: Overload, anxiety and other paradoxes and
     pathologies. Journal of Information Science 35(2) (apr 2009) 180–191
 [5] Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Information retrieval 15(1)
     (2012) 54–92
 [6] Chein, M., Mugnier, M.L.: Graph-based knowledge representation: computational foundations of conceptual
     graphs. Springer Science & Business Media (2008)
 [7] Chen, X., Zhang, Y., Ai, Q., Xu, H., Yan, J., Qin, Z.: Personalized key frame recommendation. In:
     Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information
     Retrieval, ACM (2017) 315–324
 [8] Culpepper, J.S., Diaz, F., Smucker, M.D.e.: Research frontiers in information retrieval report from the
     third strategic workshop on information retrieval in lorne (swirl 2018). Technical report (2018)
 [9] Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: Proceedings
     of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM
     (2014) 365–374
[10] Daoud, M., Lechani, L.T., Boughanem, M.: Towards a graph-based user profile modeling for a session-based
     personalized search. Knowledge and Information Systems 21(3) (2009) 365–398
[11] Dervin, B.: From the mind’s eye of the user: The sense-making qualitative-quantitative methodology.
     Qualitative research in information management 9 (1992) 61–84
[12] Dietz, L., Meij, E., Bloomberg, L.P.: Overview of The First Workshop on Knowledge Graphs and Semantics
     for Text Retrieval and Analysis. CEUR Workshop Proceedings 51(3) (2017) 139–144
[13] Hannak, A., Sapiezynski, P., Molavi Kakhki, A., Krishnamurthy, B., Lazer, D., Mislove, A., Wilson, C.:
     Measuring personalization of web search. In: Proceedings of the 22Nd International Conference on World
     Wide Web. WWW ’13, New York, NY, USA, ACM (2013) 527–538
[14] He, J., Bron, M.: Measuring Demonstrated Potential Domain Knowledge with Knowledge Graphs. In:
     Proceedings of The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis
     (KG4IR). (2017)
[15] He, J., Bron, M., de Vries, A.P.: Characterizing stages of a multi-session complex search task through
     direct and indirect query modifications. In: Proceedings of the 36th international ACM SIGIR conference
     on Research and development in information retrieval, ACM (2013) 897–900
[16] Holzinger, A., Biemann, C., Pattichis, C.S., Kell, D.B.: What do we need to build explainable AI systems
     for the medical domain? arXiv preprint arXiv:1712.09923 (2017)
[17] Huang, Z., Cautis, B., Cheng, R., Zheng, Y.: KB-Enabled Query Recommendation for Long-Tail Queries.
     In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management,
     ACM (2016) 2107–2112
[18] Ingwersen, P., Järvelin, K.: The Turn: Integration of Information Seeking and Retrieval in Context (The
     Information Retrieval Series). Volume 18. Springer (2005)
[19] Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as
     implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research
     and Development in Information Retrieval, New York, NY, USA, ACM (2005) 154–161
[20] Karwatzki, S., Dytynko, O., Trenz, M., Veit, D.: Beyond the PersonalizationPrivacy Paradox: Privacy Val-
     uation, Transparency Features, and Service Personalization. Journal of Management Information Systems
     34(2) (apr 2017) 369–400




                                                  40
[21] Kim, Y., Seo, J., Croft, W.B.: Automatic boolean query suggestion for professional search. In: Proceedings
     of the 34th international ACM SIGIR conference on Research and development in Information Retrieval.
     SIGIR ’11, New York, NY, USA, ACM (2011) 825–834

[22] Leung, K.W.T., Ng, W., Lee, D.L.: Personalized concept-based clustering of search engine queries. IEEE
     transactions on knowledge and data engineering 20(11) (2008) 1505–1518

[23] Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends R in Information Retrieval
     3(3) (2009) 225–331

[24] Lykke, M., Larsen, B., Lund, H., Ingwersen, P.: Developing a Test Collection for the Evaluation of Integrated
     Search. In Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen,
     K., eds.: Advances in Information Retrieval. Volume 5993 of Lecture Notes in Computer Science. Springer
     Berlin / Heidelberg (2010) 627–630

[25] Mai, J.E.: Looking for Information: A Survey of Research on Information Seeking, Needs, and Behavior.
     Emerald Group Publishing (2016)

[26] Makri, S., Blandford, A., Cox, A.L.: Investigating the information-seeking behaviour of academic lawyers :
     From Ellis ’ s model to design. Information Processing & Management 44(2) (2008) 613–634

[27] Mason, D.: Legal Information Retrieval StudyLexis Professional and Westlaw UK. Legal Information
     Management (2006)

[28] Micarelli, A., Gasparetti, F., Sciarrone, F., Gauch, S.: Personalized search on the world wide web. The
     Adaptive Web 4321 (2007) 195–230

[29] Micarelli, A., Sciarrone, F.: Anatomy and empirical evaluation of an adaptive web-based information
     filtering system. User Modeling and User-Adapted Interaction 14(2) (2004) 159–200

[30] Mihalcea, R., Radev, D.: Graph-based natural language processing and information retrieval. Cambridge
     University Press (2011)

[31] Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Text. In: EMNLP. Volume 4. (2004) 404–411

[32] Mittelstadt, B.: Auditing for Transparency in Content Personalization Systems. International Journal of
     Communication 10(June) (2016) 4991–5002

[33] Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: Generalizing degree and
     shortest paths. Social networks 32(3) (2010) 245–251

[34] Paul, C., Rettinger, A., Mogadala, A., Knoblock, C.A., Szekely, P.: Efficient graph-based document simi-
     larity. In: International Semantic Web Conference, Springer (2016) 334–349

[35] Sappelli, M.: Knowledge Work in Context. User Centered Knowledge Worker Support. PhD thesis, Radboud
     University Nijmegen (2016)

[36] Sappelli, M., Verberne, S., Kraaij, W.: Evaluation of context-aware recommendation systems for information
     re-finding. Journal of the Association for Information Science and Technology 68(4) (2017) 895–910

[37] Sarrafzadeh, B., Vechtomova, O., Jokic, V.: Exploring knowledge graphs for exploratory search. In:
     Proceedings of the 5th Information Interaction in Context Symposium on - IIiX ’14. (2014) 135–144

[38] Sarrafzadeh, B., Vtyurina, A., Lank, E., Vechtomova, O.: Knowledge Graphs versus Hierarchies. In:
     Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval - CHIIR ’16.
     (2016) 91–100

[39] Speretta, M., Gauch, S.: Personalized search based on user search histories. In: Web Intelligence, 2005.
     Proceedings. The 2005 IEEE/WIC/ACM International Conference on, Ieee (2005) 622–628




                                                    41
[40] Sun, Y., Li, H., Councill, I.G., Huang, J., Lee, W.C., Giles, C.L.: Personalized ranking for digital li-
     braries based on log analysis. In: Proceedings of the 10th ACM workshop on Web information and data
     management, ACM (2008) 133–140

[41] Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Transactions
     on Knowledge Discovery from Data (TKDD) 5(1) (2010) 2
[42] Tanudjaja, F., Mui, L.: Persona: A contextualized and personalized web search. In: System Sciences, 2002.
     HICSS. Proceedings of the 35th Annual Hawaii International Conference on, IEEE (2002) 1232–1240

[43] Teevan, J., Dumais, S.T., Horvitz, E.: Personalizing search via automated analysis of interests and activities.
     In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in
     information retrieval, ACM (2005) 449–456
[44] Verberne, S., Boves, L., van den Bosch, A.: Information access in the art history domain: Evaluating a
     federated search engine for Rembrandt research. Digital Humanities Quarterly 010(4) (2016)

[45] Verberne, S., Kraaij, W., de Vries, A.P.:           Author-topic profiles for academic search.     In:   arXiv
     preprint:1804.11131. (2018)
[46] Verberne, S., Sappelli, M., Kalervo, J., Järvelin, K., Kraaij, W.: User simulations for interactive search:
     Evaluating personalized query suggestion. In: Advances in Information Retrieval. Volume 9022., Springer
     International Publishing (2015) 678–690
[47] Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: Proceedings of the
     16th international conference on World Wide Web, ACM (2007) 591–600
[48] Zhou, D., Lawless, S., Wade, V.: Improving search via personalized query expansion using social media.
     Information retrieval 15(3-4) (2012) 218–242




                                                    42