Introduction

Van Den Broek, E.L., Kisters, P.M., Vuurpijl, L.G.: Content-based image retrieval benchmarking: Utilizing color categories and color distributions. Journal of imaging science and technology

Validating the importance of work tasks as context for professional search

Egon L. van den Broek

1 2

vandenbroek@acm.org

2 0 Freudenthal Institute, Utrecht University 1 Institute of Information and Computing Sciences, Utrecht University 2 Thomas Schoegje

2005

49 3 24 28

In professional search many work tasks share some structure and are likely to recur. We argue that retrieval in the context of such tasks can exploit prior knowledge about these tasks to serve more relevant results. Understanding why someone searches can help to distill and express the user's information desires. In order to validate the value of this approach, we asked users to judge search results with and without a new search lter that narrowed down the results to documents relevant to the work tasks.Initial results suggest the importance of exploiting work tasks as context for professional search, although future work should consider the extent of this e ect.

Introduction

Copyright c by the paper's authors. Copying permitted for private and academic purposes. often shape typical information desires, and that enabling users to express these desires more accurately will better support them in their tasks.

Section 2 will open by reviewing (authentic) work tasks as concepts. Afterwards, we introduce the work tasks investigated in their context at the municipality Utrecht in Section 3. In Section 4 we describe an experiment to quantify the bene ts of ltering documents based on their originating work tasks. Finally we present our conclusions in Section 5 and expand on our plans for future work in Section 6. 2

Authentic work tasks

Work tasks are de ned as concrete sections of time that include actions towards a goal; the task outcome (e.g. handling email)[ 6 ]. A work task may include multiple search tasks, which are sub-tasks that include one or more queries towards an information need. Although the work tasks will rarely be identical, many of them will share characteristic aspects. This is especially the case for professional search, as work tasks are more recurring and structurally de ned than the information desires in typical web search.

Some work tasks are shared between users within a team or between users with similar roles in di erent teams. Others, such as reviewing recent information, are very common. The importance of one such ubiquitous work task, reviewing recent documents, has been quanti ed [ 2 ]. Here it was shown that the number of times any document was accessed in their professional setting followed a logarithmic function.

The actual search process itself also a ects the user's information desires[ 1 ][ 7 ]. The context of one's stage in the information seeking process could also be used to re ne results. 3

Investigating work tasks

The work tasks investigated are performed within the municipality Utrecht in the Netherlands. This is an organization where diverse teams look up information with diverse goals during their work tasks. This diversity is ideal for investigating the types and nature of work tasks, as well as their e ects on user interactions with the information systems. Our experiment aims to show that work tasks can often shape typical information desires, and that embedding searches with this context information will better support them in their tasks. First, we will identify typical information desires through exploratory interviews with a focus on the user work tasks and the di culties they have in completing these. We then provide them with a lter that lets them express their work task context in a search interface. This lter is based on meta-data that can be algorithmically annotated (i.e. the documents can be classi ed and ltered by class). In order to validate the perceived impact of the new interface we perform an experiment including a dummy lter that does not function as intended but instead randomly lters out documents. 3.1

Analysis of work tasks

In our experiment we aim to support policy makers from diverse domains. Although their speci c approach may vary per domain and individual, they experienced similar challenges in retrieving information. The rst key challenge was retrieving speci c documents that were stored without descriptive titles in a semi-structured archive. In this case users tended to resort to systems other than the intended interface to nd the documents (in particular by checking email attachments and by asking colleagues). The second key challenge is sorting through a large volume of documents retrieved unrelated to the intended domain. The rst challenge was addressed by allowing users to easily access recently viewed documents (as suggested in the literature [ 2 ]). The current focus will be on the second challenge; allowing users to more accurately express their information desires in a way that reduces the number of unrelated results.

Based on work tasks, two promising document categorizations were identi ed through explorative interviews. The rst was to consider the information needs at the various (global) stages of policy making. Here, di erent teams continue building on the same information as the documents produced get less explorative and increasingly speci c. The second was to focus on the types of communication between these teams, as a policy maker is required to produce or search for speci c types of documents. Whereas the rst case groups the work tasks within a team, the second case groups on the communication tasks that policy makers from multiple teams might encounter. These two categorizations will be discussed in order.

The rst categorization considers the global policy making process. There are three main steps involving di erent teams of policy makers: gathering information, forming policy proposals and adapting it. The forming of policy proposals is further separated by domain, with two signi cant ones shown in Table 1.

Step

Information gathering Debating (domain: city and space) Debating (domain: man and society) Adapting

The second categorization is based on the policy documents communicated between teams, which can generally be divided into ve categories based on their purpose. They are the products of mutually exclusive work tasks, and their purpose is summarized in Table 2.

Our initial solution to reduce the number of irrelevant search results is to allow users to lter on such a categorization based on meta-data. This annotation was approached manually for new documents, where a new interface was introduced to aid employees in selecting one of ve appropriate templates (corresponding to these classes). Although this ensures accurate annotation of new documents, users also want to search older documents. This can be approached by exploiting the le location and document title where possible, and using classi cation for the remaining cases. The algorithmic classi cation is out of the scope of this paper. This avoids inaccuracies due to classi cation errors for the remainder of the paper. An experiment will now follow to test the value in this work-task based approach, where classi cation was veri ed manually. 4

Empirical study

An empirical study was set up to verify whether lters in the search interface help the user to better express the information desire by expressing the context of the work tasks. Users were asked to judge how well a set of results ful lled a speci ed information desired, both with and without the lter. In order to test the presence of a placebo e ect on the new lter (which was created as they hypothesized it would improve performance), we also introduce a second placebo lter which lters on a random subtype. 4.1

Materials

Based on the categorization of the policy making process (see Table 1), there are 4 document types. For each of these 2 search queries were formulated by a user familiar with the system, resulting in 8 queries. They were chosen to represent authentic search tasks. The actual query used is hidden, and the user is instead presented a search question that represents the underlying information desire. Although such verbose question yield poor results when used as the query[ 5 ], these authentic search tasks re ect examples where users were interesting in ltering on the categories. Using each of the 8 queries, results were retrieved under the following three conditions:

1. TextSearch: a full-text search using the queries.

2. FilterSearch: the same search, but ltering out all results of the incorrect document types. 3. PlaceboSearch: appears identical to FilterSearch, but instead of ltering on the desired (and indicated) document type it lters on another document type.

This document categorization was chosen over the alternative previously introduced as the placebo lter would be more obvious (the user would recognize that a memo is not actually a letter). The current PlaceboSearch displays documents that were written for a di erent purpose or domain, but on the same topic. 5 female and 4 male participants were asked to judge the results for queries. They were presented with a static search interface such as the one shown in Figure 1. Document type is indicated using the color and title. Users were asked to highlight any relevant results by clicking on them (similar to previous work in gathering subjective opinions[8]). Then they were asked to indicate an overall rating of the results on a (Likert) scale of 1 - 5. They could do so at their own tempo, before proceeding to the next set of results. In order to avoid the possibility of users forgetting to answer one of these questions, users were asked to con rm when they found no useful results. The time it took to answer each question (in seconds) was also stored, and the experiment took around 30 minutes. After two practice screens, a total of 24 sets of results were presented. The order was randomized by shu ing in which order the lter types were presented, and then presenting the 8 questions per lter type in a random order per user. This was done so the user could get accustomed to the di erent lters. The time taken per answer is recorded. 4.3

Results

Table 3 shows basic descriptors of how users evaluated the various search engines. Having the proper lter outperform the others in every aspect encourages further study, although further statistical analysis is required. An ANOVA was performed using the mean Likert scores for each set of results shown (8 questions x 3 lters), with the search engine as the dependent variable. The result was not strongly statistically signi cant (F(2,27) = 3:453, p = :0505), although the e ect size was fairly large ( 2 = 0:247). In combination with the con dence intervals, this suggests that the p value would decrease below :05 if more participants were added to the study.

The discrepancy where the TextSearch lter has a large number of results selected but a relatively Likert rating is likely because this version tended to show the same le preview multiple times (from di erent sources). The reduced decision time might be a related to presenting that are obviously relevant, but also in uenced by the TextSearch lter including more documents with lengthy previews. 5

Conclusion

We noted the important role of the user's (work) tasks could play in helping users. Users perform queries within the context of a work task, and this context can be used to better understand their information needs. This is especially the case for professional search as many work tasks have a more structured and recurring nature than is often the case in web search. Results suggest the importance of work tasks as the context for professional search, although future work should consider the extent of this e ect.

Acknowledgements

The authors gratefully acknowledge the participants for their time and Arjen de Vries for his comments on the present work.

[1] Clarkson , K.L. : Supporting the complex dynamics of the information seeking process . PhD thesis , Radboud University, Nijmegen, Netherlands ( 2018 )

[2] Dumais , S. , Cutrell , E. , Cadiz , J.J. , Jancke , G. , Sarin , R. , Robbins , D.C. : Stu I've seen: a system for personal information retrieval and re-use . In: ACM SIGIR Forum . Volume 49 ., ACM ( 2016 ) 28 { 35

[3] Hoenkamp , E.C. : About the'compromised information need'and optimal interaction as quality measure for search interfaces . In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval , ACM ( 2015 ) 835 { 838

[4] Kim , Y. , Seo , J. , Croft , W.B.: Automatic boolean query suggestion for professional search . In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval , ACM ( 2011 ) 825 { 834

[5] Koster , C.H. , Seibert , O. , Seutter , M.: The phasar search engine . In: International Conference on Application of Natural Language to Information Systems , Springer ( 2006 ) 141 { 152

[6] Saastamoinen , M. , J arvelin, K.: Queries in authentic work tasks: the e ects of task type and complexity . Journal of Documentation 72 ( 6 ) ( 2016 ) 1114 { 1133

[7] Spink , A. : Multiple search sessions model of end-user behavior: An exploratory study . Journal of the American Society for Information Science 47 ( 8 ) ( 1996 ) 603 { 609