1. Introduction

S.de (M. Rokicki); ralph.ewerth@tib.eu (R. Ewerth); stefan.dietze@gesis.org (S. Dietze)

Domain-Specific Modeling of User Knowledge in Informational Search Sessions

Rui Tang

Ran Yu

Markus Rokicki

Ralph Ewerth

Stefan Dietze

1 2 0 Data Science & Intelligent Systems Group, University of Bonn , Germany 1 GESIS - Leibniz Institute for the Social Sciences , Germany 2 Heinrich Heine University Dusseldorf , Germany 3 L3S Research Center , Leibniz Universiy Hannover , Germany 4 Ping An Technology , China

2021

000 0 0002

Users frequently search on the Web to fulfill information needs with learning intent. In this context, usefulness of the search results depends strongly on the knowledge state of the user. In order to satisfy learning needs efectively, it is necessary to take users' knowledge gain and knowledge state within learning-oriented Web search sessions into account. Previous works studied the use of supervised models to predict a user's knowledge gain and knowledge state. However, the impact of knowledge domains of the search topics on a user's learning process have not been adequately explored. In this paper, we suggest domain detection techniques for search sessions and build domain-specific knowledge prediction models accordingly. Experimental evaluation results demonstrate that our approach outperforms the state-of-the-art baseline.

eol>search as learning knowledge gain informational search

1. Introduction Web search has been established. Using various features

computed based on user interactions and Web resource Users frequently surf the Web to search for a variety of content, Yu et al. [5, 6] proposed approaches and built information and to satisfy a wide range of information models for the prediction of a user’s knowledge gain (KG) needs. Web search sessions are commonly categorized and knowledge state (KS). Their work demonstrates that into three classes: navigational, informational and trans- knowledge gain and state of users can be predicted from actional [ 1 ]. Informational search sessions involve an in- their behaviors in Web search sessions. herent learning intent, i.e. the desire of a user to acquire Through more in-depth analysis of the relation beknowledge or information with respect to a particular tween user knowledge state and various features based topic, assumed to be present on one or more Web pages. on user study data published by [5], we observed that In this context, the individual relevance of search results correlations between features and knowledge gain/state is strongly dependent on the current knowledge state of in diferent knowledge domains of Web search sessions the corresponding user. are diferent. For example, the correlation between the

The importance of learning scopes has been recog- ratio of words related to the concept of health in user nized by recent work at the intersection of information browsed webpages and knowledge gain/state for search retrieval and learning theory. Eickhof et al. [ 2 ] inves- sessions on topics in the health domain, is stronger than tigated the relationship between query and Web search the correlation between them in sessions on topics in session-related metrics and learning progress. Collins- the history domain. Similar observations have been reThompson et al. [ 3 ] studied the efectiveness of user ported by Yu et al. in [6], where they proposed a new interaction with respect to certain learning outcomes. feature selection method to remove domain dependent The correlation between Web search behaviors and a features and thereby improve the topic generalizability user’s learning gain has been explored by prior work [ 4 ], of the knowledge prediction models. However, we argue while the importance of learning as an implicit element of that, instead of eliminating such features, we could use them to build fine-grained domain-specific models.

In this paper, we detect the most relevant domain of a search session based on textual information extracted from queries and webpages accessed by the user. We then carry out feature selection and build prediction models for each domain. Experimental results demonstrate that our proposed model outperforms the state-of-the-art baseline.

2. Related Works

et al. [5, 6] proposed to use features based on user interactions and Web resource content to build classification Many studies have been carried out for understanding models to predict user knowledge state and knowledge the relationship between learning progress and observ- gain in search sessions. Liu et al. [19] adopted mind maps able features in a search session. By matching the learn- to capture user’s knowledge change process and hence ing tasks into diferent learning stages of Anderson and identified four types of knowledge change styles. Krathwohl’s taxonomy [7], Jansen et al. studied the cor- Although previous works have studied the relation relation between search behaviors of 72 participants and between various features and user knowledge state, and their learning stage [8]. They showed that information knowledge prediction models have been proposed, the searching is a learning process with unique searching impact of the knowledge domain on the efectiveness of characteristics corresponding to particular learning lev- features hasn’t been explored. In this paper, we propose els. Cole et al. [9] observed that behavioral patterns pro- a novel approach for predicting user knowledge state vide reliable indicators about the domain knowledge of a and knowledge gain in informational search sessions by user, even if the actual content or topics of queries and taking the knowledge domain into consideration. documents are disregarded entirely. Collins-Thompson et al. [ 3 ] studied the influence of distinct query types on knowledge gain, finding that intrinsically diverse 3. Task Description & Approach queries lead to increased knowledge gain. Moraes et Overview al.’s [10] work compared the learning outcome of instructor designed learning videos against three instances of As defined in [ 5]: an intentional learning-related search search ("single-user", "search as support tool", "collabora- session comprises the sequence of a user’s actions with tive search") in order to find the most eficient approach respect to satisfying her learning intent in a Web search for their learning scenario. Vakkari [11] provided a struc- environment through informational queries. A user’s tured survey of features indicating learning needs as well sequence of actions begins with an initial Web query and as user knowledge and knowledge gain throughout the includes browsing through the search results, click and search process. Gadiraju et al. [ 4 ] described the use of scroll activity, navigation via hyperlinks, query reforknowledge tests to calibrate the knowledge of users be- mulations, and so forth. We refer to such an intentional fore and after their search sessions, quantifying their learning-related search session as “session” in the remainknowledge gain, and investigated the impact of search der of this paper for simplicity. intent and search behavior on knowledge gain of users. Let be a search session starting at time and ending Bhattacharya et al. [12] investigated the relationship be- at time aimed at satisfying a particular information tween users’ search and eye gaze behaviors and their need, that is, a learning intent of user . In this work, we learning performance. In a recent work, Roy et al. [13] study the knowledge indicators (): pre-knowledge investigated at which time during a search session learn- state (pre-KS) (), post-knowledge state (post-KS) ( ) ing occurred, and found that the learning curve is largely and knowledge gain (KG) ∆ (, ) during time period influenced by a user’s prior knowledge on the searched [, ]. This work aims at building domain-specific topic. Kalyani et al. [14] explored this direction further models (with respect to users’ learning intents), to predict by designing search tasks that fit into the diferent learn- the s. ing stages of the revised Bloom’s taxonomy. Through Figure 1 gives an overview of the approach we proknowledge tests before and after each search session, pose for building domain-specific KI prediction models. they found significant impact of the learning stage on a Given a session, we first extract textual information from user’s search behavior and knowledge gain. diferent fields (e.g. query terms, webpage contents, etc.)

For predicting user’s knowledge state or change in a and use it to detect the relevant domain of the session. search session, Zhang et al. [15] explored using search After domain detection, the sessions are assigned to their behavior as an indicator for the domain knowledge level most relevant domains. In the next step, we conduct the of a user. Through a small study ( = 35), they identified feature selection and knowledge modeling process using features such as the average query length or the rank of sessions assigned to each domain. More specifically, we documents consumed from the search results as being compute Web resource features and user behavior feapredictive. Syed and Collins-Thompson [16] explored the tures of each session, and then select a subset of these possibility of using regression models and features ex- features based on two feature selection strategies. Using tracted from user accessed document content to predict the selected features, we build prediction models for user knowledge change on vocabulary learning tasks [17]. each domain. The process labeled in blue in Figure 1 Gwizdka et al. [18] proposed to assess learning outcomes shows an example of the data flow when predicting KI to search environments by correlating individual search for a new session using the trained models. behaviors with corresponding eye-tracking measures. Yu

We conclude the three main tasks of building domain- the TREC 2014 Web Track1 dataset. This includes knowlspecific KI prediction models as follows: edge assessment data before and after each of the search sessions per information need, they also crawled the web1. Domain detection of informational search pages that were assessed by the users. The experimental sessions. Each session can be associated with setup for obtaining the data and KIs was described by the one or more domains to a diferent extent. For authors in [6]. the modeling purpose, we assign each session to Data Cleaning. We filtered out untrustworthy worka single domain that it has the strongest associa- ers who meet any of the following conditions: 1) did tion with based on textual information involved not complete the post-session test, 2) did not issue at in the session. As each session contains textual least 1 search query, 3) selected the same option; either information in multiple fields, it is also our task ‘YES’, ‘NO’ or for all items in the calibration test or the to find the most suitable fields to be used for the post-session test. In the next step, we filter out sessions domain detection. that are insuficient of computing features we need for 2. Feature extraction and domain-specific fea- building knowledge prediction models, that includes: 1) ture selection. In this step, we first extract a sessions with no click on any results on the SERPs, and set of features for each session from the user 2) sessions that contain at least 1 non-English resource behaviors and the related Web resource contents. browsed by the user. After applying all the aforemenFor the sessions assigned to a specific domain, tioned filters, we retain 233 search sessions, with 1.361 we select features reflecting the users’ knowledge queries and 2.622 clicks per session on average. gain and state. Knowledge Measures. Knowledge tests are scientifically formulated tests that measure the knowledge of a 3. Domain-specific knowledge modeling. We participant on a given topic. The authors of [ 4 ] created formulate the prediction of knowledge state/gain knowledge tests pertaining to each of the information as classification tasks, i.e. we aim to classify a needs. The pre (post)-knowledge score of a user in search specific (e.g. knowledge gain) of the user cor- sessions corresponding to a topic is measured as the perresponding to a search session into low, moderate, centage of the correct answers on the knowledge test high classes, with respect to a particular informa- that a given user has completed. Correspondingly, the tion need. That is, for each domain, we conduct knowledge gain is measured as the diference between a feature selection and train classifiers to build the user’s pre- and post-search session knowledge score. prediction models. For the classification tasks described in Section 3, we follow the same approach as used in [5], i.e. a Standard Deviation Classification approach to obtain three classes 4. Dataset of learners with regard to their level of pre-KS. Assuming approximately normal distributions of the respective test scores (X) for the diferent topics, we transformed the test scores into Z-scores with a mean of 0 and a Standard Deviation (SD) of 1 (standardization). We then used sta

To address the aforementioned tasks, we adopt an ex

isting dataset which has been used by previous works on understanding and predicting user knowledge state and gain [ 4, 5 ]. This dataset includes search sessions conducted by crowd workers spanning across 11 information needs for diferent topics randomly selected from

5. Domain Detection The goal of this step is to assign each informational search

session to a most relevant domain. More specifically, we extract textual information from queries and consumed Web resources of a session and apply two text classifiers on them to detect its domain.

5.2. Textual Information Extraction High Table 2

7974 eDxotmraacitneddetetexctutiaolnincfoonrfmigautriaotni.ons and abbreviations based on 65 Abbreviation Description

QW Query words WPT Web page titles WPC Web page contents QW & WPT Query words and Web page titles QW & WPC Query words and Web page contents WPT & WPC Web page titles and Web page contents QW & WPT & WPC Query words, Web page titles, and Web page contents all MV Mjority vote based on QW, WPT and WPC result tistically defined intervals (low: X < -0.5 SD; moderate: document into 10 diferent top-level domains. Each do-0.5 SD < X < 0.5 SD; high: 0.5 SD < X) for the classi- main has a score of probability, and the domain with the ifcation of the learners into roughly equal groups with highest probability is considered as the most relevant dolow, moderate, or high pre-KS. The same procedure was main. The 10 top-level domains we adopted in this work repeated for post-KS and KG. Table 1 shows the result- are arts, business, computers, games, health, home, recreing numbers of learners for the respective classes and ation, science, society and sports. The classes are adopted underlying statistics. from the Open Directory Project 4.

During a session, a user enters query terms to commu

nicate her information need on a topic related to certain 5.1. Methods for Domain Detection domains, we extract and combine all query terms in a Domain detection in this paper is formulated as a text clas- session and use it for domain detection. Titles of visited sification problem (“to which predefined class or category Web pages can be an indicator of the domain that a user is this text most likely to belong?” 2). This work aims at choose to learn in a session. Therefore, we combined the exploring the possibility of improving prediction per- titles of all the visited Web pages as the second source formance by building more focused models, rather than of textual information. Besides titles, we also analyze developing novel domain detection techniques. We there- their content by combining all textual content of visited fore utilize two existing domain detection tools, namely Web pages in a session. This result in three types of texTagTheWeb and uClassify. tual information: query words (QW ), title of the visited

TagTheWeb [20] can automatically categorize a given Web pages (WPT ) and textual contents of the visited Web text into Wikipedia categories with a probability. The pages(WPC). Moreover, we consider all the five combinacategory with the highest probability is considered to be tions of these sources (as listed in Table 2) and a majority the most relevant domain. The 19 top level Wikipedia cat- vote strategy based on results of using the three textual egories adopted by TagTheWeb are: arts, culture, games, sources respectively (all MV ). For the all MV strategy, geography, health, history, humanities, industry, law, life, when all three votes are diferent from each other, we mathematics, matter, nature, people, philosophy, reference assign the session to other domain. works, religion, science and technology and society. Moreover, TagTheWeb could also classify text into Wikipedia 5.3. Evaluation sub-categories, however, in this work, we focus only on the 19 top-level categories as the granularity fits better into the task scenario and the size of experimental dataset.

uClassify3 is a free machine learning Web service that provides classifiers for diferent applications. A classifier called Topics from uClassify can classify a given textual

We apply both text classification tools for all 8 configura

tions (Table 2) respectively. In this section, we present the evaluation results of domain detection, and choose the configuration that the next step relies on accordingly.

Ground Truth. Two authors of this paper manually assigned labels to the sessions according to the corresponding topics that were presented to the crowd workers when creating the dataset. As sessions corresponding 2https://www.uclassify.com/docs/intro 3https://www.uclassify.com/browse/uclassify/topics

4http://www.dmoz.org

to the same topic could have diferent domain focus, we decided to allow multiple correct domain labels when building the ground truth. Consequently, in the following evaluation, a domain classification outcome was treated as correct, if the predicted domain was among the assigned labels. The description of the pre-defined search topics and the domain labels assigned to them are shown in Table 3. The annotators agreed on all labels.

Evaluation Results. For each of the 16 configurations (2 classifiers X 8 textual information combinations), we compute the overall accuracy of the classification result. Based on the results shown in Table 4, we found that all accuracy scores are above 0.550 for TagTheWeb.

The best performance of TagTheWeb is achieved when combining query words and Web resource titles (QW & WPT ), as well as when combining all three fields ( QW & WPT & WPC), 174 of 233 sessions are detected correctly (accuracy = 0.747). We choose the configuration QW & WPT for later steps, as it has higher eficiency compared to QW & WPT & WPC. Meanwhile, all accuracy scores of uClassify are below 0.25. Therefore, we decide not to pass the result of uClassify to later steps.

To better illustrate the domain detection result, we present a heatmap in Figure 2 showing the assignment of sessions corresponding to each topic to the target domains by TagTheWeb using QW&WPT. We found that 81.5% of sessions in our GT are assigned to 5 domains, namely, history (56 sessions), health (49 sessions), na- 6.1.1. Model ture (32 sessions), geography (29 sessions) and people (24 session). As the next modeling steps require suficient amount of training data in order to build reliable models, we continue the experiment with the 190 sessions categorized into these 5 most frequent domains, and discard the rest 43 sessions which are categorized into society (15 sessions), humanities (10 sessions), philosophy (10 sessions), culture (5 sessions), life (2 sessions) or science and technology (1 sessions).

6. Modeling User Knowledge 6.1. Approach

As described in Section 3, we follow the same approach as in [5, 6] and cast the problem of predicting user s as classification tasks. More specifically, each session is represented as a feature vector, ⃗ = (1, 2, ..., ), where the features considered are introduced later in this section. We apply a range of standard classification models, namely, Naive Bayes (nb), Logistic Regression (lr), Support Vector Machine (svm) and Random Forest (rf ). For our experiments, we used the scikit-learn library for Python5. We tune hyperparameters of the algorithms using grid search.

Reduce feature Redundancy. We also compute the Pearson correlation coeficient (, ) between each pair of features across all sessions in a specific domain. If |(, )| ≥ , i.e. features are highly similar to each other, we remove the one which has a lower (, ) from the pair.

6.2. Evaluation The generation of class labels of the sessions in our ex

6.1.2. Feature Extraction perimental dataset is described in Section 4. We evaluate model performances by means of 10-fold cross-validation.

As the focus of this work is to explore the performance of Further, classification performance is measured in terms domain-specific knowledge prediction models, we make of the following metrics: use of the same set of features as described in [6]. The • Accuracy (Accu): percentage of search sessions that features consist of two categories according to the data were classified with the correct class label. source: Web resource features and user behavior features. • Precision (P), Recall (R), F1 (F1) score of class i: the stanThe 109 Web resource features are extracted based on dard precision, recall and F1 score on the prediction the content of the webpages which users visited during a result of each class i. session, including features computed based on document • Macro average of P, R and F1: the average of the correcomplexity (e.g. average number of words per sentence, sponding score across 3 classes. Gunning Fog Grade6), HTML structure (e.g. Number Baselines. We compare our approach against [5], of<script>elements) and linguistic characteristics (based who proposed to build classifiers to predict KG and poston the 2015 LIWC dictionaries7) of the Web resource con- KS using user interaction and session features only. Their tent. The 66 user behavior features are extracted from approach considered feature selection based on the featurethe user interaction with the search engine during a ses- KI-correlation ( ) and the between-feature-correlation sion, namely features related to the session (e.g. session ( ). Using their approach, we make use of all the 190 duration), queries (e.g. average query length), SERP (e.g. sessions which are relevant to the aforementioned 5 dothe lowest rank of click), browsing behavior (e.g. ratio of mains (history, health, nature, geography and people) to revisited pages) and mouse movements (e.g. total scroll build classifiers for the knowledge prediction tasks. We distance). As the features have been introduced and in- also compare our approach against an improved basevestigated in details by previous works [6], we will not line (denoted as baseline’) for which we apply these 190 go into details in this paper. sessions to build non-domain-specific classifiers using both user interaction features and Web resource features. 6.1.3. Metrics for Feature Selection In the experiment, we tuned the hyper-parameters of these models again using grid search to ensure a fair comparison.

Due to the dificulty in obtaining ground truth data with user knowledge assessment, the scale of training and testing data is limited. Hence, feature selection is important for building reliable models, and in particular, to avoid 6.2.1. Overall Performance overfitting. For sessions assigned to each domain, our Using our approach, the overall accuracy scores are above goal is to select a set of features ′ ⊆ that produce 0.610 for all 3 prediction tasks and the overall average the most reliable model for the prediction tasks. We F1 scores are above 0.609 (see Table 5). Compared to the introduce 2 metrics that are adapted from previous work state-of-the-art baseline (baseline), we observed improve[5]. ments for all 3 prediction tasks, with the improvements

Ensure feature efectiveness. We compute the Pear- by 18.1%, 13.6% and 17.1% (average F1 score) as well as son correlation coeficient between each feature and 16.3%, 12.2% and 15.8% (accuracy score) for pre-KS, post, i.e. (, ), across all sessions in a specific KS and KG prediction tasks respectively. domain. To ensure efectiveness of features, we select Our approach and baseline’ make use of the same feafeatures fulfilling the condition |(, )| ≥ for ture set which includes user behavior features and Web building the classification models. resource features. Our models outperform baseline’ by 14.5%, 10.4% and 12.7% (average F1 score) as well as 12.1%, 9.5% and 12.1% (accuracy score) in the tasks of pre-KS, post-KS and KG prediction respectively. This demonstrates that our domain-specific knowledge modeling ap5http://scikit-learn.org 6http://gunning-fog-index.com/ 7http://liwc.wpengine.com/

1.0 0.9 0.8 0.7 0.6 tea0.5 b0.4 0.3 0.2 0.1 0.0 0.65 fectiveness and redundancy results in markedly improved work. Further, in the domain detection step, only top performance. On the other hand, the overall restrictive level categories (domains) of the taxonomies were used settings – resulting in only two features used for KG pre- when applying TagTheWeb. Given suficient data, accudiction – highlight further room for improvement. While racy could be improved by adopting more subcategories, these general models worked best in our experiments, i.e. more specific domains. Moreover, other than the refining the domain detection step (e.g. using a more fine- two exemplary solutions investigated in this work, other grained taxonomy) could result in more coherent sets domain detection techniques could be applied as well. of training data, allowing for the use of more (specific) features.

7. Conclusions In this paper, we investigated the influence of the domain

on learning-oriented informational Web search sessions, and proposed to improve the performance of knowledge prediction models by extending them to several domainspecific models. We evaluated two text classifiers, i.e. TagTheWeb and uClassify, using 8 types of textual information respectively to categorize a session into a most relevant domain. We observed the best domain detection accuracy when using TagTheWeb based on query words and web page titles. Based on this, we built domain-specific models for knowledge prediction tasks. In our experiments, the approach outperformed the state-of-the-art baseline by at least 12.2% in terms of accuracy and at least 13.6% in terms of F1-Score. Thus, our work contributes to the understanding and prediction of user knowledge in learning-oriented informational Web search sessions.

Due to the limited availability of Web search session data as well as the corresponding user knowledge assessment data, there are limitations in our current experimental dataset. Therefore, observations made herein should be validated on a large scale dataset in future

Acknowledgments Part of this work is supported by the Leibniz Association, Germany (Leibniz Competition 2018, funding line "Collaborative Excellence", project SALIENT [K68/2017]).

sessions on the web, in: 2018 ACM on Conference behavior across varying cognitive levels, in: Proon Human Information Interaction and Retrieval ceedings of the 30th ACM Conference on Hypertext (CHIIR), ACM, 2018. and Social Media, 2019, pp. 123–132. [5] R. Yu, U. Gadiraju, P. Holtz, M. Rokicki, P. Kemkes, [15] X. Zhang, M. Cole, N. Belkin, Predicting users’ S. Dietze, Predicting user knowledge gain in infor- domain knowledge from search behaviors, in: Promational search sessions, in: Proceedings of the ceedings of the 34th international ACM SIGIR con41st International ACM SIGIR Conference on Re- ference on Research and development in Informasearch and Development in Information Retrieval, tion Retrieval, ACM, 2011, pp. 1225–1226.

ACM, 2018. [16] R. Syed, K. Collins-Thompson, Retrieval algorithms [6] R. Yu, R. Tang, M. Rokicki, U. Gadiraju, S. Dietze, optimized for human learning, in: Proceedings Topic-independent modeling of user knowledge of the 40th International ACM SIGIR Conference in informational search sessions, Information Re- on Research and Development in Information Retrieval Journal 24 (2021) 240–268. trieval, ACM, 2017, pp. 555–564. [7] L. W. Anderson, D. R. Krathwohl, P. Airasian, [17] R. Syed, K. Collins-Thompson, Exploring document K. Cruikshank, R. Mayer, P. Pintrich, J. Raths, retrieval features associated with improved shortM. Wittrock, A taxonomy for learning, teach- and long-term vocabulary learning outcomes, in: ing and assessing: A revision of bloom’s taxon- Proceedings of the 2018 Conference on Human Inomy, New York. Longman Publishing. Artz, AF, formation Interaction&Retrieval, ACM, 2018, pp. & Armour-Thomas, E.(1992). Development of a 191–200. cognitive-metacognitive framework for protocol [18] J. Gwizdka, X. Chen, Towards observable indicators analysis of mathematical problem solving in small of learning on search., in: SAL@ SIGIR, 2016. groups. Cognition and Instruction 9 (2001) 137–175. [19] H. Liu, C. Liu, N. J. Belkin, Investigation of users’ [8] B. J. Jansen, D. Booth, B. Smith, Using the taxonomy knowledge change process in learning-related of cognitive learning to model online searching, search tasks, Proceedings of the Association for Information Processing & Management 45 (2009) Information Science and Technology 56 (2019) 166– 643–663. 175. [9] M. J. Cole, J. Gwizdka, C. Liu, N. J. Belkin, X. Zhang, [20] J. F. Medeiros, B. P. Nunes, S. W. M. Siqueira, L. A.

Inferring user knowledge level from eye movement P. P. Leme, Tagtheweb: Using wikipedia categories patterns, Information Processing & Management to automatically categorize resources on the web, 49 (2013) 1075–1091. in: European Semantic Web Conference, Springer, [10] F. Moraes, S. R. Putra, C. Hauf, Contrasting search 2018, pp. 153–157.

as a learning activity with instructor-designed learning, in: A. Cuzzocrea, J. Allan, N. W. Paton, D. Srivastava, R. Agrawal, A. Z. Broder, M. J. Zaki, K. S. Candan, A. Labrinidis, A. Schuster, H. Wang (Eds.), Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018, ACM, 2018, pp. 167–176. URL: https://doi.org/ 10.1145/3269206.3271676. doi:10.1145/3269206.

3271676. [11] P. Vakkari, Searching as learning: A systematization based on literature, Journal of Information

Science 42 (2016) 7–18. [12] N. Bhattacharya, J. Gwizdka, Measuring learning during search: Diferences in interactions, eye-gaze, and semantic similarity to expert knowledge, in: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, ACM, 2019, pp.

63–71. [13] N. Roy, F. Moraes, C. Hauf, Exploring users’ learning gains within search sessions, in: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, 2020, pp. 432–436. [14] R. Kalyani, U. Gadiraju, Understanding user search

[1]

A. Z.

Broder , A taxonomy of web search , SIGIR Forum 36 ( 2002 ) 3 - 10 . URL: https://doi.org/10.1145/ 792550.792552. doi: 10 .1145/792550.792552.

[2]

Eickhof ,

Teevan ,

White ,

Dumais , Lessons from the journey: a query log analysis of withinsession learning , in: Proceedings of the 7th ACM international conference on Web search and data mining, ACM , 2014 , pp. 223 - 232 .

[3]

Collins-Thompson ,

S. Y.

Rieh ,

C. C.

Haynes ,

Syed , Assessing learning outcomes in web search: A comparison of tasks and query strategies , in: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval , ACM, 2016 , pp. 163 - 172 .

[4]

Gadiraju ,

Yu ,

Dietze ,

Holtz , Analyzing knowledge gain of users in informational search