<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S.de (M. Rokicki); ralph.ewerth@tib.eu
(R. Ewerth); stefan.dietze@gesis.org (S. Dietze)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Domain-Specific Modeling of User Knowledge in Informational Search Sessions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rui Tang</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ran Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Rokicki</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Ewerth</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dietze</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science &amp; Intelligent Systems Group, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GESIS - Leibniz Institute for the Social Sciences</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Heinrich Heine University Dusseldorf</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>L3S Research Center</institution>
          ,
          <addr-line>Leibniz Universiy Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Ping An Technology</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Users frequently search on the Web to fulfill information needs with learning intent. In this context, usefulness of the search results depends strongly on the knowledge state of the user. In order to satisfy learning needs efectively, it is necessary to take users' knowledge gain and knowledge state within learning-oriented Web search sessions into account. Previous works studied the use of supervised models to predict a user's knowledge gain and knowledge state. However, the impact of knowledge domains of the search topics on a user's learning process have not been adequately explored. In this paper, we suggest domain detection techniques for search sessions and build domain-specific knowledge prediction models accordingly. Experimental evaluation results demonstrate that our approach outperforms the state-of-the-art baseline.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;search as learning</kwd>
        <kwd>knowledge gain</kwd>
        <kwd>informational search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Web search has been established. Using various features</title>
        <p>
          computed based on user interactions and Web resource
Users frequently surf the Web to search for a variety of content, Yu et al. [5, 6] proposed approaches and built
information and to satisfy a wide range of information models for the prediction of a user’s knowledge gain (KG)
needs. Web search sessions are commonly categorized and knowledge state (KS). Their work demonstrates that
into three classes: navigational, informational and trans- knowledge gain and state of users can be predicted from
actional [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Informational search sessions involve an in- their behaviors in Web search sessions.
herent learning intent, i.e. the desire of a user to acquire Through more in-depth analysis of the relation
beknowledge or information with respect to a particular tween user knowledge state and various features based
topic, assumed to be present on one or more Web pages. on user study data published by [5], we observed that
In this context, the individual relevance of search results correlations between features and knowledge gain/state
is strongly dependent on the current knowledge state of in diferent knowledge domains of Web search sessions
the corresponding user. are diferent. For example, the correlation between the
        </p>
        <p>
          The importance of learning scopes has been recog- ratio of words related to the concept of health in user
nized by recent work at the intersection of information browsed webpages and knowledge gain/state for search
retrieval and learning theory. Eickhof et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] inves- sessions on topics in the health domain, is stronger than
tigated the relationship between query and Web search the correlation between them in sessions on topics in
session-related metrics and learning progress. Collins- the history domain. Similar observations have been
reThompson et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] studied the efectiveness of user ported by Yu et al. in [6], where they proposed a new
interaction with respect to certain learning outcomes. feature selection method to remove domain dependent
The correlation between Web search behaviors and a features and thereby improve the topic generalizability
user’s learning gain has been explored by prior work [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], of the knowledge prediction models. However, we argue
while the importance of learning as an implicit element of that, instead of eliminating such features, we could use
them to build fine-grained domain-specific models.
        </p>
        <p>In this paper, we detect the most relevant domain of
a search session based on textual information extracted
from queries and webpages accessed by the user. We
then carry out feature selection and build prediction
models for each domain. Experimental results demonstrate
that our proposed model outperforms the state-of-the-art
baseline.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        et al. [5, 6] proposed to use features based on user
interactions and Web resource content to build classification
Many studies have been carried out for understanding models to predict user knowledge state and knowledge
the relationship between learning progress and observ- gain in search sessions. Liu et al. [19] adopted mind maps
able features in a search session. By matching the learn- to capture user’s knowledge change process and hence
ing tasks into diferent learning stages of Anderson and identified four types of knowledge change styles.
Krathwohl’s taxonomy [7], Jansen et al. studied the cor- Although previous works have studied the relation
relation between search behaviors of 72 participants and between various features and user knowledge state, and
their learning stage [8]. They showed that information knowledge prediction models have been proposed, the
searching is a learning process with unique searching impact of the knowledge domain on the efectiveness of
characteristics corresponding to particular learning lev- features hasn’t been explored. In this paper, we propose
els. Cole et al. [9] observed that behavioral patterns pro- a novel approach for predicting user knowledge state
vide reliable indicators about the domain knowledge of a and knowledge gain in informational search sessions by
user, even if the actual content or topics of queries and taking the knowledge domain into consideration.
documents are disregarded entirely. Collins-Thompson
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] studied the influence of distinct query types
on knowledge gain, finding that intrinsically diverse 3. Task Description &amp; Approach
queries lead to increased knowledge gain. Moraes et Overview
al.’s [10] work compared the learning outcome of
instructor designed learning videos against three instances of As defined in [ 5]: an intentional learning-related search
search ("single-user", "search as support tool", "collabora- session comprises the sequence of a user’s actions with
tive search") in order to find the most eficient approach respect to satisfying her learning intent in a Web search
for their learning scenario. Vakkari [11] provided a struc- environment through informational queries. A user’s
tured survey of features indicating learning needs as well sequence of actions begins with an initial Web query and
as user knowledge and knowledge gain throughout the includes browsing through the search results, click and
search process. Gadiraju et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] described the use of scroll activity, navigation via hyperlinks, query
reforknowledge tests to calibrate the knowledge of users be- mulations, and so forth. We refer to such an intentional
fore and after their search sessions, quantifying their learning-related search session as “session” in the
remainknowledge gain, and investigated the impact of search der of this paper for simplicity.
intent and search behavior on knowledge gain of users. Let  be a search session starting at time  and ending
Bhattacharya et al. [12] investigated the relationship be- at time  aimed at satisfying a particular information
tween users’ search and eye gaze behaviors and their need, that is, a learning intent  of user . In this work, we
learning performance. In a recent work, Roy et al. [13] study the knowledge indicators (): pre-knowledge
investigated at which time during a search session learn- state (pre-KS) (), post-knowledge state (post-KS) ( )
ing occurred, and found that the learning curve is largely and knowledge gain (KG) ∆ (,  ) during time period
influenced by a user’s prior knowledge on the searched [,  ]. This work aims at building domain-specific
topic. Kalyani et al. [14] explored this direction further models (with respect to users’ learning intents), to predict
by designing search tasks that fit into the diferent learn- the s.
ing stages of the revised Bloom’s taxonomy. Through Figure 1 gives an overview of the approach we
proknowledge tests before and after each search session, pose for building domain-specific KI prediction models.
they found significant impact of the learning stage on a Given a session, we first extract textual information from
user’s search behavior and knowledge gain. diferent fields (e.g. query terms, webpage contents, etc.)
      </p>
      <p>For predicting user’s knowledge state or change in a and use it to detect the relevant domain of the session.
search session, Zhang et al. [15] explored using search After domain detection, the sessions are assigned to their
behavior as an indicator for the domain knowledge level most relevant domains. In the next step, we conduct the
of a user. Through a small study ( = 35), they identified feature selection and knowledge modeling process using
features such as the average query length or the rank of sessions assigned to each domain. More specifically, we
documents consumed from the search results as being compute Web resource features and user behavior
feapredictive. Syed and Collins-Thompson [16] explored the tures of each session, and then select a subset of these
possibility of using regression models and features ex- features based on two feature selection strategies. Using
tracted from user accessed document content to predict the selected features, we build  prediction models for
user knowledge change on vocabulary learning tasks [17]. each domain. The process labeled in blue in Figure 1
Gwizdka et al. [18] proposed to assess learning outcomes shows an example of the data flow when predicting KI
to search environments by correlating individual search for a new session using the trained models.
behaviors with corresponding eye-tracking measures. Yu</p>
      <p>
        We conclude the three main tasks of building domain- the TREC 2014 Web Track1 dataset. This includes
knowlspecific KI prediction models as follows: edge assessment data before and after each of the search
sessions per information need, they also crawled the
web1. Domain detection of informational search pages that were assessed by the users. The experimental
sessions. Each session  can be associated with setup for obtaining the data and KIs was described by the
one or more domains to a diferent extent. For authors in [6].
the modeling purpose, we assign each session to Data Cleaning. We filtered out untrustworthy
worka single domain that it has the strongest associa- ers who meet any of the following conditions: 1) did
tion with based on textual information involved not complete the post-session test, 2) did not issue at
in the session. As each session contains textual least 1 search query, 3) selected the same option; either
information in multiple fields, it is also our task ‘YES’, ‘NO’ or for all items in the calibration test or the
to find the most suitable fields to be used for the post-session test. In the next step, we filter out sessions
domain detection. that are insuficient of computing features we need for
2. Feature extraction and domain-specific fea- building knowledge prediction models, that includes: 1)
ture selection. In this step, we first extract a sessions with no click on any results on the SERPs, and
set of features for each session  from the user 2) sessions that contain at least 1 non-English resource
behaviors and the related Web resource contents. browsed by the user. After applying all the
aforemenFor the sessions assigned to a specific domain, tioned filters, we retain 233 search sessions, with 1.361
we select features reflecting the users’ knowledge queries and 2.622 clicks per session on average.
gain and state. Knowledge Measures. Knowledge tests are
scientifically formulated tests that measure the knowledge of a
3. Domain-specific knowledge modeling. We participant on a given topic. The authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] created
formulate the prediction of knowledge state/gain knowledge tests pertaining to each of the information
as classification tasks, i.e. we aim to classify a needs. The pre (post)-knowledge score of a user in search
specific  (e.g. knowledge gain) of the user cor- sessions corresponding to a topic is measured as the
perresponding to a search session into low, moderate, centage of the correct answers on the knowledge test
high classes, with respect to a particular informa- that a given user has completed. Correspondingly, the
tion need. That is, for each domain, we conduct knowledge gain is measured as the diference between a
feature selection and train classifiers to build the user’s pre- and post-search session knowledge score.
prediction models. For the classification tasks described in Section 3, we
follow the same approach as used in [5], i.e. a Standard
Deviation Classification approach to obtain three classes
4. Dataset of learners with regard to their level of pre-KS. Assuming
approximately normal distributions of the respective test
scores (X) for the diferent topics, we transformed the
test scores into Z-scores with a mean of 0 and a Standard
Deviation (SD) of 1 (standardization). We then used
sta
      </p>
      <sec id="sec-2-1">
        <title>To address the aforementioned tasks, we adopt an ex</title>
        <p>
          isting dataset which has been used by previous works
on understanding and predicting user knowledge state
and gain [
          <xref ref-type="bibr" rid="ref4">4, 5</xref>
          ]. This dataset includes search sessions
conducted by crowd workers spanning across 11
information needs for diferent topics randomly selected from
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Domain Detection</title>
      <sec id="sec-3-1">
        <title>The goal of this step is to assign each informational search</title>
        <p>session to a most relevant domain. More specifically, we
extract textual information from queries and consumed
Web resources of a session and apply two text classifiers
on them to detect its domain.</p>
        <sec id="sec-3-1-1">
          <title>5.2. Textual Information Extraction</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>High Table 2</title>
          <p>7974 eDxotmraacitneddetetexctutiaolnincfoonrfmigautriaotni.ons and abbreviations based on
65 Abbreviation Description</p>
          <p>QW Query words
WPT Web page titles
WPC Web page contents
QW &amp; WPT Query words and Web page titles
QW &amp; WPC Query words and Web page contents
WPT &amp; WPC Web page titles and Web page contents
QW &amp; WPT &amp; WPC Query words, Web page titles, and Web page contents
all MV Mjority vote based on QW, WPT and WPC result
tistically defined intervals (low: X &lt; -0.5 SD; moderate: document into 10 diferent top-level domains. Each
do-0.5 SD &lt; X &lt; 0.5 SD; high: 0.5 SD &lt; X) for the classi- main has a score of probability, and the domain with the
ifcation of the learners into roughly equal groups with highest probability is considered as the most relevant
dolow, moderate, or high pre-KS. The same procedure was main. The 10 top-level domains we adopted in this work
repeated for post-KS and KG. Table 1 shows the result- are arts, business, computers, games, health, home,
recreing numbers of learners for the respective classes and ation, science, society and sports. The classes are adopted
underlying statistics. from the Open Directory Project 4.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>During a session, a user enters query terms to commu</title>
        <p>nicate her information need on a topic related to certain
5.1. Methods for Domain Detection domains, we extract and combine all query terms in a
Domain detection in this paper is formulated as a text clas- session and use it for domain detection. Titles of visited
sification problem (“to which predefined class or category Web pages can be an indicator of the domain that a user
is this text most likely to belong?” 2). This work aims at choose to learn in a session. Therefore, we combined the
exploring the possibility of improving  prediction per- titles of all the visited Web pages as the second source
formance by building more focused models, rather than of textual information. Besides titles, we also analyze
developing novel domain detection techniques. We there- their content by combining all textual content of visited
fore utilize two existing domain detection tools, namely Web pages in a session. This result in three types of
texTagTheWeb and uClassify. tual information: query words (QW ), title of the visited</p>
        <p>TagTheWeb [20] can automatically categorize a given Web pages (WPT ) and textual contents of the visited Web
text into Wikipedia categories with a probability. The pages(WPC). Moreover, we consider all the five
combinacategory with the highest probability is considered to be tions of these sources (as listed in Table 2) and a majority
the most relevant domain. The 19 top level Wikipedia cat- vote strategy based on results of using the three textual
egories adopted by TagTheWeb are: arts, culture, games, sources respectively (all MV ). For the all MV strategy,
geography, health, history, humanities, industry, law, life, when all three votes are diferent from each other, we
mathematics, matter, nature, people, philosophy, reference assign the session to other domain.
works, religion, science and technology and society.
Moreover, TagTheWeb could also classify text into Wikipedia 5.3. Evaluation
sub-categories, however, in this work, we focus only on
the 19 top-level categories as the granularity fits
better into the task scenario and the size of experimental
dataset.</p>
        <p>uClassify3 is a free machine learning Web service that
provides classifiers for diferent applications. A classifier
called Topics from uClassify can classify a given textual</p>
      </sec>
      <sec id="sec-3-3">
        <title>We apply both text classification tools for all 8 configura</title>
        <p>tions (Table 2) respectively. In this section, we present
the evaluation results of domain detection, and choose
the configuration that the next step relies on accordingly.</p>
        <p>Ground Truth. Two authors of this paper manually
assigned labels to the sessions according to the
corresponding topics that were presented to the crowd
workers when creating the dataset. As sessions corresponding
2https://www.uclassify.com/docs/intro
3https://www.uclassify.com/browse/uclassify/topics</p>
      </sec>
      <sec id="sec-3-4">
        <title>4http://www.dmoz.org</title>
        <p>to the same topic could have diferent domain focus, we
decided to allow multiple correct domain labels when
building the ground truth. Consequently, in the following
evaluation, a domain classification outcome was treated
as correct, if the predicted domain was among the
assigned labels. The description of the pre-defined search
topics and the domain labels assigned to them are shown
in Table 3. The annotators agreed on all labels.</p>
        <p>Evaluation Results. For each of the 16
configurations (2 classifiers X 8 textual information combinations),
we compute the overall accuracy of the classification
result. Based on the results shown in Table 4, we found
that all accuracy scores are above 0.550 for TagTheWeb.</p>
        <p>The best performance of TagTheWeb is achieved when
combining query words and Web resource titles (QW &amp;
WPT ), as well as when combining all three fields ( QW &amp;
WPT &amp; WPC), 174 of 233 sessions are detected correctly
(accuracy = 0.747). We choose the configuration QW &amp;
WPT for later steps, as it has higher eficiency compared
to QW &amp; WPT &amp; WPC. Meanwhile, all accuracy scores
of uClassify are below 0.25. Therefore, we decide not to
pass the result of uClassify to later steps.</p>
        <p>To better illustrate the domain detection result, we
present a heatmap in Figure 2 showing the assignment
of sessions corresponding to each topic to the target
domains by TagTheWeb using QW&amp;WPT. We found that
81.5% of sessions in our GT are assigned to 5 domains,
namely, history (56 sessions), health (49 sessions), na- 6.1.1. Model
ture (32 sessions), geography (29 sessions) and people (24
session). As the next modeling steps require suficient
amount of training data in order to build reliable models,
we continue the experiment with the 190 sessions
categorized into these 5 most frequent domains, and discard
the rest 43 sessions which are categorized into society
(15 sessions), humanities (10 sessions), philosophy (10
sessions), culture (5 sessions), life (2 sessions) or science and
technology (1 sessions).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Modeling User Knowledge</title>
      <sec id="sec-4-1">
        <title>6.1. Approach</title>
        <p>As described in Section 3, we follow the same approach
as in [5, 6] and cast the problem of predicting user s
as classification tasks. More specifically, each session 
is represented as a feature vector, ⃗ = (1, 2, ..., ),
where the features considered are introduced later in
this section. We apply a range of standard classification
models, namely, Naive Bayes (nb), Logistic Regression
(lr), Support Vector Machine (svm) and Random Forest
(rf ). For our experiments, we used the scikit-learn library
for Python5. We tune hyperparameters of the algorithms
using grid search.</p>
        <p>Reduce feature Redundancy. We also compute the
Pearson correlation coeficient (,  ) between each
pair of features across all sessions in a specific domain.
If |(,  )| ≥  , i.e. features are highly similar
to each other, we remove the one which has a lower
(, ) from the pair.</p>
      </sec>
      <sec id="sec-4-2">
        <title>6.2. Evaluation</title>
        <sec id="sec-4-2-1">
          <title>The generation of class labels of the sessions in our ex</title>
          <p>6.1.2. Feature Extraction perimental dataset is described in Section 4. We evaluate
model performances by means of 10-fold cross-validation.</p>
          <p>As the focus of this work is to explore the performance of Further, classification performance is measured in terms
domain-specific knowledge prediction models, we make of the following metrics:
use of the same set of features as described in [6]. The • Accuracy (Accu): percentage of search sessions that
features consist of two categories according to the data were classified with the correct class label.
source: Web resource features and user behavior features. • Precision (P), Recall (R), F1 (F1) score of class i: the
stanThe 109 Web resource features are extracted based on dard precision, recall and F1 score on the prediction
the content of the webpages which users visited during a result of each class i.
session, including features computed based on document • Macro average of P, R and F1: the average of the
correcomplexity (e.g. average number of words per sentence, sponding score across 3 classes.
Gunning Fog Grade6), HTML structure (e.g. Number Baselines. We compare our approach against [5],
of&lt;script&gt;elements) and linguistic characteristics (based who proposed to build classifiers to predict KG and
poston the 2015 LIWC dictionaries7) of the Web resource con- KS using user interaction and session features only. Their
tent. The 66 user behavior features are extracted from approach considered feature selection based on the
featurethe user interaction with the search engine during a ses- KI-correlation ( ) and the between-feature-correlation
sion, namely features related to the session (e.g. session ( ). Using their approach, we make use of all the 190
duration), queries (e.g. average query length), SERP (e.g. sessions which are relevant to the aforementioned 5
dothe lowest rank of click), browsing behavior (e.g. ratio of mains (history, health, nature, geography and people) to
revisited pages) and mouse movements (e.g. total scroll build classifiers for the knowledge prediction tasks. We
distance). As the features have been introduced and in- also compare our approach against an improved
basevestigated in details by previous works [6], we will not line (denoted as baseline’) for which we apply these 190
go into details in this paper. sessions to build non-domain-specific classifiers using
both user interaction features and Web resource features.
6.1.3. Metrics for Feature Selection In the experiment, we tuned the hyper-parameters of
these models again using grid search to ensure a fair
comparison.</p>
          <p>Due to the dificulty in obtaining ground truth data with
user knowledge assessment, the scale of training and
testing data is limited. Hence, feature selection is important
for building reliable models, and in particular, to avoid 6.2.1. Overall Performance
overfitting. For sessions assigned to each domain, our Using our approach, the overall accuracy scores are above
goal is to select a set of features  ′ ⊆  that produce 0.610 for all 3 prediction tasks and the overall average
the most reliable model for the  prediction tasks. We F1 scores are above 0.609 (see Table 5). Compared to the
introduce 2 metrics that are adapted from previous work state-of-the-art baseline (baseline), we observed
improve[5]. ments for all 3 prediction tasks, with the improvements</p>
          <p>Ensure feature efectiveness. We compute the Pear- by 18.1%, 13.6% and 17.1% (average F1 score) as well as
son correlation coeficient between each feature  and 16.3%, 12.2% and 15.8% (accuracy score) for pre-KS,
post, i.e. (, ), across all sessions in a specific KS and KG prediction tasks respectively.
domain. To ensure efectiveness of features, we select Our approach and baseline’ make use of the same
feafeatures fulfilling the condition |(, )| ≥  for ture set which includes user behavior features and Web
building the classification models. resource features. Our models outperform baseline’ by
14.5%, 10.4% and 12.7% (average F1 score) as well as 12.1%,
9.5% and 12.1% (accuracy score) in the tasks of pre-KS,
post-KS and KG prediction respectively. This
demonstrates that our domain-specific knowledge modeling
ap5http://scikit-learn.org
6http://gunning-fog-index.com/
7http://liwc.wpengine.com/</p>
          <p>1.0
0.9
0.8
0.7
0.6
tea0.5
b0.4
0.3
0.2
0.1
0.0
0.65
fectiveness and redundancy results in markedly improved work. Further, in the domain detection step, only top
performance. On the other hand, the overall restrictive level categories (domains) of the taxonomies were used
settings – resulting in only two features used for KG pre- when applying TagTheWeb. Given suficient data,
accudiction – highlight further room for improvement. While racy could be improved by adopting more subcategories,
these general models worked best in our experiments, i.e. more specific domains. Moreover, other than the
refining the domain detection step (e.g. using a more fine- two exemplary solutions investigated in this work, other
grained taxonomy) could result in more coherent sets domain detection techniques could be applied as well.
of training data, allowing for the use of more (specific)
features.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7. Conclusions</title>
      <sec id="sec-5-1">
        <title>In this paper, we investigated the influence of the domain</title>
        <p>on learning-oriented informational Web search sessions,
and proposed to improve the performance of knowledge
prediction models by extending them to several
domainspecific models. We evaluated two text classifiers, i.e.
TagTheWeb and uClassify, using 8 types of textual
information respectively to categorize a session into a most
relevant domain. We observed the best domain detection
accuracy when using TagTheWeb based on query words and
web page titles. Based on this, we built domain-specific
models for knowledge prediction tasks. In our
experiments, the approach outperformed the state-of-the-art
baseline by at least 12.2% in terms of accuracy and at least
13.6% in terms of F1-Score. Thus, our work contributes
to the understanding and prediction of user knowledge
in learning-oriented informational Web search sessions.</p>
        <p>Due to the limited availability of Web search session
data as well as the corresponding user knowledge
assessment data, there are limitations in our current
experimental dataset. Therefore, observations made herein
should be validated on a large scale dataset in future</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>Part of this work is supported by the Leibniz Association, Germany (Leibniz Competition 2018, funding line "Collaborative Excellence", project SALIENT [K68/2017]).</title>
        <p>sessions on the web, in: 2018 ACM on Conference behavior across varying cognitive levels, in:
Proon Human Information Interaction and Retrieval ceedings of the 30th ACM Conference on Hypertext
(CHIIR), ACM, 2018. and Social Media, 2019, pp. 123–132.
[5] R. Yu, U. Gadiraju, P. Holtz, M. Rokicki, P. Kemkes, [15] X. Zhang, M. Cole, N. Belkin, Predicting users’
S. Dietze, Predicting user knowledge gain in infor- domain knowledge from search behaviors, in:
Promational search sessions, in: Proceedings of the ceedings of the 34th international ACM SIGIR
con41st International ACM SIGIR Conference on Re- ference on Research and development in
Informasearch and Development in Information Retrieval, tion Retrieval, ACM, 2011, pp. 1225–1226.</p>
        <p>ACM, 2018. [16] R. Syed, K. Collins-Thompson, Retrieval algorithms
[6] R. Yu, R. Tang, M. Rokicki, U. Gadiraju, S. Dietze, optimized for human learning, in: Proceedings
Topic-independent modeling of user knowledge of the 40th International ACM SIGIR Conference
in informational search sessions, Information Re- on Research and Development in Information
Retrieval Journal 24 (2021) 240–268. trieval, ACM, 2017, pp. 555–564.
[7] L. W. Anderson, D. R. Krathwohl, P. Airasian, [17] R. Syed, K. Collins-Thompson, Exploring document
K. Cruikshank, R. Mayer, P. Pintrich, J. Raths, retrieval features associated with improved
shortM. Wittrock, A taxonomy for learning, teach- and long-term vocabulary learning outcomes, in:
ing and assessing: A revision of bloom’s taxon- Proceedings of the 2018 Conference on Human
Inomy, New York. Longman Publishing. Artz, AF, formation Interaction&amp;Retrieval, ACM, 2018, pp.
&amp; Armour-Thomas, E.(1992). Development of a 191–200.
cognitive-metacognitive framework for protocol [18] J. Gwizdka, X. Chen, Towards observable indicators
analysis of mathematical problem solving in small of learning on search., in: SAL@ SIGIR, 2016.
groups. Cognition and Instruction 9 (2001) 137–175. [19] H. Liu, C. Liu, N. J. Belkin, Investigation of users’
[8] B. J. Jansen, D. Booth, B. Smith, Using the taxonomy knowledge change process in learning-related
of cognitive learning to model online searching, search tasks, Proceedings of the Association for
Information Processing &amp; Management 45 (2009) Information Science and Technology 56 (2019) 166–
643–663. 175.
[9] M. J. Cole, J. Gwizdka, C. Liu, N. J. Belkin, X. Zhang, [20] J. F. Medeiros, B. P. Nunes, S. W. M. Siqueira, L. A.</p>
        <p>Inferring user knowledge level from eye movement P. P. Leme, Tagtheweb: Using wikipedia categories
patterns, Information Processing &amp; Management to automatically categorize resources on the web,
49 (2013) 1075–1091. in: European Semantic Web Conference, Springer,
[10] F. Moraes, S. R. Putra, C. Hauf, Contrasting search 2018, pp. 153–157.</p>
        <p>as a learning activity with instructor-designed
learning, in: A. Cuzzocrea, J. Allan, N. W. Paton,
D. Srivastava, R. Agrawal, A. Z. Broder, M. J. Zaki,
K. S. Candan, A. Labrinidis, A. Schuster, H. Wang
(Eds.), Proceedings of the 27th ACM International
Conference on Information and Knowledge
Management, CIKM 2018, Torino, Italy, October 22-26,
2018, ACM, 2018, pp. 167–176. URL: https://doi.org/
10.1145/3269206.3271676. doi:10.1145/3269206.</p>
        <p>3271676.
[11] P. Vakkari, Searching as learning: A
systematization based on literature, Journal of Information</p>
        <p>Science 42 (2016) 7–18.
[12] N. Bhattacharya, J. Gwizdka, Measuring learning
during search: Diferences in interactions, eye-gaze,
and semantic similarity to expert knowledge, in:
Proceedings of the 2019 Conference on Human
Information Interaction and Retrieval, ACM, 2019, pp.</p>
        <p>63–71.
[13] N. Roy, F. Moraes, C. Hauf, Exploring users’
learning gains within search sessions, in: Proceedings
of the 2020 Conference on Human Information
Interaction and Retrieval, 2020, pp. 432–436.
[14] R. Kalyani, U. Gadiraju, Understanding user search</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Broder</surname>
          </string-name>
          ,
          <article-title>A taxonomy of web search</article-title>
          ,
          <source>SIGIR Forum 36</source>
          (
          <year>2002</year>
          )
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . URL: https://doi.org/10.1145/ 792550.792552. doi:
          <volume>10</volume>
          .1145/792550.792552.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <article-title>Lessons from the journey: a query log analysis of withinsession learning</article-title>
          ,
          <source>in: Proceedings of the 7th ACM international conference on Web search and data mining, ACM</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Collins-Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Rieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Haynes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Syed</surname>
          </string-name>
          ,
          <article-title>Assessing learning outcomes in web search: A comparison of tasks and query strategies</article-title>
          ,
          <source>in: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval</source>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holtz</surname>
          </string-name>
          ,
          <article-title>Analyzing knowledge gain of users in informational search</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>