<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning to Rank for Knowledge Gain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Rokicki</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ran Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Hienert</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science &amp; Intelligent Systems Group, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GESIS - Leibniz Institute, for the Social Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>L3S Research Center, Leibniz University Hannover</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Web search has often been used as a starting point to learn. Search as Learning (SAL) research aims at supporting learning activities through techniques such as user interface optimization, retrieval, and ranking. In this work, we investigate the possibility of re-ranking search engine results towards learning to improve the overall knowledge gain of the learner. We make two contributions: (1) proposing a framework for re-ranking search results by attributing the overall knowledge gain to viewed documents in the session. (2) Applying this framework to a SAL evaluation dataset. We show that the ranking can be significantly improved with respect to knowledge gain by using ranking and content features.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1. We propose a general framework for re-ranking search results regarding knowledge gain
attributed to individual documents in the search session to optimize rankings for the
learning outcome.
2. Applying the framework to an existing search as learning dataset. Results show that
the ranking can be improved for a specific learning task. Therefore, we encourage other
researchers to apply our framework to diferent learning tasks and topics to identify
factors that can be used to further improve rankings toward knowledge gain.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Many previous works investigated the relation between the user knowledge state change, their
search behavior, and the Web resources consumed. For instance, Gadiraju et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] studied
the impact of information needs on the search behavior and knowledge gain of search engine
users. Collins-Thompson et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] studied the influence of query types on knowledge gain,
ifnding that intrinsically diverse queries lead to increased knowledge gain. Bhattacharya et al.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] studied the relation between eye-tracking measures and users’ knowledge change. Liu et al.
[7] investigated the influence of three diferent types of learning resource on users’ learning
outcome in search sessions.
      </p>
      <p>
        Efort has also been made in assessing user knowledge state/gain with automated approaches.
User interaction features [
        <xref ref-type="bibr" rid="ref2">2, 8</xref>
        ], web resource content features [9, 10], and multimedia features
[11] have been considered by previous works to build classification models to predict user
knowledge state and knowledge gain in search sessions. Gwizdka et al. [12] proposed to assess
learning outcomes in search environments by correlating individual search behaviors with
corresponding eye-tracking measures.
      </p>
      <p>With this extended understanding on human learning in web search, the next goal is to
optimize search systems to better support user learning. Syed et al. [13] proposed a retrieval
algorithm that focuses on diversification to help with the exploration of topics. In later works
[14, 10], Syed et al. proposed to optimize the learning outcome of the vocabulary learning task
by selecting a set of documents that consider the keyword density and domain knowledge of
the learner, and proposed a theoretical frameworks accordingly. However, the question of how
to improve users’ learning gains through re-ranking in a general search engine context has not
been suficiently explored. In this work, we explored the use of learning-to-rank techniques to
improve users’ knowledge gain.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SAL Ranking Framework</title>
      <p>As a base for our experiments, we use the openly available SAL-Lightning Dataset [15]. It was
created from a lab study with 114 participants searching freely on the Web to learn about the
formation of lightning and thunder. This is a complex topic which needs the understanding of
several interwoven concepts. Participants could use every Web resource they would like within
30 minutes. Their knowledge states were measured before and after the search session with
multiple-choice questionnaires and self-written essays. User behavior and resource features
were recorded by screen recordings, visited Web pages, browsing timelines, gaze data, browser
interaction data, knowledge data, and questionnaires.</p>
      <p>A first analysis [ 16] shows that participants use search engines in their learning tasks to
identify useful resources by scanning through search engine result pages (SERPs). Then they
browse resources such as textual Web pages or video pages, checking for topic-related content
to read or watch. Afterward, they return to the SERP, inspect resources further down in the
result list, or refine their queries and start the cycle again.</p>
      <p>Experimental Dataset. For this paper, we use a selection of the logged resources listed above.
Namely, we use the search interaction data for 74 of the participants, excluding participants
with partially missing data1.</p>
      <p>Based on the materials explained in the resource paper, we build two more resources: (1)
from the SERPs we extracted the search number and URL, the search type (web, images, videos,
news, books) and the query terms. For every linked resource such as text links, images, videos
and knowledge graph on the SERP we extracted the position, URL, title, and snippet. (2) For web
pages clicked by a participant on a SERP, the HTML with the actual text content was obtained
during the lab experiments. However, for those links not clicked, we only have the URLs. These
resources were subsequently crawled from the web archive2 with a snapshot date as close as
possible to the time of the original experiment. The final data for our analysis consists of 706
rankings, 465 of which contain clicks on search results. These rankings consist of 25,829 links
and for 99.34% we have the HTML content.</p>
      <p>
        In line with [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], participants were divided into three groups based on their initial knowledge
state (KS) measured by the multiple-choice questionnaires before the experiment (pre− KS).
The number of users per group and some general statistics on the users’ interaction with the
rankings are given in Table 1.
      </p>
      <p>Clickthrough data and ranking labels. To optimize rankings for learning outcomes,
the dataset ofers two signals of result usefulness as possible basis of ranking optimization
goals. Firstly, the clickthrough data obtained in the learning setting ofers the first relevance
signal to learn and evaluate a reranking model. To this end, the relevance label is taken to be
1 in case the result was clicked by the user and 0 otherwise. Secondly, the knowledge state
measurements based on questionnaires before and after the search sessions allow for estimates
on the usefulness of search results for learning outcomes.</p>
      <p>As indicated in Figure 1, we assume that the contribution of a given document to the users’
learning outcome is proportional to the dwell time. Hence, we devise a relevance label that
attributes the knowledge gain achieved in the session to individual documents based on the
dwell times. We compute the KG(d) relevance label for a document  as follows:
() =
(- − -) + 1
10 − - + 1</p>
      <p>· ∑︀ 
(1)
1Among others, 20 participants were excluded due to a bug in the tracking scripts that resulted in erroneous SERP
interaction data at the beginning of the study.
2http://archive.org
where the user’s knowledge gain after the session is computed as the diference between the
knowledge state before (-) and after the session (-), and  is dwell time on
document d. In the first term of the formula, we normalize the KG(d) by the maximal achievable
value and add smoothing terms; in the second term we weight by dwell time, as explained
above3.</p>
      <p>Ranking Features. We use the following 17 basic (re-)ranking features geared towards
pointwise learning to rank. In addition to the position in the original ranking and the length of
the query, we compute the following 4 features based on the title, the snippet, and the content
of each search result, respectively:
• “sum_qterm”: Sum of the number of query term occurrences within the search result.
• “jaccard_sim”: the Jaccard similarity between search result and query.
• “bm25”: BM25 measure based on search result field and query. Parameters were chosen
according to [17].
• “bm25_ft”: bm25 measure computed with alternative document frequency to account for
the topically narrow dataset. Word counts from the fasttext4 word vector model for the
German language were used as a substitute.</p>
      <p>We extracted the textual content from the crawled search results with the help of the Inscriptis
library [18].</p>
      <p>Content Features. We use the same 114 features as in [11] to represent the textual
information in the web documents. These are computed from three diferent perspectives: 1)
Complexity of the textual content in the document, including both the descriptive metrics (e.g.
number of words, length of sentences) as well as scientifically defined complexity or readability
measures (e.g. Gunning Fog Grade5, Flesch-Kincaid Grade [19]); 2) HTML structure of the web
3Note that the sum of KG relevance labels for each user is less than 1 and the labels are therefore much lower when
compared to the click based labels, with an average value of 0.058 for relevant documents.
4https://fasttext.cc/
5http://gunning-fog-index.com/
document, which indicates the type of content (e.g. existence of item list) and how a document
is organized (e.g. length of paragraphs); 3) Linguistic features that reflect the psychological
processes, sentiment, and the writing style of the content, which are computed based on the
2015 Linguistic Inquiry and Word Count (LIWC) dictionaries6.</p>
      <p>Experimental Setup. Based on the two kinds of relevance labels defined above, we compare
basic pointwise learning-to-rank schemes against two baselines: a simple ranking based on the
BM25 measure using the substitute document frequencies, and the original ranking given by
the Google search. The learning models used in this work are Lasso, Ridge, and Random Forest
Regression7.</p>
      <p>The models were trained and evaluated in a user-wise leave-one-out cross-validation scheme,
whereby the models were iteratively evaluated on the rankings of one user and trained on
the rest. Hyperparameters were not optimized. The performance is measured in terms of
Normalized Discounted Cumulated Gain (NDCG) and Precision@10 (P@10) for the binary
‘clicked’ ranking goal. For the knowledge gain oriented rankings, we replace the precision
measurement with a corresponding KG@10 measure, which is the average KG relevance among
the 10 highest ranked results.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We compare ranking performance to the original ranking in Tables 2 and 3 for each of the two
target variables. We distinguish performances based on the pre− KS user groups; however, the
models are always computed on the complete dataset.</p>
      <p>Ranking performance. Overall, the re-ranking approaches outperform the original ranking
baseline for both ranking tasks. Among the tested approaches, Random Forest performs best in
most situations, with P@10 and KG@10 significantly improved when compared to the original
ranking.</p>
      <p>There are diferences in performance between the two ranking tasks, which are in line with
our expectations based on the diferent value ranges of the underlying relevance labels described
6http://liwc.wpengine.com/compare-dictionaries/
7The experiments were carried out using scikit-learn version 1.0.1.
in Section 3. However, there are also diferences in performance between pre − KS groups. The
ranking appears to be more challenging to optimize for high pre− KS users. Particularly for
the “KG” model results shown in Table 3, both the original ranking, as well as the re-ranked
results lists perform worse in terms of NDCG, when compared to low or medium pre− KS user
rankings. This suggests that for the lower pre− KS users there is more potential to optimize for
improved learning outcomes.</p>
      <p>Feature importances. Figure 2 shows feature importance in terms of mean decrease in
impurity, computed with the random forest model. Overall, the ranking related features are the
most useful features for both ranking tasks – except for the content length, all of them appear
among the 25 most useful features. The most important feature overall is the text query length.
As is expected for a re-ranking task, the original ranking is the second most useful feature on
average and even, barely, the most important feature for the ‘clicked’ relevance task. Content
features on the other hand are overall lower ranked, although still useful apparently. One could
assume that information needs evolve throughout a learning session and the query length might
be an indicator of that, with longer, more specific, queries in the later stages of learning sessions.
However, upon correlating query length to session progress, we find that only the high pre -KS
users exhibit a relationship, with signicfiantly shorter queries (  &lt; 0.01) in the second halves
of the sessions – queries in the first half were dominated by general queries on formation of
thunderstorms, while in the second half of the sessions, short and specific queries into aspects
and technical terms could be observed. This diference might be one of the ways in which the
query length helps the models to optimize rankings for improved learning outcomes.</p>
      <p>In terms of diferences between the ranking tasks, among the most useful features we
observe higher importance values for the ‘ranking_query_length’ and ‘ranking_snippet_bm25_ft’
features for the knowledge gain based relevance prediction, when compared to ranking based
on relevance derived from clicks. This indicates that there may be a diference when optimizing
for knowledge gain directly, over just optimizing the rankings for the click based relevance,
which, in contrast, is an indicator of the usefulness of results perceived by the users.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this work we proposed a general framework for re-ranking search results regarding knowledge
gain to optimize rankings for the learning outcome. We applied the framework to an existing
search as learning dataset, showing that the ranking can be improved towards higher knowledge
gain. In addition, our results indicate that for users with lower amounts of knowledge going into
the session, there appears to be more potential to optimize for improved learning outcomes. In
terms of features, the query length was particularly helpful in optimizing rankings for improved
learning outcomes, with some users issuing shorter queries for more specific technical terms in
the later part of the sessions. Our results also indicate that there might be diferences when
optimizing for KG directly, when compared to click based relevance indicators.</p>
      <p>This work also has some limitations. We applied our SAL re-ranking framework to only one
learning task and topic and to a limited number of participants and rankings. This could have
influenced the results. Additionally, more specialized content extraction and representation
approaches geared towards images and videos might be more appropriate for these types of
search results. Finally, attributing knowledFgeeagatuinretoImdopcuomrtaenntcsebyodf wraenlldtiomme ifsoorenslyt a rough
for 'clicked' and 'KG_mc'
0.00 0.05 0.10 0.15</p>
      <p>Mean decrease in impurity
Figure 2: Feature importances based on mean decrease in impurity for the “clicked” and “KG” labels.
The 25 most important features are shown, ordered according to the sum of importance values.
ranking_query_length</p>
      <p>ranking_position
ranking_snippet_length
ranking_content_bm25
ranking_snippet_bm25_ft
ranking_snippet_jaccard_sim</p>
      <p>ranking_snippet_bm25
ranking_content_bm25_ft
ranking_title_jaccard_sim
ranking_content_jaccard_sim
ranking_snippet_sum_qterm_freq
ranking_content_sum_qterm_freq
ranking_title_bm25_ft
ranking_title_bm25</p>
      <p>content_death
ranking_title_sum_qterm_freq
ranking_title_length</p>
      <p>content_affect
content_Parenth
content_AllPunc
content_motion</p>
      <p>content_Sixltr
content_c_char
content_home
content_WPS
model
clicked
KG
hypothesis. Higher dwell times could also indicate dificulties with the legibility of the document
depending on the task and topic. However, we think that attributing the overall knowledge gain
to the consumed resources within the session is a natural way to find the most helpful resources
for learning – and in a second instance, these resources should be ranked higher. Therefore,
we encourage other researchers to use our framework for other learning tasks and topics to
understand the efects better.</p>
      <p>Previous works indicated the possibility of identifying search sessions with a learning intent
using in-session data. The preliminary findings in this work show that search result ranking
can be optimized towards knowledge gain. Combining these insights, this work serves as a
starting point for search engine optimization for human learning from the retrieval and ranking
perspective. In future work, we will investigate the impact of knowledge gain oriented
reranking strategies in real-world search sessions through field studies and continue improving
the re-ranking algorithms.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is partially funded by the Leibniz Association, Germany (Leibniz Competition 2018,
funding line "Collaborative Excellence", project SALIENT [K68/2017]).
[7] C. Liu, X. Song, How do information source selection strategies influence users’ learning
outcomes’, in: Proceedings of the 2018 Conference on Human Information Interaction &amp;
Retrieval, 2018, pp. 257–260.
[8] X. Zhang, M. Cole, N. Belkin, Predicting users’ domain knowledge from search
behaviors, in: Proceedings of the 34th international ACM SIGIR conference on Research and
development in Information Retrieval, ACM, 2011, pp. 1225–1226.
[9] R. Yu, R. Tang, M. Rokicki, U. Gadiraju, S. Dietze, Topic-independent modeling of user
knowledge in informational search sessions, Information Retrieval Journal 24 (2021)
240–268.
[10] R. Syed, K. Collins-Thompson, Exploring document retrieval features associated with
improved short-and long-term vocabulary learning outcomes, in: Proceedings of the 2018
conference on human information interaction &amp; retrieval, 2018, pp. 191–200.
[11] C. Otto, R. Yu, G. Pardi, J. v. Hoyer, M. Rokicki, A. Hoppe, P. Holtz, Y. Kammerer, S. Dietze,
R. Ewerth, Predicting knowledge gain during web search based on multimedia resource
consumption, in: International Conference on Artificial Intelligence in Education, Springer,
2021, pp. 318–330.
[12] J. Gwizdka, X. Chen, Towards observable indicators of learning on search., in: SAL@</p>
      <p>SIGIR, 2016.
[13] R. Syed, K. Collins-Thompson, Optimizing search results for human learning goals,</p>
      <p>Information Retrieval Journal 20 (2017) 506–523.
[14] R. Syed, K. Collins-Thompson, Retrieval algorithms optimized for human learning, in:
Proceedings of the 40th International ACM SIGIR Conference on Research and Development
in Information Retrieval, ACM, 2017, pp. 555–564.
[15] C. Otto, M. Rokicki, G. Pardi, W. Gritz, D. Hienert, R. Yu, J. von Hoyer, A. Hoppe, S. Dietze,
P. Holtz, Y. Kammerer, R. Ewerth, Sal-lightning dataset: Search and eye gaze behavior,
resource interactions and knowledge gain during web search, in: ACM SIGIR Conference
on Human Information Interaction and Retrieval, CHIIR ’22, Association for Computing
Machinery, New York, NY, USA, 2022, p. 347–352.
[16] G. Pardi, J. von Hoyer, P. Holtz, Y. Kammerer, The role of cognitive abilities and time spent
on texts and videos in a multimodal searching as learning task, in: Proceedings of the
2020 Conference on Human Information Interaction and Retrieval, 2020, pp. 378–382.
[17] T. Qin, T.-Y. Liu, J. Xu, H. Li, Letor: A benchmark collection for research on learning to
rank for information retrieval, Information Retrieval 13 (2010) 346–374.
[18] A. Weichselbraun, Inscriptis–a python-based html to text conversion library optimized
for knowledge extraction from the web, arXiv preprint arXiv:2108.01454 (2021).
[19] J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, B. S. Chissom, Derivation of new readability
formulas (automated readability index, fog count and flesch reading ease formula) for navy
enlisted personnel, Research Branch Report 8-75, Naval Technical Training Command
Millington TN Research Branch, 1975.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vakkari</surname>
          </string-name>
          ,
          <article-title>Searching as learning: A systematization based on literature</article-title>
          ,
          <source>Journal of Information Science</source>
          <volume>42</volume>
          (
          <year>2016</year>
          )
          <fpage>7</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rokicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kemkes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <article-title>Predicting user knowledge gain in informational search sessions</article-title>
          , in: K.
          <string-name>
            <surname>Collins-Thompson</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Mei</surname>
            ,
            <given-names>B. D.</given-names>
          </string-name>
          <string-name>
            <surname>Davison</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , E. Yilmaz (Eds.),
          <source>The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2018</year>
          , Ann Arbor, MI, USA, July
          <volume>08</volume>
          -
          <issue>12</issue>
          ,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hoppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kammerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ewerth</surname>
          </string-name>
          ,
          <article-title>Current challenges for studying search as learning processes</article-title>
          , in: Linked Learning Workshop - Learning and
          <article-title>Education with Web Data (LILE)</article-title>
          ,
          <source>in conjunction with ACM Conference on Web Science</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holtz</surname>
          </string-name>
          ,
          <article-title>Analyzing knowledge gain of users in informational search sessions on the web</article-title>
          , in: C.
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N. J.</given-names>
          </string-name>
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Byström</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Scholer</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2018 Conference on Human Information Interaction and Retrieval</source>
          ,
          <string-name>
            <surname>CHIIR</surname>
          </string-name>
          <year>2018</year>
          , New Brunswick, NJ, USA, March
          <volume>11</volume>
          -15,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Collins-Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Rieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Haynes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Syed</surname>
          </string-name>
          ,
          <article-title>Assessing learning outcomes in web search: A comparison of tasks and query strategies</article-title>
          ,
          <source>in: Proceedings of the 2016 ACM on conference on human information interaction and retrieval</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gwizdka</surname>
          </string-name>
          ,
          <article-title>Relating eye-tracking measures with changes in knowledge on search tasks</article-title>
          ,
          <source>in: Proceedings of the 2018 ACM Symposium on Eye Tracking Research &amp; Applications</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>