<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. El-Zein)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>User knowledge and Search Goals in Information Retrieval: A benchmark and study on the evolution of users' knowledge gain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dima El-Zein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université Côte d'Azur</institution>
          ,
          <addr-line>CNRS,Laboratoire I3S, UMR 7271, Sophia Antipolis</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>0000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>This abstract presents an Information Retrieval framework that personalises results based on the user's knowledge and search goals. The framework utilises the content of the pages visited by the user to represent his/her knowledge, and a set of questions/statements the user wishes to answer to represent his/her search goals. In the absence of related datasets and benchmarks, we propose a methodology to evaluate the framework.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information Retrieval</kwd>
        <kwd>User Knowledge</kwd>
        <kwd>User Search Goals</kwd>
        <kwd>Search Personalisation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. FRAMEWORK AND</p>
      <p>
        EVALUATION
query, it is expected to receive a ranked list of documents
that are the least similar to the user’s knowledge and
the most similar to his/her goals. The decision of
returnThe consideration of the user’s cognitive components in ing a document or not is based on three elements: the
the domain of Information Retrieval IR was set as one of knowledge, the goal, and the document to be proposed.
the “major challenges” by the IR community in 2018 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. All three elements are supposed to have a textual format;
To our knowledge, there is no research dealing with the we propose 3 methods to represent them: (1) Keyword
content of the documents read by the user as his/her ac- representation using RAKE - Rapid Automatic keyword
quired knowledge. In general, such content has been used extraction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (2) Vector representation using GloVe
to construct the user profile from which the user prefer- Global Vectors for Word Representation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (3) Vector
ences could be obtained, for example. Those profiles are representation using BERT - Bidirectional Transformers
usually either static or not frequently updated, therefore for Language Understanding - embedding [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Finally,
cannot help in representing the user’s knowledge, which the similarity between those elements’ representation is
is constantly evolving. The constant evolution of the calculated and documents are returned accordingly.
user’s knowledge is an important aspect to be considered Evaluation Challenges : The challenge to evaluate
when proposing documents that are supposed to have the framework is the lack of adequate datasets or related
novel content and/or help him/her achieve a goal not yet benchmarks. Numerous existing datasets logged search
achieved. sessions’ activities, however to the best of our knowledge,
      </p>
      <p>
        The IR Framework: We propose a cognitive agent none did track the user’s knowledge and its change after
that is “aware” about its user’s knowledge and goals; reading a document. Our idea is to obtain such
informathose information are set as the agent’s beliefs. The tion by adapting a public dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that measured the
user’s knowledge is represented by the content of the user’s knowledge gain during a search session. That will
documents he/she reads; the agent will update its beliefs allow us to evaluate the framework.
about the user’s knowledge after every document read. Dataset’s Experiment : The dataset’s experiment
The goals are represented by the set of questions the user quantified the user’s knowledge gain about a topic after
wishes to answer at the end of a search session. The pro- a search session. The participants were provided an
inposed agent employs its beliefs to provide the user with formation need sentence for a specific topic, then were
documents that contain novel information in respect to invited to search the web about it; their behaviour was
what he/she already knows and that also help to reach getting logged meanwhile. They also had to respond to
his/her search goals. Therefore, in response to a user pre- and post-session tests that consisted of statements
related to the topic. The tests assessed the participants
knowledge regarding the topics and were scored based
on the correctness of the answers. A user’s knowledge
gain was measured as the diference between the
postand pre- tests’ scores.
      </p>
      <p>Benchmark Creation : To estimate the page
knowledge gain  brought by each page , we perform a linear
regression analysis of the user knowledge gain  against
visited pages which are binary values- visited or not
visited.  would then be the regression coeficient. We
could hence understand and predict a user’s knowledge
gain after visiting a set of pages P. As the user visits one
page after the other, we track the cumulative evolution of
the knowledge gain. We construct a benchmark
containing for each user, the set of submitted queries, the related
visited pages and the associated evolution of knowledge
gain.</p>
      <p>Framework Evaluation : The evaluation’s idea is to
submit to the framework, the set of queries submitted
by every user and suppose the user read the document
returned by the agent. We consider the study population
to be the set of users who scored zero in the pre-session
test, representing those having no previous knowledge
about the searched topic; the agent’s beliefs about the
user’s knowledge are then still empty. They will get
updated as the user starts visiting pages. The user goals
consisted of the information need and the test statements.</p>
      <p>For the first query submitted by a user, since the agent
has no information yet about the user’s knowledge, we
return the same page visited in the benchmark. The agent
builds its initial beliefs about its user’s knowledge and
starts its personalising task. For the following queries, the
agent compares the content of the pages to be proposed
to the agent’s beliefs (both the user knowledge and goal)
and decides which document to return. We track the
evolution of the user knowledge gain and compare it to
the benchmark.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Culpepper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          , M. D. Smucker, Research frontiers in
          <source>information retrieval: Report from the third strategic workshop on information retrieval in lorne (SWIRL</source>
          <year>2018</year>
          ),
          <source>SIGIR Forum 52</source>
          (
          <year>2018</year>
          )
          <fpage>34</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cowley</surname>
          </string-name>
          ,
          <article-title>Automatic keyword extraction from individual documents</article-title>
          ,
          <source>Text mining: applications and theory 1</source>
          (
          <year>2010</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holtz</surname>
          </string-name>
          ,
          <article-title>Analyzing knowledge gain of users in informational search sessions on the web</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Human Information Interaction &amp; Retrieval</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>