User knowledge and Search Goals in Information Retrieval:
A benchmark and study on the evolution of users’
knowledge gain
Dima El-Zein
Université Côte d’Azur, CNRS,Laboratoire I3S, UMR 7271, Sophia Antipolis, France


                                          Abstract
                                          This abstract presents an Information Retrieval framework that personalises results based on the user’s knowledge and
                                          search goals. The framework utilises the content of the pages visited by the user to represent his/her knowledge, and a set
                                          of questions/statements the user wishes to answer to represent his/her search goals. In the absence of related datasets and
                                          benchmarks, we propose a methodology to evaluate the framework.

                                          Keywords
                                          Information Retrieval, User Knowledge, User Search Goals, Search Personalisation


1. FRAMEWORK AND                                                                                                   query, it is expected to receive a ranked list of documents
                                                                                                                   that are the least similar to the user’s knowledge and
   EVALUATION                                                                                                      the most similar to his/her goals. The decision of return-
The consideration of the user’s cognitive components in                                                            ing a document or not is based on three elements: the
the domain of Information Retrieval IR was set as one of                                                           knowledge, the goal, and the document to be proposed.
the “major challenges” by the IR community in 2018 [1].                                                            All three elements are supposed to have a textual format;
To our knowledge, there is no research dealing with the                                                            we propose 3 methods to represent them: (1) Keyword
content of the documents read by the user as his/her ac-                                                           representation using RAKE - Rapid Automatic keyword
quired knowledge. In general, such content has been used                                                           extraction [2] (2) Vector representation using GloVe -
to construct the user profile from which the user prefer-                                                          Global Vectors for Word Representation [3] (3) Vector
ences could be obtained, for example. Those profiles are                                                           representation using BERT - Bidirectional Transformers
usually either static or not frequently updated, therefore                                                         for Language Understanding - embedding [4]. Finally,
cannot help in representing the user’s knowledge, which                                                            the similarity between those elements’ representation is
is constantly evolving. The constant evolution of the                                                              calculated and documents are returned accordingly.
user’s knowledge is an important aspect to be considered                                                              Evaluation Challenges : The challenge to evaluate
when proposing documents that are supposed to have                                                                 the framework is the lack of adequate datasets or related
novel content and/or help him/her achieve a goal not yet                                                           benchmarks. Numerous existing datasets logged search
achieved.                                                                                                          sessions’ activities, however to the best of our knowledge,
   The IR Framework: We propose a cognitive agent                                                                  none did track the user’s knowledge and its change after
that is “aware” about its user’s knowledge and goals;                                                              reading a document. Our idea is to obtain such informa-
those information are set as the agent’s beliefs. The                                                              tion by adapting a public dataset [5] that measured the
user’s knowledge is represented by the content of the                                                              user’s knowledge gain during a search session. That will
documents he/she reads; the agent will update its beliefs                                                          allow us to evaluate the framework.
about the user’s knowledge after every document read.                                                                 Dataset’s Experiment : The dataset’s experiment
The goals are represented by the set of questions the user                                                         quantified the user’s knowledge gain about a topic after
wishes to answer at the end of a search session. The pro-                                                          a search session. The participants were provided an in-
posed agent employs its beliefs to provide the user with                                                           formation need sentence for a specific topic, then were
documents that contain novel information in respect to                                                             invited to search the web about it; their behaviour was
what he/she already knows and that also help to reach                                                              getting logged meanwhile. They also had to respond to
his/her search goals. Therefore, in response to a user                                                             pre- and post-session tests that consisted of statements
                                                                                                                   related to the topic. The tests assessed the participants
DESIRES 2021 – 2nd International Conference on Design of                                                           knowledge regarding the topics and were scored based
Experimental Search & Information REtrieval Systems, September
15–18, 2021, Padua, Italy
                                                                                                                   on the correctness of the answers. A user’s knowledge
" elzein@i3s.unice.fr (D. El-Zein)                                                                                 gain was measured as the difference between the post-
 00000-0003-4156-1237 (D. El-Zein)                                                                                and pre- tests’ scores.
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                        Benchmark Creation : To estimate the page knowl-
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
edge gain 𝑔𝑖 brought by each page 𝑝𝑖 , we perform a linear      ence on Human Information Interaction & Retrieval,
regression analysis of the user knowledge gain 𝐺 against        2018, pp. 2–11.
visited pages which are binary values- visited or not vis-
ited. 𝑔𝑖 would then be the regression coefficient. We
could hence understand and predict a user’s knowledge
gain after visiting a set of pages P. As the user visits one
page after the other, we track the cumulative evolution of
the knowledge gain. We construct a benchmark contain-
ing for each user, the set of submitted queries, the related
visited pages and the associated evolution of knowledge
gain.
   Framework Evaluation : The evaluation’s idea is to
submit to the framework, the set of queries submitted
by every user and suppose the user read the document
returned by the agent. We consider the study population
to be the set of users who scored zero in the pre-session
test, representing those having no previous knowledge
about the searched topic; the agent’s beliefs about the
user’s knowledge are then still empty. They will get
updated as the user starts visiting pages. The user goals
consisted of the information need and the test statements.
For the first query submitted by a user, since the agent
has no information yet about the user’s knowledge, we
return the same page visited in the benchmark. The agent
builds its initial beliefs about its user’s knowledge and
starts its personalising task. For the following queries, the
agent compares the content of the pages to be proposed
to the agent’s beliefs (both the user knowledge and goal)
and decides which document to return. We track the
evolution of the user knowledge gain and compare it to
the benchmark.


References
[1] J. S. Culpepper, F. Diaz, M. D. Smucker, Research
    frontiers in information retrieval: Report from the
    third strategic workshop on information retrieval in
    lorne (SWIRL 2018), SIGIR Forum 52 (2018) 34–90.
[2] S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic
    keyword extraction from individual documents, Text
    mining: applications and theory 1 (2010) 1–20.
[3] J. Pennington, R. Socher, C. D. Manning, Glove:
    Global vectors for word representation, in: Proceed-
    ings of the 2014 conference on empirical methods
    in natural language processing (EMNLP), 2014, pp.
    1532–1543.
[4] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
    Bert: Pre-training of deep bidirectional transform-
    ers for language understanding, arXiv preprint
    arXiv:1810.04805 (2018).
[5] U. Gadiraju, R. Yu, S. Dietze, P. Holtz, Analyzing
    knowledge gain of users in informational search ses-
    sions on the web, in: Proceedings of the 2018 Confer-