<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Result Delta Prediction Based on Knowledge Deltas for Continuous IR Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriela Gonzalez-Saez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alaa El-Ebshihy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Fink</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petra Galuščáková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florina Piroi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Iommi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorraine Goeuriot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Mulhem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>, LIG</institution>
          ,
          <addr-line>Grenoble</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research Studios Austria, Data Science Studio</institution>
          ,
          <addr-line>Vienna, AT</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The continuous evaluation of Information Retrieval Systems requires comparing IR systems both one to another, but also across collections, in other words across diferent evaluation environments (test collection and evaluation metrics). These evaluation environments may also be evolutionary versions of some given evaluation environment. In this work, we propose a methodology to measure and understand the impact the diferences between test collection representations (i.e. knowledge delta, Δ) has on system performance, and we look at the diferences in their outputs (i.e. result delta, ℛΔ). We present initial experiments with various text representations on the TREC 2004 Robust Collection, and look at the relation between the Δ and the ℛΔ.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Continuous Evaluation</kwd>
        <kwd>Evolving Test Collections</kwd>
        <kwd>Knowledge Delta</kwd>
        <kwd>Result Delta</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>means of defining Knowledge Delta ( Δ) and observing its impact on the ℛΔ. In our view, Δ
for IR is a combination of a document representation delta, Δ, and a query representations
delta, Δ, both defined as diference functions between pairs of text sequence representations.</p>
      <p>
        This paper proposes a study that looks at how various simple text representations to quantify
Δ and their impact on ℛΔ. Initial experiments are performed on the TREC 2004 Robust
Collection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As this collection stores the publishing time for each document, we consider it
to be an evolving collection. That is, we can simulate the conditions of an IR system that has to
provide answers to queries, answers extracted from a set of documents that changes over time.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        Mothe [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analysed diferent approaches to understand the efectiveness of IR systems, focusing
on studying the efectiveness with respect to the query and IR system parameters. In our work,
we are interested in understanding the change of the IR systems performance with respect to
the change of the document collection in addition to the query, in a way to predict the change
in performance of the IR system for an evolving test collection. Inspired by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we aim to use
the document representations as features for the document collection and find the correlation
between the features of document collection and the change in IR system performance.
Test collection diference, Δ: We define the Δ as a quantifiable value of the
diferences between document representations, which may be more or less complex: bag of
words, TF-IDF [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], topic detection methods (e.g. Latent Dirichlet Allocation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and conceptual
embeddings [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) and neural networks language models (e.g. Word2Vec [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
Any of these representations, or a combination of them, may contribute to generate the
document collection representation which can then be used to quantify Δ and predict the ℛΔ.
Performance impact, ℛΔ: We define ℛΔ as the absolute diference in the IR system
performance in two EEs: consider  (,  ) as the performance of systems  evaluated in
evaluation environment  with metric  , we compute ℛΔ as  (1, 1) −  (1, 2).
Prediction model, (Δ ∼ ℛ Δ): We propose to understand the impact of Δ on ℛΔ by
building a model that predicts ℛΔ from Δ. We will, first, observe the correlation between
Δ and ℛΔ using diferent text representation methods as Δ. Then, we will build a
prediction model based on these observations. Finally, we will analyse the impact of the Δ
elements on the prediction of the ℛΔ by feature selection techniques [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Dataset: We measure Δ and ℛΔ from an evolving test collection as an example of
documents changing in a real corpus. The evolving test collection is built by creating shards of a
classical test collection [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that contains timestamped documents. We use these timestamps to
assign documents, according to their temporal order, to shards and to define fixed percentages
of corpus overlap to control the evolution.
      </p>
      <p>
        Initial Experiment: We evaluate pyterrier BM25 system [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] in an evolving test collection
created from Robust [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using the MAP metric. We create 41  using 90% document overlaps
between successive shards, with full set of topics. As text representations, we test two features
used in query performance prediction [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]: Averaged Term Weight Variability (avVAR) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
and Averaged Collection Query Similarity (avSCQ) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We compare EEs with 50% of overlap
(e.g. 1 vs. 6, 2 vs. 7, etc.). Figure 1 presents changes in the MAP score (ℛΔ)
compared with the Δ calculated as the changes in the selected feature values: avVAR in (a)
and avSCQ in (b). The pearson correlation between the Δ MAP and the features is 0.5 and 0.12
for the avVAR and avSCQ, respectively. These results confirm that the changes in Δ have a
considerable efect ℛΔ values. Moreover, they show that the efect might substantially difer
for diferent features and over time.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion and Future Work</title>
      <p>We propose the definition of Knowledge Delta (Δ) for the elements of the EEs. As a first
attempt to quantify the Δ and its impact on the Result Delta (ℛΔ), we use two simple text
representation metrics, avVAR and avSCQ. We experiment on an evolving test collection which
is built by using the timestamps from the Robust test collection. The initial results show a
correlation between Δ and the ℛΔ and thus provide justification for our approach. These
results motivate us to build a prediction model (Δ ∼ ℛ Δ) that can predict the change
of the performance of an IR systems using the Δ and also to quantify Δusing diferent
text representations (see Section 2). We either plan to construct a machine learning model
that assumes Δ as input feature to predict ℛΔ or to use time series [15] techniques to
predict significant changes in Δ, which lead to changes in the performance of the IR system.
Moreover, we plan to define other types of Δ and ℛΔ, such as quantifying the diferences
in query representations (Δ) and apply them in the LongEval collection [16]. This will
contribute to understand the impact of the Δ on other ℛΔ, including ℛΔ and ℛΔ.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is supported by ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the
French Agence Nationale de la Recherche, and by the Austrian Science Fund FWF grant I4471-N.
similarity and variability evidence, in: Advances in Information Retrieval: 30th European
Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings
30, Springer, 2008, pp. 52–64.
[15] C. Chatfield, The analysis of time series: an introduction, Chapman and hall/CRC, 2003.
[16] P. Galuščáková, R. Deveaud, G. Gonzalez-Saez, P. Mulhem, L. Goeuriot, F. Piroi, M. Popel,
Longeval-retrieval: French-english dynamic test collection for continuous web search
evaluation, arXiv preprint arXiv:2303.03229 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <article-title>Test collection based evaluation of information retrieval systems</article-title>
          , Now Publishers Inc,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>González-Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <article-title>Towards the evaluation of information retrieval systems on evolving datasets with pivot systems</article-title>
          , in: K. S. Candan,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <article-title>The trec 2005 robust track</article-title>
          ,
          <source>in: ACM SIGIR Forum</source>
          , volume
          <volume>40</volume>
          , ACM New York, NY, USA,
          <year>2006</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <article-title>Analytics methods to understand information retrieval efectiveness-a survey</article-title>
          ,
          <source>Mathematics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A vector space model for automatic indexing</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>18</volume>
          (
          <year>1975</year>
          )
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jelodar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey, Multim</article-title>
          .
          <source>Tools Appl</source>
          .
          <volume>78</volume>
          (
          <year>2019</year>
          )
          <fpage>15169</fpage>
          -
          <lpage>15211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Abdulahhad</surname>
          </string-name>
          ,
          <article-title>Concept embedding for information retrieval</article-title>
          , in: G. Pasi,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Hanbury (Eds.),
          <source>Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, volume
          <volume>10772</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>563</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          , in: C.
          <string-name>
            <surname>J. C. Burges</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>K. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8</source>
          ,
          <year>2013</year>
          ,
          <string-name>
            <given-names>Lake</given-names>
            <surname>Tahoe</surname>
          </string-name>
          , Nevada, United States,
          <year>2013</year>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Déjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Ullah</surname>
          </string-name>
          ,
          <article-title>Forward and backward feature selection for query performance prediction</article-title>
          ,
          <source>in: Proceedings of the 35th annual ACM symposium on applied computing</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>690</fpage>
          -
          <lpage>697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <article-title>Using collection shards to study retrieval performance efect sizes</article-title>
          ,
          <source>ACM Transactions on Information Systems (TOIS) 37</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Declarative experimentation in information retrieval using pyterrier</article-title>
          ,
          <source>in: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hauf</surname>
          </string-name>
          ,
          <article-title>Predicting the efectiveness of queries and retrieval systems</article-title>
          ,
          <source>in: SIGIR Forum</source>
          , volume
          <volume>44</volume>
          ,
          <year>2010</year>
          , p.
          <fpage>88</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scholer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsegay</surname>
          </string-name>
          ,
          <article-title>Efective pre-retrieval query performance prediction using</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>