<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>for Temporal Generalization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arthur Muanza Ndiema</string-name>
          <email>arthur-muanza.ndiema@smail.th-koeln.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jüri Keller</string-name>
          <email>jueri.keller@th-koeln.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Schaer</string-name>
          <email>philipp.schaer@th-koeln.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Query Clustering, Relevance Feedback, Longitudinal Evaluation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TH Köln (University of Applied Sciences)</institution>
          ,
          <addr-line>Claudiusstr. 1, Cologne, 50678</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper details the participation of the CIR_cluster team in the CLEF 2025 LongEval for WebSearch task, for which we submitted four distinct runs. In longitudinal settings, approaches that leverage historical information-such as past relevance judgments-have demonstrated strong efectiveness. However, these methods are limited when no such information is available. For instance, relying on previous clicks is infeasible for queries that have never been issued before. We hypothesize that documents relevant to a given query are also relevant to its semantic variants. Based on this assumption, we cluster queries to identify query variants. This enables us to link previously unseen queries to the histories of its query variants. By that, the extended approaches can generalize not only to new and updated documents but also to new and updated queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In this work, we describe the participation of the CIR_cluster team in the CLEF 2025 LongEval WebSearch
task. This task aims to evaluate retrieval systems over time, focusing on how well they adapt to changes
in the web and user behavior [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. The lab provides two dynamic test collections, consisting of
multiple snapshots of the same search setting. Each snapshot describes an evolved state of the document
corpus, query set, and relevance judgments (qrels). These settings provide a unique opportunity to
evaluate the efectiveness of retrieval systems and likewise open up new opportunities to develop
relevance signals from past information.
      </p>
      <p>
        In previous works, we have shown that approaches that leverage historical information, such as past
relevance judgments, can achieve strong efectiveness in longitudinal settings [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. As such, the Qrel
Boost (QB) approach boosts relevant query-document pairs from previous snapshots, assuming that if
a document was relevant for a query in the past, it is likely to be relevant for the same query in the
future. In a similar manner, the Relevance Feedback (RF) approach extends the query with terms from
previously relevant documents, assuming that these terms are still relevant for the current query.
      </p>
      <p>Both methods are limited when no prior information is available. For example, relying on previous
qrels is infeasible for queries that have never been issued before. In such cases, for unseen queries, the
system falls back to the base ranker. Further, it was observed that many similar queries are captured
in the test collections. Often, they even difer only on a lexical level, such as slight spelling variations</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>2. Approaches</title>
      <p>To source more relevance information from prior snapshots and to overcome the limitation that the
approaches cannot generalize to new queries, we propose to identify query variants by clustering
queries. These clusters are then used in two retrieval systems to produce the submitted rankings.</p>
      <sec id="sec-2-1">
        <title>2.1. Query Clustering</title>
        <p>
          All queries for all snapshots are grouped into clusters of similarity. This clustering was performed based
on an early version of the LongEval WebRetrieval dataset, before queries were separated into individual
formed into 1024-dimensional vectors using sentence transformers and the
Lajavaneusse/sentencecamambertlarge model [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ].2 This model was specifically trained for French texts.
1
2https://huggingface.co/Lajavaness/bilingual-embedding-large
        </p>
        <p>Subsequently, the embedded queries were clustered using k-means and DBSCAN [13, 14]. K-means
clustering requires pre-defining the number of clusters beforehand [ 13]. Since the distribution of queries
is unknown, we estimated it based on the results of diferent  values using the elbow method [14]. The
DBSCAN algorithm does not require a target number of clusters and also identifies outliers [ 15]. This
makes it theoretically well suited, as many independent queries can be expected and should be excluded
from the clustering. Instead, the minimum points per cluster (MinP) and the maximum distance between
two points ( ) need to be defined. Both parameters were estimated based on a grid search, the adjusted
rand score, and the elbow method.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Retrieval Systems</title>
        <p>
          The query clustering is used in the retrieval ranking by mapping the query to its cluster. The clusters can
be understood as an abstraction of a retrieval topic containing diferent query variants. The approach is
applied to two diferent retrieval systems initially proposed in the LongEval lab 2024 that were later
further refined [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ].
        </p>
        <p>Qrel Boost The first system, Qrles Boost (QB), directly boosts query document pairs that were
previously relevant. The intuition is that if a document was relevant for a query in the past, it is likely
to be relevant for the same query in the future. Initially, a BM25 ranking is created that is then reranked.
Each query-document pair of the initial ranking is compared to the qrels of the previous snapshot. If the
query-document pair was found, its ranking score is multiplied by a boost depending on the previous
relevance label. This can be repeated for multiple previous snapshots. In the submitted runs, we used
the default parameters of  = 0.7 describing the strength of the boost and  = 2 as an additional factor
for highly relevant documents. We used all available previous snapshots as history. This results in a
history of eight snapshots for all the submitted test runs (2022-06 to 2023-09), and accordingly fewer for
the training runs. At the first point in time (2022-06), no prior snapshots are available, and the approach
falls back to the BM25 ranking.</p>
        <p>Instead of boosting query-document pairs that were previously relevant, we boost all documents
related to a query from the abstract topic. This means that for the query  1 of the topic  we now also
boost the document  although it only appeared in the qrels for the query  2 of the same topic  .
Relevance Feedback The second system, Relevance Feedback (RF), similarly to the QB approach,
uses previously relevant documents to extend the query. Therefore, the terms with the highest tf-idf
scores are extracted from the documents that were previously relevant to a query. The extended query
is issued to a BM25 retriever. If no previously relevant documents are available, the system only uses
the original query. We used the default parameters of 10 feedback terms from the top 3 documents. For
this approach, we also used all available history.</p>
        <p>Instead of using only the previously relevant documents of the query, we use all previously relevant
documents of the topic. This means that for the query  1 of some topic  we now also use document 
although it only appeared in the qrels for the query  2 of topic  .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Evaluation</title>
      <p>First the clustering methods evaluated in diferent settings and then the retrieval systems employing
the clusters are validated in the eight training snapshots of the LongEval WebSearch dataset.</p>
      <sec id="sec-3-1">
        <title>3.1. Query Clustering</title>
        <p>The query clusters are created based on the initial version of the LongEval WebSearch dataset from
2025. At this stage only one query set for all snapshots were published. While this was later reverted
and many queries were removed, the query IDs remained the same and the cluster are still valid.</p>
        <p>Based on the results, for the k-means clustering  = 5000 clusters were derived as optimal values.
For DBSCAN a maximum distance of  = 0.2 and   = 2 was chosen. This results in 5334 clusters,
a similar value to k-means. 27.9% of the 54,864 queries are classified as outliers. This means that no
variants for those queries could be identified, and the approaches can only rely on the original query.
Most clusters consist of only two queries that often difer only in spelling or minor variations. For
example the query chateau de villiers-le-mahieu (75386) and château de villiers-le-mahieu (74083) both
belong to the cluster 5327. Other clusters difer more strongly, for example, the cluster 5245 with the
queries loi militaire and projet de loi militaire. Bigger clusters capture whole categories or sub-domains.
For example, the biggest cluster 10 consists of 3067 food-related queries like pomme de terre, gateau, or
recette. Figure 2 visualizes the cluster with DBSCAN. In comparison, the k-means clusters are much
smaller with 139 queries per cluster at most. Only 212 clusters contain only two queries. This yields
clusters of similar sizes.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Retrieval Experiments</title>
        <p>For an initial validation, we tested both approaches with both clustering methods on all previous
training snapshots. Additionally, we compared the results to the oficial QB and RF baselines and also
BM25. The results on nDCG@10 are reported in Table 1 and Figure 3. Only the k-means clustering
improves the efectiveness for the QB system at the snapshots 2022-08 and 2022-08, and only by a little.
The oficial baselines outperform all other combinations. Figure 3 indicates a drop in efectiveness at
2022-09 for all systems. After that snapshot, the efectiveness increases again, for BM25 only slightly,
and especially for the QB approaches more strongly. While the oficial RB baseline shows a clearly better
efectiveness, its clustering extensions are mostly comparable to BM25. Notably, the RF approaches
maintain the delta to BM25 over the snapshots, while the QB approaches further expand it.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Foundational to the proposed approaches is the query clustering. The results showed that the diferent
clustering methods could identify some query variants. However, these good clusters are overshadowed
by too large and diverse clusters. In the context of the whole query distribution, queries from those
clusters are well related, for example, the queries pomme de terre and gateau, but clearly too diferent
for the same documents to be assumed to be relevant. Better parameters could improve the clustering,
but finding them is a challenging endeavor, especially in the context of a continuous query stream.
Additionally, other features, such as the clicked or relevant documents for a query, could be used as
additional features.</p>
      <p>Regarding the implementation of the retrieval approaches, the clustering could be replaced with
a similarity function that finds similar queries up to a certain threshold. The approaches utilize all
previous snapshots. This means that for later snapshots, many more prior qrels are available. This
could support the observation that the QB system over time diverges from BM25. Regarding the RF
systems, the efect remains unclear, and more tests with diferent histories are needed. Since the tf-idf
scores of expansion terms are compared across all snapshots, outliers that only appeared once could
strongly influence the results.</p>
      <p>The proposed approaches are limited in diferent ways. Both clustering approaches do not diferentiate
between the original query and query variants of the cluster. This means that a query variant from
the same topic, focusing on a specific aspect, can introduce highly specific terms for query expansion,
although they may only be relevant to some aspects of the topic. More sophisticated methods are
needed that diferentiate between core queries and more distantly related query variants. For example,
RM3 weights the terms individually. A similar approach can be implemented based on the similarity
between the original query and its variations.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented our participation in the CLEF 2025 LongEval WebSearch task. We proposed
a query clustering approach to identify query variants and applied it to two retrieval systems:
QueryBased (QB) and Relevance Feedback (RF). The approaches were initially evaluated on the training
collection of the task. Unfortunately, the results did not show improvements in retrieval efectiveness
compared to the baselines. However, we observed that when meaningful query variants were identified,
the performance can improve.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We gratefully acknowledge the support of the German Research Foundation (DFG) through project
grant No. 407518790.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check, Paraphrase and reword. After using these tools, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.
for improving bi-encoders for pairwise sentence scoring tasks, in: NAACL-HLT, Association for
Computational Linguistics, 2021, pp. 296–310.
[13] J. B. MacQueen, Some methods for classification and analysis of multivariate observations,
University of California Press, 1967, pp. 281–297.
[14] R. L. Thorndike, Who belongs in the family?, Psychometrika 18 (1953) 267–276. doi:10.1007/</p>
      <p>BF02289263.
[15] M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large
spatial databases with noise, in: KDD, AAAI Press, 1996, pp. 226–231.
[16] L. van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research
9 (2008) 2579–2605. URL: http://jmlr.org/papers/v9/vandermaaten08a.html.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cancellieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          , J. Keller, P. Knoth,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pride</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 LongEval Lab on Longitudinal Evaluation of Model Performance, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Bilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , E. Kochkina,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Servan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2023 longeval lab on longitudinal evaluation of model performance</article-title>
          ,
          <source>in: CLEF</source>
          , volume
          <volume>14163</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>440</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Medina-Alias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 longeval lab on longitudinal evaluation of model performance</article-title>
          ,
          <source>in: CLEF (2)</source>
          , volume
          <volume>14959</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>208</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cancellieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          , J. Keller, P. Knoth,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pride</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          , Longeval at CLEF 2025:
          <article-title>Longitudinal evaluation of IR model performance</article-title>
          ,
          <source>in: ECIR (5)</source>
          , volume
          <volume>15576</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          , pp.
          <fpage>382</fpage>
          -
          <lpage>388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , T. Breuer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Leveraging prior relevance signals in web search</article-title>
          ,
          <source>in: CLEF (Working Notes)</source>
          , volume
          <volume>3740</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>2396</fpage>
          -
          <lpage>2406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , M. Fröbe,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hendriksen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alexander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Counterfactual query rewriting to use historical relevance feedback</article-title>
          ,
          <source>in: ECIR (3)</source>
          , volume
          <volume>15574</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mofat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scholer</surname>
          </string-name>
          , P. Thomas,
          <article-title>User variability and IR system evaluation</article-title>
          , in: SIGIR, ACM,
          <year>2015</year>
          , pp.
          <fpage>625</fpage>
          -
          <lpage>634</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mofat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scholer</surname>
          </string-name>
          , P. Thomas,
          <article-title>UQV100: A test collection with query variability</article-title>
          , in: R.
          <string-name>
            <surname>Perego</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Aslam</surname>
            ,
            <given-names>I. Ruthven</given-names>
          </string-name>
          , J. Zobel (Eds.),
          <source>Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2016</year>
          , Pisa, Italy,
          <source>July 17-21</source>
          ,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>725</fpage>
          -
          <lpage>728</lpage>
          . doi:
          <volume>10</volume>
          .1145/2911451.2914671.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , T. Breuer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Evaluation of temporal change in IR test collections</article-title>
          , in: ICTIR, ACM,
          <year>2024</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks, in: EMNLP/IJCNLP (1), Association for Computational Linguistics</article-title>
          ,
          <year>2019</year>
          , pp.
          <fpage>3980</fpage>
          -
          <lpage>3990</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          , in: ACL, Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Daxenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Gurevych</given-names>
            ,
            <surname>Augmented</surname>
          </string-name>
          <string-name>
            <surname>SBERT</surname>
          </string-name>
          :
          <article-title>data augmentation method</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>