<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team OpenWebSearch at LongEval: Using Historical Data for Scientific Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daria Alexander</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maik Fröbe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gijs Hendriksen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Hagen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djoerd Hiemstra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Potthast</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arjen P. de Vries</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Friedrich-Schiller-Universität Jena</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Radboud Universiteit Nijmegen</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Kassel</institution>
          ,
          <addr-line>hessian.AI, ScaDS.AI</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the submissions of the OpenWebSearch team for the CLEF 2025 LongEval Sci-Retrieval track. Our approaches aim to explore how historical data from the past can be re-used to build efective rankings. The Sci-Retrieval track uses click-data and documents from the CORE search engine. We start all our submissions from rankings of the CORE search engine that we crawled for all queries of the track. This has two motivations: ifrst, we hypothesize that a good practical search engine should only make minor improvements in the ranking at a time (i.e., we would like to only make small adjustments to the production ranking), and, second, we hypothesize that only documents that are in the top ranks of the CORE ranking can be relevant in the setup of LongEval where relevance is derived from clicks (i.e., we try to incorporate the position bias of the clicks into our rankings). Based on this crawled CORE ranking, we try to make improvements via qrel-boosting, RM3 keyqueries, clustering, monoT5 re-ranking and user intent prediction. Our evaluation shows that qrel-boosting, RM3 keyqueries, clustering and intent prediction improve the CORE ranking that we re-rank.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Keyquery</kwd>
        <kwd>User Intent Prediction</kwd>
        <kwd>Counterfactual Query Rewriting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Historical query logs can help to improve future retrieval models as the relevance labels might be
transferrable across time. The CLEF LongEval retrieval task [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1, 2, 3, 4, 5, 6, 7, 8</xref>
        ] studies this scenario and
provides relevance labels mined from click longs of past user interactions, allowing retrieval systems to
optimize rankings for future queries. Especially for recurring queries, which a retrieval system observed
in the past and future test periods, there is substantial opportunity to exploit past relevance judgments.
      </p>
      <p>This year, LongEval introduced a scientific search retrieval task [ 8, 9]. Scientific search focuses
on retrieving and profiling information objects related to scholarly research [ 10]. These systems are
designed to locate scientific papers, theses, technical reports, and other academic materials from curated
and often domain-specific repositories. Unlike general web search, scientific search emphasizes high
precision and relevance, often including bibliometric data such as citations, authorship, publication
venues, and institutional afiliations. Scientific search engines play an important role in supporting the
discovery, navigation, and evaluation of scientific literature across diferent fields.</p>
      <p>In this paper, we describe the submissions of the OpenWebSearch team for the CLEF 2025 LongEval
Sci-Retrieval track. We build on top of our submissions to LongEval 2023 and 2024 [11, 12, 13],
focussing again on leveraging historical interaction data to improve document rankings in a realistic
and constrained retrieval setting. The Sci-Retrieval track provides query logs and click data derived
from the CORE search engine, along with its associated document corpus. We build all our submissions
starting from the top-ranked results of the CORE search engine, which we crawled for every query in
the track across diferent document fields (title, abstract, and full text). This design choice is grounded
in two main hypotheses. First, we assume that a practical search engine should make only incremental
improvements to existing rankings – reflecting realistic production constraints and user expectations.
Second, we believe that most relevant documents are already among the top results shown by the CORE
engine, especially since LongEval defines relevance based on user clicks (i.e., we try to incorporate the
position bias into our rankings).</p>
      <p>Building on these initial rankings, we explore multiple techniques to improve retrieval efectiveness:
qrel-boosting based on past relevance (as proposed by Keller et al [12, 14]), RM3 keyquery expansion,
cluster-based boosting, monoT5 re-ranking of top results, and user intent prediction (such as suggested
by [15, 16]). Each of these methods is designed to re-rank or selectively boost documents within the
CORE results to increase the likelihood of retrieving relevant content. Our evaluation shows that
qrel-boosting, RM3 keyqueries, clustering, and intent prediction consistently improve the re-ranked
CORE results, supporting our hypothesis that lightweight, targeted adjustments can lead to meaningful
gains in efectiveness. Our code is available online. 1</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>We review related work on redundancy in information retrieval setups, keyqueries, and user intent
classification that form the basis of our submissions.</p>
      <p>
        Redundancy in Information Retrieval Setups. Normally, it is good practice to avoid redundancy
between training, validation, and test splits in experiments, as otherwise, the efectiveness could be
overestimated due to train-test leakage [17, 18]. Especially for IR experiments, redundant documents
might cause efectiveness scores to be overestimated because retrieval models get a reward for showing
the same document multiple times [19, 20]. Similar problems can occur for learned models that might
overfit to redundancy in the training data [ 21]. However, in the LongEval scenario, redundancy emerges
naturally, as queries and documents might overlap over time, which is no form of train-test leakage
as the datasets are partitioned over time [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. In this setting, redundant data might be especially
helpful, e.g., as previously showcased when relevance judgments were transferred from the ClueWeb09
corpus to ClueWeb12 via near-duplicate detection [22]. We followed this approach and transferred the
relevance judgments to the newer dataset splits in the LongEval scenario.
      </p>
      <p>Keyqueries The concept of keyqueries [23] aims to formulate a query that retrieves a set of target
documents at the top positions and has been applied to scholarly search [24], medical search [25],
privacy scenarios [26], etc. For a set  of documents, a query  is a keyquery against some retrieval
system , if  fulfills the following three conditions [ 23]: (1) every  ∈  is in the top- results returned
by  for , (2)  has at least  results, and (3) no ′ ⊂  fulfills the first two conditions. The first
two conditions (i.e., the parameters  and ) determine the desired specificity and the generality of a
keyquery, while the third condition is a minimality constraint to avoid adding further terms to a query
that already retrieves the target records at high ranks. Previous work applied this concept only to static
corpora, but we extended it to evolving corpora in the LongEval scenario.</p>
      <p>User Intent Classification When users type queries in search engines, they often have a specific
intent in mind. Broder [27] divided queries into three categories according to their intent: navigational,
transactional and informational. An informational intent refers to acquiring some information from
a website, a navigational intent consists of searching for a particular website, a transactional intent
refers to obtaining some services from a website (e.g. downloading a game). The follow-up studies that
utilised Broder’s taxonomy to classify user intent usually chose two out of three Broder’s categories:
either informational and navigational [28, 29], or informational and transactional [30]. They adopted
diferent techniques such as computing the scores of distribution of query terms [ 29], classification of
queries into topics [30] as well as tracking past user-click behavior and anchor-link distribution [28].</p>
      <p>Broder taxonomy’s categories were also used in scientific search. Khabsa et al. [ 31] classified the
queries in CiteSeerX logs into two categories: navigational and informational. According to them,</p>
      <sec id="sec-2-1">
        <title>1https://github.com/OpenWebSearch/LONGEVAL-25</title>
        <p>navigational queries in scientific search contain full titles of the papers, or some keywords from the
title and authors’ names, while informational queries usually contain concepts. There are also domain
specific user intent taxonomies in scientific search. Rohatgi et al. [ 32] suggested title, concept and author
intent categories; Xiong et al. [33] proposed concept, author, exploration, title, and venue categories.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        During our last year participation at LongEval, we incorporated relevance information from past click
logs into the query reformulation process using keyqueries [11]. We also utilised this information for
the indexing process via a reverted index that contains the top ranked documents per query [
        <xref ref-type="bibr" rid="ref7">34</xref>
        ]. Finally,
we incorporated both approaches into learning-to-rank pipelines, ensuring that retrieval is also possible
for novel queries that were not seen before. This year, we decided to participate in the LongEval-Sci
task, as improving retrieval efectiveness in scientific search is less explored compared to web search
(for instance, there are only a few corpora for this, and we have some prior experience from building
and evaluating the IR Anthology [
        <xref ref-type="bibr" rid="ref8 ref9">35, 36</xref>
        ]). For this task we aimed to continue with keyqueries and are
also adding qrel boosting, cluster boosting, and an intent-aware layer. All our implementations used the
ir_datasets [
        <xref ref-type="bibr" rid="ref10">37</xref>
        ] extensions2 for LongEval [
        <xref ref-type="bibr" rid="ref11">38</xref>
        ]. We submitted all our approaches as run submissions
and tracked the execution in most cases with the TIREx Tracker [
        <xref ref-type="bibr" rid="ref12">39</xref>
        ] in the ir_metadata format [
        <xref ref-type="bibr" rid="ref13">40</xref>
        ],
but we intend to submit our best approaches as software submission to improve reproducibility to
TIRA/TIREx [
        <xref ref-type="bibr" rid="ref14 ref15">41, 42</xref>
        ] after the deadline to re-run them without modification next year.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Fusion with CORE</title>
        <p>We began by crawling the top-25 results returned by the CORE search engine3 for each query in the
2025 Sci track of the LongEval benchmark. To capture diverse relevance signals, we performed this
retrieval separately for diferent document fields: title, abstract, and full text. Each field-specific retrieval
represented a distinct ranking produced by the CORE search engine. We verified that the IDs are
the same by looking at the titles of the documents from the CORE search engine and the titles of the
Longeval corpus. We further removed all documents from the CORE ranking that were not present in
the LongEval corpus (we make the original rankings available online for better reprodicibility4). To
construct a more robust and comprehensive final ranking, we applied a fusion strategy that combines
the rankings from the individual fields (i.e., title, abstract, and full text). After combining the results,
we further enhanced the final ranking by filling in any missing positions with additional documents
retrieved using a BM25 model over the full text (i.e., we appended the BM25 results to the CORE
ranking). This process ensures broad coverage and incorporates complementary evidence from multiple
content fields to improve the overall ranking quality.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Reranking with MonoT5</title>
        <p>
          To further refine the quality of our fused rankings, we applied a neural reranking step using the
castorini/monot5-base-msmarco model [
          <xref ref-type="bibr" rid="ref16">43</xref>
          ]. Specifically, we reranked the top-10 documents
(abstracts) from the fused results generated by the CORE search engine. The monoT5 model is a
sequence-to-sequence transformer trained on the MS MARCO dataset for passage reranking tasks. It
takes a query and a candidate document (the default text as implemented in the ir_datasets LongEval
package) as input and outputs a relevance score by predicting whether the document is relevant to the
query. By applying MonoT5 to the highest-ranked candidates, we aimed to better capture semantic
relevance beyond lexical overlap, leveraging the model’s deep language understanding to improve
precision at the top of the ranking.
        </p>
        <sec id="sec-3-2-1">
          <title>2https://github.com/clef-longeval/ir-datasets-longeval 3https://core.ac.uk/ 4https://huggingface.co/datasets/gijshendriksen/LongEval</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Qrel Boosting</title>
        <p>We applied the qrel boosting approach [14] provided as the oficial baseline 5 in the default configuration
without modification to the fused CORE rankings. Conceptually, this means that for the queries that
have relevant documents in the past that still exist in the corpus, we move those known relevant
documents to the top. Independent of any prior ranking, all documents  ranked for a query  at 
(timeslot) that were judged at a previous timeslot, e.g. − 1, were boosted by:
⎧ 2
⎪
 ,( ) = ⎨ 2
⎪⎩(1 −  )2 if qrel, = 0
if qrel, = 1
if qrel, &gt; 1
(2)</p>
        <p>The additional free parameter  can be used to assign a diferent boost to documents with higher
relevance labels. For queries without known previous rankings, the CORE-fusion ranking remained
unchanged.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Keyqueries</title>
        <p>As in the previous editions, some queries overlapped over diferent time slots, and in case their intent
stayed the same, we aimed to transfer their relevance to the new time slots. Consecutively, for those
queries we knew what documents had been clicked clicked a few months ago. This run applied RM3 in
default configuration inspired by the counterfactual query rewriting with keyqueries by [ 12] on top of
the qrel-boost-core run (i.e., we use the documents that were previously clicked as feedback documents
so that RM3 reformulates the query to move the previously clicked documents higher in the ranking).
In the previous years, documents could be deleted in later timestamps, therefore, in our previous years
submissions we had to insert the clicked documents into the current corpus. However, in this year’s
Science retrieval task documents are not removed and also do not change their content; therefore, we
did not modify the corpus and applied RM3 without changes.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Cluster Boosting</title>
        <p>
          We implemented a cluster boosting strategy using the PyTerrier framework. This approach involved
partitioning the document collection into clusters (or shards) based on content similarity or other
clustering methods. During training, for each query, we tracked which clusters contained relevant
documents across previous timestamps or query interactions. At retrieval time, documents that belonged
to these historically relevant clusters were assigned a higher score or were explicitly boosted. The
reasoning is that if a cluster has previously provided relevant results for a given query, other documents
within the same cluster are more likely to be relevant as well (following the Cluster Hypothesis [
          <xref ref-type="bibr" rid="ref17">44</xref>
          ]).
This method aims to improve retrieval efectiveness by using implicit relevance signals at the cluster
level, rather than treating documents independently.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Adding User Intent</title>
        <p>User Intent Classification For user intent classification we used Broder’s [ 27] taxonomy that has
informational, navigational and transactional categories. We manually annotated 50 queries from the
training set and classified their intent. In this manually annotated sample we found informational and
navigational queries but no transactional, so we decided to classify the queries into informational and
navigational categories. Overall, transactional queries are rarely present in scientific search [ 31], and
typically involve downloading data, such as a dataset or metadata.</p>
        <p>As the queries were very short (mean length 1.57 words in the training set), we used additional
information from relevant documents, such as the titles, the name of the authors, the text of the abstract.
We noticed that some queries had only a few clicked documents, contained the names of the authors</p>
        <sec id="sec-3-6-1">
          <title>5https://github.com/clef-longeval/longeval-code/tree/main/clef25/qrel-boost</title>
          <p>and were searching for one specific paper by those authors. Since in those cases the users were likely
searching for specific papers, we classified such queries as navigational. Queries that received a high
number of clicks and also those that targeted general concepts were classified as informational.</p>
          <p>
            Since informational queries are often categorized by their degree of specificity, such as finding specific
facts versus exploring the subject to gather information and learn [
            <xref ref-type="bibr" rid="ref18">45, 46</xref>
            ], and the informational queries
in LongEval-Sci collection are not aimed at finding specific facts, we refined the informational category
and defined those queries as exploratory. As a result, we have two categories for our classification:
exploratory and navigational.
          </p>
          <p>To perform query classification for the train and the test sets, we did automatic classification using
weak supervision. Weak supervision is an approach in machine learning where noisy, limited, or
imprecise sources are used instead of (or along with) gold labelled data. We used Snorkel [47], which is
an end-to-end system for creating labelling functions and training and evaluating the labelling model.
In Snorkel the final intent of the queries is determined by the labeling functions using diferent labeling
systems, such as majority voting.</p>
          <p>For the classification, we adapted the same approach as Rohatgi et al. [ 32] and filtered out the queries
of one intent category, assuming that other queries belong to the other intent category. We filtered out
navigational queries, by establishing the following characteristics for them:
• Only a few document clicks (up to 3 clicks).
• A query of more than two words that is fully contained within the title of a clicked paper as an
exact sequence.
• A query is an author name (or author names), and only one paper by this author (or these authors)
is clicked.</p>
          <p>After the classification, we found that 12 % of the queries in the training set were navigational.
That corresponded to the manually annotated subset, which contained 10 % of navigational queries.
Since there were overlapping queries in the 2025-01 test set timeslot, we were able to use information
from the previously relevant documents to classify their intent. This test set timeslot contained 5 % of
navigational queries, which is understandable, as for 337 out of 492 queries we did not have information
about previously relevant documents. Since there was no overlap between the training set and the
2024-11 test set timeslot, we could not rely on previously relevant documents and could not filter out
any navigational queries.</p>
          <p>An Intent-Aware System. We chose 2 diferent strategies for the intent-aware system. For the
navigational queries, as the aim was to find a specific paper, we decided to move known relevant
documents to the top. For the exploratory queries, we used RM3 on top of qrel boosting, as expanding
such queries with terms from relevant documents can broaden their scope and improve retrieval
efectiveness. In cases where there was no information about previously relevant documents (such as in
the 2024-11 test set), it was not possible to separate navigational from exploratory queries. As a result,
those queries were classified as exploratory by default.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We evaluate our retrieval systems in terms of nDCG@10 and a condensed version of nDCG@10 where
we remove all unjudged documents from the ranking. The condensed list evaluation was proposed
by Sakai [48], and we apply it here because documents that are not retrieved by the original search
engine cannot be considered relevant in the evaluation setting of LongEval (still, it is known that this
condensed evaluation overestimates efectiveness [ 49], so a realistic evaluation score is likely between
the nDCG@10 and its condensed counterpart).</p>
      <p>Table 1 provides an overview of our results. The results indicate that qrel-boost-core is the most
efective approach when considering documents that have already been judged for relevance. This is
2024-11 2025-01 2024-11
2025-01
qrel-boost-core
fusion-with-core
monot5-in-core
query-intent-fusion
rm3-on-qrel-boost
ows-cluster-boosting
BM25
expected, as the method directly promotes documents known to be relevant in the past. For judged
documents, fusion-with-core also outperforms the baseline, highlighting the benefit of combining
documents from the collection with results crawled from the CORE search engine.</p>
      <p>Interestingly, reranking the fusion-with-core results using MonoT5 did not lead to further
improvements. We suspect this is due to the short length of the queries, which may limit the efectiveness of
deep semantic models like monoT5. Similarly, rm3-on-qrel-boost performed worse than qrel-boost-core,
likely for the same reason.</p>
      <p>When previously relevant documents allowed to distinguish between navigational and exploratory
queries, query-intent-fusion outperformed rm3-on-qrel-boost, but still did not reach the efectiveness
of qrel-boost-core. This aligns with expectations, as most queries in the dataset are exploratory, and
rm3-on-qrel-boost is applied to these. Still, applying qrel-boost-core to navigational queries leads to
improvements for judged documents, emphasizing the value of tailoring retrieval strategies to diferent
types of user intent.</p>
      <p>For nDCG@10 across all documents, intent-aware retrieval achieves the best performance when
previously relevant documents are known, highlighting the importance of aligning retrieval strategies
with user intent. Please note that the intent-aware retrieval only has overlapping queries available
for the 2025-01 timeslot, but in this scenario, it performs well. When such relevance information is
not available (i.e., in the 2024-11 setting), ows-cluster-boosting proves to be the most efective method,
demonstrating its usefulness in initial retrieval scenarios.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Our experiments for the CLEF 2025 LongEval Sci-Retrieval track demonstrate that incorporating
historical relevance information can considerably increase retrieval efectiveness. To improve retrieval
efectiveness in scientific search, we performed RM3 keyquery expansion, cluster-based boosting,
monoT5 re-ranking of top results, and user intent prediction. Overall, our findings suggest that
modest, targeted interventions, especially those guided by relevance history and user intent, can lead to
substantial improvements over production rankings in real-world search settings. In our experiments,
we explicitly boost the existing position bias of systems. Therefore, interesting directions for future
work might be to verify how alternative relevance judgments (i.e., not derived from the production
search engine) can be applied to the evaluation, for instance, via simulations or large language model
relevance assessors.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT to perform a grammar and spelling
check and find synonyms to make the vocabulary more diverse. After using this tool the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has received funding from the European Union’s Horizon Europe research and innovation
program under grant agreement No 101070014 (OpenWebSearch.EU, https://doi.org/10.3030/101070014).
[7] R. Alkhalifa, H. Borkakoty, R. Deveaud, A. El-Ebshihy, L. E. Anke, T. Fink, P. Galuscáková, G. G.</p>
      <p>Sáez, L. Goeuriot, D. Iommi, M. Liakata, H. T. Madabushi, P. Medina-Alias, P. Mulhem, F. Piroi,
M. Popel, A. Zubiaga, Extended overview of the CLEF 2024 longeval lab on longitudinal evaluation
of model performance, in: G. Faggioli, N. Ferro, P. Galuscáková, A. G. S. de Herrera (Eds.), Working
Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12
September, 2024, volume 3740 of CEUR Workshop Proceedings, CEUR-WS.org, 2024, pp. 2267–2289.</p>
      <p>URL: https://ceur-ws.org/Vol-3740/paper-213.pdf.
[8] M. Cancellieri, A. El-Ebshihy, T. Fink, P. Galuscáková, G. G. Sáez, L. Goeuriot, D. Iommi, J. Keller,
P. Knoth, P. Mulhem, F. Piroi, D. Pride, P. Schaer, Longeval at CLEF 2025: Longitudinal evaluation
of IR model performance, in: C. Hauf, C. Macdonald, D. Jannach, G. Kazai, F. M. Nardini,
F. Pinelli, F. Silvestri, N. Tonellotto (Eds.), Advances in Information Retrieval - 47th European
Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings,
Part V, volume 15576 of Lecture Notes in Computer Science, Springer, 2025, pp. 382–388. URL:
https://doi.org/10.1007/978-3-031-88720-8_58. doi:10.1007/978-3-031-88720-8\_58.
[9] M. Cancellieri, A. El-Ebshihy, T. Fink, P. Galuščáková, G. Gonzalez-Saez, L. Goeuriot, D. Iommi,
J. Keller, P. Knoth, P. Mulhem, F. Piroi, D. Pride, P. Schaer, Overview of the CLEF 2025 LongEval
Lab on Longitudinal Evaluation of Model Performance, in: J. Carrillo-de Albornoz, J. Gonzalo,
L. Plaza, A. García Seco de Herrera, J. Mothe, F. Piroi, P. Rosso, D. Spina, G. Faggioli, N. Ferro
(Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the
Sixteenth International Conference of the CLEF Association (CLEF 2025), 2025.
[10] X. Li, B. J. Schijvenaars, M. de Rijke, Investigating queries and search failures in academic search,</p>
      <p>Information processing &amp; management 53 (2017) 666–683.
[11] D. Alexander, M. Fröbe, F. Schlatt, M. Hagen, D. Hiemstra, M. Potthast, A. P. de Vries, Team
OpenWebSearch at CLEF 2024: LongEval, in: Working Notes Papers of the CLEF 2024 Evaluation
Labs, CEUR Workshop Proceedings, 2024.
[12] J. Keller, M. Fröbe, G. Hendriksen, D. Alexander, M. Potthast, M. Hagen, P. Schaer, Counterfactual
query rewriting to use historical relevance feedback, in: European Conference on Information
Retrieval, Springer, 2025, pp. 138–147.
[13] M. Fröbe, G. Hendriksen, A. P. de Vries, M. Potthast, Open web search at longeval 2023: Reciprocal
rank fusion on automatically generated query variants, in: M. Aliannejadi, G. Faggioli, N. Ferro,
M. Vlachos (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023),
Thessaloniki, Greece, September 18th to 21st, 2023, volume 3497 of CEUR Workshop Proceedings,
CEUR-WS.org, 2023, pp. 2432–2440. URL: https://ceur-ws.org/Vol-3497/paper-195.pdf.
[14] J. Keller, T. Breuer, P. Schaer, Leveraging prior relevance signals in web search, in: G. Faggioli,
N. Ferro, P. Galuscáková, A. G. S. de Herrera (Eds.), Working Notes of the Conference and Labs of
the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024, volume 3740 of CEUR
Workshop Proceedings, CEUR-WS.org, 2024, pp. 2396–2406. URL: https://ceur-ws.org/Vol-3740/
paper-220.pdf.
[15] D. Alexander, W. Kusa, A. P. de Vries, Orcas-i: queries annotated with intent using weak
supervision, in: Proceedings of the 45th International ACM SIGIR Conference on Research and
Development in Information Retrieval, 2022, pp. 3057–3066.
[16] D. Alexander, W. Kusa, A. P. de Vries, Orcas-i query intent predictor as component of tira, in:
S. M. Farzana, M. Fröbe, M. Granitzer, G. Hendriksen, D. Hiemstra, M. Potthast, S. Zerhoudi (Eds.),
1st International Workshop on Open Web Search, number 3689 in CEUR Workshop Proceedings,
2024, pp. 23–29. URL: https://ceur-ws.org/Vol-3689/.
[17] K. Krishna, A. Roy, M. Iyyer, Hurdles to progress in long-form question answering, in: K. Toutanova,
A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty,
Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021,
Online, June 6-11, 2021, Association for Computational Linguistics, 2021, pp. 4940–4957. URL:
https://doi.org/10.18653/v1/2021.naacl-main.393. doi:10.18653/V1/2021.NAACL-MAIN.393.
[18] M. Fröbe, C. Akiki, M. Potthast, M. Hagen, How Train-Test Leakage Afects Zero-shot
Retrieval, in: D. Arroyuelo, B. Poblete (Eds.), 29th International Symposium on String Processing
and Information Retrieval (SPIRE 2022), volume 13617, Concepción, Chile, 2022. doi:10.1007/
978-3-031-20643-6_11.
[19] Y. Bernstein, J. Zobel, Redundant documents and search efectiveness, in: O. Herzog,
H. Schek, N. Fuhr, A. Chowdhury, W. Teiken (Eds.), Proceedings of the 2005 ACM CIKM
International Conference on Information and Knowledge Management, Bremen, Germany, October
31 - November 5, 2005, ACM, 2005, pp. 736–743. URL: https://doi.org/10.1145/1099554.1099733.
doi:10.1145/1099554.1099733.
[20] M. Fröbe, J. Bittner, M. Potthast, M. Hagen, The Efect of Content-Equivalent Near-Duplicates on
the Evaluation of Search Engines, in: J. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. Silva,
F. Martins (Eds.), Advances in Information Retrieval. 42nd European Conference on IR Research
(ECIR 2020), volume 12036 of Lecture Notes in Computer Science, Springer, Berlin Heidelberg New
York, 2020, pp. 12–19. doi:10.1007/978-3-030-45442-5_2.
[21] M. Fröbe, J. Bevendorf, J. Reimer, M. Potthast, M. Hagen, Sampling Bias Due to Near-Duplicates
in Learning to Rank, in: 43rd International ACM Conference on Research and Development in
Information Retrieval (SIGIR 2020), ACM, 2020, pp. 1997–2000. doi:10.1145/3397271.3401212.
[22] M. Fröbe, J. Bevendorf, L. Gienapp, M. Völske, B. Stein, M. Potthast, M. Hagen, CopyCat:
NearDuplicates within and between the ClueWeb and the Common Crawl, in: F. Diaz, C. Shah,
T. Suel, P. Castells, R. Jones, T. Sakai (Eds.), 44th International ACM Conference on Research and
Development in Information Retrieval (SIGIR 2021), ACM, 2021, pp. 2398–2404. doi:10.1145/
3404835.3463246.
[23] M. Hagen, A. Beyer, T. Gollub, K. Komlossy, B. Stein, Supporting Scholarly Search with Keyqueries,
in: N. Ferro, F. Crestani, M.-F. Moens, J. Mothe, F. Silvestri, G. Di Nunzio, C. Hauf, G. Silvello
(Eds.), Advances in Information Retrieval. 38th European Conference on IR Research (ECIR 2016),
volume 9626 of Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2016, pp.
507–520. doi:10.1007/978-3-319-30671-1_37.
[24] M. Völske, T. Gollub, M. Hagen, B. Stein, A keyquery-based classification system for
CORE, D Lib Mag. 20 (2014). URL: https://doi.org/10.1045/november14-voelske. doi:10.1045/
NOVEMBER14-VOELSKE.
[25] M. Fröbe, S. Günther, A. Bondarenko, J. Huck, M. Hagen, Using keyqueries to reduce
misinformation in health-related search results, in: ROMCIR 2022: The 2nd Workshop on Reducing
Online Misinformation through Credible Information Retrieval, held as part of ECIR 2022: the
44th European Conference on Information Retrieval, 2022.
[26] M. Fröbe, E. O. Schmidt, M. Hagen, Eficient Query Obfuscation with Keyqueries, in: 20th
International IEEE/WIC/ACM Conference on Web Intelligence (WI-IAT 2021), ACM, 2021. doi:10.
1145/3486622.3493950.
[27] A. Broder, A taxonomy of web search, SIGIR Forum 36 (2002).
[28] U. Lee, Z. Liu, J. Cho, Automatic identification of user goals in web search, in: WWW ’05:
Proceedings of the 14th international conference on World Wide Web, 2005, pp. 391–400. doi:10.
1145/1060745.1060804.
[29] I.-h. Kang, G. Kim, Query type classification for web document retrieval, in: SIGIR ’03: Proceedings
of the 26th annual international ACM SIGIR conference on Research and development in informaion
retrieval, 2004. doi:10.1145/860435.860449.
[30] R. Baeza-Yates, L. Calderon-Benavides, C. González-Caro, The intention behind web queries,
volume 4209, 2006, pp. 98–109. doi:10.1007/11880561_9.
[31] M. Khabsa, Z. Wu, C. L. Giles, Towards better understanding of academic search, in: Proceedings
of the 16th ACM/IEEE-CS on joint conference on digital libraries, 2016, pp. 111–114.
[32] S. Rohatgi, C. L. Giles, J. Wu, What were people searching for? a query log analysis of an academic
search engine, in: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), IEEE, 2021, pp.
342–343.
[33] C. Xiong, R. Power, J. Callan, Explicit semantic ranking for academic search via knowledge graph
embedding, in: Proceedings of the 26th international conference on world wide web, 2017, pp.
[46] D. M. Russell, D. Tang, M. Kellar, R. Jefries, Task behaviors during web search: The dificulty of
assigning labels, in: 2009 42nd Hawaii International Conference on System Sciences, IEEE, 2009,
pp. 1–5.
[47] A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, C. Ré, Snorkel: Rapid training data creation
with weak supervision, in: Proceedings of the VLDB endowment. International conference on
very large data bases, volume 11, 2017, p. 269.
[48] T. Sakai, Alternatives to bpref, in: W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, N. Kando (Eds.),
SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, ACM, 2007,
pp. 71–78. URL: https://doi.org/10.1145/1277741.1277756. doi:10.1145/1277741.1277756.
[49] M. Fröbe, L. Gienapp, M. Potthast, M. Hagen, Bootstrapped nDCG Estimation in the Presence
of Unjudged Documents, in: Advances in Information Retrieval. 45th European Conference on
IR Research (ECIR 2023), volume 13980 of Lecture Notes in Computer Science, Springer, Berlin
Heidelberg New York, 2023, pp. 313–329. doi:10.1007/978-3-031-28244-7_20.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Bilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , E. Kochkina,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Servan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          , Longeval:
          <article-title>Longitudinal evaluation of model performance at CLEF 2023</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval - 45th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2023</year>
          , Dublin, Ireland, April 2-
          <issue>6</issue>
          ,
          <year>2023</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>III</given-names>
          </string-name>
          , volume
          <volume>13982</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>505</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>58</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -28241-6\_
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Bilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , E. Kochkina,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Servan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Extended overview of the CLEF-2023 longeval lab on longitudinal evaluation of model performance</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), Thessaloniki, Greece,
          <source>September 18th to 21st</source>
          ,
          <year>2023</year>
          , volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2181</fpage>
          -
          <lpage>2203</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-184.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          , Longevalretrieval:
          <article-title>French-english dynamic test collection for continuous web search evaluation</article-title>
          , in: H.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W. E.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          <string-name>
            <surname>Kato</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Poblete (Eds.),
          <source>Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2023</year>
          , Taipei, Taiwan,
          <source>July 23-27</source>
          ,
          <year>2023</year>
          , ACM,
          <year>2023</year>
          , pp.
          <fpage>3086</fpage>
          -
          <lpage>3094</lpage>
          . URL: https://doi.org/10.1145/3539618.3591921. doi:
          <volume>10</volume>
          .1145/3539618.3591921.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Medina-Alias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Servan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          , Longeval:
          <article-title>Longitudinal evaluation of model performance at CLEF 2024</article-title>
          , in: N.
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , I. Ounis (Eds.),
          <source>Advances in Information Retrieval - 46th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2024</year>
          , Glasgow, UK, March
          <volume>24</volume>
          -28,
          <year>2024</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>VI</given-names>
          </string-name>
          , volume
          <volume>14613</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -56072-
          <issue>9</issue>
          _8. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -56072-9\_8.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Medina-Alias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Servan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance</article-title>
          , in: G.
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Borkakoty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Espinosa-Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iommi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Madabushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Medina-Alias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science (LNCS)</source>
          , Springer, Heidelberg, Germany,
          <year>2024</year>
          .
          <fpage>1271</fpage>
          -
          <lpage>1279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pickens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , G. Golovchinsky,
          <article-title>Reverted indexing for feedback and expansion</article-title>
          , in: J. X.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Koudas</surname>
            ,
            <given-names>G. J. F.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Collins-Thompson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          An (Eds.),
          <source>Proceedings of the 19th ACM Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2010</year>
          , Toronto, Ontario, Canada,
          <source>October 26-30</source>
          ,
          <year>2010</year>
          , ACM,
          <year>2010</year>
          , pp.
          <fpage>1049</fpage>
          -
          <lpage>1058</lpage>
          . URL: https://doi.org/10.1145/1871437. 1871571. doi:
          <volume>10</volume>
          .1145/1871437.1871571.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Scells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Akiki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gienapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Reimer</surname>
          </string-name>
          , S. MacAvaney,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Resources for Combining Teaching and Research in Information Retrieval Coursework</article-title>
          , in: G. Yang,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , S. Han,
          <string-name>
            <surname>C</surname>
          </string-name>
          . Hauf, G. Zuccon, Y. Zhang (Eds.),
          <source>47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2024</year>
          ), ACM,
          <year>2024</year>
          , pp.
          <fpage>1115</fpage>
          -
          <lpage>1125</lpage>
          . doi:
          <volume>10</volume>
          .1145/3626772.3657886.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Günther</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Bittner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kahmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niekler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Völske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , The Information Retrieval Anthology, in: F. Diaz,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Sakai (Eds.),
          <source>44th International ACM Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2021</year>
          ), ACM,
          <year>2021</year>
          , pp.
          <fpage>2550</fpage>
          -
          <lpage>2555</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3404835.3462798.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>MacAvaney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goharian</surname>
          </string-name>
          ,
          <article-title>Simplified data wrangling with ir_datasets</article-title>
          , in: F. Diaz,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Sakai (Eds.),
          <source>SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Virtual Event, Canada,
          <source>July 11-15</source>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>2429</fpage>
          -
          <lpage>2436</lpage>
          . URL: https://doi.org/10.1145/ 3404835.3463254. doi:
          <volume>10</volume>
          .1145/3404835.3463254.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , M. Fröbe,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hendriksen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alexander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Simplified longitudinal retrieval experiments: A case study on query expansion and document boosting, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 16th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2024</year>
          , Madrid, Spain, September 9-
          <issue>12</issue>
          ,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Merker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Scells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , M. Potthast, TIREx Tracker:
          <article-title>The Information Retrieval Experiment Tracker</article-title>
          , in: 48th
          <source>International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2025</year>
          ), ACM,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          , J. Keller, P. Schaer,
          <article-title>ir_metadata: An extensible metadata schema for IR experiments</article-title>
          , in: E. Amigó,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carterette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Culpepper</surname>
          </string-name>
          , G. Kazai (Eds.),
          <source>SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Madrid, Spain,
          <source>July 11 - 15</source>
          ,
          <year>2022</year>
          , ACM,
          <year>2022</year>
          , pp.
          <fpage>3078</fpage>
          -
          <lpage>3089</lpage>
          . URL: https://doi.org/10.1145/3477495. 3531738. doi:
          <volume>10</volume>
          .1145/3477495.3531738.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Reimer</surname>
          </string-name>
          , S. MacAvaney,
          <string-name>
            <given-names>N.</given-names>
            <surname>Deckers</surname>
          </string-name>
          , S. Reich, J.
          <string-name>
            <surname>Bevendorf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>The Information Retrieval Experiment Platform</article-title>
          , in: H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
          </string-name>
          , H.
          <string-name>
            <surname>- H. Huang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kato</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Poblete (Eds.),
          <source>46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2023</year>
          ), ACM,
          <year>2023</year>
          , pp.
          <fpage>2826</fpage>
          -
          <lpage>2836</lpage>
          . doi:
          <volume>10</volume>
          .1145/3539618.3591888.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Document ranking with a pretrained sequence-to-sequence model</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>708</fpage>
          -
          <lpage>718</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>63</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>63</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [44]
          <string-name>
            <surname>C. J. Van Rijsbergen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Information</surname>
            <given-names>Retrieval</given-names>
          </string-name>
          , Butterworths,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kellar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Watters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Author</surname>
          </string-name>
          ,
          <article-title>A field study characterizing web-based information seeking tasks</article-title>
          ,
          <source>JASIST</source>
          <volume>58</volume>
          (
          <year>2007</year>
          )
          <fpage>999</fpage>
          -
          <lpage>1018</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.20590.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>