<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CIR at LongEval 2025: Exploring Temporal Sensitivity in Web Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Florian Braun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Timo Busch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammed Sirac Coban</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryam El Ghadioui</string-name>
          <email>G@0.30</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davit Hovhannisyan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristine Jonina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Large</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Zhi Yong Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Loewenstein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars Maaßen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nadine Maron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Henri Mörsheim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joshua Azimoh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nduka Ofunim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadims Romanovskis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Simon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Witalla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Max Wollenberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jüri Keller</string-name>
          <email>jueri.keller@th-koeln.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Schaer</string-name>
          <email>philipp.schaer@th-koeln.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TH Köln (University of Applied Sciences)</institution>
          ,
          <addr-line>Claudiusstr. 1, Cologne, 50678</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Temporal dynamics in retrieval settings have shown to carry helpful information for retrieval processes. In this submission to the CLEF LongEval lab we propose five diferent approaches: (1) Finding time-dependent queries with the help of LLMs and to treat these queries diferently by boosting their retrieval scores based on the categorization; (2) Finding time-dependent queries and scoring them on a scale from 0 to 1 and to use that score to influence the final ranking; (3) Using relevance information from older sub-collections and to use relevance feedback on the current sub-collection by using query expansion using tf-idf; (4) Boosting known relevant documents-query pairs from older sub-collections but comparing the similarity of old and recent documents; ifnally, (5) a neural relevance re-ranking based on a topcial semantic clustering. In total we submitted seven runs to the WebRetrieval task of the lab. The results indicate that only four of them could outperform BM25.</p>
      </abstract>
      <kwd-group>
        <kwd>time-dependent queries</kwd>
        <kwd>clustering</kwd>
        <kwd>relevance feedback</kwd>
        <kwd>similarity</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The LongEval lab at CLEF is focused on the evaluation of retrieval systems on changing test collections
over time. In this lab notebook we summarize our submissions to the CLEF LongEval lab in 2025 that
extend on previous work on the lab from 2023 [1] and 2024 [2]. The submissions are the result of a
students’ project course with the Cologne Information Retrieval group (CIR) at TH Köln - University
Applied Sciences in Cologne, Germany. Five groups participated in the course (see Table 1).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Approaches and Implementations</title>
      <p>
        The approaches tested in this submission range from (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) finding time-dependent queries with the help of
Large Language Models (LLMs) and to treat these queries diferently by boosting their retrieval scores
based on the categorization (Section 2.1); (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) finding time-dependent queries and scoring them on a
scale from 0 to 1 and to use that score to influence the final ranking (Section
2.2); (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) using relevance
information from older sub-collections and to use relevance feedback on the current sub-collection by
using query expansion using tf-idf (Section 2.3); (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) boosting known relevant documents-query pairs
from older sub-collections but comparing the similarity of old and recent documents (Section 2.4); and
ifnally (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) a neural relevance reranking based on a topcial semantic clustering (Section
2.5).
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <sec id="sec-2-1">
        <title>2.1. Categories of Time-dependent Queries</title>
        <p>Queries are not all the same. We know that there are diferent query types in web search, like
transactional, navigational, or informational queries [3]. Additionally, we know from studies on temporal
retrieval [4] that users make a diference and most often prefer recent vs. old information, and that
there are diferent temporal entities like events that should be treated diferently in the retrieval process.
We picked up these ideas and tried to first distinguish diferent types of temporal queries and use this
classification information to apply a boost to recent documents that were ranked for these queries.</p>
        <p>We define the following four query types: time-independent, explicit-time, event, and timeliness. We
used GPT-4o mini to categorize each LongEval query into one of these categories. We instructed the
LLM to categorize the queries using the following definitions:
• time-independent (timeless information not tied to a specific time or event, e.g., definitions,
recipes, general rules),
• explicit-time (requests with explicit time references, e.g., years, dates, specific periods),
• event (requests about specific events, e.g., Named public events, Scheduled institutional events,</p>
        <p>Historical events), or
• timeliness (time-sensitive or current information where up-to-date info or availability matters
e.g., weather, stock prices, live updates, buying intent, tax rates).</p>
        <p>
          We instructed the LLM to use the following categorization process: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Look for explicit time references
(e.g., years, dates). Assign to explicit-time if present. (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Check for event-related terms. Assign to the
event if applicable. (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) If the request requires real-time or current information, assign to timeliness. (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
If the request is timeless and not tied to time or events, assign it to time-independent. The LLM was
instructed to only respond with the category name.
        </p>
        <p>Some examples of labels that this categorization process would assign are listed in Table 2. We see
that some of the labels are debatable, like the assignment of World War II to the “explicit time” category
instead of the “event” category. In a manual test with 100 random queries, we achieved a Cohen’s Kappa
between 0.33 and 0.42 (two separate annotation runs with GPT-4o mini). The LLM struggled the most
with the event and explicit-time categories, where it was hard to find any matching queries. Instead of
the default English prompt, we also tried a French prompt, which lowered the agreement rate to 0.24.</p>
        <p>Based on the categories assigned by the LLM, we applied a boost to the original BM25 scores. As
documents in LongEval don’t have a specific timestamp, we applied a simple heuristic: If a document
was detected in more than three LongEval sub-collections, it was considered old, and all other documents
were considered recent. We then applied the boosting only on the recent document with the following
boosting factors:
• time-independent: 1.20
• explicit-time: 1.15
• event: 1.15
• timeliness: 1.20
time-independent “definition of gravity”, “chess rules” 23849
explicit-time “World War II”, “US president 1990” 544
event “Cannes festival 2025”, “French Revolution” 1146
timeliness “Apple stock price”, “weather Lyon” 4601</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Scoring Time-dependent Queries</title>
        <p>In contrast to the previous approach, we are only interested in how time-dependent a query is. We
prompt an LLM to assign a value between 0 and 1 to every query, encoding the time-dependency of a
query.</p>
        <p>We prompted GPT-4o with the following instruction: “You have this Query. Give a score on how
time-dependent this Query: query_text. The Score is between 0 and 1. Don’t answer with anything
more than the Score.” and received a score for 47.053 queries. In Figure 1, we see a plot of the distribution
of temporal dependency scores. The average score was 0.325, indicating a clear majority of less
timedependent queries, with only a few time-dependent ones in comparison. Using a threshold of 0.6, only
22% of the queries were marked as time-dependent.</p>
        <p>
          For our re-ranking, we take the BM25 scores based on our PyTerrier implementation with the default
parameters for each pair of query  and document  and an additional boost:
score_combined(, ) =
score_bm25(, ) + (
score_time() × score_recency())
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where   _25 is the original BM25 score between  and  and  is a weight factor for the boost
based on the scores   _ , and   _  .   _ is the temporal dependency factor of the
query  as estimated by the LLM.   _  is the recency of document  defined by the frequency
of  in the previous snapshots  of the dataset  . It is calculated as follows:
score_recency() =
        </p>
        <p>1
1 + log(1 + |{snapshot| ∈ snapshot AND snapshot ∈  ()}|)</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Time-dependent Relevance Feedback</title>
        <p>This submission builds upon our relevance_feedback approach as submitted in 2024 [2], where we used
a query expansion method, making use of the relevance feedback provided by prior documents, i.e.,
those documents with a relevant label at earlier timestamps.</p>
        <p>
          We reimplemented the original pipeline with the following modifications: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) We removed French and
English stopwords, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) we removed terms with a length of less than 5 characters, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) we only considered
highly relevant (relevance score of 2) documents, and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) we calculated the tf-idf weight using the whole
PyTerrier index data, and not just the candidate documents. We calculated the tf-idf values for each
term in the highly relevant documents for each query for all previous sub-collections, and we extracted
the term with the highest tf-idf value per document. Up to 8 terms with the highest tf-idf values for
each query were used to expand the original query. In most cases, only 2 to 4 terms were used, as for
many queries, only a few highly relevant documents from previous sub-collections are available. On
the training data, we see an improvement over a simple BM25 baseline (see Table 3).
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Filtering and Boosting of Document Pairs Across sub-collections</title>
        <p>The overall idea of this approach is to use relevance information from previous sub-collections by
boosting known relevant documents. This approach was already proposed as qrel_boost in our submission
2024 and in 2025 we further refined this approach by including a filter step and a more fine-grained
boosting mechanism. In the original approach we boosted all known relevant documents independent
of any potential updates. This time we compare old and new documents not only based on the URL
but also on the document content itself. It builds on observations from previous studies on pseudo
relevance feedback [5]. Only if the old and new document content is the same or similar, we applied a
boost on the original BM25 scores.</p>
        <p>We developed and implemented three diferent methods to identify, filter, and boost document pairs
that appear in two temporally distinct sub-collections of the LongEval dataset: A length matching
approach, a document similarity comparison based on Sentence BERT, and a comparison by Jaccard
index. The overarching goal was to recognize relevant document versions that remained stable over
time, either structurally or semantically, and to integrate this information into a retrieval pipeline. All
methods were grounded in a query-document relevance mapping (query_doc_map) derived from the
oficial qrels file, which associates each query with its set of relevant documents. The result was a
dictionary assigning each query ID to a standardized list of relevant document IDs, which served as the
basis for our comparison strategies over periods. For the submitted approaches, only the qrels from the
2023-03 snapshot were used.</p>
        <p>Length Matching The first method aimed to identify document pairs whose text content had the
same length in both snapshots. We extracted and compared the text lengths for each document and
retained only those pairs where the lengths matched perfectly. This strict filtering approach provided a
fast and reliable pipeline for unchanged content, ensuring that only structurally identical document
versions were boosted by taking the original BM25 score and multiplying it by two.</p>
        <p>Length-Based Similarity with Tolerance Buckets Recognizing that minor formatting changes or
metadata updates might alter document length slightly without afecting the core content, we introduced
a more flexible filtering scheme based on length ratios. For each document pair, we computed the ratio
between the shorter and longer version and classified them into predefined buckets and assigned a
boosting factor (see Table 4). Separate filtered mappings were created for each category, allowing for
graded analysis or boosting strategies depending on the degree of length similarity.</p>
        <p>Sentence-BERT Similarity (Hard Threshold) To move beyond structural comparison and capture
true semantic stability, we employed Sentence-BERT embeddings using the all-MiniLM-L6-v2 model.
Each document version was encoded into a dense vector representation, and cosine similarity was
calculated between the two versions. Only those document pairs with a similarity score above a strict
threshold of 0.9 were retained and the original BM25 score was boosted with a factor of two. This
method enabled us to preserve documents that may difer lexically but convey the same meaning,
ofering a more context-aware filtering strategy.</p>
        <p>Sentence-BERT Similarity with Buckets) Expanding on the previous method, we categorized
document pairs into semantic similarity buckets rather than applying a hard cutof. This allowed us to
group documents based on graded similarity levels similar to the string length approach (see Table 4).</p>
        <p>Jaccard Similarity As an alternative to embedding-based methods, we also evaluated lexical overlap
through Jaccard similarity. By tokenizing and lowercasing the document texts, we computed the ratio of
intersecting to union word sets for each document pair. Document pairs exceeding a specified similarity
threshold of 0.9 were retained and boosted by a factor of two. This approach was particularly useful for
identifying minimal editorial changes.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Neural Re-ranking Supported by Semantic Clustering</title>
        <p>This approach aims to develop a search engine that goes beyond keyword matching and also evaluates
search results based on their content. Essentially, this means assigning new documents to thematic
clusters. We used machine learning to identify hidden content-related connections between content
clusters and their relevance.</p>
        <p>First, we carried out a systematic clustering of all queries from the LongEval database. The queries
were encoded with the help of OpenAI embeddings (text-embedding-3-large) into a 3072-dimensional
semantic space. We used Uniform Manifold Approximation and Projection for Dimension (UMAP) to
reduce the vectors to 50 dimensions, and then applied k-means clustering [6, 7]. The optimal number
of clusters of 56 was determined by silhouette score analysis, resulting in thematically coherent groups
(e.g. clusters with terms such as 4: “Job/Employment” or 32: “Food/Ingredients”) [8]. These clusters
served as the basis for the subsequent modeling. In Figure 2, we see a visualization of ten high-level
clusters from the original 56 topics discovered in the queries.</p>
        <p>
          All relevant document were assigned to one or many diferent clusters. After pre-processing
(lowercasing, normalization, stopword removal, SnowballStemmer for French, and punctuation removal),
we extracted term frequencies for the top 10,000 terms and used a multi-hot encoding. For the model
itself, we developed a dense neural network with TensorFlow/Keras. Our aim was to use the text
content to predict which topic clusters a document fits into and how relevant it is. The model was built
as a dual-output network with two separate prediction branches: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Cluster prediction: 56-neuron
softmax layer for topic classification, and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Relevance estimation: sigmoid activation for continuous
relevance assessment (
          <xref ref-type="bibr" rid="ref1">0-1</xref>
          ). The architecture consisted of three hidden layers (512 → 256 → 128
neurons) with LeakyReLU activation and dropout regularization ( = 0.3 ). The input features were the
10,000-dimensional multi-hot-encoded term vectors. The training was performed on documents until
2023-02 with class weighting to compensate for the relevance imbalance.
        </p>
        <p>
          The actual retrieval was a two step approach: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) 1000 candidate documents were retrieved using
PyTerrier’s BM25, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) then we applied a re-ranking based on the overlap of query and document cluster:
score_combined = {
score_bm25(, ) × 2 ×
score_bm25(, ),
sigmoid(score_cluster()),
if cluster overlap
otherwise
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
where score_cluster is the prediction of the cluster relevance predictor. So, for document that don’t
have a topical cluster overlap with the queries, we take the original bm25 score, but for overlapping
documents we alter the score to enforce a re-reranking. We can implement this process to be trained
only on few or only the preceding sub-collection of diferent sub-collection and therefore a longer time
span to enhance the training process.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The retrieval efectiveness of the presented approaches was evaluated using the nDCG@10 metric [ 9],
which aligns with the web-search context of this task. We also report the efectiveness of the BM25 [ 10]
baseline, as most systems, excluding the Sauerkraut approach, function as re-rankers applied to this
initial retrieval stage. The comprehensive results are depicted in Table 5 and Figure 3.</p>
      <p>Among the evaluated systems, the SchaeredRetrieval approach, which boosted known documents
based on the temporal type of the query, demonstrated the weakest performance across nearly all
snapshots, with nDCG@10 scores ranging from 0.20 to 0.25. Similarly, the timeliness-focused approach
by SuperTeam123 performed on par with the BM25 baseline for most snapshots. Although the nDCG@10
scores typically difered by only a few thousandths, a substantial drop occurred at the 2023-04 snapshot,
where it recorded the lowest score of 0.192 among all systems and snapshots.</p>
      <p>In contrast, the relevance feedback approach was the first to outperform the BM25 baseline. It shows
similar performance trends as the baseline but consistently achieved better results. It is the second
best unique approach. The JMFT team submitted three variations of their approach: Jaccard, Bert,
and length. All of which outperform the baseline and the other approaches. Notably, for the first two</p>
      <p>Effectiveness across Snapshots by Approach
0.20</p>
      <p>2023-03
JMFT_Jaccard</p>
      <p>Sauerkraut
snapshots, all three JMFT approaches yielded identical nDCG@10 scores, with only minor variances
observed thereafter.</p>
      <p>The final team, fair_schaer, proposed a neural relevance re-ranking model. This approach achieved
an efectiveness that positioned it between the BM25 baseline it was designed to re-rank and the
ScharedRetrieval system. This approach also did not outperform BM25. Over time, the performance
gap between this system and the BM25 baseline narrowed slightly.</p>
      <p>Overall, all submitted approaches exhibited broadly similar trends in retrieval efectiveness. A greater
variance in performance among the systems was observed in the initial two snapshots when the training
data was most recent. This variance diminished in the later snapshots, with the final snapshot showing
the least variance between the systems.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>We proposed five distinct approaches for leveraging temporal information within test collections.
While some of these methods are further developments of recent submissions, others are novel and
previously untested. Ultimately, only two approaches, relevance feedback and qrel_boosting, managed
to outperform the BM25 baseline on the test data. These results confirm, once again, that both are
efective strategies for improving retrieval efectiveness at a low computational cost. In contrast, our
ifndings indicate that the timeliness of a query could not yet be successfully utilized as an efective
relevance signal.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We gratefully acknowledge the support of the German Research Foundation (DFG) through project
grant No. 407518790.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check, Paraphrase and reword. After using these tools, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , T. Breuer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Evaluating temporal persistence using replicability measures</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), Thessaloniki, Greece,
          <source>September 18th to 21st</source>
          ,
          <year>2023</year>
          , volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2441</fpage>
          -
          <lpage>2457</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-196.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , T. Breuer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Leveraging prior relevance signals in web search</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), Grenoble, France,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2024</year>
          , volume
          <volume>3740</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>2396</fpage>
          -
          <lpage>2406</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          / paper-220.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Broder</surname>
          </string-name>
          ,
          <article-title>A taxonomy of web search</article-title>
          ,
          <source>SIGIR Forum 36</source>
          (
          <year>2002</year>
          )
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . URL: https://doi.org/10. 1145/792550.792552. doi:
          <volume>10</volume>
          .1145/792550.792552.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Blanco</surname>
          </string-name>
          ,
          <article-title>A survey of temporal web search experience</article-title>
          , in: L.
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>A. H. F.</given-names>
          </string-name>
          <string-name>
            <surname>Laender</surname>
            ,
            <given-names>B. F.</given-names>
          </string-name>
          <string-name>
            <surname>Lóscio</surname>
            ,
            <given-names>I. King</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fontoura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. P. M. de Oliveira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Lima</surname>
          </string-name>
          , E. Wilde (Eds.), 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May
          <volume>13</volume>
          -17,
          <year>2013</year>
          , Companion Volume,
          <source>International World Wide Web Conferences Steering Committee / ACM</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1101</fpage>
          -
          <lpage>1108</lpage>
          . URL: https://doi.org/10.1145/2487788.2488126. doi:
          <volume>10</volume>
          . 1145/2487788.2488126.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Evaluating elements of web-based data enrichment for pseudorelevance feedback retrieval</article-title>
          , in: K. S. Candan,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 12th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>September 21-24</source>
          ,
          <year>2021</year>
          , Proceedings, volume
          <volume>12880</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>64</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -85251-
          <issue>1</issue>
          _5. doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>030</fpage>
          - 85251- 1\_5.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>McInnes</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Healy,</surname>
          </string-name>
          <article-title>UMAP: uniform manifold approximation and projection for dimension reduction</article-title>
          , CoRR abs/
          <year>1802</year>
          .03426 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1802</year>
          .03426. arXiv:
          <year>1802</year>
          .03426.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>J. B. MacQueen,</surname>
          </string-name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          , University of California Press,
          <year>1967</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          ,
          <article-title>Silhouettes: a graphical aid to the interpretation and validation of cluster analysis</article-title>
          ,
          <source>J. Comput. Appl</source>
          . Math.
          <volume>20</volume>
          (
          <year>1987</year>
          )
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          . URL: https://doi.org/10.1016/
          <fpage>0377</fpage>
          -
          <lpage>0427</lpage>
          (
          <issue>87</issue>
          )
          <fpage>90125</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          . 1016/
          <fpage>0377</fpage>
          -
          <lpage>0427</lpage>
          (
          <issue>87</issue>
          )
          <fpage>90125</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          ,
          <article-title>Cumulated gain-based evaluation of IR techniques</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>20</volume>
          (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          . URL: http://doi.acm.
          <source>org/10</source>
          .1145/582415.582418. doi:
          <volume>10</volume>
          .1145/582415.582418.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          , Okapi at TREC-3, in: D. K. Harman (Ed.),
          <source>Proceedings of The Third Text REtrieval Conference</source>
          , TREC 1994, Gaithersburg, Maryland, USA, November 2-
          <issue>4</issue>
          ,
          <year>1994</year>
          , volume
          <volume>500</volume>
          -225 of NIST Special Publication, National Institute
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>