<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SEUPD@CLEF: RAFJAM on Longitudinal Evaluation of Model Performance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alvise Bolzonella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Broetto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Gasparini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Farhad Sadat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper is intended to be a report of the work we have done for the CLEF 2023 LongEval Lab, whose main goal is to evaluate and improve performances of IR models along time. We have implemented a basic retrieval system and then modified and extended it, focusing on diferent query expansion techniques, involving the use of synonyms and pseudo-relevance feedback. We will provide a description of our ideas, code and other development details, along with statistical analysis of the runs of our systems on diferent test collections.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information retrieval</kwd>
        <kwd>Lucene</kwd>
        <kwd>query expansion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Project setup and basic retrieval system</title>
        <p>We have decided to implement our retrieval system starting from the basic structure of the
projects contained in the Search Engines course repository1, used during our studies for learning
the fundamentals about Information Retrieval.</p>
        <p>The basic code for the Indexer, Searcher and Analyzer of the project is taken from the
hellotipster example. We have then improved our Analyzer focusing on the hello-analyzer example,
in particular we have adopted the StopFilter class from there. We have also looked into
FrenchAnalyzer class of Lucene2, which has suggested us to add an ElisionFilter using its default list
of French articles, and to stem words with FrenchLightStemFilter3. Finally, as similarity score,
after trying diferent alternatives, we have decided to use Best Matching 25 (BM25) because it
provides the best results.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Pseudo-relevance feedback</title>
        <p>
          The general idea for this technique comes from the lectures of the Search Engines course and
from some further readings [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The diferent term weighting schemas we have tried to use (see
Section 3.2.1) and Zipf’s law were presented during the lectures, too.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Synonym-substitution query expansion</title>
        <p>
          The main idea for this method comes from many papers that discuss on the lexical query
expansion [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In order to find the most relevant words to substitute with their synonyms, we
asked to an artificial intelligence, chatGPT 4, to find the most frequent words in the query list
and to provide a list of three synonyms for all this words.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The procedure to obtain the final runs results consists basically of two main steps, which are
the indexing of the documents in the collection, involving parsing operations and using a
customised analyser, and the proper searching phase, preceded by parsing and analysis of the
queries. The following sections describe our proposed solutions and implementations; they are
based on Apache Lucene5, which is a very popular API for IR. It consists of a lot of packages for
various tasks, including analysis of texts in diferent languages, including French 6.
1UniPD Search Engines repository: (https://bitbucket.org/frrncl/se-unipd/src/master/)
2Documentation: org.apache.lucene.analysis.fr.FrenchAnalyzer
3A class of the org.apache.lucene.analysis.fr library
4OpenAI, "ChatGPT", knowledge cutof: september 2021, avaiable online on https://openai.com
5T. A. S. Foundation, Apache lucene, 2022. (org.apache.lucene)
6French package:(org.apache.lucene.analysis.fr)</p>
      <sec id="sec-3-1">
        <title>3.1. Processing and indexing the documents</title>
        <p>3.1.1. Parsing
Before being stored into the index, documents have to be parsed to correctly obtain their fields.
We have inspected the structure of the .txt files with the documents: they contain IDs and
bodies, marked by diferent tags, which have suggested us to parse them using Java regular
expressions, which easily allow to detect these tags. In our project, class LEParser takes care
of processing original documents in this way, storing their content in a fresh instance of the
ParsedDocument class. LEParser extends DocumentParser, which is an abstract class providing
a general structure for parsers and which is able to iterate over ParsedDocument instances.</p>
        <sec id="sec-3-1-1">
          <title>3.1.2. Analyzing</title>
          <p>This step consists of tokenizing the content of the documents and modifying and filtering the
obtained tokens. This procedure is handled in class LEAnalyzer, which has allowed us to analyse
the content of the documents in a customized way. It extends the abstract class Analyzer by
overriding initReader and createComponents methods. By trying many diferent combinations
and computing related evaluation measures, we have decided the sequence of operations it is
supposed to do:
• tokenization with StandardTokenizer, provided by Lucene;
• getting all token letters to lowercase;
• deletion of all tokens representing stopwords and French articles (we have tried
diferent lists and chosen one of roughly 500 terms, mostly from French but also from
English language);
• deletion of single letters words, using a length filter;
• stemming, using Lucene FrenchLightStemFilter; we have observed that more aggressive
stemming algorithms, even if simpler, provide worse performances;
• deletion of numeric tokens, except for integers with more than 3 digits, which often
represent years; this filtering step is managed by our NumberFilter.</p>
          <p>Some little variations to this schema ("standard analyzing") for the diferent systems will be
briefly described.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.3. Indexing</title>
          <p>This is the phase where the documents are actually stored by Lucene IndexWriter. In class
DirectoryIndexer, the tree of directories containing the files with the documents is scanned and
each document is parsed as described before. Then, its fields are wrapped in a ParsedDocument
instance and passed to the writer, which is configured in order to perform our customized
analysis. DirectoryIndexer contains a main method, to make it runnable, in order to create the
index.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Searching</title>
        <p>
          This is the part where we spent most of our time, implementing diferent strategies: the main
idea was to improve the quality of the retrieval system modifying the queries.
We started by inspecting the files with the queries, then decided to create a simple parser
for the .tsv one to get ID and content (namely the "title" field) of each query, storing them in a
Lucene QualityQuery instance. When a new Searcher instance is created, this file reader
is invoked and, in order to ensure that query contents are by default setting analyzed like the
body of the documents, our customized LEAnalyzer is passed as a parameter to the constructor.
At searching time, boolean queries are created for each QualityQuery instance. Then, each
boolean query is searched using BM25 similarity [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Resulting scores for top documents are
stored, and a textual run file is produced.
        </p>
        <p>This is the standard procedure to search a set of queries in a document index; our
corrections are basically Query Expansion (QE) techniques which act on the text of queries before
building the BooleanQuery.</p>
        <p>Our three main ideas for query expansions are detailed below, along with a description of their
implementation.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Pseudo-relevance feedback</title>
          <p>
            Pseudo-relevance feedback technique [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] is a heuristic way to expand a query with significant
words in order to improve search results. The main assumption behind it is that top results
coming from the original query can be assumed to be relevant, and most frequent words in
them to be therefore significant for the topic.
          </p>
          <p>
            We have tried diferent implementations for this technique. As a common programming strategy,
they use Lucene term vectors to iterate through the terms of each document, in order to weight
them according to their frequency. We have observed that taking into account only
occurrences of words in the top documents, and not all the retrieved ones, leads to better performance.
The main variations were about the weighting schema: we have tried with raw total term
frequencies, average relative frequencies in the top ranked documents, and with tfidf weighting.
We have then looked into evaluation measures for the results, in particular map, and seen that
more complicated weighting systems resulted in similar or slightly lower scores. Therefore,
Occam principle suggested us to keep the simplest system, which is considering absolute
frequencies of terms across the best matching documents retrieved. Fixing a number of words to
add to the original query seemed too arbitrary, so we thought to take into account frequency
drops of words. By drop we mean a considerable delta from the predicted frequency of a word
according to Zipf’s law [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], which is an empirical law stating that the number of occurrences
of terms and their rank when sorting them by decreasing frequency are roughly inversely
proportional.
          </p>
          <p>According to this idea, in our implementation words are added to the original query starting
from the most frequent one until a fixed maximum number is reached, or a significant frequency
drop is met. This drop happens at k-th-ranked word if:</p>
          <p>·  &lt; 1 · 
We have also tried some tuning on these two parameters (kZipf and the max number of words
to be added), but unfortunately we observed that not expanding the query at all provided the
best scores. Some additional considerations are left to sections 5 and 6.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Query expansion with synonym substitution</title>
          <p>
            Synonym substitution query expansion [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] is a technique used to enhance the accuracy and
relevance of search results. It involves expanding the user’s original search query by replacing
some of the query terms with their synonyms or related terms. This helps to capture a wider
range of results that may not have been included in the original query and can improve the
precision of the search results.
          </p>
          <p>In our implementation of this technique we have used, as we explained in section 2.3, an
artificial intelligence to create a list of three synonyms for the most frequent words in the query
list. Then we create, starting from the original queries, three expanded queries obtained by
replacing the words founded in the synonym list with their first, second and third synonym.
Afterwards we compute the average score for all the ScoreDoc arrays generated by the four
queries and we return to the search method the ScoreDoc array of the query that maximize
this value.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Pseudo-relevance feedback on synonym-expanded queries</title>
          <p>We have also tried to merge the two previously described strategies.</p>
          <p>Pseudo-relevance query expansion is applied to the original query and to the three queries
obtaining by synonyms replacement, and the results which are kept are the ones coming from
the best-performing one, computing scores as described for standard query expansion with
synonyms.
3.2.4. Summary of the runs produced by the systems
• Basic: standard analyzing, without NumberFilter;
• SynQE: standard analyzing and query expansion with synonyms;
• PseudoRelQE: standard analyzing and query expansion with pseudo-relevance feedback;
• AllQE: standard analyzing without NUmberFilter and query expansion with synonyms
combined with pseudo-relevance feedback;</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Documents</title>
        <p>We used three French collections provided by Conference and Labs of the Evaluation Forum (CLEF)
for The LongEval challenge:</p>
        <p>Train collection - Consisted of almost 9 GBs of files, for a total of 1570734 documents.
Short term collection - Consisted of almost of 8 GBs of files, for a total of 1593376 documents.</p>
        <p>Long term collection - Consisted of almost of 6 GBs of files, for a total of 1081334 documents.
Train collection contains the documents, the queries and the qrels we used to train the system
on, and also some fresh queries to perform in-time testing on the same documents. The others
are test collections, containing only documents and queries.</p>
        <p>
          The collections can be found here, along with a brief description:
https://lindat.mf.cuni.cz/repository/xmlui/longeval-train-v2 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
https://lindat.mf.cuni.cz/repository/xmlui/longeval-test-collection [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Topics</title>
        <p>The topics were also provided by CLEF for the LongEval challenge and were contained in a
.tsv file that is stored in the /queries folder of the collections.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Relevance Judgements and Measures</title>
        <p>In order to test the performances of the diferent analyzers, we used trec_eval to compare
our results obtained on the train documents with the qrels given by CLEF. We focused our
attention mainly on the Mean Average Precision (MAP) since it is described as the most relevant
parameter to evaluate a run using a single number.
4.4. Tools
We developed and tested our Systems with the following experimental setups:
• Java Version: Open JDK 19.0.2;
• IDE &amp; tools: Intellij IDEA, Apache Lucene, Apache Maven;
• Computer used:
– CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
– RAM: 16,0 GB LPDDR4x
– SSD: 512 GB SSD NVMe PCle</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.5. Bitbucket Repository</title>
        <p>Link: https://bitbucket.org/upd-dei-stud-prj/seupd2223-rafjam/src/master/.</p>
        <p>We provided four solutions in the "runs" folder, which are the ones described in detail in Section 3
and in particular in 3.2.4.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Results on Training Data</title>
        <p>This section is intended to provide some numerical results and discussion about the
experiments we have conducted on training data while developing our retrieval systems. One
possible comparison ofered by Table 1 is the one between synonyms and pseudo-relevance
query expansions.</p>
        <p>While the SynQE System provides a slightly better result in terms of average precision, the
PseudoRelQE results in a higher recall. This seems reasonable, as pseudo-relevance starts from
a set of documents which have been well-ranked by a previous run, which makes them probably
relevant.</p>
        <p>As Figure 1 displays, the Basic System ofers overall better performances compared to the
other systems.</p>
        <p>The PseudoRelQE and SynQE we developed ofer similar precision, especially at low recall
value.</p>
        <p>We also tried to combine the two, in the AllQE System, but that resulted in a slight drop in the
performances.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results on Test Data</title>
        <p>In this section we provide the results obtained by running our algorithms on each of the three
available test collections (heldout, short term, long term). The first one consists of fresh queries
to be paired with the same documents of the training collection, while the other ones are made
of diferent topics and documents.</p>
        <p>(c) Long term collection.</p>
        <p>The boxplots in Figure 2 confirm that the Basic system ofers superior performances overall
in terms of nDCG, while we need some more refined tools to make statistically significant
observations about the other 3 systems. They also show that the heldout collection presents
a bigger variation in results and that they are unbalanced, with a bigger concentration at low
performance values. Comparing these with the results obtained on the other two collections,
we can assume that the quality of queries and relevance judgements in the heldout collection is
worse.</p>
        <p>Our following statistical analysis is based on nDCG scores, computed on our runs using the
relevance judgements released by CLEF for each test collection.
(b) Short term collection.</p>
        <p>(c) Long term collection.</p>
        <p>In order to better study the diference between runs with diferent systems and on diferent
queries, we have conducted a two-way ANalysis Of VAriance (ANOVA) test on nDCG
performances of our runs, considering a significance of 0.05.</p>
        <p>We want therefore to reject two null hypotheses, for each collection:
• the one stating that each system is expected to provide the same mean nDCG:
0 :   =    =   =  
• the one stating that each topic is expected to get the same mean nDCG across the systems:
0 :   = , ∀ ∈ 
The three tables in Figure 3 represents the results of the analysis.</p>
        <p>The F statistic is computed as:
(b) Short term collection.</p>
        <p>(c) Long term collection.</p>
        <p>Figure 4: Multiple comparisons of the nDCG scores of the 4 systems with Tukey HSD.</p>
        <p>Finally, we have plotted the graphs representing the results of Tukey’s Honestly Significant
Diference ( HSD) multiple comparison tests on each collection. They allow to visualize pairwise
comparisons between systems, and therefore to make more precise observations.
One important point is that we can confirm that the Basic system has better performances than
the others under nDCG: the test shows that it’s significantly comparable only with PseudoRelQE
in the heldout collection, but the fact that it has few queries makes confidence intervals
considerably larger than the ones in the other test collections, and so more easily overlapping.
Another possible consideration is that PseudoRelQE seems to be always the second best option,
although the test shows statistical evidence only in the long term collection.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>The first consideration to do is about both query expansion methods we have implemented:
we have unfortunately observed that this heuristic operations don’t lead to an improvement in
terms of precision. Therefore, we haven’t performed an organic refinement of parameters (in
particular the maximum number of words and kZipf) because the best option was to not add words.
One more satisfying observation is about pseudo-relevance query expansion: counting word
occurrences only in few top-ranked documents produces better results. That’s what we
expected, because documents ranked after the first ones in the original searching results, which
are probably the most relevant, are less likely to contain words which are significant for the
topic. In our implementation, using only the top-ranked document leads to the best performance
in terms of mean average precision.</p>
      <p>One potential improvement could be the parameter tuning which was mentioned before, and
could be performed after finding a more efective term weighting model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , M. Popel,
          <article-title>LongEvalRetrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation, arXiv.org, Information Retrieval (cs</article-title>
          .IR) arXiv:
          <fpage>2303</fpage>
          .03229 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>R. Christopher D. Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          , Introduction to Information Retrieval, 1st ed., Cambridge University Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Azad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deepak</surname>
          </string-name>
          ,
          <article-title>Query Expansion Techniques for Information Retrieval: a Survey, arXiv.org, Information Retrieval (cs</article-title>
          .IR) arXiv:
          <fpage>1708</fpage>
          .00247 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          , U. Zaragoza,
          <article-title>The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval (FnTIR) 3 (</article-title>
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Query Expansion for Sentence Retrieval Using Pseudo Relevance Feedback and Word Embedding</article-title>
          , in: G.
          <string-name>
            <surname>J. F. Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lawless</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Cappellato</surname>
          </string-name>
          , N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Eighth International Conference of the CLEF Association (CLEF 2017), Lecture Notes in Computer Science (LNCS) 10456</source>
          , Springer, Heidelberg, Germany,
          <year>2017</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Zipf</surname>
          </string-name>
          ,
          <article-title>Human Behavior and the Principle of Least Efort</article-title>
          , Addison-Wesley, Cambridge (MA), USA,
          <year>1949</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Imran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharan</surname>
          </string-name>
          , Thesaurus and query expansion, Department of Computer Science, Jamia Hamdard, New Delhi, India,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Devaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , M. Popel, LongEval train collection,
          <year>2023</year>
          . URL: http://hdl.handle.net/11234/1-5010,
          <article-title>LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL)</article-title>
          ,
          <source>Faculty of Mathematics and Physics</source>
          , Charles University.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Devaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , M. Popel, LongEval test collection,
          <year>2023</year>
          . URL: http://hdl.handle.net/11234/1-5139,
          <article-title>LINDAT/CLARIAHCZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL)</article-title>
          ,
          <source>Faculty of Mathematics and Physics</source>
          , Charles University.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>