<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CIRAL Track at FIRE 2023: Cross-lingual Information Retrieval for African Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mofetoluwa Adeyemi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akintunde Oladipo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xinyu Crystina Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Alfonso-Hermelo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehdi Rezagholizadeh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boxing Chen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jimmy Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huawei Noah's Ark Lab</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Waterloo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper provides an overview of the first CIRAL track at the Forum for Information Retrieval Evaluation 2023. The goal of CIRAL is to promote the research and evaluation of cross-lingual information retrieval for African languages. With the intent of curating a human-annotated test collection through community evaluations, our track entails retrieval between English and four African languages which are Hausa, Somali, Swahili and Yoruba. We discuss the cross-lingual information retrieval task, curation of the test collection, participation and evaluation results. Analysis of the curated pools is provided, and we also compare the efectiveness of the submitted retrieval methods. The CIRAL track did show and encourage the research prospects that exist for CLIR in African languages, and we are hopeful for the direction this takes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cross-lingual Information Retrieval</kwd>
        <kwd>African Languages</kwd>
        <kwd>Ad-hoc Retrieval</kwd>
        <kwd>Passage Ranking</kwd>
        <kwd>Community Evaluations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        constructs the first CLIR dataset in African languages, built synthetically based on Wikipedia’s
structure and covers 15 African languages. Other collections such as large scale CLIR [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
      </p>
      <sec id="sec-1-1">
        <title>CLIRMatrix [12] and the IARPA MATERIAL test collection [13] which was curated solely for</title>
        <p>low-resource languages, only cover a minimal amount of African languages.</p>
      </sec>
      <sec id="sec-1-2">
        <title>The sparsity of resources and the bid to promote participation in CLIR research for African</title>
        <p>languages calls out the construction of CIRAL, which stands for Cross-lingual Information</p>
      </sec>
      <sec id="sec-1-3">
        <title>Retrieval for African Languages. The CIRAL track hosted at the Forum for Information Retrieval</title>
      </sec>
      <sec id="sec-1-4">
        <title>Evaluation (FIRE) focused on cross-lingual passage retrieval covering 4 African languages: Hausa,</title>
      </sec>
      <sec id="sec-1-5">
        <title>Somali, Swahili, and Yoruba, which are some of the most widely spoken languages in Africa.</title>
      </sec>
      <sec id="sec-1-6">
        <title>Given the low-resource nature of African languages, even in widely used sources like Wikipedia,</title>
      </sec>
      <sec id="sec-1-7">
        <title>CIRAL’s collection is built with articles from the indigenous news domain of the respective</title>
        <p>
          languages. Similar to the passage ranking task in TREC’s Deep learning track [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], relatively
few queries (80 to 100) are developed for the task. The queries and judgments are produced by
native speakers who took the roles of query developers and relevance assessors. As is often
the culture in community evaluations, CIRAL also set out to curate a test collection by pooling
submissions from track participants.
        </p>
      </sec>
      <sec id="sec-1-8">
        <title>In hosting CIRAL, we look out for: 1) The efectiveness of indigenous textual data in CLIR</title>
        <p>for African languages, 2) A comparison of how well diferent retrieval methods perform in
CLIR for African languages, 3) The importance of retrieval and participation diversity. An
overview of the curated collection, query development and relevance assessment process is
provided and results from relevance assessment demonstrate the efectiveness of retrieving
relevant passages from indigenous sources. Participation in the track and submissions for the
respective languages are also discussed, comparing diferent retrieval methods employed in the
task. Taking into consideration participation, submissions and other factors, we examine the
test collection curated from the task, which informs future decisions in community evaluations
for African languages.</p>
      </sec>
      <sec id="sec-1-9">
        <title>We hope CIRAL fosters CLIR evaluation and research in African languages and in low</title>
        <p>resource settings, and hence the development of retrieval systems that are well suited for such
tasks. Details of the track are also available on the provided website.1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <sec id="sec-2-1">
        <title>The focus task at CIRAL was cross-lingual passage ranking between English and African</title>
        <p>languages. For a kick-of, only four African languages were included this year: Hausa, Somali,</p>
      </sec>
      <sec id="sec-2-2">
        <title>Swahili, and Yoruba, which are selected according to the size of native speakers of the languages</title>
        <p>in East and West Africa. All four languages are in Latin script, two belonging to the Afro-Asiatic
language family, and the other two to the Niger-Congo family. See details of the languages in
Table 1. We choose English as the pivot language as it is an oficial language in countries where
these African languages are spoken, with the exception of Somali whose speakers lean more
towards Arabic than English.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Given English queries, participants are tasked with developing retrieval systems which</title>
        <p>return ranked passages in the African languages according to the estimated likelihood of
relevance. Queries are formulated as natural language questions and passages are judged using
binary relevance: 0 for non-relevant and 1 for relevant. The relevant passage is defined as
the one that answers the question, whereas the non-relevant passage does not. To facilitate
the development and evaluation of their retrieval systems, participants were provided with a
training set comprising a sample of 10 queries for each language, their relevance judgments
and the passage collection for the languages. Considering the nature of the task, we evaluate
for early precision and recall using metrics such as nDCG@20 and Recall@100 and participants
were also made aware of these in developing their systems. For evaluations, the test set of
queries was provided for which submitted runs were manually judged to form query pools.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Participants were also encouraged to rank their submitted runs in the order that they preferred to contribute to the pools. Details on the provided passage collection, development of queries, and pooling process are discussed in the following sections.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Passage Collection</title>
      <p>CIRAL’s passage collection is curated from indigenous news websites and blogs for each of
the four languages. These sites serve as a source of local and international information and
as shown in Table 2, are a huge source of text for their languages. The articles are collected
using a web scrapping framework called Otelemuye2 and combined into monolingual document
sets. The collected articles date from as early as was available on the website (which was the
early 2000s for some languages) up until March 2023. Passages are generated from the set by
chunking each news article on a sentence level using a sliding-window segmentation [15]. To
ensure natural discourse segments when chunking the articles, a stride window of 3 is used with
2https://github.com/theyorubayesian/otelemuye
a maximum of 6 sentences per window. The resulting passages are further filtered to remove
those with less than 7 or more than 200 words. Table 2 shows the median and average number of
tokens per passage in each language, providing more insight into their passage distribution. To
ensure each passage is in its required language, we filter using the language’s list of stopwords,
hence removing passages in a diferent language; a minimum of 3 to 5 stopwords was used to
ascertain if a passage was in its African language. The resulting number of passages is shown
in Table 2.</p>
      <sec id="sec-3-1">
        <title>The curated passages are provided in JSONL files, each line representing a JSON object with</title>
        <p>details about a passage. Passages have the following fields:
• docid: Unique identifier
• title: The headline of the news article from which it was obtained.
• text: The passage body.</p>
        <p>• url: The link to the news article from which it was gotten.</p>
        <p>The unique identifier ( docid) for each passage is constructed programmatically to have the
format source#article_id#passage_id providing information on the news website and
specific article number the passage was extracted from in the monolingual set. This is also
helpful as there are a few news articles without titles, hence leaving the respective passages
without a text in the title field. The passage collection files were made publicly available to
participants in a Hugging face dataset repository.3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Query Development</title>
      <p>
        Queries for the task are formulated as natural language questions, modelling that of collections
such as MS MARCO [16] which is used in TREC’s Deep learning track [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the MIRACL
dataset [17], among others. Considering the passage collection for a language was curated from
its indigenous websites, language queries had to have topics either of interest to the speakers of
the language or with information that can be easily found in the language’s news. We term
these queries language/cultural-specific queries ,4 which is a combination of queries with both
generic and indigenous topics depending on the language. The language-specific queries are
developed as factoids to ensure answers are direct and unambiguous.
      </p>
      <p>The process of query development involved native speakers generating questions with
answers in the language’s news. For this task, articles from the MasakhaNews dataset [18] are
used as a source of inspiration for the query formation. The MasakhaNews dataset is a news
topic classification dataset and covers 16 African languages. It serves as a good starting point
given that the documents in the dataset have been classified into categories namely: business,
entertainment, health, politics, religion, sport and technology, providing a more direct approach for
generating diverse queries. Using the same passage preprocessing implemented in generating
the passage collection, articles in MasakhaNews are chunked into various passages but with
an additional category field to jointly inspire queries. Query developers (interchangeably</p>
      <sec id="sec-4-1">
        <title>3https://huggingface.co/datasets/CIRAL/ciral-corpus</title>
      </sec>
      <sec id="sec-4-2">
        <title>4The generated queries also include some which are generic, but we term the queries language-specific due to</title>
        <p>the news data also capturing events which are mostly of interest in the language.
called annotators) are native speakers of the languages with reading and writing proficiency in
both English and their respective African languages. To generate these queries, annotators are
given the MasakhaNews passage snippets and tasked with generating questions inspired but
not answerable by the snippet to ensure good-quality queries. The questions are generated in
the African language and then translated into English by the annotator.</p>
        <p>To ensure that generated queries had relevant passages, the annotators checked if an inspired
question had passages answering the question in the CIRAL collection. This was done via a
search interface developed as a Hugging Face space for each language using Spacerini.5 As
shown in the Figure 1, the annotators provide the query in its African language, its English
translation and the docid of the passage that inspired it and search was monolingual i.e. using
the query in its African language. Using a hybrid of BM25 [19] and AfriBERTa DPR indexes,6
the top 20 retrieved documents were annotated for relevance with selections for either true or
false. Relevance annotation was done as follows:
• Relevant (True): The annotator selected true if the passage answered the question or
implied the answer without doubt.
• Non-relevant (False): The annotator selected false if the passage didn’t answer the
question.</p>
        <p>Instances where the passages gave partial or incomplete answers to the question also occurred
and depending on the level of incompleteness, the annotators judged the passages as
nonrelevant. Passages annotated as true in the interface were assigned a relevance of 1 and
those annotated as false a relevance of 0. Queries retained and distributed in the task had
at least 1 relevant passage and no more than 15 relevant passages to avoid way too simple
queries for the systems. Ambiguous or incomprehensible queries were also filtered out from
the collection. A set of 10 queries for each language was first developed and released along
with the corresponding judgments as train samples. Subsequently, the test queries for which</p>
      </sec>
      <sec id="sec-4-3">
        <title>5https://github.com/castorini/hf-spacerini</title>
      </sec>
      <sec id="sec-4-4">
        <title>6https://huggingface.co/castorini/afriberta-dpr-ptf-msmarco-ft-latin-mrtydi</title>
        <sec id="sec-4-4-1">
          <title>Language</title>
          <p>the pooling process was to be carried out were released: 85 for Hausa, 100 for Somali, 85 for</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>Swahili and 100 for Yoruba as presented in Table 3.</title>
      </sec>
      <sec id="sec-4-6">
        <title>Judgments obtained during the query development process are referred to as shallow consid</title>
        <p>ering they are few. The number of shallow judgments obtained through the query development
process for the test queries is also shown in Table 3, and these judgments are reserved in the
pool formation during relevance assessment. The diferent timelines for which each set was
released, along with the run submission and result distribution dates are provided in Table 4.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Relevance Assessment</title>
      <sec id="sec-5-1">
        <title>As often practised in community evaluation, runs submitted for the test set are manually judged</title>
        <p>to form the test collection’s qrels via pooling. A total of 84 submissions were made by the
participating teams, 21 for each language. Using the ranked list of runs provided by the teams,
query pools were formed for each language and we provide details on the relevance assessment
process and analysis of pools in this section.
5.1. Pooling Process</p>
      </sec>
      <sec id="sec-5-2">
        <title>The top 3 ranked submissions from participating teams contributed to the pooling process, with subsequent additions depending on available time and assessment resources. A total of 40 runs contributed to the pools across the four languages, depending on the model type; however dense models made up more of the contributing runs. The pool depth for submissions was kept</title>
        <sec id="sec-5-2-1">
          <title>Hausa Somali Swahili Yoruba</title>
          <p>Minimum across queries
Maximum across queries
Total pool size
at a constant of  = 20, however, there was no restricted size for the pools. Judgments were
carried out by two assessors for each language, where an assessor judged the full pool of a given
query; the test set was split into distinct halves and each assigned to an assessor. Assessors
provided judgments on a binarized scale using the following description:
• Relevant: The passage answers the query, or the answer can be very easily implied from
the passage.
• Non-relevant: The passage doesn’t answer the question at all, or is related to the question
but doesn’t answer the question.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Relevant passages are given a judgment of 1 and non-relevant a judgment of 0.</title>
        <p>5.2. Pool Analysis</p>
      </sec>
      <sec id="sec-5-4">
        <title>The total pool size obtained for each language from the relevance assessment is presented in</title>
        <p>Table 5. This includes the sparse judgments obtained during query development, which were
also re-assessed during the relevance assessment phase. Across the languages, the minimum
number of judgments per query ranges from 40 to 60 while some queries have up to over 120
judgments. 3 queries in Hausa, 4 in Somali, 2 in Swahili and 12 in Yoruba have pool sizes
of less than 60 passages, indicating that contributing runs retrieved similar sets of passages
for these queries in their top 20 ranks. Runs which contributed to the pooling process also
retrieved more relevant passages across the four languages as seen in Table 6. However, certain
queries were found to have no relevant passages and were discarded. This was as a result of
wrongly annotated passages from the query development phase, or grammatical errors which
afected retrieval results. This left Hausa with 80 test queries as opposed to the initial 85 queries
and Somali with 99 as opposed to 100. There were also a few queries with just 1 relevant
passage across the languages with Yoruba having the highest of 5 queries. The increased
amount of relevant passages obtained from the pooling process is a good indication that African
indigenous websites are a great source for retrieval, especially coupled with queries of interest
to the language speakers, which could also include generic topics.</p>
        <p>Table 6 also indicates that a large number of relevant passages were obtained for certain
queries. Considering the minimal number of runs that contributed to the pools, this raises the
concern that more relevant passages might remain unjudged, especially for runs that didn’t
contribute to the pooling process or are evaluated after the track. We analyse the number
of queries with the highest tendency of having unjudged relevant passages using relevance
densities. The relevance density of a query is its number of relevant passages compared to its
pool size, and we adopt a rule of thumb that queries with relevance densities 0.6 and higher
very likely still have unjudged relevant passages. Figure 2 gives a distribution of the relevance
densities for each language and we find that the number of queries with densities higher than
0.6 is less than 5 across the languages. There is also a higher amount of queries having densities
between 0.6 and 0.4, with Swahili and Hausa having up to 22 to 25% of queries. Although this
approach to analysing the completeness of judgment isn’t holistic, it provides some insight
on the quantity of queries in each language that would most certainly have unjudged relevant
passages from new systems.
Mean
0.1624
0.1483
0.1406
0.2135</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Analysis</title>
      <sec id="sec-6-1">
        <title>An overview of participants’ submissions and the results obtained from evaluating submitted</title>
        <p>runs on the pooled qrels is provided in this section. Results are also analysed at the query level
to identify query dificulty, as well as the efectiveness of the submitted runs and model type.</p>
        <p>A total of 3 teams participated in the CIRAL track with 84 runs submitted. Considering that
cross-lingual passage ranking was the major task, participants weren’t given any specifications
on the retrieval type to employ and submissions comprised dense (52), reranking (20), hybrid
(8) and sparse (4) methods. All submissions covered the four languages hence there is an equal
number of runs among the languages. The retrieval methods employed by participating teams
are properly discussed in their working notes.
6.1. Overall Results</p>
      </sec>
      <sec id="sec-6-2">
        <title>We present the results statistics of all languages in Table 7, and the detailed results of all</title>
        <p>submitted runs in Tables 8, 9, 10 and 11. The nDCG@20, MRR@10, Recall@100, and MAP@100
scores for each submission are reported and the average and maximum scores can be found in
Table 7. The main metric in the task is nDCG@20 and a cut-of of k=20 is used considering a
decent number of queries had above 10 relevant passages during query development. Dense
models make up 62% of submissions for each language and have the highest average scores
across the metrics. Most submissions employ end-to-end cross-lingual retrieval with a few
document translation methods represented as DT in the table. However, the top 2 performing
submissions across the languages employ document translation at one stage or the other in
their systems and have the highest scores for all metrics.</p>
        <p>The efectiveness of model types is better visualized in Figure 3. Runs are ordered by the
nDCG@20 scores, and though dense runs make up most of the top runs, there is a variation
in efectiveness across the dense models. The efectiveness of reranking methods also varies
widely across the languages, with the exception of Yoruba where reranking models have the
top nDCG@20 scores as seen in Figure 3a. Given there wasn’t a specific task on reranking,
submitted runs employ diferent first and second-stage methods which has an impact on the
varying degree of output quality. However, the best reranking run outperformed the best dense
run across the languages with the exception of Somali. The submission pool has a very minimal
number of hybrid and sparse runs, giving insuficient room for comparison of the model types
on the task. The sparse run, however, outperforms some of the dense and reranking runs and
achieves competitive nDCG scores, especially in Somali and Yoruba.</p>
        <p>Dense models achieve higher recall@100 across all languages as seen in Figure 3b. Maintaining
the same order by nDCG@20, runs not having a high nDCG@20 retrieved more relevant
passages in their top 100 candidates. With the exception of Yoruba and the best reranking
model, reranking generally achieved lower recall@100, with even the sparse run achieving a
better score across the languages. These results indicate that many of the submitted systems
have relevant passages at deeper depths, however, due to the nature of the task, we optimize for
early rankings using nDCG@20.
6.2. Query-level Results
Figures 4, 5, 6 and 7 provide query level efectiveness using nDCG@20 and queries are ordered
by the median scores across evaluated runs. The median nDCG score for a good percentage
of queries is greater than 0, indicating that most submissions do not perform too badly on
individual queries across the languages. Certain queries, such as 41 in Hausa, also have quite
a gap between the maximum score obtained and the scores by the rest of the runs, indicating
specific runs perform better on these queries compared to other runs. The same can be said for
queries like 81 in Swahili, where only a few runs identify the relevant passages of the query.</p>
      </sec>
      <sec id="sec-6-3">
        <title>This implies that these runs understand the semantics of the query and such queries could boost</title>
        <p>the scores of systems that are able to retrieve its relevant documents.</p>
      </sec>
      <sec id="sec-6-4">
        <title>We also analyse the query dificulty across the languages, as queries that are too easy or</title>
        <p>dificult are not ideal in distinguishing systems’ efectiveness. Examples of these are queries 72 in
the Hausa language and 433 in Yoruba where the median nDCG@20 score is 1.0 across submitted
systems, making them very easy queries and problematic for evaluation. There are also quite a
number of dificult queries across the languages, with Somali having the highest, where only a
few outliers score higher than 0 nDCG@20. However, a good number of queries such as 21 in</p>
      </sec>
      <sec id="sec-6-5">
        <title>Swahili and 161 in Somali have a decent spread of scores and are ideal for evaluation.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The CIRAL track was held for the first time at the Forum for Information Retrieval Evaluation
(FIRE) 2023, with the goal of promoting the research and evaluation of cross-lingual
information retrieval for African languages. The task covered passage retrieval between English and
four African languages and test collections were curated for these languages via community
evaluations. Submissions from participating teams comprise mostly dense single-stage retrieval
systems, and these make up most of the best-performing systems on the task. Some limitations
faced this year include a minimal number of participants and less diversity in submitted retrieval
systems. Despite the limitations, we hope the CIRAL track evolves and the curated collection
matures into its most reliable and reusable version.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <sec id="sec-8-1">
        <title>This research was supported in part by the Natural Sciences and Engineering Research Council</title>
        <p>(NSERC) of Canada. We would like to thank the Masakhane community7 for their contributions
in the query development phase of the project. We also appreciate John Hopkins University
HLTCOE, organizers of the NeuCLIR track at TREC,8 for contributing the English translations
of the passage collections to the track.</p>
      </sec>
      <sec id="sec-8-2">
        <title>7https://www.masakhane.io/</title>
      </sec>
      <sec id="sec-8-3">
        <title>8https://neuclir.github.io/</title>
      </sec>
      <sec id="sec-8-4">
        <title>Deep Learning track, arXiv preprint arXiv:2003.07820 (2020).</title>
        <p>[15] M. S. Tamber, R. Pradeep, J. Lin, Pre-processing matters! Improved Wikipedia corpora for
open-domain question answering, in: Proceedings of the 45th European Conference on</p>
      </sec>
      <sec id="sec-8-5">
        <title>Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III,</title>
        <p>Springer, 2023, pp. 163–176.
[16] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, MS MARCO:</p>
      </sec>
      <sec id="sec-8-6">
        <title>A human-generated MAchine Reading COmprehension dataset (2016).</title>
        <p>[17] X. Zhang, N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu,</p>
      </sec>
      <sec id="sec-8-7">
        <title>M. Rezagholizadeh, J. Lin, Making A MIRACL: Multilingual information retrieval across a</title>
        <p>continuum of languages, arXiv preprint arXiv:2210.09984 (2022).
[18] D. I. Adelani, M. Masiak, I. A. Azime, J. O. Alabi, A. L. Tonja, C. Mwase, O. Ogundepo, B. F.</p>
      </sec>
      <sec id="sec-8-8">
        <title>Dossou, A. Oladipo, D. Nixdorf, et al., MasakhaNews: News topic classification for African</title>
        <p>languages, arXiv preprint arXiv:2304.09972 (2023).
[19] S. Robertson, H. Zaragoza, et al., The probabilistic relevance framework: BM25 and beyond,</p>
      </sec>
      <sec id="sec-8-9">
        <title>Foundations and Trends® in Information Retrieval 3 (2009) 333–389.</title>
        <p>bm25-dt-mT5-pft-rerank
dt.plaid
plaid-xlmr.mlmfine.tt
plaid-xlmr.mlmfine.tt.jholo
plaid-xlmr.tt
plaid-xlmr.mlmfine.et
plaid-xlmr.et
bm25-dt-afrimt5-rerank
afroxlmr_..._ft_ckpt2000
afro_xlmr_..._sw_mrtydi_ft
splade.bm25-mt5-rerank
splade-mt5-rerank
afroxlmr_base_..._ckpt1000
afriberta_base_ckpt25k
dt.bm25-rm3
splade-afrimt5-rerank
afriberta_base_..._mrtydi_ft
hybrid_afriberta_dpr_splade
hybrid_mdpr_msmarco_clir_splade
afriberta_base_..._sw_miracl
afriberta_..._sw_miracl_ft_ckpt100
h2oloo
HLTCOE
HLTCOE
HLTCOE
HLTCOE
HLTCOE
HLTCOE
h2oloo
Masakhane
Masakhane
h2oloo
h2oloo
Masakhane
Masakhane
HLTCOE
h2oloo
Masakhane
h2oloo
h2oloo
Masakhane
Masakhane</p>
        <p>HLTCOE
h2oloo
HLTCOE
HLTCOE
HLTCOE
HLTCOE
HLTCOE
h2oloo
HLTCOE
h2oloo
h2oloo
Masakhane
Masakhane
Masakhane
h2oloo
h2oloo
Masakhane</p>
        <p>h2oloo
Masakhane
Masakhane
Masakhane</p>
        <p>End-to-End
Team</p>
        <p>End-to-End</p>
        <p>Model Type</p>
        <p>Recall@100
MAP
bm25-dt-mT5-pft-rerank
dt.plaid
plaid-xlmr.mlmfine.tt
plaid-xlmr.mlmfine.tt.jholo
plaid-xlmr.tt
plaid-xlmr.mlmfine.et
afroxlmr_..._ft_ckpt2000
plaid-xlmr.et
afroxlmr_base_..._ckpt1000
bm25-dt-afrimt5-rerank
afro_xlmr_..._sw_mrtydi_ft
splade.bm25-mt5-rerank
splade-mt5-rerank
dt.bm25-rm3
afriberta_base_ckpt25k
afriberta_base_..._mrtydi_ft
splade-afrimt5-rerank
hybrid_afriberta_dpr_splade
afriberta_base_..._sw_miracl
afriberta_..._sw_miracl_ft_ckpt100
hybrid_mdpr_msmarco_clir_splade
h2oloo
HLTCOE
HLTCOE
HLTCOE
HLTCOE</p>
        <p>HLTCOE
Masakhane</p>
        <p>HLTCOE
Masakhane</p>
        <p>h2oloo
Masakhane
h2oloo
h2oloo
HLTCOE
Masakhane
Masakhane
h2oloo
h2oloo
Masakhane
Masakhane
h2oloo
h2oloo
HLTCOE
HLTCOE
HLTCOE
h2oloo
h2oloo
h2oloo
HLTCOE
HLTCOE
HLTCOE
h2oloo
HLTCOE
Masakhane
Masakhane
Masakhane</p>
        <p>h2oloo
Masakhane
Masakhane</p>
        <p>h2oloo
Masakhane
Masakhane</p>
        <p>End-to-End
Team</p>
        <p>End-to-End</p>
        <p>Model Type</p>
        <p>Recall@100</p>
        <p>MAP
Figure 4: Boxplots showing nDCG@20 for Hausa Queries.
Figure 5: Boxplots showing nDCG@20 for Somali Queries
Figure 6: Boxplots showing nDCG@20 for Swahili Queries
Figure 7: Boxplots showing nDCG@20 for Yoruba Queries</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schäuble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sheridan</surname>
          </string-name>
          ,
          <article-title>Cross-language information retrieval (CLIR) track overview</article-title>
          ,
          <source>NIST SPECIAL PUBLICATION SP</source>
          (
          <year>1998</year>
          )
          <fpage>31</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <article-title>Information retrieval evaluation in a changing world lessons learned from 20 years of CLEF (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maiti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Modak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <article-title>The FIRE 2008 evaluation exercise</article-title>
          ,
          <source>ACM Transactions on Asian Language Information Processing (TALIP) 9</source>
          (
          <issue>2010</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kuriyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nozue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hidaka</surname>
          </string-name>
          ,
          <article-title>Overview of IR tasks at the first NTCIR workshop</article-title>
          ,
          <source>in: Proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition</source>
          ,
          <year>1999</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          , S. MacAvaney, J. Mayfield,
          <string-name>
            <given-names>P.</given-names>
            <surname>McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          , E. Yang,
          <article-title>Overview of the TREC 2022 NeuCLIR track</article-title>
          ,
          <source>arXiv preprint arXiv:2304.12367</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ogueji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Toward best practices for training multilingual dense retrieval models</article-title>
          ,
          <source>ACM Transactions on Information Systems</source>
          <volume>42</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yarmohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , S. Hisamoto,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Povey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koehn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          ,
          <article-title>Robust document representations for cross-lingual information retrieval in lowresource settings</article-title>
          ,
          <source>in: Proceedings of Machine Translation Summit XVII: Research Track</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <article-title>Combining contextualized and non-contextualized query translations to improve CLIR</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1581</fpage>
          -
          <lpage>1584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zbib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Weakly supervised attentional model for low resource ad-hoc cross-lingual information retrieval</article-title>
          ,
          <source>in: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo</source>
          <year>2019</year>
          ),
          <year>2019</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ogundepo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Sun,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lin,</surname>
          </string-name>
          <article-title>AfriCLIRMatrix: Enabling cross-lingual information retrieval for African languages</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>8721</fpage>
          -
          <lpage>8728</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schamoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inui</surname>
          </string-name>
          ,
          <article-title>Cross-lingual learning-to-rank with shared representations</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>458</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          , K. Duh,
          <article-title>CLIRMatrix: A massively large collection of bilingual and multilingual datasets for cross-lingual information retrieval</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4160</fpage>
          -
          <lpage>4170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rubino</surname>
          </string-name>
          ,
          <article-title>Machine translation for English retrieval of information in any language (machine translation for English-based domain-appropriate triage of information in any language), in: Conferences of the Association for Machine Translation in the Americas: MT Users' Track, The Association for Machine Translation in the Americas</article-title>
          , Austin, TX, USA,
          <year>2016</year>
          , pp.
          <fpage>322</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <article-title>Overview of the TREC 2019</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>