1. Introduction

Overview of the CIRAL Track at FIRE 2023: Cross-lingual Information Retrieval for African Languages

Mofetoluwa Adeyemi

Akintunde Oladipo

Xinyu Crystina Zhang

David Alfonso-Hermelo

Mehdi Rezagholizadeh

Boxing Chen

Jimmy Lin

Huawei Noah's Ark Lab

0 University of Waterloo

This paper provides an overview of the first CIRAL track at the Forum for Information Retrieval Evaluation 2023. The goal of CIRAL is to promote the research and evaluation of cross-lingual information retrieval for African languages. With the intent of curating a human-annotated test collection through community evaluations, our track entails retrieval between English and four African languages which are Hausa, Somali, Swahili and Yoruba. We discuss the cross-lingual information retrieval task, curation of the test collection, participation and evaluation results. Analysis of the curated pools is provided, and we also compare the efectiveness of the submitted retrieval methods. The CIRAL track did show and encourage the research prospects that exist for CLIR in African languages, and we are hopeful for the direction this takes.

eol>Cross-lingual Information Retrieval African Languages Ad-hoc Retrieval Passage Ranking Community Evaluations

1. Introduction

constructs the first CLIR dataset in African languages, built synthetically based on Wikipedia’s structure and covers 15 African languages. Other collections such as large scale CLIR [ 11 ],

CLIRMatrix [12] and the IARPA MATERIAL test collection [13] which was curated solely for

low-resource languages, only cover a minimal amount of African languages.

The sparsity of resources and the bid to promote participation in CLIR research for African

languages calls out the construction of CIRAL, which stands for Cross-lingual Information

Retrieval for African Languages. The CIRAL track hosted at the Forum for Information Retrieval Evaluation (FIRE) focused on cross-lingual passage retrieval covering 4 African languages: Hausa, Somali, Swahili, and Yoruba, which are some of the most widely spoken languages in Africa. Given the low-resource nature of African languages, even in widely used sources like Wikipedia, CIRAL’s collection is built with articles from the indigenous news domain of the respective

languages. Similar to the passage ranking task in TREC’s Deep learning track [ 14 ], relatively few queries (80 to 100) are developed for the task. The queries and judgments are produced by native speakers who took the roles of query developers and relevance assessors. As is often the culture in community evaluations, CIRAL also set out to curate a test collection by pooling submissions from track participants.

In hosting CIRAL, we look out for: 1) The efectiveness of indigenous textual data in CLIR

for African languages, 2) A comparison of how well diferent retrieval methods perform in CLIR for African languages, 3) The importance of retrieval and participation diversity. An overview of the curated collection, query development and relevance assessment process is provided and results from relevance assessment demonstrate the efectiveness of retrieving relevant passages from indigenous sources. Participation in the track and submissions for the respective languages are also discussed, comparing diferent retrieval methods employed in the task. Taking into consideration participation, submissions and other factors, we examine the test collection curated from the task, which informs future decisions in community evaluations for African languages.

We hope CIRAL fosters CLIR evaluation and research in African languages and in low

resource settings, and hence the development of retrieval systems that are well suited for such tasks. Details of the track are also available on the provided website.1

2. Task Description The focus task at CIRAL was cross-lingual passage ranking between English and African

languages. For a kick-of, only four African languages were included this year: Hausa, Somali,

Swahili, and Yoruba, which are selected according to the size of native speakers of the languages

in East and West Africa. All four languages are in Latin script, two belonging to the Afro-Asiatic language family, and the other two to the Niger-Congo family. See details of the languages in Table 1. We choose English as the pivot language as it is an oficial language in countries where these African languages are spoken, with the exception of Somali whose speakers lean more towards Arabic than English.

Given English queries, participants are tasked with developing retrieval systems which

return ranked passages in the African languages according to the estimated likelihood of relevance. Queries are formulated as natural language questions and passages are judged using binary relevance: 0 for non-relevant and 1 for relevant. The relevant passage is defined as the one that answers the question, whereas the non-relevant passage does not. To facilitate the development and evaluation of their retrieval systems, participants were provided with a training set comprising a sample of 10 queries for each language, their relevance judgments and the passage collection for the languages. Considering the nature of the task, we evaluate for early precision and recall using metrics such as nDCG@20 and Recall@100 and participants were also made aware of these in developing their systems. For evaluations, the test set of queries was provided for which submitted runs were manually judged to form query pools.

Participants were also encouraged to rank their submitted runs in the order that they preferred to contribute to the pools. Details on the provided passage collection, development of queries, and pooling process are discussed in the following sections. 3. Passage Collection

CIRAL’s passage collection is curated from indigenous news websites and blogs for each of the four languages. These sites serve as a source of local and international information and as shown in Table 2, are a huge source of text for their languages. The articles are collected using a web scrapping framework called Otelemuye2 and combined into monolingual document sets. The collected articles date from as early as was available on the website (which was the early 2000s for some languages) up until March 2023. Passages are generated from the set by chunking each news article on a sentence level using a sliding-window segmentation [15]. To ensure natural discourse segments when chunking the articles, a stride window of 3 is used with 2https://github.com/theyorubayesian/otelemuye a maximum of 6 sentences per window. The resulting passages are further filtered to remove those with less than 7 or more than 200 words. Table 2 shows the median and average number of tokens per passage in each language, providing more insight into their passage distribution. To ensure each passage is in its required language, we filter using the language’s list of stopwords, hence removing passages in a diferent language; a minimum of 3 to 5 stopwords was used to ascertain if a passage was in its African language. The resulting number of passages is shown in Table 2.

The curated passages are provided in JSONL files, each line representing a JSON object with

details about a passage. Passages have the following fields: • docid: Unique identifier • title: The headline of the news article from which it was obtained. • text: The passage body.

• url: The link to the news article from which it was gotten.

The unique identifier ( docid) for each passage is constructed programmatically to have the format source#article_id#passage_id providing information on the news website and specific article number the passage was extracted from in the monolingual set. This is also helpful as there are a few news articles without titles, hence leaving the respective passages without a text in the title field. The passage collection files were made publicly available to participants in a Hugging face dataset repository.3

4. Query Development

Queries for the task are formulated as natural language questions, modelling that of collections such as MS MARCO [16] which is used in TREC’s Deep learning track [ 14 ], the MIRACL dataset [17], among others. Considering the passage collection for a language was curated from its indigenous websites, language queries had to have topics either of interest to the speakers of the language or with information that can be easily found in the language’s news. We term these queries language/cultural-specific queries ,4 which is a combination of queries with both generic and indigenous topics depending on the language. The language-specific queries are developed as factoids to ensure answers are direct and unambiguous.

The process of query development involved native speakers generating questions with answers in the language’s news. For this task, articles from the MasakhaNews dataset [18] are used as a source of inspiration for the query formation. The MasakhaNews dataset is a news topic classification dataset and covers 16 African languages. It serves as a good starting point given that the documents in the dataset have been classified into categories namely: business, entertainment, health, politics, religion, sport and technology, providing a more direct approach for generating diverse queries. Using the same passage preprocessing implemented in generating the passage collection, articles in MasakhaNews are chunked into various passages but with an additional category field to jointly inspire queries. Query developers (interchangeably

3https://huggingface.co/datasets/CIRAL/ciral-corpus 4The generated queries also include some which are generic, but we term the queries language-specific due to

the news data also capturing events which are mostly of interest in the language. called annotators) are native speakers of the languages with reading and writing proficiency in both English and their respective African languages. To generate these queries, annotators are given the MasakhaNews passage snippets and tasked with generating questions inspired but not answerable by the snippet to ensure good-quality queries. The questions are generated in the African language and then translated into English by the annotator.

To ensure that generated queries had relevant passages, the annotators checked if an inspired question had passages answering the question in the CIRAL collection. This was done via a search interface developed as a Hugging Face space for each language using Spacerini.5 As shown in the Figure 1, the annotators provide the query in its African language, its English translation and the docid of the passage that inspired it and search was monolingual i.e. using the query in its African language. Using a hybrid of BM25 [19] and AfriBERTa DPR indexes,6 the top 20 retrieved documents were annotated for relevance with selections for either true or false. Relevance annotation was done as follows: • Relevant (True): The annotator selected true if the passage answered the question or implied the answer without doubt. • Non-relevant (False): The annotator selected false if the passage didn’t answer the question.

Instances where the passages gave partial or incomplete answers to the question also occurred and depending on the level of incompleteness, the annotators judged the passages as nonrelevant. Passages annotated as true in the interface were assigned a relevance of 1 and those annotated as false a relevance of 0. Queries retained and distributed in the task had at least 1 relevant passage and no more than 15 relevant passages to avoid way too simple queries for the systems. Ambiguous or incomprehensible queries were also filtered out from the collection. A set of 10 queries for each language was first developed and released along with the corresponding judgments as train samples. Subsequently, the test queries for which

5https://github.com/castorini/hf-spacerini 6https://huggingface.co/castorini/afriberta-dpr-ptf-msmarco-ft-latin-mrtydi Language

the pooling process was to be carried out were released: 85 for Hausa, 100 for Somali, 85 for

Swahili and 100 for Yoruba as presented in Table 3. Judgments obtained during the query development process are referred to as shallow consid

ering they are few. The number of shallow judgments obtained through the query development process for the test queries is also shown in Table 3, and these judgments are reserved in the pool formation during relevance assessment. The diferent timelines for which each set was released, along with the run submission and result distribution dates are provided in Table 4.

5. Relevance Assessment As often practised in community evaluation, runs submitted for the test set are manually judged

to form the test collection’s qrels via pooling. A total of 84 submissions were made by the participating teams, 21 for each language. Using the ranked list of runs provided by the teams, query pools were formed for each language and we provide details on the relevance assessment process and analysis of pools in this section. 5.1. Pooling Process

The top 3 ranked submissions from participating teams contributed to the pooling process, with subsequent additions depending on available time and assessment resources. A total of 40 runs contributed to the pools across the four languages, depending on the model type; however dense models made up more of the contributing runs. The pool depth for submissions was kept Hausa Somali Swahili Yoruba

Minimum across queries Maximum across queries Total pool size at a constant of = 20, however, there was no restricted size for the pools. Judgments were carried out by two assessors for each language, where an assessor judged the full pool of a given query; the test set was split into distinct halves and each assigned to an assessor. Assessors provided judgments on a binarized scale using the following description: • Relevant: The passage answers the query, or the answer can be very easily implied from the passage. • Non-relevant: The passage doesn’t answer the question at all, or is related to the question but doesn’t answer the question.

Relevant passages are given a judgment of 1 and non-relevant a judgment of 0.

5.2. Pool Analysis

The total pool size obtained for each language from the relevance assessment is presented in

Table 5. This includes the sparse judgments obtained during query development, which were also re-assessed during the relevance assessment phase. Across the languages, the minimum number of judgments per query ranges from 40 to 60 while some queries have up to over 120 judgments. 3 queries in Hausa, 4 in Somali, 2 in Swahili and 12 in Yoruba have pool sizes of less than 60 passages, indicating that contributing runs retrieved similar sets of passages for these queries in their top 20 ranks. Runs which contributed to the pooling process also retrieved more relevant passages across the four languages as seen in Table 6. However, certain queries were found to have no relevant passages and were discarded. This was as a result of wrongly annotated passages from the query development phase, or grammatical errors which afected retrieval results. This left Hausa with 80 test queries as opposed to the initial 85 queries and Somali with 99 as opposed to 100. There were also a few queries with just 1 relevant passage across the languages with Yoruba having the highest of 5 queries. The increased amount of relevant passages obtained from the pooling process is a good indication that African indigenous websites are a great source for retrieval, especially coupled with queries of interest to the language speakers, which could also include generic topics.

Table 6 also indicates that a large number of relevant passages were obtained for certain queries. Considering the minimal number of runs that contributed to the pools, this raises the concern that more relevant passages might remain unjudged, especially for runs that didn’t contribute to the pooling process or are evaluated after the track. We analyse the number of queries with the highest tendency of having unjudged relevant passages using relevance densities. The relevance density of a query is its number of relevant passages compared to its pool size, and we adopt a rule of thumb that queries with relevance densities 0.6 and higher very likely still have unjudged relevant passages. Figure 2 gives a distribution of the relevance densities for each language and we find that the number of queries with densities higher than 0.6 is less than 5 across the languages. There is also a higher amount of queries having densities between 0.6 and 0.4, with Swahili and Hausa having up to 22 to 25% of queries. Although this approach to analysing the completeness of judgment isn’t holistic, it provides some insight on the quantity of queries in each language that would most certainly have unjudged relevant passages from new systems. Mean 0.1624 0.1483 0.1406 0.2135

6. Results and Analysis An overview of participants’ submissions and the results obtained from evaluating submitted

runs on the pooled qrels is provided in this section. Results are also analysed at the query level to identify query dificulty, as well as the efectiveness of the submitted runs and model type.

A total of 3 teams participated in the CIRAL track with 84 runs submitted. Considering that cross-lingual passage ranking was the major task, participants weren’t given any specifications on the retrieval type to employ and submissions comprised dense (52), reranking (20), hybrid (8) and sparse (4) methods. All submissions covered the four languages hence there is an equal number of runs among the languages. The retrieval methods employed by participating teams are properly discussed in their working notes. 6.1. Overall Results

We present the results statistics of all languages in Table 7, and the detailed results of all

submitted runs in Tables 8, 9, 10 and 11. The nDCG@20, MRR@10, Recall@100, and MAP@100 scores for each submission are reported and the average and maximum scores can be found in Table 7. The main metric in the task is nDCG@20 and a cut-of of k=20 is used considering a decent number of queries had above 10 relevant passages during query development. Dense models make up 62% of submissions for each language and have the highest average scores across the metrics. Most submissions employ end-to-end cross-lingual retrieval with a few document translation methods represented as DT in the table. However, the top 2 performing submissions across the languages employ document translation at one stage or the other in their systems and have the highest scores for all metrics.

The efectiveness of model types is better visualized in Figure 3. Runs are ordered by the nDCG@20 scores, and though dense runs make up most of the top runs, there is a variation in efectiveness across the dense models. The efectiveness of reranking methods also varies widely across the languages, with the exception of Yoruba where reranking models have the top nDCG@20 scores as seen in Figure 3a. Given there wasn’t a specific task on reranking, submitted runs employ diferent first and second-stage methods which has an impact on the varying degree of output quality. However, the best reranking run outperformed the best dense run across the languages with the exception of Somali. The submission pool has a very minimal number of hybrid and sparse runs, giving insuficient room for comparison of the model types on the task. The sparse run, however, outperforms some of the dense and reranking runs and achieves competitive nDCG scores, especially in Somali and Yoruba.

Dense models achieve higher recall@100 across all languages as seen in Figure 3b. Maintaining the same order by nDCG@20, runs not having a high nDCG@20 retrieved more relevant passages in their top 100 candidates. With the exception of Yoruba and the best reranking model, reranking generally achieved lower recall@100, with even the sparse run achieving a better score across the languages. These results indicate that many of the submitted systems have relevant passages at deeper depths, however, due to the nature of the task, we optimize for early rankings using nDCG@20. 6.2. Query-level Results Figures 4, 5, 6 and 7 provide query level efectiveness using nDCG@20 and queries are ordered by the median scores across evaluated runs. The median nDCG score for a good percentage of queries is greater than 0, indicating that most submissions do not perform too badly on individual queries across the languages. Certain queries, such as 41 in Hausa, also have quite a gap between the maximum score obtained and the scores by the rest of the runs, indicating specific runs perform better on these queries compared to other runs. The same can be said for queries like 81 in Swahili, where only a few runs identify the relevant passages of the query.

This implies that these runs understand the semantics of the query and such queries could boost

the scores of systems that are able to retrieve its relevant documents.

We also analyse the query dificulty across the languages, as queries that are too easy or

dificult are not ideal in distinguishing systems’ efectiveness. Examples of these are queries 72 in the Hausa language and 433 in Yoruba where the median nDCG@20 score is 1.0 across submitted systems, making them very easy queries and problematic for evaluation. There are also quite a number of dificult queries across the languages, with Somali having the highest, where only a few outliers score higher than 0 nDCG@20. However, a good number of queries such as 21 in

Swahili and 161 in Somali have a decent spread of scores and are ideal for evaluation. 7. Conclusion

The CIRAL track was held for the first time at the Forum for Information Retrieval Evaluation (FIRE) 2023, with the goal of promoting the research and evaluation of cross-lingual information retrieval for African languages. The task covered passage retrieval between English and four African languages and test collections were curated for these languages via community evaluations. Submissions from participating teams comprise mostly dense single-stage retrieval systems, and these make up most of the best-performing systems on the task. Some limitations faced this year include a minimal number of participants and less diversity in submitted retrieval systems. Despite the limitations, we hope the CIRAL track evolves and the curated collection matures into its most reliable and reusable version.

Acknowledgments This research was supported in part by the Natural Sciences and Engineering Research Council

(NSERC) of Canada. We would like to thank the Masakhane community7 for their contributions in the query development phase of the project. We also appreciate John Hopkins University HLTCOE, organizers of the NeuCLIR track at TREC,8 for contributing the English translations of the passage collections to the track.

7https://www.masakhane.io/ 8https://neuclir.github.io/ Deep Learning track, arXiv preprint arXiv:2003.07820 (2020).

[15] M. S. Tamber, R. Pradeep, J. Lin, Pre-processing matters! Improved Wikipedia corpora for open-domain question answering, in: Proceedings of the 45th European Conference on

Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III,

Springer, 2023, pp. 163–176. [16] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, MS MARCO:

A human-generated MAchine Reading COmprehension dataset (2016).

[17] X. Zhang, N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu,

M. Rezagholizadeh, J. Lin, Making A MIRACL: Multilingual information retrieval across a

continuum of languages, arXiv preprint arXiv:2210.09984 (2022). [18] D. I. Adelani, M. Masiak, I. A. Azime, J. O. Alabi, A. L. Tonja, C. Mwase, O. Ogundepo, B. F.

Dossou, A. Oladipo, D. Nixdorf, et al., MasakhaNews: News topic classification for African

languages, arXiv preprint arXiv:2304.09972 (2023). [19] S. Robertson, H. Zaragoza, et al., The probabilistic relevance framework: BM25 and beyond,

Foundations and Trends® in Information Retrieval 3 (2009) 333–389.

bm25-dt-mT5-pft-rerank dt.plaid plaid-xlmr.mlmfine.tt plaid-xlmr.mlmfine.tt.jholo plaid-xlmr.tt plaid-xlmr.mlmfine.et plaid-xlmr.et bm25-dt-afrimt5-rerank afroxlmr_..._ft_ckpt2000 afro_xlmr_..._sw_mrtydi_ft splade.bm25-mt5-rerank splade-mt5-rerank afroxlmr_base_..._ckpt1000 afriberta_base_ckpt25k dt.bm25-rm3 splade-afrimt5-rerank afriberta_base_..._mrtydi_ft hybrid_afriberta_dpr_splade hybrid_mdpr_msmarco_clir_splade afriberta_base_..._sw_miracl afriberta_..._sw_miracl_ft_ckpt100 h2oloo HLTCOE HLTCOE HLTCOE HLTCOE HLTCOE HLTCOE h2oloo Masakhane Masakhane h2oloo h2oloo Masakhane Masakhane HLTCOE h2oloo Masakhane h2oloo h2oloo Masakhane Masakhane

HLTCOE h2oloo HLTCOE HLTCOE HLTCOE HLTCOE HLTCOE h2oloo HLTCOE h2oloo h2oloo Masakhane Masakhane Masakhane h2oloo h2oloo Masakhane

h2oloo Masakhane Masakhane Masakhane

End-to-End Team

End-to-End

Model Type

Recall@100 MAP bm25-dt-mT5-pft-rerank dt.plaid plaid-xlmr.mlmfine.tt plaid-xlmr.mlmfine.tt.jholo plaid-xlmr.tt plaid-xlmr.mlmfine.et afroxlmr_..._ft_ckpt2000 plaid-xlmr.et afroxlmr_base_..._ckpt1000 bm25-dt-afrimt5-rerank afro_xlmr_..._sw_mrtydi_ft splade.bm25-mt5-rerank splade-mt5-rerank dt.bm25-rm3 afriberta_base_ckpt25k afriberta_base_..._mrtydi_ft splade-afrimt5-rerank hybrid_afriberta_dpr_splade afriberta_base_..._sw_miracl afriberta_..._sw_miracl_ft_ckpt100 hybrid_mdpr_msmarco_clir_splade h2oloo HLTCOE HLTCOE HLTCOE HLTCOE

HLTCOE Masakhane

h2oloo Masakhane h2oloo h2oloo HLTCOE Masakhane Masakhane h2oloo h2oloo Masakhane Masakhane h2oloo h2oloo HLTCOE HLTCOE HLTCOE h2oloo h2oloo h2oloo HLTCOE HLTCOE HLTCOE h2oloo HLTCOE Masakhane Masakhane Masakhane

h2oloo Masakhane Masakhane

End-to-End Team

End-to-End

Model Type

Recall@100

MAP Figure 4: Boxplots showing nDCG@20 for Hausa Queries. Figure 5: Boxplots showing nDCG@20 for Somali Queries Figure 6: Boxplots showing nDCG@20 for Swahili Queries Figure 7: Boxplots showing nDCG@20 for Yoruba Queries

[1]

Schäuble ,

Sheridan , Cross-language information retrieval (CLIR) track overview , NIST SPECIAL PUBLICATION SP ( 1998 ) 31 - 44 .

[2]

Peters , Information retrieval evaluation in a changing world lessons learned from 20 years of CLEF ( 2019 ).

[3]

Majumder ,

Mitra ,

Pal ,

Bandyopadhyay ,

Maiti ,

Pal ,

Modak ,

Sanyal , The FIRE 2008 evaluation exercise , ACM Transactions on Asian Language Information Processing (TALIP) 9 ( 2010 ) 1 - 24 .

[4]

Kando ,

Kuriyama ,

Nozue ,

Eguchi ,

Kato ,

Hidaka , Overview of IR tasks at the first NTCIR workshop , in: Proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition , 1999 , pp. 11 - 44 .

[5]

Lawrie , S. MacAvaney, J. Mayfield,

McNamee ,

D. W.

Oard ,

Soldaini , E. Yang, Overview of the TREC 2022 NeuCLIR track , arXiv preprint arXiv:2304.12367 ( 2023 ).

[6]

Zhang ,

Ogueji ,

Ma ,

Lin , Toward best practices for training multilingual dense retrieval models , ACM Transactions on Information Systems 42 ( 2023 ) 1 - 33 .

[7]

Yarmohammadi ,

Ma , S. Hisamoto,

Rahman ,

Wang ,

Xu ,

Povey ,

Koehn ,

Duh , Robust document representations for cross-lingual information retrieval in lowresource settings , in: Proceedings of Machine Translation Summit XVII: Research Track , 2019 , pp. 12 - 20 .

[8]

Nair ,

Galuscakova ,

D. W.

Oard , Combining contextualized and non-contextualized query translations to improve CLIR , in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , 2020 , pp. 1581 - 1584 .

[9]

Zhao ,

Zbib ,

Jiang ,

Karakos ,

Huang , Weakly supervised attentional model for low resource ad-hoc cross-lingual information retrieval , in: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019 ), 2019 , pp. 259 - 264 .

[10]

Ogundepo ,

Zhang , S. Sun,

Duh , J. Lin, AfriCLIRMatrix: Enabling cross-lingual information retrieval for African languages , in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022 , pp. 8721 - 8728 .

[11]

Sasaki ,

Sun ,

Schamoni ,

Duh ,

Inui , Cross-lingual learning-to-rank with shared representations , in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 2 ( Short

Papers)

, 2018 , pp. 458 - 463 .

[12]

Sun , K. Duh, CLIRMatrix: A massively large collection of bilingual and multilingual datasets for cross-lingual information retrieval , in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2020 , pp. 4160 - 4170 .

[13]

Rubino , Machine translation for English retrieval of information in any language (machine translation for English-based domain-appropriate triage of information in any language), in: Conferences of the Association for Machine Translation in the Americas: MT Users' Track, The Association for Machine Translation in the Americas , Austin, TX, USA, 2016 , pp. 322 - 354 .

[14]

Craswell ,

Mitra ,

Yilmaz ,

Campos ,

E. M.

Voorhees , Overview of the TREC 2019