-

A Post-Analysis of Query Reformulation Methods for Clinical Trials Retrieval

DISCUSSION PAPER

Maristella Agosti

Giorgio Maria Di Nunzio

0 1

Stefano Marchesin

stefano.marchesing@unipd.it 0 0 Department of Information Engineering 1 Department of Mathematics University of Padua , Italy

The Precision Medicine (PM) track of the Text REtrieval Conference (TREC) focuses on providing useful precision medicine information to clinicians treating cancer patients. The PM track gives the unique opportunity to evaluate medical IR systems on two di erent collections: scienti c literature and clinical trials. In this paper, we evaluate several state-of-the-art query expansion and reduction methods to see whether a particular approach can be helpful in clinical trials retrieval. We present those approaches that are consistently e ective in all three TREC PM editions and we compare them to the results obtained by the research groups who participated in all three editions.

Query reformulation knowledge base precision medicine

Medical Information Retrieval (IR) helps a wide variety of users to access and search medical information archives and data [ 6 ]. In [ 9 ], a classi cation of textual medical information is proposed: 1) Patient-speci c information which applies to individual patients. This type of information can be structured, as in the case of an Electronic Health Record (EHR), or can be free narrative text. 2) Knowledge-based information that has been derived and organized from observational or experimental research. In the case of clinical research, the information is most commonly provided by books and journals, but can take a wide variety of other forms. Therefore, the design of e ective tools to access and search textual medical information requires, among other things, enhancing the query through expansion and/or rewriting methods that leverage the information contained within knowledge resources. [ 15 ] identi ed some challenges arising from the di erences between general and medical case-based retrieval. In particular, Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy. state-of-the-art retrieval methods, combined with selective query term weighing based on medical thesauri and physician feedback, improve performance significantly [ 16, 5 ]. In 2017, 2018, and 2019 the PM [ 11 ] track1 at TREC2 focused on an important use case in clinical decision support: providing useful precision medicine information to clinicians treating cancer patients. This track gives a unique opportunity to evaluate medical IR systems since the experimental collection is composed of a set of topics (synthetic cases created by precision oncologists) for two di erent collections that target two di erent tasks: 1) retrieving biomedical articles addressing relevant treatments for a given patient, and 2) retrieving clinical trials for which a patient { described in the information need { is eligible.

This work combines and discusses the methodology and the results originally presented at SIGIR 2019 [ 2 ] and TREC 2019 [ 4 ]. The objective is to evaluate several state-of-the-art query expansion and reduction methods to examine whether a particular approach can be helpful in clinical trials retrieval. Precisely, we compare the results obtained with our approach to the best experiments submitted to the 2017 and 2018 PM tracks [ 2 ]. Then, we select the top three query reformulations found in 2017 and 2018 PM tracks and we evaluate whether their e ectiveness also holds in the 2019 PM track [ 4 ]. We conduct a systematic comparison between our approach and those proposed by the research groups that participated in all three years of TREC PM. The analysis shows the e ectiveness of the proposed query reformulations in 2017 and 2018 PM tracks and con rms the trend in the 2019 PM track. The obtained runs achieve top performing results in all PM tracks [11{13]. In particular, a speci c query reformulation allows the retrieval system to achieve top results in all three PM tracks.

The rest of the paper is organized as follows: Section 2 describes the approach used to evaluate di erent query reformulation methods. Section 3 presents the experimental setup and Section 4 compares the results obtained using our approach with those obtained by the other research groups who participated in TREC PM 2017, 2018 and 2019. Finally, Section 5 reports some nal remarks. 2

Approach

The approach we propose consists of four steps: (i) indexing, (ii) query reformulation, (iii) retrieval and (iv) ltering.

Indexing. We create the following elds to index clinical trials collections: <docid>, <text>, <max_age>, <min_age> and <gender>. Fields <max_age>, <min_age> and <gender> contain information extracted from the eligibility section of clinical trials and are used in the ltering step. The <text> eld contains the entire content of each clinical trial.

1 http://www.trec-cds.org/ 2 https://trec.nist.gov/

Query Reformulation. The approach relies on two types of query reformulation methods: query expansion and query reduction.

Query expansion: We perform a knowledge-based a priori query expansion. First, we rely on MetaMap [ 3 ], a state-of-the-art medical concept extractor, to extract from each query eld all the Uni ed Medical Language System (UMLS)3 concepts belonging to the following semantic types:4 Neoplastic Process (neop), Gene or Genome (gngm) and Cell or Molecular Dysfunction (comd ). The gngm and comd semantic types are related to the query <gene> eld, while neop is related to the <disease> eld. For those collections where an additional <other> eld is included { which considers other potential factors that may be relevant { MetaMap is used on <other> with no restriction on the semantic types, as its content does not refer to any particular semantic type. Second, for each extracted concept, we consider all its name variants contained into the following knowledge sources: National Cancer Institute (NCI), Medical Subject Headings (MeSH), SNOMED CT (SNOMEDCT) and UMLS Metathesaurus (MTH). All knowledge sources are manually curated and up-to-date. The expanded queries consist of the union of the original terms with the set of name variants. For example, consider a query only containing the word \melanoma" { which is mapped to the UMLS concept C0025202. The set of name variants for the concept \melanoma" contains, among many others: cutaneous melanoma, malignant melanoma, malignant melanoma (disorder). Therefore, the nal expanded query is the union of the original term \melanoma" with all its name variants. Additionally, we expand queries that do not mention any kind of blood cancer (e.g. \lymphoma" or \leukemia") with the term solid. This expansion proved to be e ective in [ 7 ] where the authors found that a large part of relevant clinical trials do not mention the exact disease. A more general term like solid tumor is preferable and more e ective.

Query reduction: We reduce original queries by removing, whenever present, gene mutations from the <gene> eld. To clarify: consider a topic where the <gene> eld mentions \BRAF (V600E)". After reduction, the <gene> eld becomes \BRAF". The reduction aims at mitigating the over-speci city of topics, since the information contained in a topic is too speci c compared to those contained in the target documents [ 10 ]. Additionally, we remove the <other> eld from those collections that include it { since it contains additional factors that are not necessarily relevant, thus representing a potential source of noise in retrieving precise information for patients.

Retrieval. We use BM25 [ 14 ] as retrieval model. Query terms obtained through query expansion are weighted lower than 1:0 to avoid introducing too much noise in the retrieval process [ 8 ].

Filtering. The eligibility section in clinical trials comprises three important demographic aspects that a patient needs to satisfy to be considered eligible for

3 https://www.nlm.nih.gov/research/umls/ 4 https://metamap.nlm.nih.gov/SemanticTypesAndGroups.shtml

the trial, namely: minimum age, maximum age and gender; where minimum age and maximum age are the minimum and the maximum age, respectively, required for a patient to be considered eligible for the trial, while gender is the required gender. After the retrieval step, we lter out from the list of candidate trials those for which a patient is not eligible { i.e. his/her demographic data (age and gender) does not satisfy the three aforementioned eligibility criteria. In those cases where part of the demographic data is not speci ed, a clinical trial is kept or discarded on the basis of the remaining demographic information. For instance, if the clinical trial does not specify a required minimum age, then it is kept or discarded based on its maximum age and gender required values. 3

Experimental Setup

This section describes the experimental collections and the setup used to apply and evaluate our approach.

Experimental Collections. We report the main information related to topics and document collections below.

Topics consist of 30, 50, and 40 synthetic cases created by precision oncologists in 2017, 2018, and 2019, respectively. In 2017, each topic contained four key elements in a semi-structured format: (1) disease (e.g. a type of cancer), (2) genetic variants (primarily present in tumors), (3) demographic information (e.g. age, gender), and (4) other factors (which could impact certain treatment options). In 2018 and 2019, each topic had three of the four key elements used in 2017: (1) disease, (2) genetic variants, and (3) demographic information. Furthermore, the 2019 topics contain ten non-cancer-related topics.

Clinical Trials consist of a set of 241,006 clinical trial descriptions, for both 2017 and 2018, and of an updated version of 306,238 descriptions for 2019. The collections are derived from ClinicalTrials.gov5 { a database of privately and publicly funded clinical studies conducted around the world. When none of the available treatments are e ective on oncology patients, the common recourse is to determine if any potential treatments are undergoing evaluation in a clinical trial. Therefore, it would be helpful to automatically identify the most relevant clinical trials for an individual patient. Precision oncology trials typically use a certain treatment for a certain disease with a speci c genetic variant. Such trials can have complex inclusion and/or exclusion criteria that are challenging to match with automated systems.

Experimental Procedure. We use Whoosh,6 a Python search engine library, for indexing, retrieval, and ltering. For BM25, we keep the default values k1 = 1:2 and b = 0:75 provided { as we found them to be a good combination [ 1 ]. For query expansion, we rely on MetaMap to extract and disambiguate concepts from UMLS. Below we report the procedure used for each experiment.

5 https://clinicaltrials.gov/ 6 https://whoosh.readthedocs.io/en/latest/intro.html

{ Indexing

Index clinical trials using the following created elds: <docid>, <text>, <max_age>, <min_age> and <gender>. { Query reformulation

Use MetaMap to extract from each query eld the UMLS concepts restricted to the following semantic types: neop for <disease>, gngm/comd for <gene>, and all for <other>; Extract from concepts all name variants belonging to NCI, MeSH, SNOMED CT and MTH knowledge sources; Expand (or not) topics that do not mention \lymphoma" or \leukemia" with the term solid ; Reduce (or not) queries by removing, whenever present, gene mutations from the <gene> eld;

Remove (or not) the <other> eld. { Retrieval 2017/2018 PM track: Adopt any combination of the reformulation strategies; 2019 PM track: Adopt the three best reformulation strategies from 2017/2018 PM tracks; Weigh expanded terms with a value k = 0:1;

Perform a search using expanded queries with BM25. { Filtering

Filter out clinical trials for which the patient is not eligible.

Evaluation Measures. We use the measures adopted in the TREC PM tracks, that are: inferred nDCG (infNDCG), R-precision (Rprec) and P@10. 4

Experimental Results and Discussion

In Table 1, we report the results of our experiments on query reformulation (Part A) and compare them with the results obtained by the research groups that participated at TREC PM 2017, 2018 and 2019 (Part B). Given the large number of experiments we performed, we decided to only present the 5 runs with the highest P@10 for each year. Each line shows a particular combination (yes or no values) of semantic types (neop, comd, gngm), usage and expansion of <other> eld (oth, oth exp), query reduction (orig), and expansion using weighted solid (tumor) keyword. We use the symbol ` ' to indicate that the features oth, oth exp are not applicable for the years 2018 and 2019 due to the absence of the <other> eld in 2018 and 2019 topics. We highlight in bold the top 3 scores for each measure, and we use the symbol z to indicate the combination that performs well in all three years. For the TREC PM participants, we select those participants who submitted runs in all three years and reached the top 10 performing runs in at least one edition for each measure [11{13]. The results achieved using the ve most e ective query reformulations for each year. (z) indicates a particular query reformulation e ective in all three years. Part B (bottom) reports the results obtained by participant runs, along with the lowest score required to enter the top 10 TREC results list and the score obtained by the best combination of our approach. Further details are reported in Section 4. reported in part B of Table 1 indicate the best score obtained by a particular run for a speci c measure; the best results of a participant are often related to di erent runs. The symbol ` ' means that the measure is not available, while `<' indicates that none of the runs submitted by the participant achieved the top 10 performing runs. For the sake of comparison, we add for each measure the lowest score required to enter the top 10 TREC results list, and the score obtained by the best combination of our approach { indicated by the line number { as if we were participants of these tracks. Analysis of Query Reformulations. The results from Table 1 (Part A) highlight that the use of solid expansions with weight 0.1 as well as query gene reductions (orig = n) seems to improve performance consistently in 2017 { two of the three best runs in terms of P@10 (lines 1 and 2) applying both techniques. Regarding knowledge-based expansions, the semantic type gngm (lines 1 and 5) seems more e ective than neop (line 3), while comd does not seem to have any positive e ect at all. All ve runs do not consider the other eld (oth = n) nor its expansion (oth exp = n) { con rming our intuition that it might represent a potential source of noise in retrieving precise information for patients. Similarly to 2017, two of the best three runs of 2018 use no knowledge-based expansions and rely on the solid (tumor) expansion with weight 0.1 (lines 7 and 9). In particular, the runs combining query gene reductions and solid expansions (marked as z) provide the best performances for all the measures considered, both in 2017 and 2018. This suggests that removing highly specialized information (i.e. the gene mutation) or adding general terms (e.g. solid) bene ts the retrieval. A possible reason is related to the nature of the document collections, since clinical trials often contain general requirements to allow patients to enroll. The results obtained in 2019 with the top three query reformulations from 2017 and 2018 con rm this trend. The run combining query gene reductions and solid expansions (line 13z) is one of the top three runs of 2019, however two query reformulations from 2017 (line 14) and 2018 (line 11) provide better performance. This result shows how di cult the task is. In fact, even though we found a particular query reformulation approach (marked as z) to be highly e ective in all three years { especially in 2017 and 2018 { it was not the best approach for 2019. Therefore, this analysis helps researchers to select an e ective subset of query reformulations to build strong baselines for clinical trials retrieval. Comparison with TREC PM Participants. The results from Table 1 (Part B) mark a clear division between the 2017 and 2018 tracks and the 2019 track. In 2017 and 2018, most of the participant runs did not reach the top 10 threshold in any of the considered measures { the only exception is the research group from Poznan University of Technology, whose best runs always belong to the top 10 performing runs for the track. Conversely, in 2019 all the participant groups submitted runs that achieved results higher than the top 10 threshold.

Compared with the results obtained using the query reformulations from Table 1 (Part A), we can see that all runs employing the best query reformulations obtain results higher than the top 10 threshold for all the considered measures in all three years. Furthermore, the runs using the top ve query reformulations achieved consistently better results than participant runs for each measure in all three years. This is an indication of the robustness of our approach across the di erent collections and also of the e ectiveness of the proposed query reformulations for the clinical trials retrieval. In particular, it is worth to mention that the runs using the (z) query reformulation achieve performances that belong to the top three best performing systems of each year PM track [11{13]. Therefore, the analysis of query reformulations made on the 2017 and 2018 PM tracks conrmed its trend in the 2019 PM track and allowed us to identify a speci c set of query reformulations bene cial for the retrieval of clinical trials. 5

Conclusions and Final Remarks

In this paper, we further elaborate the results originally presented at SIGIR 2019 [ 2 ] and TREC 2019 [ 4 ] to evaluate several query expansion and reduction methods and to see whether a particular approach can be helpful in clinical trials retrieval. The experimental analysis showed the e ectiveness of the proposed query reformulations in 2017 and 2018 PM tracks, and we con rmed this positive trend in the 2019 PM track. The obtained runs achieve top performing results in all PM tracks [11{13]. In particular, the run marked as z in Table 1 can be considered as a valid baseline to build stronger multi-stage systems in the future. Acknowledgements. This work is partially supported by the ExaMode project, as part of the European Union H2020 research and innovation program under grant agreement no. 825292.

1. Agosti , M. ,

Nunzio , G.M. , Marchesin , S. : The University of Padua IMS Research Group at TREC 2018 PM Track . In: Proc. TREC ( 2018 )

2. Agosti , M. ,

Nunzio , G.M. , Marchesin , S.: An Analysis of Query Reformulation Techniques for Precision Medicine . In: Proc. ACM SIGIR Conf . pp. 973 { 976 ( 2019 )

3. Aronson , A.R.: E ective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program . In: Proc. AMIA Symposium . pp. 17 { 21 ( 2001 )

Nunzio , G.M. , Marchesin , S. , Agosti , M. : Exploring how to combine query reformulations for precision medicine . In: Proc. TREC ( 2019 )

5. Diao , L. , et alii: The Research of Query Expansion Based on Medical Terms Reweighting in Medical IR. EURASIP J. Wireless Comm.&Networ. (1) , 105 ( 2018 )

6. Goeuriot , L. , Jones , G. , Kelly , L. , Muller, H., Zobel , J.: Medical Information Retrieval: Introduction to the Special Issue . Inform Retrieval J. 19 ( 1 ), 1{ 5 ( 2016 )

7. Goodwin , T.R. , Skinner , M.A. , Harabagiu , S.M. : UTD HLTRI at TREC 2017: Precision medicine track . In: Proc. TREC ( 2017 )

8. Gurulingappa , H. , Toldo , L. , Schepers , C. , Bauer , A. , Megaro , G.: Semi-supervised information retrieval system for clinical decision support . In: Proc. TREC ( 2016 )

9. Hersh , W. : Information Retrieval: A Health and Biomedical Perspective . Health and Informatics Series , Springer-Verlag, New York, NY, USA ( 2009 )

10. Oleynik , M. , et alii: HPI-DHC at TREC 2018: PM Track . In: Proc. TREC ( 2018 )

11. Roberts , K. , et alii: Overview of PM Track . In: Proc. TREC ( 2017 )

12. Roberts , K. , et alii: Overview of PM Track . In: Proc. TREC ( 2018 )

13. Roberts , K. , et alii: Overview of PM Track . In: Proc. TREC ( 2019 )

14. Robertson , S. , Zaragoza , H.: The probabilistic relevance framework: BM25 and beyond . Foundations and Trends R in Information Retrieval 3 ( 4 ), 333 { 389 ( 2009 )

15. Sondhi , P. , et alii: Leveraging Medical Thesauri and Physician FB for Improving Medical Literature Retrieval for Case Queries . JAMIA 19 ( 5 ), 851 { 858 ( 2012 )

16. Zhu , D. , Wu , S. , Carterette , B. , Liu , H.: Using Large Clinical Corpora for QE in Text-Based Cohort Identi cation . J. of Biomedical Informatics 49 , 275 { 281 ( 2014 )