=Paper=
{{Paper
|id=Vol-2518/paper-ODLS13
|storemode=property
|title=Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries
|pdfUrl=https://ceur-ws.org/Vol-2518/paper-ODLS13.pdf
|volume=Vol-2518
|authors=Chao Xu,Walter Forkel,Stefan Borgwardt,Franz Baader,Beihai Zhou
|dblpUrl=https://dblp.org/rec/conf/jowo/XuFBBZ19
}}
==Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries==
Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries Chao XU a,1 , Walter FORKEL b , Stefan BORGWARDT b , Franz BAADER b and Beihai ZHOU a a Department of Philosophy, Peking University b Institute for Theoretical Computer Science, TU Dresden Abstract. Selecting patients for clinical trials is very labor-intensive. Our goal is to develop an automated system that can support doctors in this task. This paper de- scribes a major step towards such a system: the automatic translation of clinical trial eligibility criteria from natural language into formal, logic-based queries. First, we develop a semantic annotation process that can capture many types of clinical trial criteria. Then, we map the annotated criteria to the formal query language. We have built a prototype system based on state-of-the-art NLP tools such as Word2Vec, Stanford NLP tools, and the MetaMap Tagger, and have evaluated the quality of the produced queries on a number of criteria from clinicaltrials.gov. Finally, we discuss some criteria that were hard to translate, and give suggestions for how to formulate eligibility criteria to make them easier to translate automatically. Keywords. automatic translation, natural language translation, eligibility criteria, clinical trials, patient cohort recruitment, query answering 1. Introduction Automating the screening process for clinical trials is a major research topic [1,2,3]. As the demand for (semi-)automated patient recruitment based on electronic health records (EHRs) becomes more and more urgent, the representation and formalization of eligibil- ity criteria (ECs) of clinical trials also have attracted considerable attention. To the best of our knowledge, however, there are no methods which can translate arbitrary ECs into logical expressions automatically (see Section 2 for related work). Baader et al. [4] have proposed a framework for (semi-)automatically selecting pa- tients for clinical trials, based on ontology-based query answering techniques from the area of Description Logic. Our goal is to build a prototype system that can be evaluated in practice. The users of such a system would be medical researchers rather than logi- cians, hence the tool must be able to formalize eligibility criteria (ECs) of clinical trials automatically. Since the available information is limited to EHRs, not all ECs can be evaluated by such a system, but it can support doctors in pre-selecting patients for a later, more thorough screening procedure. 1 Corresponding Author: Chao Xu, Department of Philosophy, Peking University, Peking, China; E-mail: c.xu@pku.edu.cn. Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). We present a prototype implementation that can automatically translate ECs into formal queries based on description logics. This can be seen as an instance of the larger field of translating natural language (NL) into a formal language with a precisely defined semantics. Rather than dealing with arbitrary NL expressions, we concentrate here on the restricted setting of ECs of clinical trials. These descriptions are specific to the medical domain, and there are many formal medical ontologies that can help us to recognize medical concepts. Additionally, by choosing a specific formal target language, we restrict the problem to recognizing the supported syntactical structures in NL. Our formal query language, metric temporal conjunctive queries with negation (MT- NCQs), is based on several recent research results [5,6,7]. Our translation is based on an- notating ECs formulated in NL by certain semantic roles and additional information. The semantic annotations we use focus on the kind of information that can be represented by our target query language, and hence can be seen as a filtering mechanism before the final translation to MTNCQs. Our prototype system uses existing NL techniques such as Word2Vec, Stanford NLP tools,2 and MetaMap3 [8,9,10]. We evaluate our implementa- tion on a random selection of criteria from clinicaltrials.gov,4 which contains more than 3.000.000 criteria from over 250.000 clinical studies. We identify which kinds of cri- teria are easy or hard to translate. From this, we develop some suggestions on how to formulate ECs so that processing them automatically becomes easier and more accurate. Our prototype implementation with instructions on how to reproduce our results can be found at https://github.com/wko/criteria-translation. An extended version of this paper can be found at https://tu-dresden.de/inf/lat/papers. 2. Related Work Our work combines two strands of research, namely representation and formalization of ECs and automatic translation from NL to formal languages. Weng et al. [1] surveyed various representation methods of ECs and proposed a framework of five dimensions to compare them. According to different application sce- narios, different representation methods for ECs are adopted. Bache et al. [2] proposed a general language for clinical trial investigation and construction (ECLECTIC) by analysing 123 criteria from 8 clinical trials. Based on our own investigation of ECs, we propose MTNCQs as formal representation language since it covers a wide range of cri- teria, profits from existing medical ontologies and is based on a large body of research on (temporal) ontology-based query answering [5,6,7]. Previous work has already considered translation of ECs. Tu et al. [3] proposed a practical translation method based on the ERGO annotation, which is an intermediate representation for ECs. However, ERGO annotation can only be done manually or semi- automatically. Milian et al. [11,12] focused on breast-cancer trials and summarized 165 patterns, and used these patterns and concept recognition tools to structure criteria. After that, they generated a formal representation by projecting the concepts in criteria to the predefined query template. There is also some work about extraction and representation of partial knowledge in ECs. Zhou et al. [13], Luo et al. [14] and Boland et al. [15] 2 https://nlp.stanford.edu/ 3 https://metamap.nlm.nih.gov/ 4 https://clinicaltrials.gov focused on the recognition and representation of temporal knowledge. Huang et al. [16] and Enger et al. [17] proposed several methods for detecting negated expressions. Weng et al. [18], Luo et al. [14], Bhattacharya et al. [19], and Chondrogiannis et al. [20] classified the clinical trials into limited semantic classes by using semantic pat- tern recognition or machine learning methods, which is helpful for figuring out the most prominent kinds of information expressed in clinical trials. In the field of NL processing, automatic translation from NL into formal language, e.g., first-order logic formulas, is also known as automatic semantic parsing. Dong et al. [21] proposed an automatic semantic parsing method based on machine learning, different from traditional rule-based or template-based methods. 3. Preliminaries Our approach is based on the paradigm of ontology-mediated query answering [22] in description logics, where an ontology (formalizing medical background knowledge) is used to answer a query (expressing clinical trial criteria) over a dataset (containing pa- tient data from EHRs). We now describe the formal languages used for these ingredients. 3.1. Medical Information We employ the existing large medical ontology SNOMED CT, which is expressed in the tractable description logic E L . It consists of a large number of concept definitions of the form A ≡ C or A v C, where A is a concept name, e.g., the name of a disease or a surgical procedure, and its definition C is a complex concept, which can be built from concept names using the constructors C1 uC2 (conjunction of concepts) and ∃r.C (exis- tential restriction over a role name r). The semantics of complex concepts and concept definitions can be given by a translation into first-order logic (for details, see [23]). For example, SNOMED CT contains the definition Asthma v DisorderOfRespiratorySystem u ∃findingSite.AirwayStructure, saying that asthma is a disorder of the respiratory system that occurs in airway structures. To use the ontological knowledge, the patient data need to be formulated in terms of the concept and role names occurring in SNOMED CT. We will in the following assume that all patient data are given in the form of an ABox,5 which contains concept asser- tions A(a), where A is a concept name and a is an individual name, denoting a specific patient or the disease of a patient, and role assertions r(a, b), where r is a role name and a, b are individual names. For example, we can represent the simple fact that a patient (represented by some identifier p) has Asthma by the assertions diagnosedWith(p, d), Asthma(d). Note that diagnosedWith is not a role name from SNOMED CT, because this ontology was not intended to explicitly model patients; we introduce this new role name here to associate patients with their diagnoses. Similarly, we introduce takes to de- scribe patients’ medication, and undergoes to describe medical procedures like surgeries that were performed on the patients. Some formats for EHRs already contain diagnoses and procedures in a structured way, e.g., in the form of SNOMED CT concepts or other 5 For now, we abstract from the also non-trivial task of translating patient data into this form (see, e.g., [24]). formats that can be translated to a SNOMED CT representation. Apart from that, large parts of patient records are still made up of textual reports. To recognize SNOMED CT concepts in texts, one can use existing solutions such as the MetaMap tagger [10]. 3.2. A Formal Language for Eligibility Criteria The ECs of clinical trials are separated into inclusion criteria, which must be satisfied by an eligible patient, and exclusion criteria, which must not be satisfied by the patient. We focus here on translating single criteria such as ‘History of lung disease other than asthma’6 and do not distinguish between inclusion and exclusion criteria. After translat- ing a criterion, one can negate the output in case it was an exclusion criterion. Our goal is to translate ECs into logical queries that can then be evaluated over the ontology (SNOMED CT) and the data (EHRs). Our precise query language, proposed in [4], is based on conjunctive queries, but incorporates negation [7], (metric) temporal operators in the spirit of [6,5] as well as concrete domains [25]. A conjunctive query with negation (NCQ) is a first-order formula φ (x) = ∃y.ψ(x, y), where φ is a conjunction of (negated) concept atoms (¬)A(x) and (negated) role atoms (¬)r(x, y) over the variables x∪y. The variables x are called the answer variables. For ex- ample, we can use φ (x) = ∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y) to find all patients (x) that have any lung disease (y) except asthma. A metric temporal conjunctive query with negation (MTNCQ) is a formula in which NCQs can be combined via the constructors ¬φ , φ1 ∧ φ2 , φ1 ∨ φ2 , ♦I φ , I φ , and φ1 UI φ2 , where I is an interval over the integers. In this setting, we assume that each assertion in our ABox also contains a time stamp i ∈ Z, which represents the time at which this fact was recorded. For example, diagnosedWith(p, d, i) says that the diagnosis took place at time i. In our system, we assume that each time stamp represents a single month. The temporal formulas ♦I φ , I φ , and φ UI ψ express that φ holds at some point during the time interval I, at all points in I, and at all points until ψ holds (which hap- pens within I), respectively. For example, [−6,0] ∃y.Patient(x) ∧ diagnosedWith(x, y) ∧ Diabetes(y) asks for patients x that have had diabetes for at least the past six months. Finally, concrete domains allow MTNCQs to refer to measurements. For this, we include in the patient data assertions like hemoglobinOf(p, 15 g/dl, i) to record a specific value of hemoglobin measured for patient p at time i. In the query, we extend NCQs by atoms such as hemoglobinOf(x) < 14 g/dl, e.g., to describe patients with abnormal measurements. We have developed an appropriate semantics and algorithms to efficiently answer MTNCQs,7 and will extend this to deal with concrete domain atoms. 4. Methodology The main idea is to use semantic annotations to bridge the gap between eligibility cri- teria and formal queries. The working of our system can be broadly divided into two stages: annotating the eligibility criterion, and then constructing a formal query from the semantic annotations. The outline of the system is shown in Figure 1. 6 https://clinicaltrials.gov/ct2/show/NCT02548598 7 A paper on this is submitted to a conference. Eligibility Criterion Preprocessing Age And Time Recognition Expression After Removing Age and Time Expressions Number And Conjunction Metamap Tagger Recognition Choose The Best Negation Matched Concepts Recognition Annotated Criterion Age Expression Person Expression Pattern Expression Medical Concept Transformation Transformation Transformation Transformation Combination of Formal Expressions Formal Query Figure 1. Outline of the translation system 4.1. Semantic Annotations Our annotations identify pieces of information that can be translated to MTNCQ con- structors, such as temporal operators, negation, and medical concepts. The design of the annotations also incorporates knowledge about frequently occurring types of ECs, and takes into account whether it can be reasonably expected that the queried information can be found in EHRs. We use the MetaMap tagger to recognize medical concepts, and we use keyword matching to recognize other concepts. As a preprocessing step, we ho- mogenize the NL criteria, e.g., replace ‘two’ with ‘2’ and replace ‘greater than’ with ‘>’. 4.1.1. The Selection of Semantic Roles After looking at a number of ECs, we identified the following frequently requested types of information: age, gender, diagnoses, medications, procedures, measurements, and temporal context (e.g., ‘history of . . . ’). This analysis is consistent with the results of Weng et al. [18], Luo et al. [14], Bhattacharya et al. [19], and Chondrogiannis et al. [20], which all rank this kind of information high in their lists of prominent semantic classes. Our formalization is based on SNOMED CT, which contains 19 top-level and more than 350 second-level categories. Out of these, we identified 8 categories that corre- spond to the above-listed information: clinical finding, observable entity, product, sub- stance, procedure, unit, family medical history, person. For now, we discard other seman- Table 1. List of semantic roles and representations in the semantic annotation Semantic role Examples Representation Age age 18–70 [lower, upper] Time within 5 years [start, end] Comparison sign greater than >|≥|≤|< Partial negation other than ∧¬ Main negation no history of ¬ Number one, two, three, ... Arabic numerals Conjunction and, or, defined by ∧, ∨ From SNOMED CT (e.g., clinical finding) lung disease Concept name tic classes from SNOMED CT, such as qualifier values (‘severe’, ‘known’, ‘isolated’) or devices. This restriction helps to resolve some of the ambiguity of words or phrases. For example, in SNOMED CT ‘female’ can be mapped to ‘Female structure (body struc- ture)’ or ‘Female (finding)’; and ‘scar’ can be identified as ‘Scar (disorder)’ or ‘Scar (morphologic abnormality)’. By excluding the types body structure and morphologic abnormality, we obtain a more uniform representation. However, SNOMED CT only contains medical concepts, and we additionally con- sider the semantic roles age, time, number, comparison sign, negation, and conjunction. Table 1 contains an overview of all semantic roles with examples. In addition to the se- mantic role, we record additional information in the annotations, e.g., the precise concept from SNOMED CT or a time interval. Our choice of semantic roles determines the vocabulary that we will use to formu- late MTNCQs. More precisely, the concept names are restricted to the subconcepts of the 8 categories in SNOMED CT identified above. We use the role names diagnosed- With, takes, and undergoes to connect patients to SNOMED CT concepts, but none of the role names from SNOMED CT itself. Additionally, we allow concrete domain predi- cates like hemoglobinOf that correspond to SNOMED CT substances (e.g., Hemoglobin) and observable entities, as well as ageOf. Finally, temporal information, negation, and conjunction are expressed by the logical connectives of our query language. 4.1.2. Concept Recognition and Semantic Role Annotation To illustrate the annotation process, we consider the EC ‘history of lung disease other than asthma’;8 Table 2 and the end result in Figure 2. The first steps are to recognize and annotate age and temporal expressions using regular expressions. In our example, ‘history of’ is recognized by the regular expression ‘(a|any|prior|previous) (.*?)history of’, and then annotated by the semantic role time and the temporal interval (−∞, 0]. We then remove the identified age expressions and tem- poral expressions from the EC. They form complete semantic units, and thus removing them does not affect the meaning of the remaining part of the EC, while it allows us to avoid accidental translation of these expressions into SNOMED CT concepts. On the remaining criterion, we then run the MetaMap tagger [10], a tool for recog- nizing concepts from the UMLS Metathesaurus, which subsumes SNOMED CT. Given 8 https://clinicaltrials.gov/ct2/show/NCT02548598 Table 2. Example of the semantic annotation of an EC. Stage Output Original EC history of lung disease other than asthma Age recognition — Time recognition history of → (time) Remove age/time lung disease other than asthma MetaMap lung disease other than asthma → Disorder of lung (disorder), Lung structure (body structure), Asthma (disorder) Restrict semantic roles lung disease other than asthma → Disorder of lung, Asthma Detect sub-phrases lung disease, lung, disease, other, than, asthma Compute most similar (lung disease, Disorder of lung) : 0.91, (lung, Disorder of lung) : 0.81, concept for each sub-phrase (disease, Disorder of lung) : 0.89, (asthma, Asthma) : 1.0 Find best matches lung disease → Disorder of lung, asthma → Asthma Negation recognition other than → (negation) Other semantic roles — Sub-phrase history of lung disease other than asthma Start And (0,11) (12,24) (25, 35) (36,42) End Position Semantic Role time clinical finding negation clinical finding Representation (−∞, 0] disorder of lung ∧¬ asthma Concept ID 19829001 195967001 Figure 2. The semantic annotation for our example. a phrase or sentence, it returns the most likely phrase-concept matches. In our exam- ple, MetaMap does not identify any sub-phrases, and outputs the following concepts for the whole phrase ‘lung disease other than asthma’: ‘Disorder of lung (disorder)’, ‘Lung structure (body structure)’, ‘Asthma (disorder)’. By restricting the types as described in Section 4.1.1, we immediately rule out ‘Lung structure’. A larger challenge, however, is to obtain more exact phrase-concept matches. For this, we split all sub-phrases returned by MetaMap into more sub-phrases using the Stanford NLP tools [9]. Then we try to find the best phrase-concept matches, by cal- culating a similarity value (in [0, 1]) of each sub-phrase to all candidate concepts using Word2Vec [8] and the Levenshtein distance; we also use the synonymous expressions provided by SNOMED CT to potentially obtain a higher similarity. To avoid spurious matches, we use a minimum threshold of 0.66 for the similarity. In our example, this ex- cludes the words ‘other’ and ‘than’, because there is no candidate concept that is similar enough. The best matches for the phrases ‘lung disease’, ‘lung’, and ‘disease’ all refer to the same concept Disorder of lung, and we use the similarity values to choose the best of them, where we give preference to longer phrases. Table 3. Translation of basic query parts. Semantic role Formalization Example Age ageOf(x) ≥ lower ∧ ageOf(x) ≤ upper ageOf(x) ≥ 18 Time ♦[start,end] ♦[−12,0] Person conceptName(x) Woman(x) Clinical finding ∃y.diagnosedWith(x, y) ∧ conceptName(y) ∃y.diagnosedWith(x, y) ∧ HIV(y) Product ∃y.takes(x, y) ∧ conceptName(y) ∃y.takes(x, y) ∧ Aspirin(y) Procedure ∃y.undergoes(x, y) ∧ conceptName(y) ∃y.undergoes(x, y) ∧ Appendectomy(y) Measurement pattern: substance/observable entity—comparison sign—number—unit Formula: conceptNameOf(x) (> | ≥ | ≤ | <) number conceptName hemoglobinOf(x) < 14g/dl Group pattern: clinical finding—clinical finding—. . . Formula: conceptName1(y) ∨ conceptName2(y) ∨ . . . HIV(y) ∨ HepatitisC(y) Negation pattern: clinical finding—partial negation—clinical finding Formula: conceptName1(y) ∧ ¬conceptName2(y) Diabetes(y) ∧ ¬DiabetesType1(y) It remains to recognize other semantic roles in the EC, i.e., number, negation, com- parison sign, and conjunction. We mainly do this by keyword or pattern matching. The negation case is the most complex due to its various forms: • explicit negation e.g., ‘not’, ‘except’, ‘other than’, ‘with the exception of’; • morphological negation, e.g., ‘non-pregnant’, ‘non-healed’, ‘non-smoker’; • implicit negation, e.g., ‘lack of’, ‘rule out’, ‘free from’. In our prototype system, we focus on explicit negation, and consider two cases: either the whole sentence is negated (‘patient does not have . . . ’) or only part of it (‘. . . other than . . . ’). For conjunctions between parts of sentences, we use ‘∨’ as default annotation, be- cause there is no good way to map ‘and’ and ‘or’ in EC to conjunction or disjunction ex- actly, e.g., in the EC ‘. . . including cyclosporine, systemic itraconazole or ketoconazole, erythromycin or clarithromycin, nefazodone, verapamil and human immunodeficiency virus protease inhibitors’9 both ‘and’ and ‘or’ have the same meaning. The final semantic annotation for our example can be seen in Figure 2. 4.2. The Formal Queries To obtain the final MTNCQ, we combine the different annotated phrases according to the composibility of semantic roles and structural information. There are four kinds of basic subformulas: age formulas, person formulas, medical formulas and pattern for- mulas, and their translation is described in Table 3. For measurements, we detect pat- terns in the semantic annotation that correspond to a comparison of a substance or ob- servable entity with a specific numerical value (including unit). Additionally, we group adjacent SNOMED CT findings together, to translate them into a set of atoms joined by ∨ inside the same ∃y.diagnosedWith(x, y) ∧ . . . formula. We also translate negation between clinical findings into appropriate formulas, and do the same for products and 9 https://clinicaltrials.gov/ct2/show/NCT02452502 Table 4. Experimental results. The right table shows the annotation of the translation quality for the 93 criteria that were marked as ‘answerable’ by all evaluators. Unanswerable Answerable Good Partial Wrong evaluator 1 282 119 evaluator 1 54 29 10 evaluator 2 254 147 evaluator 2 56 27 10 evaluator 3 237 164 evaluator 3 65 18 10 procedures. In our running example, ‘lung disease other than asthma’ is formalized as (∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y)). Finally, we combine these subformulas using the remaining connectives and nega- tions and consider any time expressions. In our prototype system, we only express a sin- gle temporal operator of the form ♦[−n,0] , which we found to be the most common in clinical trials. Such an operator is always applied to the whole formula, e.g., we obtain ♦(−∞,0] (∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y)). If there is more than one temporal annotation, we choose the more specific one. For example, in ‘history of myocardial infarction, unstable angina pectoris, percutaneous coronary intervention, congestive heart failure, hypertensive encephalopathy, stroke or TIA within the last 6 months’10 there are ‘history of’ and ‘within the last 6 months’, and we choose the latter. If there are no explicit connectives, we combine medical and measurement formulas by disjunction, and then combine them with age and person formulas by conjunction. 5. Experiments To the best of our knowledge, there are no gold standard datasets for the translation of criteria into formal language. Therefore, we evaluated our approach on real-world studies taken from clinicaltrials.gov.11 During the design phase we used 24 randomly selected studies, which contained approximately 300 criteria. Our prototype system was optimized to cover as many of these criteria as possible. For testing, we randomly selected criteria across all studies on clinicaltrials.gov and manually evaluated them. Due to time constraints, we managed to process 401 criteria. We defined the following metrics: A criterion is answerable, if a) it is possible for a human to translate it into an MTNCQ using only the vocabulary chosen in Section 4.1.1; and b) it can in principle be answered by only looking at the EHR of a patient. Hence, criteria that refer to the future (‘during study phase’), or ask for subjective information (‘in the opinion of the investigator’, ‘willingness to’), are not considered answerable for the purposes of our system. For each answerable criterion, we then evaluated the quality of the translation. The resulting MTNCQ is labeled as good if it contains all (necessary) information; partial if it represents at least parts of the criterion; and wrong otherwise. These metrics are clearly subjective to some extent. To get a more reliable evaluation and to quantify the amount of subjectivity, we let three evaluators (three of the authors) vote independently on the test data. The results can be seen in Table 4. The results indicate that the judgment on whether a criterion is answerable or not differs between the evaluators. We found that the difference is mainly caused by two 10 https://clinicaltrials.gov/ct2/show/NCT00220220 11 https://clinicaltrials.gov/ things: Firstly, it is sometimes difficult to judge whether a concept can be represented in SNOMED CT, because the concept name can differ significantly from the description in the text. Secondly, many criteria contain very specific phrases, for example ‘Active bowels inflammatory disease ([Crohn], chronic, diarrhea...)’.12 The word ‘active’ cannot be translated into SNOMED CT, and we could translate it into a temporal constraint only under some assumptions on the semantics of ‘active’. Some might consider this to be not so important, while for others this renders the criterion unanswerable. Despite the differences, at least 60% of the criteria cannot be answered, even in the opinion of evaluator 3, who was the most optimistic. This is partially because of condition b) above. The second reason is that quite a number of criteria cannot be represented in our formal language, either because of a lack of vocabulary in SNOMED CT, or because of missing semantic roles (see Section 4.1.1). While the former cannot be improved on, the latter offers room for future optimizations. To compare the quality of the translations, we consider only criteria that have been marked as answerable by all evaluators. This leaves 93 criteria that are analyzed on the right-hand side of Table 4. The difference in the translation quality is again due to the varying opinions of the evaluators regarding how detailed a translation needs to be in order to be considered good. Our system is able to translate more than 50% of the (confidently) answerable criteria, which is a promising first result. In the following, we give examples for a good, partial, and a bad translation of our system: ‘Has a history of diabetic ketoacidosis in the last 6 months.’13 ♦[−6,0] ∃y.diagnosedWith(x, y) ∧ KetoacidosisInDiabetesMellitus(y) ‘History of, diagnosed or suspected genital or other malignancy (excluding treated squamous cell carcinoma of the skin), and untreated cervical dysplasia.’14 ♦(−∞,0] ∃y.diagnosedWith(x, y) ∧ MalignantNeoplasticDisease(y) ∨ DysplasiaOfCervix(y) ‘Primary tumors developed 5 years previous to the inclusion, except in situ cervix carcinoma or skin basocellular cancer properly treated’15 ♦(−∞,0] ∃y.diagnosedWith(x, y) ∧ CarcinomaInSituOfUterineCervix(y) ∨ SkinCancer(y) The second translation is partially correct, because the temporal data and the main concepts have been recognized correctly, but ‘excluding . . . ’ was not translated. The last translation is wrong since neither the temporal information, the negation, nor the main concept ‘primary tumors’ have been recognized correctly. For more examples, we refer the reader to the appendix in the extended version. 6. Discussion and Ongoing Work Formalizing ECs is a challenging task due to the gap between natural and formal lan- guage. In this paper, we have presented an automatic translation method from ECs into 12 https://clinicaltrials.gov/ct2/show/NCT02363725 13 https://clinicaltrials.gov/ct2/show/NCT02269735 14 https://clinicaltrials.gov/ct2/show/NCT01397097 15 https://clinicaltrials.gov/ct2/show/NCT01303029 formal queries, and developed a prototype system based on existing NLP tools. We have evaluated our prototype on 401 eligibility criteria. More than 50% of the answerable cri- teria have been translated correctly, which is an encouraging result that can be improved on by optimizing the translation process as we describe below. However, there remain certain criteria that are hard to translate (even for humans) due to their complex structure. While it is unreasonable to expect medical doctors to formulate clinical trial criteria directly as MTNCQs, we nevertheless identify a few key points that can be observed during the formulation of ECs to make the automatic translation easier: 1. Split criteria whenever possible, e.g., divide ‘diagnosed with diabetes and hyper- tension’ into ‘diagnosed with diabetes’ and ‘diagnosed with hypertension.’ 2. Formulate every EC as an independent description that does not depend on other criteria or the background knowledge of clinical trials, like in ‘Known hypersen- sitivity to any of the study drugs or excipients.’16 3. Avoid using nonadjacent words to express a concept, e.g., ‘. . . dermatologic, neu- rologic, or psychiatric disease’17 should rather be formulated as ‘dermatologic disease, neurologic disease, or psychiatric disease.’ We can improve the quality of our translation by collecting more regular expres- sions and custom mappings, or employing specialized techniques from the literature for the recognition of semantic roles like comparison sign or negation. Other obvious steps are the inclusion of more concept categories from SNOMED CT such as devices, quali- fiers, and events. For example, the criterion ‘severe aortic stenosis’18 could be translated as ∃y, z.hasDiagnosis(x, y) ∧ AorticStenosis(y) ∧ severity(y, z) ∧ Severe(z) if we annotate ‘severe’ with the SNOMED CT concept severe (qualifier value) and detect the pattern qualifier value—finding. It is also straightforward modify our system to output a ranked list of multiple candidate translations that the doctor may choose from. Another interesting direction for future work is to develop a controlled natural lan- guage [26] based on our semantic annotations. Criteria formulated in this way can then easily be transformed into MTNCQs as we have described. With appropriate editing sup- port, creating new ECs that conform with this controlled NL would be not much more difficult than writing them as free-form text. Of course, one should retain the possibility to add free-form criteria, which then have to be evaluated manually. Acknowledgements This work was supported by the DFG project BA 1122/19-1 (GOASQ), by DFG grant 389792660 as part of TRR 248, and by the China Scholarship Council. References [1] Chunhua Weng, Samson W Tu, Ida Sim, and Rachel Richesson. Formal representation of eligibility criteria: a literature review. J. Biomed. Inform., 43(3):451–467, 2010. 16 https://clinicaltrials.gov/ct2/show/NCT01935492 17 https://clinicaltrials.gov/ct2/show/NCT00960570 18 https://clinicaltrials.gov/ct2/show/NCT01951950 [2] Richard Bache, Adel Taweel, Simon Miles, and Brendan C Delaney. An eligibility criteria query lan- guage for heterogeneous data warehouses. Method. Inform. Med, 54(01):41–44, 2015. [3] Samson W Tu, Mor Peleg, Simona Carini, Michael Bobak, Jessica Ross, Daniel Rubin, and Ida Sim. A practical method for transforming free-text eligibility criteria into computable criteria. J. Biomed. Inform., 44(2):239–250, 2011. [4] Franz Baader, Stefan Borgwardt, and Walter Forkel. Patient selection for clinical trials using temporal- ized ontology-mediated query answering. In Proc. HQA Workshop, pages 1069–1074. ACM, 2018. [5] Franz Baader, Stefan Borgwardt, and Marcel Lippmann. Temporal query entailment in the description logic S H Q. J. Web Sem., 33:71–93, 2015. [6] Franz Baader, Stefan Borgwardt, Patrick Koopmann, Ana Ozaki, and Veronika Thost. Metric temporal description logics with interval-rigid names. In Proc. FroCoS Symposium, pages 60–76. Springer, 2017. [7] Stefan Borgwardt and Walter Forkel. Closed-world semantics for conjunctive queries with negation over E L H ⊥ ontologies. In Proc. JELIA Conference, pages 371–386. Springer, 2019. [8] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [9] Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Proc. ALC Meeting, pages 55–60, 2014. [10] Alan R Aronson. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap program. In Proc. AMIA Symposium, page 17, 2001. [11] Krystyna Milian, Anca Bucur, and Annette Ten Teije. Formalization of clinical trial eligibility criteria: Evaluation of a pattern-based approach. In Proc. BIBM Conference, pages 1–4. IEEE, 2012. [12] Krystyna Milian and Annette ten Teije. Towards automatic patient eligibility assessment: From free-text criteria to queries. In Proc. AIME Conference, pages 78–83. Springer, 2013. [13] Li Zhou, Genevieve B Melton, Simon Parsons, and George Hripcsak. A temporal constraint structure for extracting temporal information from clinical narrative. J. Biomed. Inform., 39(4):424–439, 2006. [14] Zhihui Luo, Meliha Yetisgen-Yildiz, and Chunhua Weng. Dynamic categorization of clinical research eligibility criteria by hierarchical clustering. J. Biomed. Inform., 44(6):927–935, 2011. [15] Mary Regina Boland, Samson W Tu, Simona Carini, Ida Sim, and Chunhua Weng. EliXR-TIME: A temporal knowledge representation for clinical research eligibility criteria. AMIA Transl. Sci. Proc., 2012:71, 2012. [16] Yang Huang and Henry J Lowe. A novel hybrid approach to automated negation detection in clinical radiology reports. J. Am. Med. Inform. Assn., 14(3):304–311, 2007. [17] Martine Enger, Erik Velldal, and Lilja Øvrelid. An open-source tool for negation detection: A maximum- margin approach. In Proc. SemBEaR Workshop, pages 64–69, 2017. [18] Chunhua Weng, Xiaoying Wu, Zhihui Luo, Mary Regina Boland, Dimitri Theodoratos, and Stephen B Johnson. EliXR: An approach to eligibility criteria extraction and representation. J. Am. Med. Inform. Assn., 18(Supplement 1):i116–i124, 2011. [19] Sanmitra Bhattacharya and Michael N Cantor. Analysis of eligibility criteria representation in industry- standard clinical trial protocols. J. Biomed. Inform., 46(5):805–813, 2013. [20] Efthymios Chondrogiannis, Vassiliki Andronikou, Anastasios Tagaris, Efstathios Karanastasis, Theodora Varvarigou, and Masatsugu Tsuji. A novel semantic representation for eligibility criteria in clinical trials. J. Biomed. Inform., 69:10–23, 2017. [21] Li Dong and Mirella Lapata. Language to logical form with neural attention. In Proc. Annual Meeting of the ACL, 2016. [22] Meghyn Bienvenu. Ontology-mediated query answering: Harnessing knowledge to get more from data. In Proc. IJCAI Conference, pages 4058–4061. AAAI Press, 2016. [23] Franz Baader, Ian Horrocks, Carsten Lutz, and Uli Sattler. An Introduction to Description Logic. Cam- bridge University Press, 2017. [24] Jon Patrick, Yefeng Wang, and Peter Budd. An automated system for conversion of clinical notes into SNOMED Clinical Terminology. In Proc. ACSW Symposium, pages 219–226, 2007. [25] Franz Baader and Philipp Hanschke. A scheme for integrating concrete domains into concept languages. In John Mylopoulos and Raymond Reiter, editors, Proc. IJCAI Conference, pages 452–457, 1991. [26] Tobias Kuhn. A survey and classification of controlled natural languages. Comput. Linguist., 40(1):121– 170, 2014.