=Paper= {{Paper |id=Vol-2518/paper-ODLS13 |storemode=property |title=Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries |pdfUrl=https://ceur-ws.org/Vol-2518/paper-ODLS13.pdf |volume=Vol-2518 |authors=Chao Xu,Walter Forkel,Stefan Borgwardt,Franz Baader,Beihai Zhou |dblpUrl=https://dblp.org/rec/conf/jowo/XuFBBZ19 }} ==Automatic Translation of Clinical Trial Eligibility Criteria into Formal Queries== https://ceur-ws.org/Vol-2518/paper-ODLS13.pdf

Automatic Translation of Clinical Trial
Eligibility Criteria into Formal Queries
Chao XU a,1 , Walter FORKEL b , Stefan BORGWARDT b , Franz BAADER b and
Beihai ZHOU a
a Department of Philosophy, Peking University
b Institute for Theoretical Computer Science, TU Dresden

Abstract. Selecting patients for clinical trials is very labor-intensive. Our goal is
to develop an automated system that can support doctors in this task. This paper de-
scribes a major step towards such a system: the automatic translation of clinical trial
eligibility criteria from natural language into formal, logic-based queries. First, we
develop a semantic annotation process that can capture many types of clinical trial
criteria. Then, we map the annotated criteria to the formal query language. We have
built a prototype system based on state-of-the-art NLP tools such as Word2Vec,
Stanford NLP tools, and the MetaMap Tagger, and have evaluated the quality of the
produced queries on a number of criteria from clinicaltrials.gov. Finally, we discuss
some criteria that were hard to translate, and give suggestions for how to formulate
eligibility criteria to make them easier to translate automatically.
Keywords. automatic translation, natural language translation, eligibility criteria,
clinical trials, patient cohort recruitment, query answering

1. Introduction

Automating the screening process for clinical trials is a major research topic [1,2,3]. As
the demand for (semi-)automated patient recruitment based on electronic health records
(EHRs) becomes more and more urgent, the representation and formalization of eligibil-
ity criteria (ECs) of clinical trials also have attracted considerable attention. To the best
of our knowledge, however, there are no methods which can translate arbitrary ECs into
logical expressions automatically (see Section 2 for related work).
Baader et al. [4] have proposed a framework for (semi-)automatically selecting pa-
tients for clinical trials, based on ontology-based query answering techniques from the
area of Description Logic. Our goal is to build a prototype system that can be evaluated
in practice. The users of such a system would be medical researchers rather than logi-
cians, hence the tool must be able to formalize eligibility criteria (ECs) of clinical trials
automatically. Since the available information is limited to EHRs, not all ECs can be
evaluated by such a system, but it can support doctors in pre-selecting patients for a later,
more thorough screening procedure.
1 Corresponding Author: Chao Xu, Department of Philosophy, Peking University, Peking, China; E-mail:

c.xu@pku.edu.cn. Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
We present a prototype implementation that can automatically translate ECs into
formal queries based on description logics. This can be seen as an instance of the larger
field of translating natural language (NL) into a formal language with a precisely defined
semantics. Rather than dealing with arbitrary NL expressions, we concentrate here on the
restricted setting of ECs of clinical trials. These descriptions are specific to the medical
domain, and there are many formal medical ontologies that can help us to recognize
medical concepts. Additionally, by choosing a specific formal target language, we restrict
the problem to recognizing the supported syntactical structures in NL.
Our formal query language, metric temporal conjunctive queries with negation (MT-
NCQs), is based on several recent research results [5,6,7]. Our translation is based on an-
notating ECs formulated in NL by certain semantic roles and additional information. The
semantic annotations we use focus on the kind of information that can be represented
by our target query language, and hence can be seen as a filtering mechanism before the
final translation to MTNCQs. Our prototype system uses existing NL techniques such as
Word2Vec, Stanford NLP tools,2 and MetaMap3 [8,9,10]. We evaluate our implementa-
tion on a random selection of criteria from clinicaltrials.gov,4 which contains more than
3.000.000 criteria from over 250.000 clinical studies. We identify which kinds of cri-
teria are easy or hard to translate. From this, we develop some suggestions on how to
formulate ECs so that processing them automatically becomes easier and more accurate.
Our prototype implementation with instructions on how to reproduce our results
can be found at https://github.com/wko/criteria-translation. An extended
version of this paper can be found at https://tu-dresden.de/inf/lat/papers.

2. Related Work

Our work combines two strands of research, namely representation and formalization of
ECs and automatic translation from NL to formal languages.
Weng et al. [1] surveyed various representation methods of ECs and proposed a
framework of five dimensions to compare them. According to different application sce-
narios, different representation methods for ECs are adopted. Bache et al. [2] proposed
a general language for clinical trial investigation and construction (ECLECTIC) by
analysing 123 criteria from 8 clinical trials. Based on our own investigation of ECs, we
propose MTNCQs as formal representation language since it covers a wide range of cri-
teria, profits from existing medical ontologies and is based on a large body of research
on (temporal) ontology-based query answering [5,6,7].
Previous work has already considered translation of ECs. Tu et al. [3] proposed a
practical translation method based on the ERGO annotation, which is an intermediate
representation for ECs. However, ERGO annotation can only be done manually or semi-
automatically. Milian et al. [11,12] focused on breast-cancer trials and summarized 165
patterns, and used these patterns and concept recognition tools to structure criteria. After
that, they generated a formal representation by projecting the concepts in criteria to the
predefined query template. There is also some work about extraction and representation
of partial knowledge in ECs. Zhou et al. [13], Luo et al. [14] and Boland et al. [15]
2 https://nlp.stanford.edu/
3 https://metamap.nlm.nih.gov/
4 https://clinicaltrials.gov
focused on the recognition and representation of temporal knowledge. Huang et al. [16]
and Enger et al. [17] proposed several methods for detecting negated expressions.
Weng et al. [18], Luo et al. [14], Bhattacharya et al. [19], and Chondrogiannis et
al. [20] classified the clinical trials into limited semantic classes by using semantic pat-
tern recognition or machine learning methods, which is helpful for figuring out the most
prominent kinds of information expressed in clinical trials.
In the field of NL processing, automatic translation from NL into formal language,
e.g., first-order logic formulas, is also known as automatic semantic parsing. Dong et
al. [21] proposed an automatic semantic parsing method based on machine learning,
different from traditional rule-based or template-based methods.

3. Preliminaries

Our approach is based on the paradigm of ontology-mediated query answering [22] in
description logics, where an ontology (formalizing medical background knowledge) is
used to answer a query (expressing clinical trial criteria) over a dataset (containing pa-
tient data from EHRs). We now describe the formal languages used for these ingredients.

3.1. Medical Information

We employ the existing large medical ontology SNOMED CT, which is expressed in the
tractable description logic E L . It consists of a large number of concept definitions of
the form A ≡ C or A v C, where A is a concept name, e.g., the name of a disease or a
surgical procedure, and its definition C is a complex concept, which can be built from
concept names using the constructors C1 uC2 (conjunction of concepts) and ∃r.C (exis-
tential restriction over a role name r). The semantics of complex concepts and concept
definitions can be given by a translation into first-order logic (for details, see [23]). For
example, SNOMED CT contains the definition

Asthma v DisorderOfRespiratorySystem u ∃findingSite.AirwayStructure,

saying that asthma is a disorder of the respiratory system that occurs in airway structures.
To use the ontological knowledge, the patient data need to be formulated in terms of
the concept and role names occurring in SNOMED CT. We will in the following assume
that all patient data are given in the form of an ABox,5 which contains concept asser-
tions A(a), where A is a concept name and a is an individual name, denoting a specific
patient or the disease of a patient, and role assertions r(a, b), where r is a role name and
a, b are individual names. For example, we can represent the simple fact that a patient
(represented by some identifier p) has Asthma by the assertions diagnosedWith(p, d),
Asthma(d). Note that diagnosedWith is not a role name from SNOMED CT, because
this ontology was not intended to explicitly model patients; we introduce this new role
name here to associate patients with their diagnoses. Similarly, we introduce takes to de-
scribe patients’ medication, and undergoes to describe medical procedures like surgeries
that were performed on the patients. Some formats for EHRs already contain diagnoses
and procedures in a structured way, e.g., in the form of SNOMED CT concepts or other
5 For now, we abstract from the also non-trivial task of translating patient data into this form (see, e.g., [24]).
formats that can be translated to a SNOMED CT representation. Apart from that, large
parts of patient records are still made up of textual reports. To recognize SNOMED CT
concepts in texts, one can use existing solutions such as the MetaMap tagger [10].

3.2. A Formal Language for Eligibility Criteria

The ECs of clinical trials are separated into inclusion criteria, which must be satisfied
by an eligible patient, and exclusion criteria, which must not be satisfied by the patient.
We focus here on translating single criteria such as ‘History of lung disease other than
asthma’6 and do not distinguish between inclusion and exclusion criteria. After translat-
ing a criterion, one can negate the output in case it was an exclusion criterion.
Our goal is to translate ECs into logical queries that can then be evaluated over the
ontology (SNOMED CT) and the data (EHRs). Our precise query language, proposed
in [4], is based on conjunctive queries, but incorporates negation [7], (metric) temporal
operators in the spirit of [6,5] as well as concrete domains [25].
A conjunctive query with negation (NCQ) is a first-order formula φ (x) = ∃y.ψ(x, y),
where φ is a conjunction of (negated) concept atoms (¬)A(x) and (negated) role atoms
(¬)r(x, y) over the variables x∪y. The variables x are called the answer variables. For ex-
ample, we can use φ (x) = ∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y)
to find all patients (x) that have any lung disease (y) except asthma.
A metric temporal conjunctive query with negation (MTNCQ) is a formula in which
NCQs can be combined via the constructors ¬φ , φ1 ∧ φ2 , φ1 ∨ φ2 , ♦I φ , I φ , and φ1 UI φ2 ,
where I is an interval over the integers. In this setting, we assume that each assertion in
our ABox also contains a time stamp i ∈ Z, which represents the time at which this fact
was recorded. For example, diagnosedWith(p, d, i) says that the diagnosis took place at
time i. In our system, we assume that each time stamp represents a single month.
The temporal formulas ♦I φ , I φ , and φ UI ψ express that φ holds at some point
during the time interval I, at all points in I, and at all points until ψ holds (which hap-
pens within I), respectively. For example, [−6,0] ∃y.Patient(x) ∧ diagnosedWith(x, y) ∧
Diabetes(y) asks for patients x that have had diabetes for at least the past six months.
Finally, concrete domains allow MTNCQs to refer to measurements. For this, we
include in the patient data assertions like hemoglobinOf(p, 15 g/dl, i) to record a specific
value of hemoglobin measured for patient p at time i. In the query, we extend NCQs
by atoms such as hemoglobinOf(x) < 14 g/dl, e.g., to describe patients with abnormal
measurements. We have developed an appropriate semantics and algorithms to efficiently
answer MTNCQs,7 and will extend this to deal with concrete domain atoms.

4. Methodology

The main idea is to use semantic annotations to bridge the gap between eligibility cri-
teria and formal queries. The working of our system can be broadly divided into two
stages: annotating the eligibility criterion, and then constructing a formal query from the
semantic annotations. The outline of the system is shown in Figure 1.

6 https://clinicaltrials.gov/ct2/show/NCT02548598
7 A paper on this is submitted to a conference.
Eligibility Criterion

Preprocessing

Age And Time
Recognition

Expression After Removing Age and
Time Expressions

Number And Conjunction
Metamap Tagger
Recognition

Choose The Best Negation
Matched Concepts Recognition

Annotated Criterion

Age Expression Person Expression Pattern Expression Medical Concept
Transformation Transformation Transformation Transformation

Combination of
Formal Expressions

Formal Query

Figure 1. Outline of the translation system

4.1. Semantic Annotations

Our annotations identify pieces of information that can be translated to MTNCQ con-
structors, such as temporal operators, negation, and medical concepts. The design of the
annotations also incorporates knowledge about frequently occurring types of ECs, and
takes into account whether it can be reasonably expected that the queried information
can be found in EHRs. We use the MetaMap tagger to recognize medical concepts, and
we use keyword matching to recognize other concepts. As a preprocessing step, we ho-
mogenize the NL criteria, e.g., replace ‘two’ with ‘2’ and replace ‘greater than’ with ‘>’.

4.1.1. The Selection of Semantic Roles
After looking at a number of ECs, we identified the following frequently requested
types of information: age, gender, diagnoses, medications, procedures, measurements,
and temporal context (e.g., ‘history of . . . ’). This analysis is consistent with the results of
Weng et al. [18], Luo et al. [14], Bhattacharya et al. [19], and Chondrogiannis et al. [20],
which all rank this kind of information high in their lists of prominent semantic classes.
Our formalization is based on SNOMED CT, which contains 19 top-level and more
than 350 second-level categories. Out of these, we identified 8 categories that corre-
spond to the above-listed information: clinical finding, observable entity, product, sub-
stance, procedure, unit, family medical history, person. For now, we discard other seman-
Table 1. List of semantic roles and representations in the semantic annotation

Semantic role Examples Representation

Age age 18–70 [lower, upper]
Time within 5 years [start, end]
Comparison sign greater than >|≥|≤|<
Partial negation other than ∧¬
Main negation no history of ¬
Number one, two, three, ... Arabic numerals
Conjunction and, or, defined by ∧, ∨
From SNOMED CT (e.g., clinical finding) lung disease Concept name

tic classes from SNOMED CT, such as qualifier values (‘severe’, ‘known’, ‘isolated’)
or devices. This restriction helps to resolve some of the ambiguity of words or phrases.
For example, in SNOMED CT ‘female’ can be mapped to ‘Female structure (body struc-
ture)’ or ‘Female (finding)’; and ‘scar’ can be identified as ‘Scar (disorder)’ or ‘Scar
(morphologic abnormality)’. By excluding the types body structure and morphologic
abnormality, we obtain a more uniform representation.
However, SNOMED CT only contains medical concepts, and we additionally con-
sider the semantic roles age, time, number, comparison sign, negation, and conjunction.
Table 1 contains an overview of all semantic roles with examples. In addition to the se-
mantic role, we record additional information in the annotations, e.g., the precise concept
from SNOMED CT or a time interval.
Our choice of semantic roles determines the vocabulary that we will use to formu-
late MTNCQs. More precisely, the concept names are restricted to the subconcepts of
the 8 categories in SNOMED CT identified above. We use the role names diagnosed-
With, takes, and undergoes to connect patients to SNOMED CT concepts, but none of
the role names from SNOMED CT itself. Additionally, we allow concrete domain predi-
cates like hemoglobinOf that correspond to SNOMED CT substances (e.g., Hemoglobin)
and observable entities, as well as ageOf. Finally, temporal information, negation, and
conjunction are expressed by the logical connectives of our query language.

4.1.2. Concept Recognition and Semantic Role Annotation
To illustrate the annotation process, we consider the EC ‘history of lung disease other
than asthma’;8 Table 2 and the end result in Figure 2.
The first steps are to recognize and annotate age and temporal expressions using
regular expressions. In our example, ‘history of’ is recognized by the regular expression
‘(a|any|prior|previous) (.*?)history of’, and then annotated by the semantic role time and
the temporal interval (−∞, 0]. We then remove the identified age expressions and tem-
poral expressions from the EC. They form complete semantic units, and thus removing
them does not affect the meaning of the remaining part of the EC, while it allows us to
avoid accidental translation of these expressions into SNOMED CT concepts.
On the remaining criterion, we then run the MetaMap tagger [10], a tool for recog-
nizing concepts from the UMLS Metathesaurus, which subsumes SNOMED CT. Given
8 https://clinicaltrials.gov/ct2/show/NCT02548598
Table 2. Example of the semantic annotation of an EC.

Stage Output

Original EC history of lung disease other than asthma
Age recognition —
Time recognition history of → (time)
Remove age/time lung disease other than asthma
MetaMap lung disease other than asthma → Disorder of lung (disorder), Lung
structure (body structure), Asthma (disorder)
Restrict semantic roles lung disease other than asthma → Disorder of lung, Asthma
Detect sub-phrases lung disease, lung, disease, other, than, asthma
Compute most similar (lung disease, Disorder of lung) : 0.91, (lung, Disorder of lung) : 0.81,
concept for each sub-phrase (disease, Disorder of lung) : 0.89, (asthma, Asthma) : 1.0
Find best matches lung disease → Disorder of lung, asthma → Asthma
Negation recognition other than → (negation)
Other semantic roles —

Sub-phrase history of lung disease other than asthma

Start And
(0,11) (12,24) (25, 35) (36,42)
End Position

Semantic Role time clinical finding negation clinical finding

Representation (−∞, 0] disorder of lung ∧¬ asthma

Concept ID 19829001 195967001

Figure 2. The semantic annotation for our example.

a phrase or sentence, it returns the most likely phrase-concept matches. In our exam-
ple, MetaMap does not identify any sub-phrases, and outputs the following concepts for
the whole phrase ‘lung disease other than asthma’: ‘Disorder of lung (disorder)’, ‘Lung
structure (body structure)’, ‘Asthma (disorder)’. By restricting the types as described in
Section 4.1.1, we immediately rule out ‘Lung structure’.
A larger challenge, however, is to obtain more exact phrase-concept matches. For
this, we split all sub-phrases returned by MetaMap into more sub-phrases using the
Stanford NLP tools [9]. Then we try to find the best phrase-concept matches, by cal-
culating a similarity value (in [0, 1]) of each sub-phrase to all candidate concepts using
Word2Vec [8] and the Levenshtein distance; we also use the synonymous expressions
provided by SNOMED CT to potentially obtain a higher similarity. To avoid spurious
matches, we use a minimum threshold of 0.66 for the similarity. In our example, this ex-
cludes the words ‘other’ and ‘than’, because there is no candidate concept that is similar
enough. The best matches for the phrases ‘lung disease’, ‘lung’, and ‘disease’ all refer to
the same concept Disorder of lung, and we use the similarity values to choose the best of
them, where we give preference to longer phrases.
Table 3. Translation of basic query parts.

Semantic role Formalization Example

Age ageOf(x) ≥ lower ∧ ageOf(x) ≤ upper ageOf(x) ≥ 18
Time ♦[start,end] ♦[−12,0]
Person conceptName(x) Woman(x)
Clinical finding ∃y.diagnosedWith(x, y) ∧ conceptName(y) ∃y.diagnosedWith(x, y) ∧ HIV(y)
Product ∃y.takes(x, y) ∧ conceptName(y) ∃y.takes(x, y) ∧ Aspirin(y)
Procedure ∃y.undergoes(x, y) ∧ conceptName(y) ∃y.undergoes(x, y) ∧ Appendectomy(y)

Measurement pattern: substance/observable entity—comparison sign—number—unit
Formula: conceptNameOf(x) (> | ≥ | ≤ | <) number conceptName hemoglobinOf(x) < 14g/dl

Group pattern: clinical finding—clinical finding—. . .
Formula: conceptName1(y) ∨ conceptName2(y) ∨ . . . HIV(y) ∨ HepatitisC(y)

Negation pattern: clinical finding—partial negation—clinical finding
Formula: conceptName1(y) ∧ ¬conceptName2(y) Diabetes(y) ∧ ¬DiabetesType1(y)

It remains to recognize other semantic roles in the EC, i.e., number, negation, com-
parison sign, and conjunction. We mainly do this by keyword or pattern matching. The
negation case is the most complex due to its various forms:
• explicit negation e.g., ‘not’, ‘except’, ‘other than’, ‘with the exception of’;
• morphological negation, e.g., ‘non-pregnant’, ‘non-healed’, ‘non-smoker’;
• implicit negation, e.g., ‘lack of’, ‘rule out’, ‘free from’.
In our prototype system, we focus on explicit negation, and consider two cases: either the
whole sentence is negated (‘patient does not have . . . ’) or only part of it (‘. . . other than
. . . ’). For conjunctions between parts of sentences, we use ‘∨’ as default annotation, be-
cause there is no good way to map ‘and’ and ‘or’ in EC to conjunction or disjunction ex-
actly, e.g., in the EC ‘. . . including cyclosporine, systemic itraconazole or ketoconazole,
erythromycin or clarithromycin, nefazodone, verapamil and human immunodeficiency
virus protease inhibitors’9 both ‘and’ and ‘or’ have the same meaning.
The final semantic annotation for our example can be seen in Figure 2.

4.2. The Formal Queries

To obtain the final MTNCQ, we combine the different annotated phrases according to
the composibility of semantic roles and structural information. There are four kinds of
basic subformulas: age formulas, person formulas, medical formulas and pattern for-
mulas, and their translation is described in Table 3. For measurements, we detect pat-
terns in the semantic annotation that correspond to a comparison of a substance or ob-
servable entity with a specific numerical value (including unit). Additionally, we group
adjacent SNOMED CT findings together, to translate them into a set of atoms joined
by ∨ inside the same ∃y.diagnosedWith(x, y) ∧ . . . formula. We also translate negation
between clinical findings into appropriate formulas, and do the same for products and
9 https://clinicaltrials.gov/ct2/show/NCT02452502
Table 4. Experimental results. The right table shows the annotation of the translation quality for the 93 criteria
that were marked as ‘answerable’ by all evaluators.

Unanswerable Answerable Good Partial Wrong

evaluator 1 282 119 evaluator 1 54 29 10
evaluator 2 254 147 evaluator 2 56 27 10
evaluator 3 237 164 evaluator 3 65 18 10

procedures. In our running example, ‘lung disease other than asthma’ is formalized as
(∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y)).
Finally, we combine these subformulas using the remaining connectives and nega-
tions and consider any time expressions. In our prototype system, we only express a sin-
gle temporal operator of the form ♦[−n,0] , which we found to be the most common in
clinical trials. Such an operator is always applied to the whole formula, e.g., we obtain
♦(−∞,0] (∃y.diagnosedWith(x, y) ∧ DisorderOfLung(y) ∧ ¬Asthma(y)). If there is more
than one temporal annotation, we choose the more specific one. For example, in ‘history
of myocardial infarction, unstable angina pectoris, percutaneous coronary intervention,
congestive heart failure, hypertensive encephalopathy, stroke or TIA within the last 6
months’10 there are ‘history of’ and ‘within the last 6 months’, and we choose the latter.
If there are no explicit connectives, we combine medical and measurement formulas
by disjunction, and then combine them with age and person formulas by conjunction.

5. Experiments

To the best of our knowledge, there are no gold standard datasets for the translation
of criteria into formal language. Therefore, we evaluated our approach on real-world
studies taken from clinicaltrials.gov.11 During the design phase we used 24 randomly
selected studies, which contained approximately 300 criteria. Our prototype system was
optimized to cover as many of these criteria as possible.
For testing, we randomly selected criteria across all studies on clinicaltrials.gov and
manually evaluated them. Due to time constraints, we managed to process 401 criteria.
We defined the following metrics: A criterion is answerable, if a) it is possible for a
human to translate it into an MTNCQ using only the vocabulary chosen in Section 4.1.1;
and b) it can in principle be answered by only looking at the EHR of a patient. Hence,
criteria that refer to the future (‘during study phase’), or ask for subjective information
(‘in the opinion of the investigator’, ‘willingness to’), are not considered answerable for
the purposes of our system. For each answerable criterion, we then evaluated the quality
of the translation. The resulting MTNCQ is labeled as good if it contains all (necessary)
information; partial if it represents at least parts of the criterion; and wrong otherwise.
These metrics are clearly subjective to some extent. To get a more reliable evaluation and
to quantify the amount of subjectivity, we let three evaluators (three of the authors) vote
independently on the test data. The results can be seen in Table 4.
The results indicate that the judgment on whether a criterion is answerable or not
differs between the evaluators. We found that the difference is mainly caused by two
10 https://clinicaltrials.gov/ct2/show/NCT00220220
11 https://clinicaltrials.gov/
things: Firstly, it is sometimes difficult to judge whether a concept can be represented
in SNOMED CT, because the concept name can differ significantly from the description
in the text. Secondly, many criteria contain very specific phrases, for example ‘Active
bowels inflammatory disease ([Crohn], chronic, diarrhea...)’.12 The word ‘active’ cannot
be translated into SNOMED CT, and we could translate it into a temporal constraint
only under some assumptions on the semantics of ‘active’. Some might consider this to
be not so important, while for others this renders the criterion unanswerable. Despite
the differences, at least 60% of the criteria cannot be answered, even in the opinion of
evaluator 3, who was the most optimistic. This is partially because of condition b) above.
The second reason is that quite a number of criteria cannot be represented in our formal
language, either because of a lack of vocabulary in SNOMED CT, or because of missing
semantic roles (see Section 4.1.1). While the former cannot be improved on, the latter
offers room for future optimizations.
To compare the quality of the translations, we consider only criteria that have been
marked as answerable by all evaluators. This leaves 93 criteria that are analyzed on
the right-hand side of Table 4. The difference in the translation quality is again due to
the varying opinions of the evaluators regarding how detailed a translation needs to be
in order to be considered good. Our system is able to translate more than 50% of the
(confidently) answerable criteria, which is a promising first result. In the following, we
give examples for a good, partial, and a bad translation of our system:

‘Has a history of diabetic ketoacidosis in the last 6 months.’13

♦[−6,0] ∃y.diagnosedWith(x, y) ∧ KetoacidosisInDiabetesMellitus(y)
‘History of, diagnosed or suspected genital or other malignancy (excluding treated squamous cell
carcinoma of the skin), and untreated cervical dysplasia.’14

♦(−∞,0] ∃y.diagnosedWith(x, y) ∧ MalignantNeoplasticDisease(y) ∨ DysplasiaOfCervix(y)

‘Primary tumors developed 5 years previous to the inclusion, except in situ cervix carcinoma or
skin basocellular cancer properly treated’15

♦(−∞,0] ∃y.diagnosedWith(x, y) ∧ CarcinomaInSituOfUterineCervix(y) ∨ SkinCancer(y)

The second translation is partially correct, because the temporal data and the main
concepts have been recognized correctly, but ‘excluding . . . ’ was not translated. The last
translation is wrong since neither the temporal information, the negation, nor the main
concept ‘primary tumors’ have been recognized correctly. For more examples, we refer
the reader to the appendix in the extended version.

6. Discussion and Ongoing Work

Formalizing ECs is a challenging task due to the gap between natural and formal lan-
guage. In this paper, we have presented an automatic translation method from ECs into
12 https://clinicaltrials.gov/ct2/show/NCT02363725
13 https://clinicaltrials.gov/ct2/show/NCT02269735
14 https://clinicaltrials.gov/ct2/show/NCT01397097
15 https://clinicaltrials.gov/ct2/show/NCT01303029
formal queries, and developed a prototype system based on existing NLP tools. We have
evaluated our prototype on 401 eligibility criteria. More than 50% of the answerable cri-
teria have been translated correctly, which is an encouraging result that can be improved
on by optimizing the translation process as we describe below. However, there remain
certain criteria that are hard to translate (even for humans) due to their complex structure.
While it is unreasonable to expect medical doctors to formulate clinical trial criteria
directly as MTNCQs, we nevertheless identify a few key points that can be observed
during the formulation of ECs to make the automatic translation easier:
1. Split criteria whenever possible, e.g., divide ‘diagnosed with diabetes and hyper-
tension’ into ‘diagnosed with diabetes’ and ‘diagnosed with hypertension.’
2. Formulate every EC as an independent description that does not depend on other
criteria or the background knowledge of clinical trials, like in ‘Known hypersen-
sitivity to any of the study drugs or excipients.’16
3. Avoid using nonadjacent words to express a concept, e.g., ‘. . . dermatologic, neu-
rologic, or psychiatric disease’17 should rather be formulated as ‘dermatologic
disease, neurologic disease, or psychiatric disease.’
We can improve the quality of our translation by collecting more regular expres-
sions and custom mappings, or employing specialized techniques from the literature for
the recognition of semantic roles like comparison sign or negation. Other obvious steps
are the inclusion of more concept categories from SNOMED CT such as devices, quali-
fiers, and events. For example, the criterion ‘severe aortic stenosis’18 could be translated
as ∃y, z.hasDiagnosis(x, y) ∧ AorticStenosis(y) ∧ severity(y, z) ∧ Severe(z) if we annotate
‘severe’ with the SNOMED CT concept severe (qualifier value) and detect the pattern
qualifier value—finding. It is also straightforward modify our system to output a ranked
list of multiple candidate translations that the doctor may choose from.
Another interesting direction for future work is to develop a controlled natural lan-
guage [26] based on our semantic annotations. Criteria formulated in this way can then
easily be transformed into MTNCQs as we have described. With appropriate editing sup-
port, creating new ECs that conform with this controlled NL would be not much more
difficult than writing them as free-form text. Of course, one should retain the possibility
to add free-form criteria, which then have to be evaluated manually.

Acknowledgements

This work was supported by the DFG project BA 1122/19-1 (GOASQ), by DFG grant
389792660 as part of TRR 248, and by the China Scholarship Council.

References

[1] Chunhua Weng, Samson W Tu, Ida Sim, and Rachel Richesson. Formal representation of eligibility
criteria: a literature review. J. Biomed. Inform., 43(3):451–467, 2010.

16 https://clinicaltrials.gov/ct2/show/NCT01935492
17 https://clinicaltrials.gov/ct2/show/NCT00960570
18 https://clinicaltrials.gov/ct2/show/NCT01951950
[2] Richard Bache, Adel Taweel, Simon Miles, and Brendan C Delaney. An eligibility criteria query lan-
guage for heterogeneous data warehouses. Method. Inform. Med, 54(01):41–44, 2015.
[3] Samson W Tu, Mor Peleg, Simona Carini, Michael Bobak, Jessica Ross, Daniel Rubin, and Ida Sim.
A practical method for transforming free-text eligibility criteria into computable criteria. J. Biomed.
Inform., 44(2):239–250, 2011.
[4] Franz Baader, Stefan Borgwardt, and Walter Forkel. Patient selection for clinical trials using temporal-
ized ontology-mediated query answering. In Proc. HQA Workshop, pages 1069–1074. ACM, 2018.
[5] Franz Baader, Stefan Borgwardt, and Marcel Lippmann. Temporal query entailment in the description
logic S H Q. J. Web Sem., 33:71–93, 2015.
[6] Franz Baader, Stefan Borgwardt, Patrick Koopmann, Ana Ozaki, and Veronika Thost. Metric temporal
description logics with interval-rigid names. In Proc. FroCoS Symposium, pages 60–76. Springer, 2017.
[7] Stefan Borgwardt and Walter Forkel. Closed-world semantics for conjunctive queries with negation over
E L H ⊥ ontologies. In Proc. JELIA Conference, pages 371–386. Springer, 2019.
[8] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations
in vector space. arXiv preprint arXiv:1301.3781, 2013.
[9] Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky.
The Stanford CoreNLP natural language processing toolkit. In Proc. ALC Meeting, pages 55–60, 2014.
[10] Alan R Aronson. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap
program. In Proc. AMIA Symposium, page 17, 2001.
[11] Krystyna Milian, Anca Bucur, and Annette Ten Teije. Formalization of clinical trial eligibility criteria:
Evaluation of a pattern-based approach. In Proc. BIBM Conference, pages 1–4. IEEE, 2012.
[12] Krystyna Milian and Annette ten Teije. Towards automatic patient eligibility assessment: From free-text
criteria to queries. In Proc. AIME Conference, pages 78–83. Springer, 2013.
[13] Li Zhou, Genevieve B Melton, Simon Parsons, and George Hripcsak. A temporal constraint structure
for extracting temporal information from clinical narrative. J. Biomed. Inform., 39(4):424–439, 2006.
[14] Zhihui Luo, Meliha Yetisgen-Yildiz, and Chunhua Weng. Dynamic categorization of clinical research
eligibility criteria by hierarchical clustering. J. Biomed. Inform., 44(6):927–935, 2011.
[15] Mary Regina Boland, Samson W Tu, Simona Carini, Ida Sim, and Chunhua Weng. EliXR-TIME: A
temporal knowledge representation for clinical research eligibility criteria. AMIA Transl. Sci. Proc.,
2012:71, 2012.
[16] Yang Huang and Henry J Lowe. A novel hybrid approach to automated negation detection in clinical
radiology reports. J. Am. Med. Inform. Assn., 14(3):304–311, 2007.
[17] Martine Enger, Erik Velldal, and Lilja Øvrelid. An open-source tool for negation detection: A maximum-
margin approach. In Proc. SemBEaR Workshop, pages 64–69, 2017.
[18] Chunhua Weng, Xiaoying Wu, Zhihui Luo, Mary Regina Boland, Dimitri Theodoratos, and Stephen B
Johnson. EliXR: An approach to eligibility criteria extraction and representation. J. Am. Med. Inform.
Assn., 18(Supplement 1):i116–i124, 2011.
[19] Sanmitra Bhattacharya and Michael N Cantor. Analysis of eligibility criteria representation in industry-
standard clinical trial protocols. J. Biomed. Inform., 46(5):805–813, 2013.
[20] Efthymios Chondrogiannis, Vassiliki Andronikou, Anastasios Tagaris, Efstathios Karanastasis,
Theodora Varvarigou, and Masatsugu Tsuji. A novel semantic representation for eligibility criteria in
clinical trials. J. Biomed. Inform., 69:10–23, 2017.
[21] Li Dong and Mirella Lapata. Language to logical form with neural attention. In Proc. Annual Meeting
of the ACL, 2016.
[22] Meghyn Bienvenu. Ontology-mediated query answering: Harnessing knowledge to get more from data.
In Proc. IJCAI Conference, pages 4058–4061. AAAI Press, 2016.
[23] Franz Baader, Ian Horrocks, Carsten Lutz, and Uli Sattler. An Introduction to Description Logic. Cam-
bridge University Press, 2017.
[24] Jon Patrick, Yefeng Wang, and Peter Budd. An automated system for conversion of clinical notes into
SNOMED Clinical Terminology. In Proc. ACSW Symposium, pages 219–226, 2007.
[25] Franz Baader and Philipp Hanschke. A scheme for integrating concrete domains into concept languages.
In John Mylopoulos and Raymond Reiter, editors, Proc. IJCAI Conference, pages 452–457, 1991.
[26] Tobias Kuhn. A survey and classification of controlled natural languages. Comput. Linguist., 40(1):121–
170, 2014.