=Paper=
{{Paper
|id=Vol-2429/paper2
|storemode=property
|title=Analysing the Heterogeneity of Rule-Based EHR Phenotyping Algorithms in CALIBER and the UK Biobank
|pdfUrl=https://ceur-ws.org/Vol-2429/paper2.pdf
|volume=Vol-2429
|authors=Spiros Denaxas,Helen Parkinson,Natalie Fitzpatrick,Cathie Sudlow,Harry Hemingway
|dblpUrl=https://dblp.org/rec/conf/ijcai/DenaxasPFSH19
}}
==Analysing the Heterogeneity of Rule-Based EHR Phenotyping Algorithms in CALIBER and the UK Biobank==
Analyzing the heterogeneity of rule-based EHR phenotyping algorithms in
CALIBER and the UK Biobank
Spiros Denaxas1,2,3, Helen Parkinson2,4, Natalie Fitzpatrick1,2,3, Cathie Sudlow2,5, Harry
Hemingway1,2,3
1
Institute of Health Informatics, University College London, UK
2
Health Data Research UK London/Cambridge/Scotland, UK
3
UCL Hospitals Biomedical Research Center, London, UK
4
European Bioinformatics Institute, Cambridge, UK
5
Centre for Medical Informatics, Usher Institute of Population Health Science and Informatics,
University of Edinburgh, Edinburgh, UK
s.denaxas@ucl.ac.uk, parkinso@ebi.ac.uk, n.fitzpatrick@ucl.ac.uk, Cathie.Sudlow@ed.ac.uk,
h.hemingway@ucl.ac.uk
Abstract implementation patterns will facilitate the design of
a minimum information standard for representing
Electronic Health Records (EHR) are data and curating algorithms nationally and
generated during routine interactions across internationally.
healthcare settings and contain rich, longitudinal
information on diagnoses, symptoms, medications, 1 Introduction
investigations and tests. A primary use-case for
EHR is the creation of phenotyping algorithms In the United Kingdom (UK), structured electronic health
used to identify disease status, onset and records (EHR) spanning primary care, hospital care,
progression or extraction of information on risk disease/procedure registries and death registries are used to
factors or biomarkers. Phenotyping however is create longitudinal disease phenotypes for observational
challenging since EHR are collected for different research studies [Hemingway et al., 2018]. Through a
purposes, have variable data quality and often process called phenotyping, researchers create algorithms
require significant harmonization. While which utilize multiple EHR sources to accurately extract
considerable effort goes into the phenotyping information on diseases (e.g. status, onset and progression),
process, no consistent methodology for lifestyle risk factors and biomarkers [Banda et al., 2018].
representing algorithms exists in the UK. Creating Phenotyping however is challenging due to the fact that
a national repository of curated algorithms can EHR are fragmented, curated using different controlled
potentially enable algorithm dissemination and clinical terminologies and collected for purposes other than
reuse by the wider community. A critical first step research (e.g. reimbursement, audit) [Morley et al., 2014].
is the creation of a robust minimum information
standard for phenotyping algorithm components Phenotyping requires a significant amount of resources and
(metadata, implementation logic, validation mix of expertise, yet no common standard approach for
evidence) which involves identifying and defining, validating and ultimately sharing EHR
reviewing the complexity and heterogeneity of phenotyping algorithms currently exists. In the UK,
current UK EHR algorithms. In this study, we structured primary care EHR have been used in >1,800
analyzed all available EHR phenotyping algorithms peer-reviewed studies to date but only 5% of studies
(n=70) from two large-scale contemporary EHR published sufficiently reproducible phenotypes [Springate et
resources in the UK (CALIBER and UK Biobank). al., 2014]. Defining a standardized format to represent EHR
We documented EHR sources, controlled clinical phenotypes will enable portability across data sources (and
terminologies, evidence of algorithm validation, healthcare systems) and facilitate the systematic sharing of
representation and implementation logic patterns. algorithms across the community [Mo et al. 2015].
Understanding the heterogeneity of UK EHR
algorithms and identifying common
Copyright © 2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
6
Compared to the United States (US), the UK EHR research lifestyle risk factors from two large-scale contemporary UK
landscape differs in two important ways: 1) researchers can research resources: UK Biobank1 and CALIBER2.
utilize multiple national EHR sources to create longitudinal
‘cradle to grave’ phenotypes [Kuan et al., 2019], and 2) UK The UK Biobank [Sudlow et al., 2015] is a prospective
primary care EHR contain both healthy and unhealthy cohort study of 500,000 (aged 40-69 at recruitment) adults
individuals which allow researchers to capture information recruited in England, Scotland and Wales from 2006-2010.
on disease severity and progression over time. A recent For each participant, deep phenotypic and genotypic
systematic review identified 66 different definitions used to information is available including biomarkers in blood and
capture asthma status and exacerbations in research using urine, imaging (brain, heart, abdomen, bone, carotid artery),
UK EHR [Al Sallakh et al., 2017] demonstrating significant lifestyle indicators, pathophysiological measurements and
existing heterogeneity. While analyses have been genome-wide genotype data. Follow-up for health outcomes
undertaken in the US to characterize the heterogeneity of is enabled by hospital EHR (Hospital Episode Statistics
phenotyping algorithms [Conway et al., 2011], no such (HES) in England, Patient Episode Data Warehouse in
analysis has been carried out in the UK. Wales and Scottish Morbidity Registry in Scotland) and
linkages to primary care EHR are underway. CALIBER
One of the aims of the newly-established national institute [Denaxas et al., 2012; Denaxas et al., 2019] is a research
for health data science, Health Data Research UK (HDR resource consisting of algorithms, tools and methods for
UK, www.hdruk.ac.uk), is the creation of a national structured EHR linked across primary care (Clinical Practice
Phenomics Resource: an open-access online resource where Research Datalink, CPRD), hospital care (HES) and a
EHR phenotypes can be deposited and curated. A critical mortality data (Office for National Statistics, ONS) in the
first step in this process is to establish a minimum UK.
information standard for representing EHR phenotyping
algorithms. This involves exploring and documenting the In the UK, national EHR are recorded using controlled
complexity, heterogeneity, design and implementation clinical terminologies where terms are assigned at variable
patterns of contemporary phenotyping algorithms in the UK. timepoints i.e. in UK primary care the physician records
The concept of a minimum information standard has been terms in real time during the consultation with the patient
used successfully in other biomedical disciplines, e.g. whereas in hospital care terms are retrospectively entered
Minimum Information About a Microarray Experiment into databases by trained coders and data selected for billing
(MIAME) defines standards for reporting microarray purposes. We identified and counted the number of
experiments [Brazma et al., 2001]. Establishing a ontology terms each algorithm utilizes from five controlled
standardized method for representing phenotypes in the UK clinical terminologies which are widely used in the UK: a)
can potentially address these challenges and ensure Read (primary care, subset of SNOMED-CT), b)
compatibility with other international initiatives such as International Classification of Diseases 9th and 10th
eMERGE and PCORNet [Fleurence et al. 2014; Gottesman Revision (ICD-9, ICD-10, secondary care diagnoses and
et al. 2013]. cause of mortality), c) OPCS Classification of Interventions
and Procedures (OPCS-4, hospital surgical procedures,
2. Aims analogous to the Current Procedural Terminology ontology
Despite the widespread use of UK EHR data sources for used in the United States), and d) the Dictionary of
research, contemporary research resources utilize different Medicines and Devices (DM+D) which is used to record
approaches for algorithm creation, curation and validation. primary care prescriptions. Terms were automatically
The aims of this study were to: a) identify and characterize extracted from documents and counted using regular
the structural components, implementation logic and expressions in Python 3.63. We manually extracted and
heterogeneity of rule-based algorithms defining diseases, counted terms across five randomly chosen algorithms to
lifestyle risk factors and biomarkers in structured national verify the automatically-generated counts.
EHR in the UK utilized by contemporary research
resources, and b) propose a minimum information standard EHR phenotype validation is a critical process guiding the
to represent UK EHR phenotyping algorithms. subsequent use of algorithms and we were interested in what
types, if any, of evidence were available to external
3. Methods researchers. We classified the available material into six
We identified, downloaded and reviewed published non-overlapping categories which encapsulate all potential
phenotyping algorithms for diseases, biomarkers and
1
http://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42
2
https://www.caliberresearch.org/portal/phenotypes
3
https://www.python.org/
7
approaches for obtaining validity evidence (adapted from Temporal Complex ≥3 high SBP/DBP
[Denaxas et al, 2019] and recorded as used/not used): (complex) temporal readings within 1-year
• Aetiological: Are the prospective associations with risk rules. OR ≥2 high SBP/DBP
factors consistent with previous published evidence multiple readings in a 6-month
from both EHR and non-EHR studies? logic layers period
• Prognostic: Are the risks of subsequent events Biomarker Evidence Presence of a positive
plausible and consistent with existing domain from rheumatoid factor test or
knowledge? continuous anti-cyclic citrullinated
• Case-note review: What is the positive predictive value measureme peptide antibody test
(PPV) and the negative predictive value (NPV) when nt after a rheumatoid
comparing the algorithm with clinician-led review of arthritis diagnosis
case notes, self-reported information or a suitable “gold Complex Calculation Calculate average BMI
standard” source? calculation e.g. unit in consultation, exclude
• Cross-EHR-source concordance: To what extent is conversion measurements <10
the phenotype concordant across EHR sources? kg/m2 or >80 kg/m2
• Genetic: Are the observed genetic associations
plausible and consistent in terms of magnitude and Table 1: Characteristics of implementation logic,
direction of association with associations reported from temporality and algorithmic implementation features
non-EHR studies? extracted and analyzed from phenotyping algorithms in
• External populations: Has the algorithm been the UK Biobank and the CALIBER resources. AF Atrial
evaluated in different countries or external sources? Fibrillation; BMI Body Mass Index; BP Blood Pressure;
DBP Diastolic Blood Pressure; DVT Deep Vein
For each algorithm, we documented the EHR sources the Thrombosis; PVD Peripheral Vascular Disease; PE
phenotype is derived from (i.e. primary care, hospital care, Pulmonary Embolism; HT Hypertension; HF Heart Failure;
mortality register). We extracted information on the mmHg millimeter of mercury; SBP Systolic Blood Pressure.
representation components of phenotypes e.g. the presence
of tabular data and the use of a flowchart (or other graphical 4. Results
presentation). We extracted and categorized information on We identified and reviewed 70 EHR phenotyping (Table 2)
the different types of implementation logic, temporality and algorithms available from the UK Biobank (n=19) and the
algorithm implementation patterns (Table 1), partially based CALIBER resource (n=51). The majority of phenotyping
on previous research in the US [Conway et al., 2011]. algorithms were created to ascertain disease status (n=54)
Concept Definition Example (e.g. heart failure [Gho et al. 2018; Uijl et al. 2019],
Simple Simple PVD diagnosis during a depression [Daskalopoulou et al. 2016]), ten algorithms
Boolean Boolean primary care were created to extract information on biomarkers (e.g.
statements consultation OR heart rate [Archangelidi et al. 2018], blood pressure
e.g. diagnosis of leg or aortic [Rapsomaniki et al. 2014]) and six algorithms were used to
“AND”, embolism or thrombosis identify lifestyle risk factors (e.g. alcohol [Bell et al. 2017],
“OR” during a hospitalization smoking [Pujades-Rodriguez et al. 2015]).
Complex Nested IF patient = diabetic: HT
Boolean statements threshold: SBP ≥140 All but one CALIBER phenotyping algorithm (n=50) used
with mmHg OR DBP ≥90 information from primary care EHR with the exception of
multiple mmHg ELSE: threshold socioeconomic status which was defined using the Index of
layers? SBP ≥150 mmHg OR Multiple Deprivation (IMD) provided by the ONS.
DBP ≥90 mmHg Algorithms defining biomarker measurements (e.g. white
blood cells, heart rate) were based on primary care EHR
Negation Are No AF diagnosis term is
entirely while approximately half of the algorithms
negation present, but the patient
ascertaining disease status (n=19 of 35) combined
statements record includes a
information across all three EHR sources. All currently
used? warfarin prescription in
available UK Biobank algorithms (n=19) combined
the absence of prior
information recorded during the baseline assessment (data
DVT or PE, or a digoxin
not shown), diagnoses and/or surgical procedures recorded
prescription but no HF
during hospitalization and information based on the
Temporal Temporal Iron deficiency anaemia
underlying (or secondary) cause of death which is recorded
(simple) proximity record in primary care
in the national mortality register. Primary care linkages in
future or OR hospital AND
UK Biobank are still underway and as a result none of the
past endoscopy in 30 days
8
currently available algorithms utilized information from Prognostic 86% (n=66) and cross-source concordance 54%
primary care EHR. However, primary care information for (n=43) validation approaches where the most widely-used
just under half of the cohort (n=230,000) will be made algorithm evaluation approaches. The least-widely used
available for UK Biobank researchers in June 2019. validation approach was expert case note review, although
Algorithms incorporating primary care data for the this type of validation has been completed for a few UK
conditions already covered have been or are being Biobank algorithms, including dementia and its subtypes
developed [Wilkinson et al 2019]. Along with a range of [Wilkinson et al, 2019], and is underway for several others.
additional algorithms expanding the range of health Most (93% [n=66]) of the algorithms used data stored in
outcomes available, they will be available from UK Biobank tabular format since tables are predominantly used to store
later in 2019. Overall, based on current publicly available lists of controlled clinical terminology terms. Only 25%
information from CALIBER and UK Biobank, 75% (n=66) (n=15) of algorithms included a graphical representation of
of algorithms used data from secondary care EHR and 45% the algorithm using a flowchart and all algorithms included
(n=49) used information available in the death registry. a textual description of the algorithm components.
The most widely-used clinical terminology was Read with
4,729 (non-unique) terms used across all algorithms while 5. Discussion
the second highest number of terms was derived from the In this study we downloaded and reviewed 70 EHR
DM+D with 2,273 (non-unique) terms used to record phenotyping algorithms from two large-scale, national
prescriptions in primary care EHR. Four algorithms (body research resources in the UK. We reviewed algorithms in
mass index, socioeconomic deprivation, sex, heart rate) did terms of EHR data sources, controlled clinical terminologies
not use any terms across any terminology systems and were used, available evidence of algorithm validation, algorithm
based on information which is derived from a structured representation formats and implementation logic patterns.
field of the EHR or externally linked such as in the case of
IMD. The atrial fibrillation algorithm used the highest Similar to findings from US studies, we discovered that UK
number of clinical terms (n=987) while across all algorithms EHR algorithms make extensive use of Boolean statements
the pregnancy phenotype used the highest number of Read and temporal logic. When these are used, they are often
codes (n=1,948). ICD-9 was the terminology least used: in complex i.e. combining multiple nested Boolean layers of
the UK Biobank it is used for recording diagnoses in older logic and defining temporal proximity rules within them.
Scottish hospital records and in CALIBER it is used to This is expected given that algorithms utilize multiple
record the cause of death prior to 1997. Algorithms defining sources of information and include evidence from primary
biomarkers contained the lowest number of terminology care and hospital care (or self-reported information in the
terms as they relied on structured data fields combined with case of the UK Biobank). Algorithms defining disease status
a small number of diagnosis terms to denote the type of test were the most frequent and complex algorithms reviewed
(e.g. Read code “42K..00 Eosinophil count”). and utilized the greatest number of terms from controlled
clinical terminologies. Negation was another major
With regards to algorithm implementation logic, 66 (93%) component of algorithms and is often used to exclude
of algorithms used Boolean statements, usually to identify concomitant diagnoses or procedures when trying to
the presence of one or more diagnosis codes in a patient’s ascertain diseases based on secondary information (e.g.
EHR. Where Boolean statements were deployed, in nearly ascertaining AF cases based on a prescription of digoxin but
half of the cases these were complex and involved either a excluding patients which are diagnosed with HF).
series of nested statements or joined information across
multiple sources, for example in the UK Biobank where The Read clinical terminology was the most popular
information is derived from self-reported, hospital and terminology used with the highest number of terms per
mortality sources and events are further stratified as phenotype. These findings are expected as Read contains a
‘prevalent’ (first reported prior to recruitment) or ‘incident’ significant amount of duplication internally due to synonym
(first reported after recruitment). A similar pattern of logic terms which can be potentially utilized. Additionally, the
was observed with regards to temporality where 66 clinical concepts contained within Read subsume the
algorithms utilized temporal rules and almost always this concepts across all other terminologies i.e. Read contains
included more complex statements and restrictions. Finally, terms for diagnoses, symptoms, laboratory tests,
approximately half (n=43) of the algorithms used negation. prescriptions and procedures. UK primary care clinical
Only ten algorithms (16%) included more complex coding is currently transitioning to SNOMED-CT which
calculations, usually to calculate the mean of multiple should provide a more streamlined set of terms to be used.
measurements on the same day or to harmonize units for
laboratory measurements to a common format. In terms of validation, we observed a significant level of
heterogeneity with approaches seeking to evaluate and
replicate previously reported aetiological and prognostic
9
estimates from non-EHR studies being the most popular. implementation logic, validation evidence and use-cases.
The presentation of the evidence however does not follow a We suggest the following components towards establishing
common standard and sometimes only included references a minimum information standard with regards to rule-based
to published research rather than a more structured abstract phenotyping algorithms for UK EHR:
of the main findings of the analyses. In contrast with the
US, expert review of case records was the least frequently Part 1 – Algorithm metadata: Succinct information about
used approach for evaluation due to the fact that large scale the goal of the algorithm, the intended use-case, the data
corpuses of medical text do not exist in the UK owing to sources and controlled clinical terminologies used,
information governance restrictions and the technical applicable age groups and genders, list of authors and their
challenges of integrating such data since they are held in a contact details and a set of SNOMED-CT terms to classify
wide range of formats by multiple different NHS the algorithm. A unique identifier, such as a Digital Object
organisations. For similar reasons, none of the algorithms Identifier (DOI), should be minted to enable usage tracking
reviewed utilize medical text and natural language in subsequent research.
processing approaches to extract information from medical
notes which is prevalent in some clinical specialties such as Part 2 – Implementation: Details on the implementation
mental health [Wu et al. 2018]. logic of the algorithm with pseudocode to facilitate the
translation to machine code and documentation on decisions
Significant heterogeneity was also observed in terms of made and reasoning. Where possible analytical scripts
representation. UK Biobank algorithms were curated in should be attached using markdown or a similar approach.
individual PDF files4 and included extended information on The standard should support defining complex Boolean and
the goal of the algorithm and useful background knowledge temporal logic across multiple EHR sources and clinical
and references. In contrast, CALIBER phenotypes were terminologies. In the future, a computable phenotype format
stored in an online, openly-available Portal5, spanned should encapsulate this information as a stand-alone file.
multiple pages and did not include much background
information. Flowcharts or similar graphical representations Part 3 – Validation evidence: Description of the steps
were not widely-used and while they are not machine- taken to support phenotype validity across six categories
readable, they can potentially minimize errors during (aetiological, prognostic, genetic, expert review, cross-
translation of the algorithm to machine code. source and external population). For each implementation,
the number of cases, controls, NPV and PPV values should
Our study has potential limitations. We reviewed algorithms be reported and the format should support the embedding of
from only two UK sources. While other UK initiatives exist, graphical files (e.g. forest plots).
they tend to focus on curating lists of controlled clinical
terminology terms (referred to as codelists) rather than self- Part 4 – Use-cases: Links to published research utilizing
contained phenotypes i.e. terms, implementation, validation the phenotype algorithms, cross-referenced with DOI’s.
evidence. We only focused on rule-based approaches and
did not cover machine learning approaches. While rule- 7. Conclusion
based methods are the most widely used in the UK, data-
driven high-throughput approaches including natural Our analyses identified a certain level of underlying
language processing methods are emerging [Zhou et al., homogeneity in terms of how phenotyping algorithms are
2016, Pikoula et al., 2019]. These approaches pose different defined and evaluated. We suggest four components
challenges and their requirements would need to be towards a minimum information standard that should be
documented and analysed in order to ensure their integration used to represent phenotyping algorithms. These findings
[Hripcsak & Albers 2013]. Finally, reproducible research provide a crucial first step towards curating and
approaches [Denaxas et al., 2017, Goodman et al, 2016] disseminating phenotyping algorithms utilizing UK EHR.
which are covered elsewhere would also need to be Further work is required towards establishing a computable
carefully taken into consideration in order to ensure format for phenotyping algorithms and ensuring
algorithm portability. interoperability with other resources (e.g. PheKB).
6. Steps towards a minimum information standard Acknowledgments
Based on our findings, we propose that an EHR This work was supported by Health Data Research UK,
phenotyping algorithm representation combines metadata, which receives its funding from HDR UK Ltd (LOND1)
funded by the UK Medical Research Council, Engineering
4
http://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42 and Physical Sciences Research Council, Economic and
5
https://www.caliberresearch.org/portal Social Research Council, Department of Health and Social
Care (England), Chief Scientist Office of the Scottish
Government Health and Social Care Directorates, Health
10
and Social Care Research and Development Division [Denaxas et al., 2019] Denaxas, S., et al. UK phenomics
(Welsh Government), Public Health Agency (Northern platform for developing and validating EHR phenotypes:
Ireland), British Heart Foundation and the Wellcome Trust. CALIBER. J Am Med Inf 10.1093/jamia/ocz105, 2019.
The BigData@Heart Consortium is funded by the [Denaxas et al. 2017] Denaxas, S. et al., Methods for
Innovative Medicines Initiative-2 Joint Undertaking under enhancing the reproducibility of biomedical research
grant agreement No. 116074. This study was supported by findings using electronic health records. BioData
the Farr Institute of Health Informatics Research at UCL Mining, 10 (31), 2017.
Partners (MR/K006584/1). This paper represents
independent research part funded by the National Institute [Fleurence et al., 2014] Fleurence, R., et al. Launching
for Health Research Biomedical Research Centre at UCLH. PCORnet, a National Patient-Centered Clinical Research
HH is a NIHR Senior Investigator. SD is an Alan Turing Network. JAMIA 21 (4): 578–82, 2014.
Fellow. [Gho et al. 2018] Gho, J. et al. An Electronic Health
Records Cohort Study on Heart Failure Following
References Myocardial Infarction in England: Incidence and
[Al Sallakh et al. 2017] Al Sallakh, M. A., et al. Defining Predictors. BMJ Open 8 (3): e018331., 2018.
asthma and assessing asthma outcomes using electronic [Goodman et al., 2016] Goodman, S.N., et al. What does
health record data: a systematic scoping review. Eur. research reproducibility mean? Science Translational
Respiratory J., 49(6), 2017. Medicine, 8(341), p.341ps12., 2016.
[Archangelidi et al., 2018] Archangelidi, O., et al. Clinically [Gottesman et al., 2013] Gottesman, O., et al. “The
Recorded Heart Rate and Incidence of 12 Coronary, Electronic Medical Records and Genomics (eMERGE)
Cardiac, Cerebrovascular and Peripheral Arterial Network: Past, Present, and Future.” Genetics in
Diseases in 233,970 Men and Women: A Linked Medicine 15 (10): 761–71, 2013.
Electronic Health Record Study. Eur. J. of Preventive [Hemingway et al., 2018] Hemingway, H., et al. Big data
Cardiology 25 (14): 1485–95, 2018. from electronic health records for early and late
[Banda et al., 2018] Banda, J. M., et al. Advances in translational cardiovascular research: challenges and
Electronic Phenotyping: From Rule-Based Definitions to potential. European Heart J., 39(16), 1481–1495, 2018
Machine Learning Models. Annual Review of [Hripcsak & Albers, 2013] Hripcsak, G. & Albers, D.J.
Biomedical Data Science 2018. Next-generation phenotyping of electronic health
[Bell et al., 2017] Bell, S, et al. Association between records. JAMIA, 20(1), 117–121, 2013.
Clinically Recorded Alcohol Consumption and Initial [Kuan et al., 2019] Kuan, V. et al. A chronological map of
Presentation of 12 Cardiovascular Diseases: Population 308 physical and mental health conditions from 4 million
Based Cohort Study Using Linked Health Records. individuals in the English National Health Service. The
BMJ 356: j909, 2017. Lancet Digital Health 1(2), e63-e67. 2019.
Brazma et al., 2001] Brazma, A. et al., Minimum [Mo et al., 2015] Mo, H., et al., Desiderata for Computable
information about a microarray experiment (MIAME)- Representations of Electronic Health Records-Driven
toward standards for microarray data. Nature Genetics, Phenotype Algorithms, JAMIA 22 (6): 1220–30., 2015.
29(4), 365–371. 2001.
[Morley et al., 2014] Morley, K. et al., Defining disease
[Conway et al., 2011] Conway, M., et al. Analyzing the phenotypes using national linked electronic health
heterogeneity and complexity of Electronic Health records: a case study of atrial fibrillation. PLOS ONE,
Record oriented phenotyping algorithms. Proc. Am Med 9(11), e110900, 2014.
Infor Assoc., 274–283, 2011
[Pikoula et al., 2019] Pikoula, M. et al., Identifying
[Daskalopoulou et al., 2016] Daskalopoulou, M. et al., clinically important COPD sub-types using data-driven
Depression as a Risk Factor for the Initial Presentation approaches in primary care population based electronic
of Twelve Cardiac, Cerebrovascular, and Peripheral health records. BMC Medical Informatics and Decision
Arterial Diseases: Data Linkage Study of 1.9 Million Making, 19(1), p.86, 2019.
Women and Men. PLOS ONE 11 (4): e0153838, 2016.
[Pujades-Rodriguez et al.,2015] Pujades-Rodriguez, M. et
[Denaxas et al, 2012] Denaxas, S. et al. Data resource al., Heterogeneous Associations between Smoking and a
profile: cardiovascular disease research using linked Wide Range of Initial Presentations of Cardiovascular
bespoke studies and electronic health records Disease in 1937360 People in England: Lifetime Risks
(CALIBER). Int. J. Epidemiology, 41(6), 1625–1638, and Implications for Risk Prediction. Int. J. of
2012. Epidemiology 44 (1): 129–41, 2015.
11
[Rapsomaniki et al., 2014] Rapsomaniki, E. et al. Blood Health Records. Eur. J. Heart Failure.,
Pressure and Incidence of Twelve Cardiovascular 10.1002/ejhf.1350, 2019.
Diseases: Lifetime Risks, Healthy Life-Years Lost, and [Wilkinson et al., 2019] Wilkinson T. et al., Identifying
Age-Specific Associations in 1·25 Million People. The dementia outcomes in UK Biobank: a validation study of
Lancet 383 (9932): 1899–1911, 2014. primary care, hospital admissions and mortality data.
[Springate et al., 2014] Springate, D. et al., ClinicalCodes: Eur Jour Epidemiology. 10.1007/s10654-019-00499-1.
an online clinical codes repository to improve the 2019.
validity and reproducibility of research using electronic [Wu et al., 2018] Wu, H. et al., SemEHR: A general-
medical records. PLOS ONE, 9(6), e99825, 2014. purpose semantic search system to surface semantic data
[Sudlow et al., 2015] Sudlow, C., et al., UK Biobank: an from clinical notes for tailored care, trial recruitment,
open access resource for identifying the causes of a wide and clinical research. JAMIA, 25(5), 530–537., 2018.
range of complex diseases of middle and old age. PLOS [Zhou et al,, 2016] Zhou, S.-M. et al., Defining Disease
Medicine, 12(3), e1001779, 2015. Phenotypes in Primary Care Electronic Health Records
[Uijl et al., 2019] Uijl, A, et al., Risk Factors for Incident by a Machine Learning Approach: A Case Study in
Heart Failure in Age- and Sex-Specific Strata: A Identifying Rheumatoid Arthritis. PLOS ONE, 11(5),
Population-Based Cohort Using Linked Electronic p.e0154515, 2016.
Source Terminology Validation Format Implementation
Biomarker
Flowchart
Temporal
Prognosis
Case note
Aetiology
Negation
Complex
Complex
External
Tabular
Boolean
Genetic
calculation
Boolean
DM+D
Source
ICD10
OPCS
temporal
ICD9
Read
MR
PC
SC
CALIBER
AAA + + + 32 6 6 46 0 + + + + + + +
AD + + + 36 17 7 0 0 + + + + + + +
AF + + + 523 5 0 396 63 + + + + + + + + + + +
Alcohol + 141 0 0 0 0 + + + + + + + +
AMI + + + 43 18 14 2 0 + + + + + + + + + + + +
AU + + 38 6 0 0 15 + + + + + + + + +
Bleeding + + + 131 14 0 17 0 + + + + + + + + + +
BMI + 0 0 0 0 0 + + + +
BP + 67 0 0 0 0 + + + + +
CHD + + + 30 8 9 0 0 + + + + + + +
Dementia NS + + + 36 17 7 0 0 + + + + + + +
Depression + + 152 15 0 0 0 + + + + + +
Deprivation 0 0 0 0 0 +
Diabetes + + 141 4 0 0 0 + + + + + + + + +
Eosinophils + 4 0 0 0 0 + + + + + + + +
Ethnicity + + 104 0 0 0 0 + + + + + +
GCA + + 7 1 0 0 18 + + + + + + + +
Gender + 0 0 0 0 0 +
HCM + + 81 2 0 41 557 + + + + + + + + +
HDL + 4 0 0 0 0 + + + + + + + +
HF + + + 93 6 9 0 0 + + + + + + + + + +
HIV + + + 35 25 0 0 + + + + + +
HR + 0 0 0 0 0 + + + + + +
12
HT + + 84 5 0 2 0 + + + + + + + + +
ICH + + + 17 1 1 0 0 + + + + + + +
Influenza + 62 0 0 0 0 + + +
Isch. Stroke + + + 10 1 2 0 0 + + + + + + +
LDL + 5 0 0 0 0 + + + + + + +
Lymphocytes + 10 0 0 0 0 + + + + + + +
MS + + 10 1 0 0 15 + + + + + + + + + +
Neutrophils + 6 0 0 0 0 + + + + + + +
Obesity + + + 105 1 0 50 0 + + + + +
PAD + + + 201 6 5 71 0 + + + + + + + +
BuP + + 25 6 0 0 286 + + + + + + + + +
PBC + + 4 1 0 0 21 + + + + + + + + +
PMR + + 3 2 0 0 90 + + + + + + + +
Pregnancy + + 1948 0 0 0 0 + + +
Psoriasis + + 82 0 0 0 453 + + + + + + + + +
RA + + 75 18 0 0 72 + + + + + + + + + +
SA + + 181 3 0 67 674 + + + + + + + + +
SAH + + + 11 1 1 0 0 + + + + + + +
SCD + + + 32 6 2 16 0 + + + + + + + + +
Scleroderma + + 5 5 0 0 9 + + + + + + + + + +
Smoking + 21 3 0 0 0 + + + + + +
Stroke NS + + + 17 6 3 1 0 + + + + + + +
TIA + + + 15 2 0 0 + + + + + + +
Triglycerides + 6 0 0 0 0 + + + + + + +
UA + + 12 4 0 0 0 + + + + + + + +
UCD + + + 32 6 2 16 0 + + + + + +
VD + + + 36 17 7 0 0 + + + + + + +
WBC + 16 0 0 0 0 + + + + + + +
UK Biobank6
AD n/a + + n/a 32 9 0 n/a + + + + + + + +
AMI n/a + + n/a 23 17 0 n/a + + + + + +
Asthma n/a + + n/a 6 6 0 n/a + + + + + +
COPD n/a + + n/a 11 4 0 n/a + + + + + +
Dementia NS n/a + + n/a 32 9 0 n/a + + + + + + + +
ESRD n/a + + n/a 18 0 37 n/a + + + + + + + +
FTD n/a + + n/a 32 9 0 n/a + + + + + + + +
ICH n/a + + n/a 32 7 0 n/a + + + + + +
Isch. Stroke n/a + + n/a 32 7 0 n/a + + + + + +
MND n/a + + n/a 1 1 0 n/a + + + + + +
MSA n/a + + n/a 19 3 0 n/a + + + + + + + +
NSTEMI n/a + + n/a 23 17 0 n/a + + + + + +
6
Primary care EHR available for participants in 2019; case-note review validation underway for multiple phenotypes.
13
Parkinsonism n/a + + n/a 19 3 0 n/a + + + + + + + +
PD n/a + + n/a 19 3 0 n/a + + + + + + + +
PSP n/a + + n/a 19 3 0 n/a + + + + + + + +
SAH n/a + + n/a 32 7 0 n/a + + + + + +
STEMI n/a + + n/a 23 17 0 n/a + + + + + +
Stroke NS n/a + + n/a 32 7 0 n/a + + + + + +
VD n/a + + n/a 32 9 0 n/a + + + + + + + +
Table 2. Information on EHR data sources, controlled clinical terminologies, available evidence of algorithm
validation, algorithm representation format and implementation logic patterns from UK Biobank and CALIBER
EHR phenotype algorithms
AAA Abdominal Aortic Aneurysm; AD Alzheimer's Disease; AF Atrial Fibrillation; AMI Acute Myocardial Infarction; AU Autoimmune Uveitis; BMI
Body Mass Index; BP Blood Pressure; BuP Bullous Pemphigoid; CHD Coronary Heart Disease; FTD Frontotemporal dementia; GCA Giant Cell Arteritis;
HCM Hypertrophic Cardiomyopathy; HDL High Density Lipoprotein cholesterol; HF Heart Failure; HIV Human Immunodeficiency Virus; HR Heart Rate;
HT Hypertension; ICH Intracerebral Haemorrhage; LDL Low Density Lipoprotein cholesterol; MS Multiple Sclerosis; NS Not Specified; PAD Peripheral
Arterial Disease; PBC Primary Biliary Cirrhosis; PMR Polymyalgia Rheumatica; RA Rheumatoid Arthritis; SA Stable Angina; SAH Subarachnoid
Haemorrhage; SCD Sudden Cardiac Death; TIA Transient Ischaemic Attack; UA Unstable Angina; UCD Unheralded Coronary Death; VD Vascular
Dementia; WBC White Blood Cell Count; COPD Chronic Obstructive Pulmonary Disease; ESRD End Stage Renal Disease; MND Motor Neuron Disease;
PD Parkinson's Disease and Parkinsonism; MSA Multiple System Atrophy; PSP Progressive Supranuclear Palsy; STEMI ST-Elevation AMI; NSTEMI Non-
ST Elevation AMI
14