Analyzing the heterogeneity of rule-based EHR phenotyping algorithms in CALIBER and the UK Biobank Spiros Denaxas1,2,3, Helen Parkinson2,4, Natalie Fitzpatrick1,2,3, Cathie Sudlow2,5, Harry Hemingway1,2,3 1 Institute of Health Informatics, University College London, UK 2 Health Data Research UK London/Cambridge/Scotland, UK 3 UCL Hospitals Biomedical Research Center, London, UK 4 European Bioinformatics Institute, Cambridge, UK 5 Centre for Medical Informatics, Usher Institute of Population Health Science and Informatics, University of Edinburgh, Edinburgh, UK s.denaxas@ucl.ac.uk, parkinso@ebi.ac.uk, n.fitzpatrick@ucl.ac.uk, Cathie.Sudlow@ed.ac.uk, h.hemingway@ucl.ac.uk Abstract implementation patterns will facilitate the design of a minimum information standard for representing Electronic Health Records (EHR) are data and curating algorithms nationally and generated during routine interactions across internationally. healthcare settings and contain rich, longitudinal information on diagnoses, symptoms, medications, 1 Introduction investigations and tests. A primary use-case for EHR is the creation of phenotyping algorithms In the United Kingdom (UK), structured electronic health used to identify disease status, onset and records (EHR) spanning primary care, hospital care, progression or extraction of information on risk disease/procedure registries and death registries are used to factors or biomarkers. Phenotyping however is create longitudinal disease phenotypes for observational challenging since EHR are collected for different research studies [Hemingway et al., 2018]. Through a purposes, have variable data quality and often process called phenotyping, researchers create algorithms require significant harmonization. While which utilize multiple EHR sources to accurately extract considerable effort goes into the phenotyping information on diseases (e.g. status, onset and progression), process, no consistent methodology for lifestyle risk factors and biomarkers [Banda et al., 2018]. representing algorithms exists in the UK. Creating Phenotyping however is challenging due to the fact that a national repository of curated algorithms can EHR are fragmented, curated using different controlled potentially enable algorithm dissemination and clinical terminologies and collected for purposes other than reuse by the wider community. A critical first step research (e.g. reimbursement, audit) [Morley et al., 2014]. is the creation of a robust minimum information standard for phenotyping algorithm components Phenotyping requires a significant amount of resources and (metadata, implementation logic, validation mix of expertise, yet no common standard approach for evidence) which involves identifying and defining, validating and ultimately sharing EHR reviewing the complexity and heterogeneity of phenotyping algorithms currently exists. In the UK, current UK EHR algorithms. In this study, we structured primary care EHR have been used in >1,800 analyzed all available EHR phenotyping algorithms peer-reviewed studies to date but only 5% of studies (n=70) from two large-scale contemporary EHR published sufficiently reproducible phenotypes [Springate et resources in the UK (CALIBER and UK Biobank). al., 2014]. Defining a standardized format to represent EHR We documented EHR sources, controlled clinical phenotypes will enable portability across data sources (and terminologies, evidence of algorithm validation, healthcare systems) and facilitate the systematic sharing of representation and implementation logic patterns. algorithms across the community [Mo et al. 2015]. Understanding the heterogeneity of UK EHR algorithms and identifying common Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 6 Compared to the United States (US), the UK EHR research lifestyle risk factors from two large-scale contemporary UK landscape differs in two important ways: 1) researchers can research resources: UK Biobank1 and CALIBER2. utilize multiple national EHR sources to create longitudinal ‘cradle to grave’ phenotypes [Kuan et al., 2019], and 2) UK The UK Biobank [Sudlow et al., 2015] is a prospective primary care EHR contain both healthy and unhealthy cohort study of 500,000 (aged 40-69 at recruitment) adults individuals which allow researchers to capture information recruited in England, Scotland and Wales from 2006-2010. on disease severity and progression over time. A recent For each participant, deep phenotypic and genotypic systematic review identified 66 different definitions used to information is available including biomarkers in blood and capture asthma status and exacerbations in research using urine, imaging (brain, heart, abdomen, bone, carotid artery), UK EHR [Al Sallakh et al., 2017] demonstrating significant lifestyle indicators, pathophysiological measurements and existing heterogeneity. While analyses have been genome-wide genotype data. Follow-up for health outcomes undertaken in the US to characterize the heterogeneity of is enabled by hospital EHR (Hospital Episode Statistics phenotyping algorithms [Conway et al., 2011], no such (HES) in England, Patient Episode Data Warehouse in analysis has been carried out in the UK. Wales and Scottish Morbidity Registry in Scotland) and linkages to primary care EHR are underway. CALIBER One of the aims of the newly-established national institute [Denaxas et al., 2012; Denaxas et al., 2019] is a research for health data science, Health Data Research UK (HDR resource consisting of algorithms, tools and methods for UK, www.hdruk.ac.uk), is the creation of a national structured EHR linked across primary care (Clinical Practice Phenomics Resource: an open-access online resource where Research Datalink, CPRD), hospital care (HES) and a EHR phenotypes can be deposited and curated. A critical mortality data (Office for National Statistics, ONS) in the first step in this process is to establish a minimum UK. information standard for representing EHR phenotyping algorithms. This involves exploring and documenting the In the UK, national EHR are recorded using controlled complexity, heterogeneity, design and implementation clinical terminologies where terms are assigned at variable patterns of contemporary phenotyping algorithms in the UK. timepoints i.e. in UK primary care the physician records The concept of a minimum information standard has been terms in real time during the consultation with the patient used successfully in other biomedical disciplines, e.g. whereas in hospital care terms are retrospectively entered Minimum Information About a Microarray Experiment into databases by trained coders and data selected for billing (MIAME) defines standards for reporting microarray purposes. We identified and counted the number of experiments [Brazma et al., 2001]. Establishing a ontology terms each algorithm utilizes from five controlled standardized method for representing phenotypes in the UK clinical terminologies which are widely used in the UK: a) can potentially address these challenges and ensure Read (primary care, subset of SNOMED-CT), b) compatibility with other international initiatives such as International Classification of Diseases 9th and 10th eMERGE and PCORNet [Fleurence et al. 2014; Gottesman Revision (ICD-9, ICD-10, secondary care diagnoses and et al. 2013]. cause of mortality), c) OPCS Classification of Interventions and Procedures (OPCS-4, hospital surgical procedures, 2. Aims analogous to the Current Procedural Terminology ontology Despite the widespread use of UK EHR data sources for used in the United States), and d) the Dictionary of research, contemporary research resources utilize different Medicines and Devices (DM+D) which is used to record approaches for algorithm creation, curation and validation. primary care prescriptions. Terms were automatically The aims of this study were to: a) identify and characterize extracted from documents and counted using regular the structural components, implementation logic and expressions in Python 3.63. We manually extracted and heterogeneity of rule-based algorithms defining diseases, counted terms across five randomly chosen algorithms to lifestyle risk factors and biomarkers in structured national verify the automatically-generated counts. EHR in the UK utilized by contemporary research resources, and b) propose a minimum information standard EHR phenotype validation is a critical process guiding the to represent UK EHR phenotyping algorithms. subsequent use of algorithms and we were interested in what types, if any, of evidence were available to external 3. Methods researchers. We classified the available material into six We identified, downloaded and reviewed published non-overlapping categories which encapsulate all potential phenotyping algorithms for diseases, biomarkers and 1 http://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42 2 https://www.caliberresearch.org/portal/phenotypes 3 https://www.python.org/ 7 approaches for obtaining validity evidence (adapted from Temporal Complex ≥3 high SBP/DBP [Denaxas et al, 2019] and recorded as used/not used): (complex) temporal readings within 1-year • Aetiological: Are the prospective associations with risk rules. OR ≥2 high SBP/DBP factors consistent with previous published evidence multiple readings in a 6-month from both EHR and non-EHR studies? logic layers period • Prognostic: Are the risks of subsequent events Biomarker Evidence Presence of a positive plausible and consistent with existing domain from rheumatoid factor test or knowledge? continuous anti-cyclic citrullinated • Case-note review: What is the positive predictive value measureme peptide antibody test (PPV) and the negative predictive value (NPV) when nt after a rheumatoid comparing the algorithm with clinician-led review of arthritis diagnosis case notes, self-reported information or a suitable “gold Complex Calculation Calculate average BMI standard” source? calculation e.g. unit in consultation, exclude • Cross-EHR-source concordance: To what extent is conversion measurements <10 the phenotype concordant across EHR sources? kg/m2 or >80 kg/m2 • Genetic: Are the observed genetic associations plausible and consistent in terms of magnitude and Table 1: Characteristics of implementation logic, direction of association with associations reported from temporality and algorithmic implementation features non-EHR studies? extracted and analyzed from phenotyping algorithms in • External populations: Has the algorithm been the UK Biobank and the CALIBER resources. AF Atrial evaluated in different countries or external sources? Fibrillation; BMI Body Mass Index; BP Blood Pressure; DBP Diastolic Blood Pressure; DVT Deep Vein For each algorithm, we documented the EHR sources the Thrombosis; PVD Peripheral Vascular Disease; PE phenotype is derived from (i.e. primary care, hospital care, Pulmonary Embolism; HT Hypertension; HF Heart Failure; mortality register). We extracted information on the mmHg millimeter of mercury; SBP Systolic Blood Pressure. representation components of phenotypes e.g. the presence of tabular data and the use of a flowchart (or other graphical 4. Results presentation). We extracted and categorized information on We identified and reviewed 70 EHR phenotyping (Table 2) the different types of implementation logic, temporality and algorithms available from the UK Biobank (n=19) and the algorithm implementation patterns (Table 1), partially based CALIBER resource (n=51). The majority of phenotyping on previous research in the US [Conway et al., 2011]. algorithms were created to ascertain disease status (n=54) Concept Definition Example (e.g. heart failure [Gho et al. 2018; Uijl et al. 2019], Simple Simple PVD diagnosis during a depression [Daskalopoulou et al. 2016]), ten algorithms Boolean Boolean primary care were created to extract information on biomarkers (e.g. statements consultation OR heart rate [Archangelidi et al. 2018], blood pressure e.g. diagnosis of leg or aortic [Rapsomaniki et al. 2014]) and six algorithms were used to “AND”, embolism or thrombosis identify lifestyle risk factors (e.g. alcohol [Bell et al. 2017], “OR” during a hospitalization smoking [Pujades-Rodriguez et al. 2015]). Complex Nested IF patient = diabetic: HT Boolean statements threshold: SBP ≥140 All but one CALIBER phenotyping algorithm (n=50) used with mmHg OR DBP ≥90 information from primary care EHR with the exception of multiple mmHg ELSE: threshold socioeconomic status which was defined using the Index of layers? SBP ≥150 mmHg OR Multiple Deprivation (IMD) provided by the ONS. DBP ≥90 mmHg Algorithms defining biomarker measurements (e.g. white blood cells, heart rate) were based on primary care EHR Negation Are No AF diagnosis term is entirely while approximately half of the algorithms negation present, but the patient ascertaining disease status (n=19 of 35) combined statements record includes a information across all three EHR sources. All currently used? warfarin prescription in available UK Biobank algorithms (n=19) combined the absence of prior information recorded during the baseline assessment (data DVT or PE, or a digoxin not shown), diagnoses and/or surgical procedures recorded prescription but no HF during hospitalization and information based on the Temporal Temporal Iron deficiency anaemia underlying (or secondary) cause of death which is recorded (simple) proximity record in primary care in the national mortality register. Primary care linkages in future or OR hospital AND UK Biobank are still underway and as a result none of the past endoscopy in 30 days 8 currently available algorithms utilized information from Prognostic 86% (n=66) and cross-source concordance 54% primary care EHR. However, primary care information for (n=43) validation approaches where the most widely-used just under half of the cohort (n=230,000) will be made algorithm evaluation approaches. The least-widely used available for UK Biobank researchers in June 2019. validation approach was expert case note review, although Algorithms incorporating primary care data for the this type of validation has been completed for a few UK conditions already covered have been or are being Biobank algorithms, including dementia and its subtypes developed [Wilkinson et al 2019]. Along with a range of [Wilkinson et al, 2019], and is underway for several others. additional algorithms expanding the range of health Most (93% [n=66]) of the algorithms used data stored in outcomes available, they will be available from UK Biobank tabular format since tables are predominantly used to store later in 2019. Overall, based on current publicly available lists of controlled clinical terminology terms. Only 25% information from CALIBER and UK Biobank, 75% (n=66) (n=15) of algorithms included a graphical representation of of algorithms used data from secondary care EHR and 45% the algorithm using a flowchart and all algorithms included (n=49) used information available in the death registry. a textual description of the algorithm components. The most widely-used clinical terminology was Read with 4,729 (non-unique) terms used across all algorithms while 5. Discussion the second highest number of terms was derived from the In this study we downloaded and reviewed 70 EHR DM+D with 2,273 (non-unique) terms used to record phenotyping algorithms from two large-scale, national prescriptions in primary care EHR. Four algorithms (body research resources in the UK. We reviewed algorithms in mass index, socioeconomic deprivation, sex, heart rate) did terms of EHR data sources, controlled clinical terminologies not use any terms across any terminology systems and were used, available evidence of algorithm validation, algorithm based on information which is derived from a structured representation formats and implementation logic patterns. field of the EHR or externally linked such as in the case of IMD. The atrial fibrillation algorithm used the highest Similar to findings from US studies, we discovered that UK number of clinical terms (n=987) while across all algorithms EHR algorithms make extensive use of Boolean statements the pregnancy phenotype used the highest number of Read and temporal logic. When these are used, they are often codes (n=1,948). ICD-9 was the terminology least used: in complex i.e. combining multiple nested Boolean layers of the UK Biobank it is used for recording diagnoses in older logic and defining temporal proximity rules within them. Scottish hospital records and in CALIBER it is used to This is expected given that algorithms utilize multiple record the cause of death prior to 1997. Algorithms defining sources of information and include evidence from primary biomarkers contained the lowest number of terminology care and hospital care (or self-reported information in the terms as they relied on structured data fields combined with case of the UK Biobank). Algorithms defining disease status a small number of diagnosis terms to denote the type of test were the most frequent and complex algorithms reviewed (e.g. Read code “42K..00 Eosinophil count”). and utilized the greatest number of terms from controlled clinical terminologies. Negation was another major With regards to algorithm implementation logic, 66 (93%) component of algorithms and is often used to exclude of algorithms used Boolean statements, usually to identify concomitant diagnoses or procedures when trying to the presence of one or more diagnosis codes in a patient’s ascertain diseases based on secondary information (e.g. EHR. Where Boolean statements were deployed, in nearly ascertaining AF cases based on a prescription of digoxin but half of the cases these were complex and involved either a excluding patients which are diagnosed with HF). series of nested statements or joined information across multiple sources, for example in the UK Biobank where The Read clinical terminology was the most popular information is derived from self-reported, hospital and terminology used with the highest number of terms per mortality sources and events are further stratified as phenotype. These findings are expected as Read contains a ‘prevalent’ (first reported prior to recruitment) or ‘incident’ significant amount of duplication internally due to synonym (first reported after recruitment). A similar pattern of logic terms which can be potentially utilized. Additionally, the was observed with regards to temporality where 66 clinical concepts contained within Read subsume the algorithms utilized temporal rules and almost always this concepts across all other terminologies i.e. Read contains included more complex statements and restrictions. Finally, terms for diagnoses, symptoms, laboratory tests, approximately half (n=43) of the algorithms used negation. prescriptions and procedures. UK primary care clinical Only ten algorithms (16%) included more complex coding is currently transitioning to SNOMED-CT which calculations, usually to calculate the mean of multiple should provide a more streamlined set of terms to be used. measurements on the same day or to harmonize units for laboratory measurements to a common format. In terms of validation, we observed a significant level of heterogeneity with approaches seeking to evaluate and replicate previously reported aetiological and prognostic 9 estimates from non-EHR studies being the most popular. implementation logic, validation evidence and use-cases. The presentation of the evidence however does not follow a We suggest the following components towards establishing common standard and sometimes only included references a minimum information standard with regards to rule-based to published research rather than a more structured abstract phenotyping algorithms for UK EHR: of the main findings of the analyses. In contrast with the US, expert review of case records was the least frequently Part 1 – Algorithm metadata: Succinct information about used approach for evaluation due to the fact that large scale the goal of the algorithm, the intended use-case, the data corpuses of medical text do not exist in the UK owing to sources and controlled clinical terminologies used, information governance restrictions and the technical applicable age groups and genders, list of authors and their challenges of integrating such data since they are held in a contact details and a set of SNOMED-CT terms to classify wide range of formats by multiple different NHS the algorithm. A unique identifier, such as a Digital Object organisations. For similar reasons, none of the algorithms Identifier (DOI), should be minted to enable usage tracking reviewed utilize medical text and natural language in subsequent research. processing approaches to extract information from medical notes which is prevalent in some clinical specialties such as Part 2 – Implementation: Details on the implementation mental health [Wu et al. 2018]. logic of the algorithm with pseudocode to facilitate the translation to machine code and documentation on decisions Significant heterogeneity was also observed in terms of made and reasoning. Where possible analytical scripts representation. UK Biobank algorithms were curated in should be attached using markdown or a similar approach. individual PDF files4 and included extended information on The standard should support defining complex Boolean and the goal of the algorithm and useful background knowledge temporal logic across multiple EHR sources and clinical and references. In contrast, CALIBER phenotypes were terminologies. In the future, a computable phenotype format stored in an online, openly-available Portal5, spanned should encapsulate this information as a stand-alone file. multiple pages and did not include much background information. Flowcharts or similar graphical representations Part 3 – Validation evidence: Description of the steps were not widely-used and while they are not machine- taken to support phenotype validity across six categories readable, they can potentially minimize errors during (aetiological, prognostic, genetic, expert review, cross- translation of the algorithm to machine code. source and external population). For each implementation, the number of cases, controls, NPV and PPV values should Our study has potential limitations. We reviewed algorithms be reported and the format should support the embedding of from only two UK sources. While other UK initiatives exist, graphical files (e.g. forest plots). they tend to focus on curating lists of controlled clinical terminology terms (referred to as codelists) rather than self- Part 4 – Use-cases: Links to published research utilizing contained phenotypes i.e. terms, implementation, validation the phenotype algorithms, cross-referenced with DOI’s. evidence. We only focused on rule-based approaches and did not cover machine learning approaches. While rule- 7. Conclusion based methods are the most widely used in the UK, data- driven high-throughput approaches including natural Our analyses identified a certain level of underlying language processing methods are emerging [Zhou et al., homogeneity in terms of how phenotyping algorithms are 2016, Pikoula et al., 2019]. These approaches pose different defined and evaluated. We suggest four components challenges and their requirements would need to be towards a minimum information standard that should be documented and analysed in order to ensure their integration used to represent phenotyping algorithms. These findings [Hripcsak & Albers 2013]. Finally, reproducible research provide a crucial first step towards curating and approaches [Denaxas et al., 2017, Goodman et al, 2016] disseminating phenotyping algorithms utilizing UK EHR. which are covered elsewhere would also need to be Further work is required towards establishing a computable carefully taken into consideration in order to ensure format for phenotyping algorithms and ensuring algorithm portability. interoperability with other resources (e.g. PheKB). 6. Steps towards a minimum information standard Acknowledgments Based on our findings, we propose that an EHR This work was supported by Health Data Research UK, phenotyping algorithm representation combines metadata, which receives its funding from HDR UK Ltd (LOND1) funded by the UK Medical Research Council, Engineering 4 http://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42 and Physical Sciences Research Council, Economic and 5 https://www.caliberresearch.org/portal Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health 10 and Social Care Research and Development Division [Denaxas et al., 2019] Denaxas, S., et al. UK phenomics (Welsh Government), Public Health Agency (Northern platform for developing and validating EHR phenotypes: Ireland), British Heart Foundation and the Wellcome Trust. CALIBER. J Am Med Inf 10.1093/jamia/ocz105, 2019. The BigData@Heart Consortium is funded by the [Denaxas et al. 2017] Denaxas, S. et al., Methods for Innovative Medicines Initiative-2 Joint Undertaking under enhancing the reproducibility of biomedical research grant agreement No. 116074. This study was supported by findings using electronic health records. BioData the Farr Institute of Health Informatics Research at UCL Mining, 10 (31), 2017. Partners (MR/K006584/1). This paper represents independent research part funded by the National Institute [Fleurence et al., 2014] Fleurence, R., et al. Launching for Health Research Biomedical Research Centre at UCLH. PCORnet, a National Patient-Centered Clinical Research HH is a NIHR Senior Investigator. SD is an Alan Turing Network. JAMIA 21 (4): 578–82, 2014. Fellow. [Gho et al. 2018] Gho, J. et al. An Electronic Health Records Cohort Study on Heart Failure Following References Myocardial Infarction in England: Incidence and [Al Sallakh et al. 2017] Al Sallakh, M. A., et al. Defining Predictors. BMJ Open 8 (3): e018331., 2018. asthma and assessing asthma outcomes using electronic [Goodman et al., 2016] Goodman, S.N., et al. What does health record data: a systematic scoping review. Eur. research reproducibility mean? Science Translational Respiratory J., 49(6), 2017. Medicine, 8(341), p.341ps12., 2016. [Archangelidi et al., 2018] Archangelidi, O., et al. Clinically [Gottesman et al., 2013] Gottesman, O., et al. “The Recorded Heart Rate and Incidence of 12 Coronary, Electronic Medical Records and Genomics (eMERGE) Cardiac, Cerebrovascular and Peripheral Arterial Network: Past, Present, and Future.” Genetics in Diseases in 233,970 Men and Women: A Linked Medicine 15 (10): 761–71, 2013. Electronic Health Record Study. Eur. J. of Preventive [Hemingway et al., 2018] Hemingway, H., et al. Big data Cardiology 25 (14): 1485–95, 2018. from electronic health records for early and late [Banda et al., 2018] Banda, J. M., et al. Advances in translational cardiovascular research: challenges and Electronic Phenotyping: From Rule-Based Definitions to potential. European Heart J., 39(16), 1481–1495, 2018 Machine Learning Models. Annual Review of [Hripcsak & Albers, 2013] Hripcsak, G. & Albers, D.J. Biomedical Data Science 2018. Next-generation phenotyping of electronic health [Bell et al., 2017] Bell, S, et al. Association between records. JAMIA, 20(1), 117–121, 2013. Clinically Recorded Alcohol Consumption and Initial [Kuan et al., 2019] Kuan, V. et al. A chronological map of Presentation of 12 Cardiovascular Diseases: Population 308 physical and mental health conditions from 4 million Based Cohort Study Using Linked Health Records. individuals in the English National Health Service. The BMJ 356: j909, 2017. Lancet Digital Health 1(2), e63-e67. 2019. Brazma et al., 2001] Brazma, A. et al., Minimum [Mo et al., 2015] Mo, H., et al., Desiderata for Computable information about a microarray experiment (MIAME)- Representations of Electronic Health Records-Driven toward standards for microarray data. Nature Genetics, Phenotype Algorithms, JAMIA 22 (6): 1220–30., 2015. 29(4), 365–371. 2001. [Morley et al., 2014] Morley, K. et al., Defining disease [Conway et al., 2011] Conway, M., et al. Analyzing the phenotypes using national linked electronic health heterogeneity and complexity of Electronic Health records: a case study of atrial fibrillation. PLOS ONE, Record oriented phenotyping algorithms. Proc. Am Med 9(11), e110900, 2014. Infor Assoc., 274–283, 2011 [Pikoula et al., 2019] Pikoula, M. et al., Identifying [Daskalopoulou et al., 2016] Daskalopoulou, M. et al., clinically important COPD sub-types using data-driven Depression as a Risk Factor for the Initial Presentation approaches in primary care population based electronic of Twelve Cardiac, Cerebrovascular, and Peripheral health records. BMC Medical Informatics and Decision Arterial Diseases: Data Linkage Study of 1.9 Million Making, 19(1), p.86, 2019. Women and Men. PLOS ONE 11 (4): e0153838, 2016. [Pujades-Rodriguez et al.,2015] Pujades-Rodriguez, M. et [Denaxas et al, 2012] Denaxas, S. et al. Data resource al., Heterogeneous Associations between Smoking and a profile: cardiovascular disease research using linked Wide Range of Initial Presentations of Cardiovascular bespoke studies and electronic health records Disease in 1937360 People in England: Lifetime Risks (CALIBER). Int. J. Epidemiology, 41(6), 1625–1638, and Implications for Risk Prediction. Int. J. of 2012. Epidemiology 44 (1): 129–41, 2015. 11 [Rapsomaniki et al., 2014] Rapsomaniki, E. et al. Blood Health Records. Eur. J. Heart Failure., Pressure and Incidence of Twelve Cardiovascular 10.1002/ejhf.1350, 2019. Diseases: Lifetime Risks, Healthy Life-Years Lost, and [Wilkinson et al., 2019] Wilkinson T. et al., Identifying Age-Specific Associations in 1·25 Million People. The dementia outcomes in UK Biobank: a validation study of Lancet 383 (9932): 1899–1911, 2014. primary care, hospital admissions and mortality data. [Springate et al., 2014] Springate, D. et al., ClinicalCodes: Eur Jour Epidemiology. 10.1007/s10654-019-00499-1. an online clinical codes repository to improve the 2019. validity and reproducibility of research using electronic [Wu et al., 2018] Wu, H. et al., SemEHR: A general- medical records. PLOS ONE, 9(6), e99825, 2014. purpose semantic search system to surface semantic data [Sudlow et al., 2015] Sudlow, C., et al., UK Biobank: an from clinical notes for tailored care, trial recruitment, open access resource for identifying the causes of a wide and clinical research. JAMIA, 25(5), 530–537., 2018. range of complex diseases of middle and old age. PLOS [Zhou et al,, 2016] Zhou, S.-M. et al., Defining Disease Medicine, 12(3), e1001779, 2015. Phenotypes in Primary Care Electronic Health Records [Uijl et al., 2019] Uijl, A, et al., Risk Factors for Incident by a Machine Learning Approach: A Case Study in Heart Failure in Age- and Sex-Specific Strata: A Identifying Rheumatoid Arthritis. PLOS ONE, 11(5), Population-Based Cohort Using Linked Electronic p.e0154515, 2016. Source Terminology Validation Format Implementation Biomarker Flowchart Temporal Prognosis Case note Aetiology Negation Complex Complex External Tabular Boolean Genetic calculation Boolean DM+D Source ICD10 OPCS temporal ICD9 Read MR PC SC CALIBER AAA + + + 32 6 6 46 0 + + + + + + + AD + + + 36 17 7 0 0 + + + + + + + AF + + + 523 5 0 396 63 + + + + + + + + + + + Alcohol + 141 0 0 0 0 + + + + + + + + AMI + + + 43 18 14 2 0 + + + + + + + + + + + + AU + + 38 6 0 0 15 + + + + + + + + + Bleeding + + + 131 14 0 17 0 + + + + + + + + + + BMI + 0 0 0 0 0 + + + + BP + 67 0 0 0 0 + + + + + CHD + + + 30 8 9 0 0 + + + + + + + Dementia NS + + + 36 17 7 0 0 + + + + + + + Depression + + 152 15 0 0 0 + + + + + + Deprivation 0 0 0 0 0 + Diabetes + + 141 4 0 0 0 + + + + + + + + + Eosinophils + 4 0 0 0 0 + + + + + + + + Ethnicity + + 104 0 0 0 0 + + + + + + GCA + + 7 1 0 0 18 + + + + + + + + Gender + 0 0 0 0 0 + HCM + + 81 2 0 41 557 + + + + + + + + + HDL + 4 0 0 0 0 + + + + + + + + HF + + + 93 6 9 0 0 + + + + + + + + + + HIV + + + 35 25 0 0 + + + + + + HR + 0 0 0 0 0 + + + + + + 12 HT + + 84 5 0 2 0 + + + + + + + + + ICH + + + 17 1 1 0 0 + + + + + + + Influenza + 62 0 0 0 0 + + + Isch. Stroke + + + 10 1 2 0 0 + + + + + + + LDL + 5 0 0 0 0 + + + + + + + Lymphocytes + 10 0 0 0 0 + + + + + + + MS + + 10 1 0 0 15 + + + + + + + + + + Neutrophils + 6 0 0 0 0 + + + + + + + Obesity + + + 105 1 0 50 0 + + + + + PAD + + + 201 6 5 71 0 + + + + + + + + BuP + + 25 6 0 0 286 + + + + + + + + + PBC + + 4 1 0 0 21 + + + + + + + + + PMR + + 3 2 0 0 90 + + + + + + + + Pregnancy + + 1948 0 0 0 0 + + + Psoriasis + + 82 0 0 0 453 + + + + + + + + + RA + + 75 18 0 0 72 + + + + + + + + + + SA + + 181 3 0 67 674 + + + + + + + + + SAH + + + 11 1 1 0 0 + + + + + + + SCD + + + 32 6 2 16 0 + + + + + + + + + Scleroderma + + 5 5 0 0 9 + + + + + + + + + + Smoking + 21 3 0 0 0 + + + + + + Stroke NS + + + 17 6 3 1 0 + + + + + + + TIA + + + 15 2 0 0 + + + + + + + Triglycerides + 6 0 0 0 0 + + + + + + + UA + + 12 4 0 0 0 + + + + + + + + UCD + + + 32 6 2 16 0 + + + + + + VD + + + 36 17 7 0 0 + + + + + + + WBC + 16 0 0 0 0 + + + + + + + UK Biobank6 AD n/a + + n/a 32 9 0 n/a + + + + + + + + AMI n/a + + n/a 23 17 0 n/a + + + + + + Asthma n/a + + n/a 6 6 0 n/a + + + + + + COPD n/a + + n/a 11 4 0 n/a + + + + + + Dementia NS n/a + + n/a 32 9 0 n/a + + + + + + + + ESRD n/a + + n/a 18 0 37 n/a + + + + + + + + FTD n/a + + n/a 32 9 0 n/a + + + + + + + + ICH n/a + + n/a 32 7 0 n/a + + + + + + Isch. Stroke n/a + + n/a 32 7 0 n/a + + + + + + MND n/a + + n/a 1 1 0 n/a + + + + + + MSA n/a + + n/a 19 3 0 n/a + + + + + + + + NSTEMI n/a + + n/a 23 17 0 n/a + + + + + + 6 Primary care EHR available for participants in 2019; case-note review validation underway for multiple phenotypes. 13 Parkinsonism n/a + + n/a 19 3 0 n/a + + + + + + + + PD n/a + + n/a 19 3 0 n/a + + + + + + + + PSP n/a + + n/a 19 3 0 n/a + + + + + + + + SAH n/a + + n/a 32 7 0 n/a + + + + + + STEMI n/a + + n/a 23 17 0 n/a + + + + + + Stroke NS n/a + + n/a 32 7 0 n/a + + + + + + VD n/a + + n/a 32 9 0 n/a + + + + + + + + Table 2. Information on EHR data sources, controlled clinical terminologies, available evidence of algorithm validation, algorithm representation format and implementation logic patterns from UK Biobank and CALIBER EHR phenotype algorithms AAA Abdominal Aortic Aneurysm; AD Alzheimer's Disease; AF Atrial Fibrillation; AMI Acute Myocardial Infarction; AU Autoimmune Uveitis; BMI Body Mass Index; BP Blood Pressure; BuP Bullous Pemphigoid; CHD Coronary Heart Disease; FTD Frontotemporal dementia; GCA Giant Cell Arteritis; HCM Hypertrophic Cardiomyopathy; HDL High Density Lipoprotein cholesterol; HF Heart Failure; HIV Human Immunodeficiency Virus; HR Heart Rate; HT Hypertension; ICH Intracerebral Haemorrhage; LDL Low Density Lipoprotein cholesterol; MS Multiple Sclerosis; NS Not Specified; PAD Peripheral Arterial Disease; PBC Primary Biliary Cirrhosis; PMR Polymyalgia Rheumatica; RA Rheumatoid Arthritis; SA Stable Angina; SAH Subarachnoid Haemorrhage; SCD Sudden Cardiac Death; TIA Transient Ischaemic Attack; UA Unstable Angina; UCD Unheralded Coronary Death; VD Vascular Dementia; WBC White Blood Cell Count; COPD Chronic Obstructive Pulmonary Disease; ESRD End Stage Renal Disease; MND Motor Neuron Disease; PD Parkinson's Disease and Parkinsonism; MSA Multiple System Atrophy; PSP Progressive Supranuclear Palsy; STEMI ST-Elevation AMI; NSTEMI Non- ST Elevation AMI 14