Enhancing the Human Phenotype Ontology for Use by the Layperson Nicole A. Vasilevsky, Mark E. Engelstad, Erin D. Christopher J Mungall Foster, Melissa A. Haendel Environmental Genomes and Systems Biology, Ontology Development Group, Library, Lawrence Berkeley National Laboratory Oregon Health & Science University Berkeley, CA, USA Portland, OR, USA vasilevs@ohsu.edu Peter Robinson, Sebastian Köhler Charité - Universitätsmedizin Berlin Berlin, Germany sebastian.koehler@charite.de Abstract—In rare or undiagnosed diseases, physicians rely upon been evaluated by multiple clinicians. In fact, the only person genotype and phenotype information in order to compare who may have all of the information about a patient’s abnormalities to other known cases and to inform diagnoses. Patients phenotype is the patient him/herself. A few remarkable stories are often the best sources of information about their symptoms and exist highlighting cases where patients’ phenotyping and phenotypes. The Human Phenotype Ontology (HPO) contains over investigations have led to a diagnosis, such as for NGLY1 [3], 12,000 terms describing abnormal human phenotypes. However, the or Jill Viles [4] who despite skepticism from her doctors, labels and synonyms in the HPO primarily use medical terminology, managed to not only diagnose herself but also to reveal which can be difficult for patients and their families to understand. In fundamental biology of the Lamin protein. While these order to make the HPO more accessible to non-medical experts, we particular cases are exceptional, many patients could further systematically added new synonyms using non-expert terminology their own diagnoses with improved phenotyping. (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 In order to maximize the usefulness of accurate classes with layperson synonyms. phenotyping for clinical diagnosis, and to build cohorts of patients for gene discovery, a standard vocabulary is essential. Keywords—Human Phenotype Ontology, Synonyms, Rare The use of a standardized vocabulary can ensure proper Disease, Patient phenotypes understanding of terminology across different users, such as patients and healthcare professionals. Therefore, using a I. INTRODUCTION controlled vocabulary that provides synonyms and definitions Every person has a unique collection of phenotypes, or for the medical terminology is valuable. To this end, the physical and physiological characteristics or traits. Diseases Human Phenotype Ontology (HPO) (http://www.human- can be characterized by symptoms and abnormal phenotypes phenotype-ontology.org/) was developed for describing and many diseases are caused by underlying genetic phenotypic abnormalities encountered in human disease to variations. Use of genetic analyses like whole genome facilitate “deep phenotyping”, whereby symptoms and sequencing can help inform disease diagnosis, as well as characteristic phenotypic findings (a phenotypic profile) are analysis of the corresponding patient phenotypes. However, captured using a logically constructed hierarchy of phenotypic although the cost and ease of collecting and analyzing terms [5]. genomic data has improved rapidly [1], collecting the phenotypic data has not become more standardized, In a clinical setting, these phenotypes are defined using convenient, or less expensive [2], limiting algorithmic medical terminology, which can be difficult for patients to approaches. Thus a major challenge in clinical care and understand. The terminology gap between medical research aimed at understanding genetic diseases is professionals and non-medical experts has long been phenotyping patients accurately, yet efficiently. recognized in many areas of medical practice. The degree to which patients understand the terminology used in medical This is a particular challenge for patients with rare or encounters has been evaluated through various methods and undiagnosed diseases. In these cases, the patients themselves across different disciplines [6-9]. This research has are a valuable resource and may be the best source of consistently acknowledged and expressed the importance of phenotyping information on their condition. Not only do making the terminology used in medical encounters more patients live with their condition, but they often have a wealth accessible to patients. Numerous organizations have of knowledge about their condition, especially those who have developed term lists that align medical terms with lay language as well as provide guidance on communicating with Recognizing that patients are experts in their medical the public about health issues [10-13]. Additionally, there are history and at keeping track of their genetic information, information resources, such as MedlinePlus, that provide GenomeConnect access to curated, quality health information on a variety of (https://connect.patientcrossroads.org/?org=GenomeConnect) topics in patient-friendly language [14]. was conceived by ClinGen (Clinical Genome Resource, http://clinicalgenome.org/), a NIH-funded resource of clinical While these resources increase accessibility and and laboratory geneticists and genetic counselors at over 24 comprehension of medical terminology for health consumers, institutions, as a registry to empower patients to help other structured vocabularies have been developed to enable researchers and clinicians understand the genetic contributions cross communication, and comprehension, between non- to health and disease. GenomeConnect was built on the specialists and medical professionals. These “consumer health premise that: “As the utility of genetic and genomic testing in vocabularies”, or CHVs, provide patient-friendly terms that healthcare grows, there is need for a high-quality genomic are often mapped (or aligned) to established medical knowledge base to improve the clinical interpretation of terminologies [15-17]. For example, the Unified Medical genomic variants. Active patient engagement can enhance Language System (UMLS) aims to include lay terms as communication between clinicians, patients, and researchers, synonyms or quasi-synonyms in their Metathesaurus, through contributing to knowledge building. It also encourages data various efforts (quasi synonyms are terms that are not sharing by patients and increases the data available for precisely the same) [15]. To this end, the UMLS clinicians to incorporate into individualized patient care, Metathesaurus was enhanced with the Dictionary of American clinical laboratories to utilize in test interpretation, and Regional English extension to map consumer terms for investigators to use for research” [22]. To this end, diabetes to medical terms [14]. These vocabularies are GenomeConnect developed a self-phenotyping survey that generally broad, containing layperson equivalents for clinical generates HPO phenotype profiles. Patients use findings as well as medical procedures and equipment. GenomeConnect to enter their information for researchers and Mapping to standardized terminologies promotes clinicians to use, facilitating the diagnostic evaluation as well interoperability between disparate sources of health as research. Not only may “self-phenotyping” be an accurate information as well as enables development of informatics and comprehensive source of data on patients, it also tools that assist patients with aspects of their medical care, empowers patients, which may be particularly beneficial to the such as filling out family histories [18]. undiagnosed disease population. The terms for CHVs are frequently sourced from online To make the HPO more accessible to patients in forums and patient-friendly websites focused on health GenomeConnect and other patient registries, we aimed to add information and medical conditions. An example of these non-expert terminology the HPO in the form of synonyms for types of online forums are patient registries. A patient registry phenotype classes, as patients are often unfamiliar with is a researcher-generated platforms that are “an organized technical terminology or may misinterpret meanings without a system that uses observational study methods to collect proper definition or explanation. Similarly, health care uniform data (clinical and other) to evaluate specified providers may be unfamiliar with the colloquial expressions. outcomes for a population defined by a particular disease, The goal of this project was to systematically review the condition, or exposure, and that serves a predetermined current terminology in the Human Phenotype Ontology and to scientific, clinical, or policy purpose(s)” [19]. Patient registries 1) apply lay synonyms to current classes and 2) to tag existing are valuable resources for patients who share or are affected classes as layperson where applicable. This resulted in the by a disease to learn more about their disease and connect with addition of 6,240 synonyms or primary labels marked as community members. Inspire (http://corp.inspire.com/) is a layperson. The layperson classes are available in the current platform for patients to engage and share amongst disease- release of HPO, and 44% of synonyms are classified as specific communities. With patient permission, Inspire layperson. Addition of the layperson synonyms to the HPO promotes primary and secondary research and analyses based will increase accessibility for patients to use the HPO, enhance on community contributions. PatientsLikeMe interoperability for clinicians, and enable crowdsourcing by (https://www.patientslikeme.com/) is a health data sharing citizen scientists. platform where patients can share information and connect. In addition, these platforms capture how patients refer to their diseases and symptoms, which is how these forums most directly contribute to the development of consumer health vocabularies [20]. These online platforms can also reveal the developing health literacy of patients, particularly in regards to their specific conditions [21]. Depending on the condition and timeframe, health consumers can become quite proficient in understanding and using medical terminology as it pertains to their particular condition or disease. In many ways, patients can become adept in recognizing and applying medical terminology to symptoms or other aspects of their condition over time. II. METHODS While an initial review of the HPO OWL file was done in Protégé, to expedite the process and make it easier to evaluate patterns in the labels, the entirety of the HPO was downloaded to a collaborative spreadsheet and manually evaluated by members of the HPO development team. The work was divided amongst curators with clinical and biomedical expertise who cross-reviewed each other’s work. Synonyms in the HPO are classified as exact, broad, narrow or related. Exact synonyms are precise alternatives to the HPO term, broad synonyms are more general than the HPO term, narrow synonyms are more specific than HPO term, and related synonyms are associated with the HPO term. In order to find appropriate synonyms, several methods were Figure 1: Patients and medical providers can search HPO for used. First we checked online knowledge bases such as phenotypes of medical conditions such as Apert’s syndrome Wikipedia, MedlinePlus, Mayo Clinic using layperson or medical terms. Apert’s syndrome image is (http://www.mayoclinic.org/), Online Mendelian Inheritance provided for illustration only and credits are available from of Man (OMIM, http://www.omim.org/)and the Elements of monarchinitiative.org. Morphology (https://elementsofmorphology.nih.gov/). Next we referred to other ontologies, terminologies, and texts such As a result of this effort, the HPO now contains a total of as Uberon (for anatomical site synonyms), SNOMED CT 14,253 synonyms for all of the existing classes. Of these browsers (e.g., IHTSDO), and specialty medical texts like synonyms, 6,240 are marked as lay synonyms (Table I). Gorlin’s Syndromes of the Head and Neck, or other similar Synonyms were either added to existing classes, or exisitng sources [23]. We made attempts to reuse synonym sub-strings classes were tagged as layperson. New synonyms were typed for similar terms, such as layperson terms for terms such as either as exact, broad, related or narrow. The final numbers of ‘absent’ for classes using the quality aplasia (PATO_0001483) each type are reported in Table I. or for anatomical classes, for example the synonym for Table I: Layperson synonyms in HPO ‘tailbone’ was added to all classes using ‘coccyx’ (UBERON_0001350). Layperson All synonyms in synonyms marked The terms were scripted into the HPO OWL file. Automated HPO as lay quality checks on the ontology were performed, such as All synonyms 14253 6240 checking for classes with the same label or exact synonym; character encoding; and formatting in title-case. We integrated Exact 12167 5357 these into our workflow using Travis CI (https://travis-ci.org/). Broad 441 298 The curation team also performed an exhaustive manual Related 1236 419 review for consistency across the hierarchy, and checked for errors or inconsistencies. The file is available at: Narrow 409 166 http://www.human-phenotype-ontology.org (under Downloads). Information content of HPO classes We aimed to understand the impact of adding lay III. OUTCOMES synonyms to the HPO if they were to be used for disease The inclusion of these plain language synonyms will diagnostics or patient-led cohort discovery. To this end, we support patient-driven applications for deep phenotyping that performed an evaluation of the information content (IC) can be utilized clinically and computationally, as depicted in content for HPO classes that were tagged as layperson or those Figure 1. that contain layperson synonyms. Mathematically this is expressed as the negative logarithm of the frequency with which the class is used to describe a disease, i.e. more general classes (such as ‘Abnormality of the nervous system’) have a low IC and very specific classes (such as ‘Spinal cord posterior columns myelin loss’) have high IC. Figure 2 shows the distribution of the IC for the HPO classes with a label or synonym marked as layperson. The analysis shows that major fraction of layperson synonyms were added to very specific Another challenge was ensuring that the application of a HPO classes. This could substantially help in the differential layperson synonym aligned with the definition of the assigned diagnostic process for HPO users. This is due to the fact that HPO class. For example, colorblindness could be broadly used searching and identifying diseases with specific HPO classes to describe many classes such as HP_0007641 is now easier in case users do not know the specific medical ‘Dyschromatopsia’ or HP_0007803 ‘Monochromacy’, but terms. there are specific differences between these two classes, with HPO classes with layperson synonym(s) dyschromatopsia being defined as ‘A form of colorblindness in which only two of the three fundamental colors can be 1800 1600 distinguished due to a lack of one of the retinal cone pigments’ 1400 and monochromacy defined as ‘Complete color blindness, a complete inability to distinguish colors. Affected persons 1200 cannot perceive colors, but only shades of gray’. These two 1000 classes were therefore given more specific layperson 800 synonyms, ‘colorblindness’ and ‘total colorblindness’, Count (Square root scale) 600 respectively. The subclasses of dyschromatopsia were assigned more specific layperson synonyms as well, such as 400 HP_0011521 Deuteranopia, layperson synonym: Green-blind, and HP_0011522 Protanopia, layperson synonym: Red-blind, 200 even though these classes may be more broadly referred to as colorblindness. It was also necessary to recognize the relationships within the ontology and applying proper consistency across classes/sub- classes when adding layperson synonyms. For example, the layperson synonym, ‘Yellowing of the skin’, was added to the 0.0 2.5 5.0 Information content of HPO class 7.5 10.0 HPO class, ‘Jaundice’. In order to maintain consistency in the application of layperson synonyms, ‘Yellowing of the skin’ also needed to be added to sub-classes, ‘Intermittent jaundice’ Figure 2: The distribution the HPO classes with layperson and ‘Prolonged neonatal jaundice’. synonyms. The average IC of these classes is 7.4. IV. CHALLENGES V. NEXT STEPS The process of adding layperson synonyms gave rise to A next step is to develop a method of validating the added several challenges. layperson synonyms in order to determine whether or not they are reflective of terms actually used and recognized by Layperson terms were not added to all the HPO classes. As patients and clinicians alike. This will be done by the HPO exemplified in Figure 2, some HPO classes already used development team and via a crowd sourcing approach. We layperson terminology, so they were tagged as layperson, and will encourage crowd sourcing for requests for additional an additional layperson synonym was not added. In some layperson synonyms, as well as validating the existing cases, a layperson term simply does not exist; for example, it layperson terms. Validation would also assist with determining is difficult to describe a joint contracture using non-medical which layperson synonym is marked as ‘primary’ within the terminology. In some instances, the layperson version of an HPO, so that a lay version of the HPO can be used in software HPO class might be the literal definition in the HPO, which applications and surveys geared towards patients. we tried to avoid. For example, the term ‘Vasculitis’ (HP_0002633) is defined as ‘Inflammation of blood vessel’, which would be a likely addition as a layperson synonym. In VI. CONCLUSIONS adding synonyms, questions emerged as to whether or not The addition of layperson synonyms increases the usability of certain synonyms were useful to add. An example is the bones the HPO, making it useful for data interoperability across in the body - many of these have assigned names (e.g., radius, clinicians and patients. Additionally, this work will enable coccyx). In some instances, as with coccyx, ‘tailbone’ has crowdsourcing by citizen scientists. The layperson synonyms emerged as a widely used synonym; however, in other cases, a are available in the current release of the HPO and are potential synonym not only strongly resembles the definition available at www.purl.obolibrary.org/obo/hp.owl. of the term (like using ‘short bone in forearm’ as a synonym Additionally, community contributions are welcome by for ‘radius’), it also may not be a term widely used amongst submitting to our issues tracker: laypeople or clinicians. In the case of radius and ulna, these https://github.com/obophenotype/human-phenotype-ontology. are both forearm bones, but there is not a way to differentiate them in layperson terminology. ACKNOWLEDGMENT This work is supported by NIH Office of Director grant: [22] Kirkpatrick BE, Riggs ER, Azzariti DR, et al. GenomeConnect: matchmaking between patients, clinical laboratories, and researchers to 1R24OD011883. Thank you to Tudor Groza and Julie improve genomic knowledge. Hum Mutat. 2015;;36(10):974-978. doi: McMurry for their help. Apert’s syndrome image credits 910.1002/humu.22838. Epub 22015 Aug 22836. available from monarchinitiative.org [23] Raoul Hennekam, Judith Allanson, Ian Krantz. Gorlin's Syndromes of the Head and Neck. Oxford University Press; 5 edition (February 5, 2010) REFERENCES [1] Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB. The real cost of sequencing: higher than you think! Genome Biol. 2011;;12(8):125. doi: 110.1186/gb-2011-1112-1188-1125. [2] Hunter L. Computational challenges of mass phenotyping. Pac Symp Biocomput. 2013:454-455. [3] Might M, Wilsey M. The shifting model in clinical diagnostics: how next-generation sequencing and families are altering the way rare diseases are discovered, studied, and treated. Genet Med. 2014; 16(10):736-737. doi: 710.1038/gim.2014.1023. Epub 2014 Mar 1020. [4] Epstein D. The DIY Scientist, the Olympian, and the Mutated Gene. 2016; https://www.propublica.org/article/muscular-dystrophy-patient- olympic-medalist-same-genetic-mutation. Accessed January 17, 2016, 2016. [5] Kohler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;;42(Database issue):D966-974. doi: 910.1093/nar/gkt1026. Epub 2013 Nov 1011. [6] Spiro, D. and F. Heidrich (1983). "Lay understanding of medical terminology." Journal of Family Practice 17(2): 277-279. [7] Pieterse, A. H., et al. (2013). "Lay understanding of common medical terminology in oncology." Psycho-Oncology 22(5): 1186-1191. [8] Chapman, K., et al. (2003). "Lay understanding of terms used in cancer consultations." Psycho-Oncology 12(6): 557-566. [9] Barker, K. L., et al. (2009). "Divided by a lack of common language? A qualitative study exploring the use of language by health professionals treating back pain." BMC Musculoskeletal Disorders 10: 123. [10] http://www.cdc.gov/other/pdf/everydaywordsforpublichealthcommunicat ion_final_11-5-15.pdf [11] http://www.portland.va.gov/research/documents/hrpp/glossary-of-lay- terms.pdf [12] http://hso.research.uiowa.edu/medical-terms-lay-language [13] https://humansubjects.stanford.edu/new/docs/glossary_definitions/lay_la nguage.pdf [14] Miller, N., et al. (2000). "MEDLINEplus: building and maintaining the National Library of Medicine's consumer health Web service." Bulletin of the Medical Library Association 88(1): 11-17. [15] Tse T, Soergel D. Exploring Medical Expressions Used by Consumers and the Media: An Emerging View of Consumer Health Vocabularies. AMIA Annual Symposium Proceedings. 2003;2003:674-678. [16] Patrick TB, Monga HK, Sievert MC, Hall JH, Longo DR. Evaluation of Controlled Vocabulary Resources for Development of a Consumer Entry Vocabulary for Diabetes. Journal of Medical Internet Research. 2001;3(3):e24. doi:10.2196/jmir.3.3.e24. [17] Seedorff, M., et al. (2013). Incorporating expert terminology and disease risk factors into consumer health vocabularies. Pacific Symposium on Biocomputing: 421-432. [18] Hulse, N. C., et al. (2010). Deriving consumer-facing disease concepts for family health histories using multi-source sampling. J Biomed Inform 43(5): 716-724. [19] Gliklich R, Dreyer N. Registries for Evaluating Patient Outcomes: A User’s Guide. Rockville, MD: Agency for Healthcare Research and Quality; 2010. AHRQ Publication No. 10-EHC049. [20] Smith, C. A. and P. J. Wicks (2008). PatientsLikeMe: Consumer health vocabulary as a folksonomy. AMIA Annu Symp Proc: 682-686. [21] Fage-Butler, A. M. and M. Nisbeth Jensen (2015). Medical terminology in online patient-patient communication: evidence of high health literacy? Health Expectations.