=Paper=
{{Paper
|id=Vol-2137/ws_ONCONTO_paper_5.pdf
|storemode=property
|title=Towards an Ontology for Representing Malignant Neoplasms
|pdfUrl=https://ceur-ws.org/Vol-2137/ws_ONCONTO_paper_5.pdf
|volume=Vol-2137
|authors=William D. Duncan,Carmelo Gaudioso,Alexander D. Diehl
|dblpUrl=https://dblp.org/rec/conf/icbo/DuncanGD17
}}
==Towards an Ontology for Representing Malignant Neoplasms==
Towards an Ontology for Representing Malignant Neoplasms William D. Duncan1,* Carmelo Gaudioso2 and Alexander D. Diehl3 1 Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14203, USA 2 Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14203, USA 3 Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, 14203, USA ABSTRACT Oncology research produces data about a wide variety of entities 2 PROJECT MOTIVATION such as tumor types, locations, pathology, and staging, patient treat- ments and outcomes, and experimental systems such as mouse models This research developed out of a number of interests. and cell lines. In order to conduct effective cancer research, terminolo- The first is that we recognized a need to connect cancer data gies, classification systems, and ontologies are needed that can inte- grate these various datasets and provide standards for consistently from multiple sources with differing levels of granularity. representing entities. Some important levels include: (1) diagnosis and treatment In this paper, we discuss our ongoing efforts to address these diffi- information about the patient and how the patient responses culties by developing a realism-based ontology for representing in- to treatment; (2) anatomical information about the organs in stances of malignant neoplasms, disease progression, treatments, and outcomes. This ontology is being built using the principles of the OBO which the cancer originates; (3) pathology information Foundry, and makes use of other OBO Foundry ontologies, such as the about the tissues removed during procedures, such as tumor Ontology for General Medical Sciences, Uberon, and the Cell Ontology. tissues and lymph nodes; (4) cellular information, such as As a result of our efforts, we have made worthwhile progress towards developing a robust ontological framework for representing malignant data obtained from flow cytometry and immunohistochem- neoplasms. istry; (5) and molecular information, such as genomic se- quencing. Providing a framework for tying these kinds of 1 INTRODUCTION data together is essential for cancer research by providing Oncology research produces data about a wide variety of the basis for the use of advanced ontology-based querying entities such as tumor types, locations, pathology, and stag- and analytical methods that allow for data integration across ing, patient treatments and outcomes, and experimental sys- multiple sources and scales. tems such as mouse models and cell lines. In order to con- duct effective cancer research, terminologies, classification 3 CURRENT CLASSIFICATION SYSTEMS, systems, and ontologies are needed that can integrate these TERMINOLOGIES, AND ONTOLOGIES various datasets and provide standards for consistently rep- A number of existing classification systems, ontologies, resenting entities. These standards facilitate the meaningful and terminologies have terms for representing malignant linking, sharing, and analysis of disparate datasets between neoplasms. Prominent examples include the International researchers and across institutions. However, the incomplete Statistical Classification of Diseases 10th Revision (ICD- and inconsistent representation of cancer-related data makes 10), the International Statistical Classification of Diseases it difficult to perform these activities. for Oncology (ICD-O), the National Cancer Institute The- In this paper, we discuss our ongoing efforts to address saurus (NCIT), and the Systematized Nomenclature of Med- these difficulties by developing a realism-based ontology for icine Clinical Terms (SNOMED CT). However, since many representing instances of malignant neoplasms, disease pro- of these information organization systems do not share a gression, treatments, and outcomes. This ontology is being common upper-level framework, it is not easy to leverage built using the principles of the OBO Foundry (Smith et al. information contained in other terminologies and ontolo- 2007), and makes use of other OBO Foundry ontologies, gies. For instance, SNOMED CT does not have terms for such as the Ontology for General Medical Sciences and the checkpoint inhibitors, whereas the NCI Thesaurus does. Cell Ontology. We chose to focus on these entities because Ideally, we would like to use terms from each system (i.e., they are key elements driving accurate cohort selection SNOMED CT and NCIT), but due to differences in their based on diagnosis, stage, and treatment; and clinical deci- relations and hierarchical structures, it is difficult to do so. sion support. As a result of our efforts, we have made For example, the term ‘metastasis’ denotes a disorder in worthwhile progress towards developing a robust ontologi- SNOMED CT, but denotes the spread of cancer (i.e., a pro- cal framework for representing malignant neoplasms. cess) in the NCIT. OBO Foundry ontologies, in contrast, are generally designed using the Basic Formal Ontology (BFO) as their upper-level framework, and this enables the creation of domain specific ontologies whose terms can be reused by * To whom correspondence should be addressed: william.duncan@roswellpark.org other OBO Foundry ontologies. For instance, the Drug On- 1 Duncan et al. tology (Hanna et al. 2013) uses terms from the Chemical In our examination of the NCIT, we found that many of Entities of Biological Interest (ChEBI) (Hastings et al. the definitions in the malignant neoplasms branch were suf- 2013) ontology to represent drug ingredients. ficiently defined and the hierarchy was rich enough to suit our purposes. However, when we examined other branches 3.1 International Statistical Classification of Dis- of the NCIT, certain problems became apparent. In particu- eases lar, we found the definitions for cell types related to cancer Due to its long history and widespread adoption, the In- to be inadequate. Consider the following NCIT concepts and ternational Statistical Classification of Diseases (ICD)1 is definitions: perhaps the most relevant system for classifying diseases. Maintained by the World Health Organization (WHO), ICD • Abnormal Cell (C12913): An abnormal human cell is a globally recognized healthcare classification system type which can occur in either disease states or dis- consisting of hierarchically structured codes that represent ease models. diseases, disorders, and other health related issues.2 In rela- • Neoplastic Cell (C12922): Cells of, or derived tion to the current topic, the International Statistical Classi- from, a tumor. fication of Diseases for Oncology (ICD-O)3 has codes for representing a number of pertinent characteristics of a ma- • Malignant Cell (C12917): Cells of, or derived lignant neoplasm, such as the anatomical site of the neo- from, a malignant tumor. plasm, the neoplasm’s histology (e.g., small cell, clear cell), The definition for Abnormal Cell suffers from its being and behavior (e.g., if it has metastasized). For example, an circular (i.e., an abnormal cell is defined as being an abnor- ovarian adenocarcinoma is represented using the following mal cell type), and thus the definition does not provide any combination of codes: new information. Furthermore, the definition specifically • C56 – the site code for an ovary states that an abnormal cell is a human cell. This prevents the NCIT from consistently modeling data about abnormal • 8140/3 – 8140 is the code for a neoplasm arising cells from non-human species despite the fact that the NCIT from glandular epithelial tissue, and ‘/3’ represents does contain concepts for mouse diseases, such as Mouse that the neoplasm is malignant Carcinoma (C24010). Given the importance of mouse mod- The advantage of ICD’s coding system is that allows els in cancer research, not being able to represent data from diseases to be easily grouped and counted for statistical and mouse studies correctly is a severe limitation. reporting purposes. For instance, to find all patients who The definitions for Neoplastic Cell and Malignant Cell have an adenocarcinoma, you only have to look for patients do not provide much clarity about how these cells relate to whose histological code begins with ‘814’ and has a behav- neoplasms. Since a neoplasm may also contain normal cell ior code greater than 3. However, there are two related types, more details are needed about what it means to be a noteworthy drawbacks to implementing ICD as an ontology. neoplastic cell other than being derived from a tumor. Fur- First, ICD does not contain codes for many of the important thermore, while a metastasis may be said in some sense to cancer related entities that need to be represented, such as derive from a tumor, this cannot be said of the originating treatments and molecular disorders. This shortcoming is neoplastic cells that first started proliferating during the tu- compounded by ICD’s lack of formal relations that would mor formation process. Lastly, it needs to be pointed out allow codes to be linked to other information. Thus, even if that these cell types form a hierarchy. A Malignant Cell is a we created code lists for the missing entities, we would still type of Neoplastic Cell, and Neoplastic Cell is a type of be faced with the task of creating well-defined relations that Abnormal Cell. This information is not contained in the tex- would allow this information to be linked to ICD codes. tual definitions in an Aristotelian fashion, although it is rep- resented in NCIT’s taxonomic relations. 3.2 National Cancer Institute Thesaurus 3.3 Systematized Nomenclature of Medicine Clin- The National Cancer Institute Thesaurus (NCIT) is a ref- ical Terms erence terminology developed by the National Cancer Insti- tute (Sioutos et al. 2007). It contains over 100,000 concepts Systematized Nomenclature of Medicine Clinical Terms with textual definitions and 400,000 cross links between its (SNOMED CT) is a comprehensive health terminology that concepts.4 provides a standardized way to represent clinical infor- mation in an electronic health record.5 Although SNOMED 1 CT has a large number of terms for clinical findings and For brevity, we use the general term ‘ICD’ to refer to the number of dif- disorders, it does not have worked out terms for other terms ferent versions of ICD, such as ICD-10 and ICD-O. 2 http://www.who.int/classifications/icd/en, accessed 2017-06-20. 3 https://training.seer.cancer.gov/icdo3, accessed 2017-06-20. 5 http://www.snomed.org/snomed-ct/what-is-snomed-ct, accessed 2017-06- 4 https://ncit.nci.nih.gov/ncitbrowser, accessed 2017-06-21. 21. 2 Towards an Ontology for Representing Malignant Neoplasms related to neoplasms. For example, the concept Tumor cell Second, the DO is missing needed formal axioms that re- (SCTID 252987004) is defined as subtype of the concept late entities having the disposition of cancer to the anatomi- Abnormal cell (SCTID 39266006), but this does not specify cal structures in which these entities are located. For in- if the concept Tumor cell represents malignant cells.6 Fur- stance, the DO term ovary epithelial cancer does not have thermore, the concept Malignant tumor cells (SCTID: axioms that formally relate the disposition to the epithelial 88400008) is defined as being a subtype of the concept Ma- cells that are part of the ovary. The lack of these axioms can lignant neoplasm, primary (SCTID: 86049000).7 This clas- make it difficult to query data modeled using the DO. For sification is incorrect for at least two reasons. First, although example, it is not possible to query for the most common malignant tumor cells are often part of a malignant neo- anatomical structures in which malignant neoplasms are plasm, they are not a kind of malignant neoplasm. A malig- found. nant neoplasm (as stated above) will also include a number 3.5 On carcinomas and other pathological enti- of non-cancer cells as part of its makeup. Second, even if we ties accept that a malignant tumor cell is a kind of malignant neoplasm, this definition is incorrect because a malignant In Smith et al. (2005b), the Ontology for Biomedical Re- tumor cell is also found in a metastasis (metastatic offshoot ality (Rosse et al. 2005) is modified to account for material of a primary tumor). Finally, SNOMED CT classifies Ma- anatomical entities, material pathological entities, and lignant tumor cells as a kind of Morphologic abnormality pathological formations. Material anatomical entities are (SCTID 4147007), and not a Disorder (SCTID 64572001). anatomical structures (e.g., organs, cells) or bodily sub- In SNOMED CT, the distinction between a morphologic stances (e.g., blood) that are found in a healthy organism. abnormality and a disorder is that some underlying patho- Anatomical structures are defined as being material anatom- logical process supports a disorder.8 However, the reason ical entities that have an inherent 3D structure generated by that cell becomes malignant is because of underlying patho- the coordinated expression organism’s own structural genes logical processes (resulting from dysregulation) occurring (Smith et al. 2005b). They include both canonical and vari- with it. ant anatomical structures. Canonical anatomical structures belong to ‘idealized’ healthy human beings. Variant ana- 3.4 Disease Ontology tomical structures are entities that deviate from the norm The Disease Ontology (DO) is an OBO Foundry ontolo- (e.g., having extra fingers), but are not pathological in the gy built for the purposes of providing the biomedical com- sense discussed below. munity with consistent, standardized, and reusable defini- An anatomical entity is defined as being a material tions to represent the range of human diseases (Schriml et pathological entity when (Smith et al. 2005b): al. 2015). Although we found the DO to have decent cover- age for cancer types, there are two difficulties with it that • It has come into being as a result of changes in made the DO not suitable for our purposes. some pre-existing canonical anatomical structure First, the DO is not consistent in its use of the terms through processes other than the expression of the ‘cancer’ and ‘neoplasm’. In DO, cancer is defined as a kind normal complement of genes of an organism of the of disposition. This means that cancer is not a material thing given type. (i.e., does not have mass), but rather is a kind of latent po- • It is predisposed to have health-related conse- tential that is actualized when cells start proliferating out of quences for the organism in question manifested by control. Malignant neoplasms, are material objects that symptoms and signs. come into being due to uncontrolled cell proliferation. In the DO, however, there are number of terms in the cancer Material pathological entities include pathological structures branch that reference neoplasms as material things and not and pathological bodily substances. These are anatomical the disposition of cancer. For example, ovary neuroendo- structures and body substances, respectively, that host some crine neoplasm is defined as a subtype of ovarian cancer. kind of pathological formation, a formation being patholog- Because of DO’s inconsistent use of terms ‘cancer’ and ‘ne- ical when it affects an organism’s physiological processes to oplasm’ and our remaining true to the OBO Foundry princi- the degree that they give rise to signs and symptoms. For ples, we decided it would be beneficial to the development instance, a carcinoma is a pathological formation that arises of our ontology to use the term ‘malignant neoplasm’ and within an anatomical structure, such as an ovary. avoid using the term ‘cancer’ when possible. A high-level summary of the hierarchy for material ana- tomical, material pathological entities is depicted below: 6 http://browser.ihtsdotools.org, accessed 2017-06-22. • material anatomical entity 7 Ibid. o anatomical structure 8 https://confluence.ihtsdotools.org/display/DOCEG/6.1.1+Clinical+ § canonical anatomical structure +-+definition, accessed 2017-06-22. § variant anatomical structure 3 Duncan et al. o portion of canonical body substance (e.g., carcinoma1 at t derived from some epithelial cell portion of blood) at some t’ prior to t • material pathological entity And, since the part of relation is transitive, we infer o pathological structure (e.g., neoplasm) that: o Portion of pathological substance (e.g., portion of pus) carcincoma1 at t part of patient1 at t Pathological formations are then related to their hosts Although this inference is trivial, the advantage of repre- and the entities out of they originate using the following senting the patient’s tumor in this manner is that we are not relations from the Open Biomedical Ontology (Smith et al. required to explicitly state this within an information system 2005a):9 using the ontology. Rather, we let the computer system han- dle this through automated inferencing. • instance of: A primitive relation that holds be- The benefit of doing this becomes apparent when we tween a particular individual and the universal consider the multiple ways we classify malignant neo- (type or kind) that the particular individual instan- plasms. A malignant neoplasm may be classified according tiates at particular time. For example, particular pa- to: tient is an instance of a human being at a particular time. • The cell type from which the neoplasm is origi- nates, e.g., carcinomas arise from epithelial cells, • part of: A primitive relation between instances of and sarcomas arise from non-epithelial cells. parts and wholes at a particular time. For example, a particular mass of malignant epithelia tissue is • The organ in which the neoplasm develops, e.g., an part of a particular ovary at a particular time. ovarian carcinoma originates in the ovary. • is a: A is a B means that A and B are universals and • The organ system to which the organ of origin be- for all times t every particular individual i, if i in- longs, e.g., an ovarian carcinoma is a kind of re- stance of A at t, then i instance of B at t. For ex- productive system cancer ample, a human being is a mammal. • The anatomical site or region in which the organ of • 10 derived from: A primitive relation between two origin is found, e.g., a tongue carcinoma is a kind distinct instances i, j and times t, t’ and is such that of head and neck cancer. changes in i at t results in a new second entity j at When such classification information is axiomatized, we t’. For example, a particular blastocyst derived can then query the information system along these multiple from a particular zygote. axes without have to maintain complex data structures that • transformation of: A transformation of B means explicitly assert this information. For instance, we can now that are universals and for all times t if i instance query an information system for all carcinomas (i.e., malig- of A at t, then there is an earlier time t’ at which i nant neoplasms that are derived from epithelial cells) that was an instance of B. belong to patients’ reproductive systems without having to explicitly link each kind of carcinoma (e.g., ovarian, uterine, As an example, suppose a patient (patient1) has a carci- testicular) to the organ and associated organ system. noma (carinoma1) that originated within her ovary (ova- ry1). We represent this using the axioms: 4 OUR PROPOSED ONTOLOGY ovary1 at t part of patient1 at t While we consider the work of Smith et al. to be a sig- carcinoma1 at t part of ovary1 at t nificant improvement over the aforementioned classification carcinoma1 at t instance of pathological structure systems and terminologies, a number of ontologies have at t been developed after this work was published. We take ad- Because carcinomas arise from the epithelial tissue lin- vantage of these more recent ontologies as follows. First, we ing of organs, we can assert the following about the pa- make use of the terms and relations from the Cell Ontology tient’s tumor: (CL) (Diehl et al. 2016) to represent the types of cells from which a malignant neoplasm arises. Moreover, as a result of our work, the CL added the terms abnormal cell, neoplastic cell, and malignant cell in order to better represent cell types 9 Hereafter, relations are represented in bold. that play integral roles in tumor formation: 10 In the referenced Open Biomedical Ontology relations, the name the • abnormal cell: A cell found in an organism or de- relation is named ‘derives from’. However, to avoid confusion, we use the term as presented in the paper. rived from an organism exhibiting a phenotype that 4 Towards an Ontology for Representing Malignant Neoplasms deviates from the expected phenotype of any native Third, in order to account for Smith et al.’s distinction cell type of that organism. Abnormal cells are typi- between material pathological entities and material anatom- cally found in disease states or disease models. ical entities, we adopt OGMS’ account of a disease (as a disposition) being based on a disorder: • neoplastic cell: An abnormal cell exhibiting dysregulation of cell proliferation or programmed disorder: A material entity which is clinically ab- cell death and capable of forming a neoplasm, an normal and part of an extended organism. Disor- aggregate of cells in the form of a tumor mass or an ders are the physical basis of disease. excess number of abnormal cells (liquid tumor) Since OGMS uses the Basic Formal Ontology (BFO) as within an organism. its upper-level framework and a disease in OGMS is type of • malignant cell: A neoplastic cell that is capable of BFO disposition, it cannot (like all dispositions) exist on its entering a surrounding tissue. own. Rather, a disease must be borne by a disorder whose structural abnormalities serve as a disease’s basis. For ex- Second, an important criterion in Smith et al.’s definition ample, a sprained ankle is a disorder in the sense that the of an entity being pathological is that it is predisposed to physical structures are clinically abnormal, and these physi- have health related consequences (Smith et al. 2005b). To ological abnormalities are the reason that a sprained ankle is more precisely account for predispositions of this sort, we disposed to swell. adopt Ontology for General Medical Sciences’ (OGMS) Fourth, to relate a disease to the disorder upon which it model of disease (Scheuermann et al. 2009). In OGMS, a is based, we define the has material basis in relation as disease is type of disposition that is manifested (or realized) follows: during those processes that compromise an organism’s physiological health. This permits us to represent that an has material basis in: A primitive relation be- organism may have a disease even though the disease is not tween an instance of a disease i and an instance of currently being realized. A malignant neoplasm, for in- a disorder j at particular time t in which i exists be- stance, may shed malignant cells that remain dormant in the cause of the physical makeup of some part of j at patient until at some later time they begin to proliferate. time t. During this dormant period, these malignant cells possess In addition to relating a disease to its basis, we must also the disposition for undergoing uncontrolled cell prolifera- account for the processes that realize (or make manifest) an tion, although the disposition is not being realized. Similar- instance of a disease. For this we use OGMS’ term patho- ly, the genome within a native cell may have mutations in logical bodily process: its BRCA1 or BRCA2 genes, but the cell may behave nor- mally until certain cellular processes uncover the pathologi- pathological bodily process: A bodily process that cal effects of these mutations. Using the dispositional ac- is clinically abnormal. count of disease, we then incorporate the Disease Ontolo- As observed in the definition, a pathological bodily pro- gy’s (DO) representation of cancer as follows: cess is a type of bodily process. However, the term bodily • disease: A disposition (i) to undergo pathological process is not defined in OGMS. processes that (ii) exists in an organism because of Fifth, in order to account for the temporal development one or more disorders in that organism.11 of malignant neoplasms, we make use of the Relations On- tology’s derives from and develops from relations (Smith • disease of cellular proliferation: A disease that is et al. 2005a). The derives from relation is similar to the characterized by abnormally rapid cell division. aforementioned derived from relation, but adds the criteria • cancer: A disease of cellular proliferation that is that the originating entity ceases to exist when the new enti- malignant and primary, characterized by uncon- ty is created and the newly created entity inherits a signifi- trolled cellular proliferation, local cell invasion and cant portion of its matter from the originating entity. For metastasis. example, the assertion: Recall that above we criticized the DO for its incon- abnormal cell derives from native cell sistent usage of the term ‘neoplasm’. However, given the entails that a particular native cell no longer exists once the need to represent the dispositional aspect of cancer, we find abnormal cell derived from it comes into existence. DO’s hierarchy appropriate for characterizing cancer as we The develops from relation also represents new entities are clear and consistent about which sense of ‘cancer’ we that arise from previously existing entities, but does not re- are using. quire that the originating entity cease to exist. This allows us to represent that an instance of a secondary neoplasm 11 DO uses the OGMS term disease. 5 Duncan et al. develops from an instance of a primary neoplasm without A summary of proposed ontology of malignant neo- having to commit the primary neoplasm’s ceasing to exist. plasms is depicted in Figure 1. Sixth, given the importance of representing the anatomi- cal structures in which malignant neoplasm from, we incor- 5 DISCUSSION porate the Uberon’s anatomical structure and OGMS’ We began our work in order to build an application on- pathological anatomical structure terms, and define them as tology to assist us in analyzing data in an ovarian cancer follows (Mungall et al. 2012): patient registry (work in progress). Because of our commit- • material anatomical entity: An anatomical entity ment to OBO Foundry principles and ontological realism, that has mass. we began our ontology development by considering existing ontologies, including OGMS and DO, and related ontolo- • anatomical structure: A material anatomical entity gies such as Uberon and CL. Our aim has been to reuse on- that is a single connected structure with inherent tology classes where possible and create new classes and 3D shape generated by coordinated expression of hierarchies where existing ontologies either are missing the organism's own genome. classes or providing faulty modeling of the domain. • pathological anatomical structure: A material enti- We have found the NCIT to be a very useful source of ty that comes into being as a result of changes in information about cancer related entities, their definitions, some pre-existing anatomical structure through and their relationships to each other. Although the NCIT is processes other than the expression of the normal very large and has been developed over many years, it really complement of genes of an organism of the given remains a terminology rather than an ontology. For exam- type, and is predisposed to have health-related con- ple, the NCIT includes the term Disease or Disorder de- sequences for the organism in question manifested fined as: by symptoms and signs. Any abnormal condition of the body or mind that We note here that although intuitively a pathological causes discomfort, dysfunction, or distress to the anatomical structure is a type of anatomical structure, for person affected or those in contact with the person. reasons that will be discussed below, we classify them in The term is often used broadly to include injuries, separate hierarchies. Moreover, we assert that a particular disabilities, syndromes, symptoms, deviant behav- pathological anatomical structure (1) develops from an iors, and atypical variations of structure and func- instance of a previously existing anatomical structure, and tion. (2) has part an instance of a disorder. These two assertions This definition does not adequately distinguish between define both necessary and sufficient conditions for an entity the processes and material entities that result in abnormal to be a pathological anatomical structure. conditions. This distinction is important for precisely repre- Lastly, with the above modifications in place, we define senting the nature of a malady. If a cancer patient has diffi- the following terms to necessary for an ontology of malig- culty breathing due to metastatic tumors spreading through- nant neoplasms: out the lungs, both the difficulty in breathing and the tumors • dysregulation of cell proliferation: A pathological are abnormal conditions, and hence, are would be classified bodily process during which cell proliferation oc- using the term Disease or Disorder (C2991). But, in reality, curs at a level not normal for that cell type in its the process of breathing is a distinct kind of entity than a native context. tumor, which is a material entity. There are past and current efforts to redevelop NCIT or at least sections of it into a • neoplasm: A disorder that results from dysregula- proper ontology. Our hope is these efforts will make the tion of cell proliferation (uncontrolled cell prolifer- NCIT more aligned with OBO Foundry principles. One ation). important result of our work was the addition the abnormal • malignant neoplasm: A neoplasm that has acquired cell, neoplastic cell, and malignant cell types to CL. These the disposition to invade surrounding tissues and CL classes parallel the naming and relationships of the spread to remote anatomical sites. NCIT concepts, but as discussed above, we chose to write new definitions that better define these cell types and do not • primary neoplasm: A malignant neoplasm that is limit their applicability unnecessarily. found in the site where the malignant cells first be- In considering the Disease Ontology, we found it to be a gan proliferating. useful catalog of cancer types, but as discussed above, we • secondary neoplasm: A malignant neoplasm that find that there is confusion as to whether neoplasms are dis- develops from a primary neoplasm. positions or disorders. Because of our need to represent pathological findings, we need to reflect that these findings 6 Towards an Ontology for Representing Malignant Neoplasms are about disorders (which are material entities) that are with these other OBO terms, we look forward to collaborat- observed by pathologists, and not about dispositions, which ing with the OBO Foundry community on creating a coher- are not directly observable. ent structure for these upper level classes that is shared An important finding of our work is that we found that among all OBO Foundry ontologies. Thus, we simply leave OBO Foundry ontologies have difficulty representing ab- pathological anatomical structure as a subtype of material normal or pathological entities. Two prominent examples anatomical entity, and pathological bodily process in its are pathological anatomical structures and pathological pro- current OGMS hierarchy. cesses. Intuitively, a pathological anatomical structure is a Our goal is to contribute to the oncology domain by cre- kind of anatomical structure. For instance, an ovary contain- ating a strong and consistent ontological foundation for ing a carcinoma is still an instance of an ovary. However, providing metadata and data analysis of patient cancer data the standard definition (with some variations) for anatomi- for both research and clinical applications including clinical cal structure found in Uberon, the Common Anatomy Ref- decision support. The ontological framework described erence Ontology, the Foundational Model of Anatomy, and herein attempts to solve some continuing issues in the repre- the Anatomical Entity Ontology does not allow for this:12 sentation of cancer as a disease and the disorders (neo- plasms) in which it presents. Our framework is intended to Material anatomical entity that has inherent 3D be useful for the description and classification of data used shape and is generated by coordinated expression in cancer diagnosis and treatment. In future work, we will of the organism's own genome. be adding classes to represent additional entities associated This issue is that disorders (such as neoplasms and frac- with cancer such as laboratory methods and results, treat- tures) that arise with anatomical structures are not necessari- ments, and outcomes. We hope our ontology will support ly generated by the organism’s genome. Thus, the definition other oncology researchers in exploiting the full potential of is too strong. Smith et al. are aware of this propose an ana- patient data registries and other cancer-related datasets. tomical hierarchy consisting of top-level anatomical struc- ture term with subtypes of canonical anatomical structure, ACKNOWLEDGEMENTS variant anatomical structure, and pathological anatomical We gratefully acknowledge support as follows. William structure (Smith et al. 2005a): Duncan and Carmelo Gaudioso received support from the • Anatomical structure Clinical Data Network, a Roswell Park Cancer Institute o Canonical anatomical structure Cancer Center Support Grant shared resource funded by o Variant anatomical structure NCI P30CA16056. Alexander Diehl received support from o Pathological anatomical structure NCATS 5UL1TR001412. All three authors received support from NCI P50CA159981. While we think this a reasonable proposal, the lack of a definition for anatomical structure makes is unclear as to REFERENCES what canonical, variant, and pathological structures have in Arp, R., Smith, B., & Spear, A.D. (2015). Building Ontologies With Basic common. Formal Ontology. The MIT Press. doi:10.7551/mitpress/9780262527- A similar problem exists for abnormal processes (such as 811.001.0001. dysregulation of cell proliferation). The OGMS, for its part, Diehl, A. D., Meehan, T. F., Bradford, Y. M., Brush, M. H., Dahdul, W. does provide the term pathological bodily process. But, this M., Dougall, D. S., … Mungall, C. J. (2016). The Cell Ontology 2016: term is orphaned from other biological processes found in enhanced content, modularization, and ontology interoperabil- other OBO ontologies. For example, the Gene Ontology ity. Journal of Biomedical Semantics, 7(44). doi: 10.1186/s13326-016- (GO) includes the term biological process:13 0088-7. PMCID: PMC4932724. https://github.com/obophenotype/cell- Any process specifically pertinent to the function- ontology. ing of integrated living units: cells, tissues, organs, Hanna, J., Joseph, E., Brochhausen, M., & Hogan, W. R. (2013). Building a and organisms. A process is a collection of molecu- drug ontology based on RxNorm and other sources. Journal of Bio- lar events with a defined beginning and end. medical Semantics, 4(44). doi:10.1186/2041-1480-4-44. PMCID: PMC3931349. Again, intuitively it makes senses that a pathological bodily Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., process should be a subtype of biological process. Howev- Muthukrishnan, V., Owen, G., Turner, S., Williams, M., & Steinbeck, er, the definition of biological process does not permit this. C. (2013). The ChEBI reference database and ontology for biologically Although we do not have any concrete solutions, at this relevant chemistry: enhancements for 2013. Nucleic Acids Re- point, for how to align pathological structures and processes search, 41(Database issue), D456–D463. doi:10.1093/nar/gks1146. 12 Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. Definitions retrieved from www.ontobee.org, accessed 2017-07-06. 13 (2012). Uberon, an integrative multi-species anatomy ontolo- Definition retrieved from www.ontobee.org, accessed 2017-07-06. 7 Duncan et al. gy. Genome Biology, 13(1), R5. doi: 10.1186/gb-2012-13-1-r5. cal Informatics, 40(1): 30-43. doi: 10.1016/j.jbi.2006.02.013. PMID: PMCID: PMC3334586. http://uberon.github.io. 16697710. Rosse, C., Kumar, A., Mejino, J. L., Cook, D. L., Detwiler, L. T., & Smith, Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, B. (2005). A Strategy for Improving and Integrating Biomedical On- J., Mungall, C., Neuhaus, F., Rector, A.L., & Rosse, C. (2005a). Rela- tologies. AMIA Annual Symposium Proceedings, 2005, 639–643. tions in biomedical ontologies. Genome Biology, 6(5): R46. PMCID: PMC1560467 doi: 10.1186/gb-2005-6-5-r46. PMCID: PMC1175958. Scheuermann, R.H., Ceusters, W., & Smith ,B. Toward an ontological Smith B., Kumar A., Ceusters W., and Rosse C. (2005b). On Carcinomas treatment of disease and diagnosis. San Francisco: Proceedings of the and Other Pathological Entities. Comparative and Functional Ge- 2009 AMIA Summit on Translational Bioinformatics, 2009, 116–120. nomics, 6(7-8): 379-387. doi:10.1002/cfg.497. PMCID: PMC2447494. PMCID: PMC3041577. https://github.com/OGMS/ogms. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Schriml, L. M., & Mitraka, E. (2015). The Disease Ontology: fostering Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., The OBI Con- interoperability between biological and clinical human disease-related sortium, Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.-A., data. Mammalian Genome, 26(9-10): 584–589. doi: 10.1007/s00335- Scheuermann, R. H., Shah, N., Whetzel, P. L., & Lewis, S. (2007). The 015-9576-9. PMCID: PMC4602048. http://disease-ontology.org. OBO Foundry: coordinated evolution of ontologies to support bi- Sioutos, N., de Coronado S., Haber M.W., Hartel F.W., Shaiu WL, & omedical data integration. Nat Biotechnology, 25(11): 1251–1255. Wright LW. (2007). NCI Thesaurus: A semantic model integrating PMCID: PMC2814. cancer-related clinical and molecular information. Journal of Biomedi- Figure 1. This figure illustrates our proposed ontology for representing malignant neoplasms. Our upper-level framework consists of classes imported from the Ontology for General Medical Sciences (OGMS), Cell Ontology (CL), Disease Ontolo- gy (DO), and Uberon. We extend the upper-level framework by adding the classes dysregulation of cell proliferation, neo- plasm, malignant neoplasm, primary neoplasm, and secondary neoplasm. 8