=Paper= {{Paper |id=Vol-2137/ws_ONCONTO_paper_5.pdf |storemode=property |title=Towards an Ontology for Representing Malignant Neoplasms |pdfUrl=https://ceur-ws.org/Vol-2137/ws_ONCONTO_paper_5.pdf |volume=Vol-2137 |authors=William D. Duncan,Carmelo Gaudioso,Alexander D. Diehl |dblpUrl=https://dblp.org/rec/conf/icbo/DuncanGD17 }} ==Towards an Ontology for Representing Malignant Neoplasms== https://ceur-ws.org/Vol-2137/ws_ONCONTO_paper_5.pdf
Towards an Ontology for Representing Malignant Neoplasms
                      William D. Duncan1,* Carmelo Gaudioso2 and Alexander D. Diehl3
             1
                 Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14203, USA
             2 Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14203, USA
                       3 Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, 14203, USA




ABSTRACT
   Oncology research produces data about a wide variety of entities         2   PROJECT MOTIVATION
such as tumor types, locations, pathology, and staging, patient treat-
ments and outcomes, and experimental systems such as mouse models               This research developed out of a number of interests.
and cell lines. In order to conduct effective cancer research, terminolo-   The first is that we recognized a need to connect cancer data
gies, classification systems, and ontologies are needed that can inte-
grate these various datasets and provide standards for consistently
                                                                            from multiple sources with differing levels of granularity.
representing entities.                                                      Some important levels include: (1) diagnosis and treatment
   In this paper, we discuss our ongoing efforts to address these diffi-    information about the patient and how the patient responses
culties by developing a realism-based ontology for representing in-         to treatment; (2) anatomical information about the organs in
stances of malignant neoplasms, disease progression, treatments, and
outcomes. This ontology is being built using the principles of the OBO      which the cancer originates; (3) pathology information
Foundry, and makes use of other OBO Foundry ontologies, such as the         about the tissues removed during procedures, such as tumor
Ontology for General Medical Sciences, Uberon, and the Cell Ontology.       tissues and lymph nodes; (4) cellular information, such as
As a result of our efforts, we have made worthwhile progress towards
developing a robust ontological framework for representing malignant
                                                                            data obtained from flow cytometry and immunohistochem-
neoplasms.                                                                  istry; (5) and molecular information, such as genomic se-
                                                                            quencing. Providing a framework for tying these kinds of
1      INTRODUCTION                                                         data together is essential for cancer research by providing
    Oncology research produces data about a wide variety of                 the basis for the use of advanced ontology-based querying
entities such as tumor types, locations, pathology, and stag-               and analytical methods that allow for data integration across
ing, patient treatments and outcomes, and experimental sys-                 multiple sources and scales.
tems such as mouse models and cell lines. In order to con-
duct effective cancer research, terminologies, classification               3   CURRENT CLASSIFICATION SYSTEMS,
systems, and ontologies are needed that can integrate these                     TERMINOLOGIES, AND ONTOLOGIES
various datasets and provide standards for consistently rep-                    A number of existing classification systems, ontologies,
resenting entities. These standards facilitate the meaningful               and terminologies have terms for representing malignant
linking, sharing, and analysis of disparate datasets between                neoplasms. Prominent examples include the International
researchers and across institutions. However, the incomplete                Statistical Classification of Diseases 10th Revision (ICD-
and inconsistent representation of cancer-related data makes                10), the International Statistical Classification of Diseases
it difficult to perform these activities.                                   for Oncology (ICD-O), the National Cancer Institute The-
    In this paper, we discuss our ongoing efforts to address                saurus (NCIT), and the Systematized Nomenclature of Med-
these difficulties by developing a realism-based ontology for               icine Clinical Terms (SNOMED CT). However, since many
representing instances of malignant neoplasms, disease pro-                 of these information organization systems do not share a
gression, treatments, and outcomes. This ontology is being                  common upper-level framework, it is not easy to leverage
built using the principles of the OBO Foundry (Smith et al.                 information contained in other terminologies and ontolo-
2007), and makes use of other OBO Foundry ontologies,                       gies. For instance, SNOMED CT does not have terms for
such as the Ontology for General Medical Sciences and the                   checkpoint inhibitors, whereas the NCI Thesaurus does.
Cell Ontology. We chose to focus on these entities because                  Ideally, we would like to use terms from each system (i.e.,
they are key elements driving accurate cohort selection                     SNOMED CT and NCIT), but due to differences in their
based on diagnosis, stage, and treatment; and clinical deci-                relations and hierarchical structures, it is difficult to do so.
sion support. As a result of our efforts, we have made                      For example, the term ‘metastasis’ denotes a disorder in
worthwhile progress towards developing a robust ontologi-                   SNOMED CT, but denotes the spread of cancer (i.e., a pro-
cal framework for representing malignant neoplasms.                         cess) in the NCIT. OBO Foundry ontologies, in contrast, are
                                                                            generally designed using the Basic Formal Ontology (BFO)
                                                                            as their upper-level framework, and this enables the creation
                                                                            of domain specific ontologies whose terms can be reused by
*   To whom correspondence should be addressed:
           william.duncan@roswellpark.org                                   other OBO Foundry ontologies. For instance, the Drug On-


                                                                                                                                          1
Duncan et al.



tology (Hanna et al. 2013) uses terms from the Chemical                           In our examination of the NCIT, we found that many of
Entities of Biological Interest (ChEBI) (Hastings et al.                      the definitions in the malignant neoplasms branch were suf-
2013) ontology to represent drug ingredients.                                 ficiently defined and the hierarchy was rich enough to suit
                                                                              our purposes. However, when we examined other branches
3.1       International Statistical Classification of Dis-
                                                                              of the NCIT, certain problems became apparent. In particu-
          eases
                                                                              lar, we found the definitions for cell types related to cancer
    Due to its long history and widespread adoption, the In-                  to be inadequate. Consider the following NCIT concepts and
ternational Statistical Classification of Diseases (ICD)1 is                  definitions:
perhaps the most relevant system for classifying diseases.
Maintained by the World Health Organization (WHO), ICD                              •    Abnormal Cell (C12913): An abnormal human cell
is a globally recognized healthcare classification system                                type which can occur in either disease states or dis-
consisting of hierarchically structured codes that represent                             ease models.
diseases, disorders, and other health related issues.2 In rela-                     •    Neoplastic Cell (C12922): Cells of, or derived
tion to the current topic, the International Statistical Classi-                         from, a tumor.
fication of Diseases for Oncology (ICD-O)3 has codes for
representing a number of pertinent characteristics of a ma-                         •    Malignant Cell (C12917): Cells of, or derived
lignant neoplasm, such as the anatomical site of the neo-                                from, a malignant tumor.
plasm, the neoplasm’s histology (e.g., small cell, clear cell),                   The definition for Abnormal Cell suffers from its being
and behavior (e.g., if it has metastasized). For example, an                  circular (i.e., an abnormal cell is defined as being an abnor-
ovarian adenocarcinoma is represented using the following                     mal cell type), and thus the definition does not provide any
combination of codes:                                                         new information. Furthermore, the definition specifically
      •    C56 – the site code for an ovary                                   states that an abnormal cell is a human cell. This prevents
                                                                              the NCIT from consistently modeling data about abnormal
      •    8140/3 – 8140 is the code for a neoplasm arising                   cells from non-human species despite the fact that the NCIT
           from glandular epithelial tissue, and ‘/3’ represents              does contain concepts for mouse diseases, such as Mouse
           that the neoplasm is malignant                                     Carcinoma (C24010). Given the importance of mouse mod-
    The advantage of ICD’s coding system is that allows                       els in cancer research, not being able to represent data from
diseases to be easily grouped and counted for statistical and                 mouse studies correctly is a severe limitation.
reporting purposes. For instance, to find all patients who                        The definitions for Neoplastic Cell and Malignant Cell
have an adenocarcinoma, you only have to look for patients                    do not provide much clarity about how these cells relate to
whose histological code begins with ‘814’ and has a behav-                    neoplasms. Since a neoplasm may also contain normal cell
ior code greater than 3. However, there are two related                       types, more details are needed about what it means to be a
noteworthy drawbacks to implementing ICD as an ontology.                      neoplastic cell other than being derived from a tumor. Fur-
First, ICD does not contain codes for many of the important                   thermore, while a metastasis may be said in some sense to
cancer related entities that need to be represented, such as                  derive from a tumor, this cannot be said of the originating
treatments and molecular disorders. This shortcoming is                       neoplastic cells that first started proliferating during the tu-
compounded by ICD’s lack of formal relations that would                       mor formation process. Lastly, it needs to be pointed out
allow codes to be linked to other information. Thus, even if                  that these cell types form a hierarchy. A Malignant Cell is a
we created code lists for the missing entities, we would still                type of Neoplastic Cell, and Neoplastic Cell is a type of
be faced with the task of creating well-defined relations that                Abnormal Cell. This information is not contained in the tex-
would allow this information to be linked to ICD codes.                       tual definitions in an Aristotelian fashion, although it is rep-
                                                                              resented in NCIT’s taxonomic relations.
3.2       National Cancer Institute Thesaurus
                                                                              3.3       Systematized Nomenclature of Medicine Clin-
    The National Cancer Institute Thesaurus (NCIT) is a ref-
                                                                                        ical Terms
erence terminology developed by the National Cancer Insti-
tute (Sioutos et al. 2007). It contains over 100,000 concepts                     Systematized Nomenclature of Medicine Clinical Terms
with textual definitions and 400,000 cross links between its                  (SNOMED CT) is a comprehensive health terminology that
concepts.4                                                                    provides a standardized way to represent clinical infor-
                                                                              mation in an electronic health record.5 Although SNOMED
1
                                                                              CT has a large number of terms for clinical findings and
  For brevity, we use the general term ‘ICD’ to refer to the number of dif-
                                                                              disorders, it does not have worked out terms for other terms
ferent versions of ICD, such as ICD-10 and ICD-O.
2
  http://www.who.int/classifications/icd/en, accessed 2017-06-20.
3
  https://training.seer.cancer.gov/icdo3, accessed 2017-06-20.                5
                                                                               http://www.snomed.org/snomed-ct/what-is-snomed-ct, accessed 2017-06-
4
  https://ncit.nci.nih.gov/ncitbrowser, accessed 2017-06-21.                  21.



2
                                                                       Towards an Ontology for Representing Malignant Neoplasms



related to neoplasms. For example, the concept Tumor cell                Second, the DO is missing needed formal axioms that re-
(SCTID 252987004) is defined as subtype of the concept               late entities having the disposition of cancer to the anatomi-
Abnormal cell (SCTID 39266006), but this does not specify            cal structures in which these entities are located. For in-
if the concept Tumor cell represents malignant cells.6 Fur-          stance, the DO term ovary epithelial cancer does not have
thermore, the concept Malignant tumor cells (SCTID:                  axioms that formally relate the disposition to the epithelial
88400008) is defined as being a subtype of the concept Ma-           cells that are part of the ovary. The lack of these axioms can
lignant neoplasm, primary (SCTID: 86049000).7 This clas-             make it difficult to query data modeled using the DO. For
sification is incorrect for at least two reasons. First, although    example, it is not possible to query for the most common
malignant tumor cells are often part of a malignant neo-             anatomical structures in which malignant neoplasms are
plasm, they are not a kind of malignant neoplasm. A malig-           found.
nant neoplasm (as stated above) will also include a number
                                                                     3.5       On carcinomas and other pathological enti-
of non-cancer cells as part of its makeup. Second, even if we
                                                                               ties
accept that a malignant tumor cell is a kind of malignant
neoplasm, this definition is incorrect because a malignant               In Smith et al. (2005b), the Ontology for Biomedical Re-
tumor cell is also found in a metastasis (metastatic offshoot        ality (Rosse et al. 2005) is modified to account for material
of a primary tumor). Finally, SNOMED CT classifies Ma-               anatomical entities, material pathological entities, and
lignant tumor cells as a kind of Morphologic abnormality             pathological formations. Material anatomical entities are
(SCTID 4147007), and not a Disorder (SCTID 64572001).                anatomical structures (e.g., organs, cells) or bodily sub-
In SNOMED CT, the distinction between a morphologic                  stances (e.g., blood) that are found in a healthy organism.
abnormality and a disorder is that some underlying patho-            Anatomical structures are defined as being material anatom-
logical process supports a disorder.8 However, the reason            ical entities that have an inherent 3D structure generated by
that cell becomes malignant is because of underlying patho-          the coordinated expression organism’s own structural genes
logical processes (resulting from dysregulation) occurring           (Smith et al. 2005b). They include both canonical and vari-
with it.                                                             ant anatomical structures. Canonical anatomical structures
                                                                     belong to ‘idealized’ healthy human beings. Variant ana-
3.4     Disease Ontology
                                                                     tomical structures are entities that deviate from the norm
    The Disease Ontology (DO) is an OBO Foundry ontolo-              (e.g., having extra fingers), but are not pathological in the
gy built for the purposes of providing the biomedical com-           sense discussed below.
munity with consistent, standardized, and reusable defini-               An anatomical entity is defined as being a material
tions to represent the range of human diseases (Schriml et           pathological entity when (Smith et al. 2005b):
al. 2015). Although we found the DO to have decent cover-
age for cancer types, there are two difficulties with it that              •    It has come into being as a result of changes in
made the DO not suitable for our purposes.                                      some pre-existing canonical anatomical structure
    First, the DO is not consistent in its use of the terms                     through processes other than the expression of the
‘cancer’ and ‘neoplasm’. In DO, cancer is defined as a kind                     normal complement of genes of an organism of the
of disposition. This means that cancer is not a material thing                  given type.
(i.e., does not have mass), but rather is a kind of latent po-             •    It is predisposed to have health-related conse-
tential that is actualized when cells start proliferating out of                quences for the organism in question manifested by
control. Malignant neoplasms, are material objects that                         symptoms and signs.
come into being due to uncontrolled cell proliferation. In the
DO, however, there are number of terms in the cancer                 Material pathological entities include pathological structures
branch that reference neoplasms as material things and not           and pathological bodily substances. These are anatomical
the disposition of cancer. For example, ovary neuroendo-             structures and body substances, respectively, that host some
crine neoplasm is defined as a subtype of ovarian cancer.            kind of pathological formation, a formation being patholog-
Because of DO’s inconsistent use of terms ‘cancer’ and ‘ne-          ical when it affects an organism’s physiological processes to
oplasm’ and our remaining true to the OBO Foundry princi-            the degree that they give rise to signs and symptoms. For
ples, we decided it would be beneficial to the development           instance, a carcinoma is a pathological formation that arises
of our ontology to use the term ‘malignant neoplasm’ and             within an anatomical structure, such as an ovary.
avoid using the term ‘cancer’ when possible.                             A high-level summary of the hierarchy for material ana-
                                                                     tomical, material pathological entities is depicted below:
6
  http://browser.ihtsdotools.org, accessed 2017-06-22.                     •    material anatomical entity
7
  Ibid.                                                                             o anatomical structure
8
  https://confluence.ihtsdotools.org/display/DOCEG/6.1.1+Clinical+                           § canonical anatomical structure
+-+definition, accessed 2017-06-22.                                                          § variant anatomical structure


                                                                                                                                 3
Duncan et al.



                o     portion of canonical body substance (e.g.,                     carcinoma1 at t derived from some epithelial cell
                      portion of blood)                                              at some t’ prior to t
     •     material pathological entity                                         And, since the part of relation is transitive, we infer
               o pathological structure (e.g., neoplasm)                    that:
               o Portion of pathological substance (e.g.,
                    portion of pus)                                                  carcincoma1 at t part of patient1 at t

    Pathological formations are then related to their hosts                     Although this inference is trivial, the advantage of repre-
and the entities out of they originate using the following                  senting the patient’s tumor in this manner is that we are not
relations from the Open Biomedical Ontology (Smith et al.                   required to explicitly state this within an information system
2005a):9                                                                    using the ontology. Rather, we let the computer system han-
                                                                            dle this through automated inferencing.
     •     instance of: A primitive relation that holds be-                     The benefit of doing this becomes apparent when we
           tween a particular individual and the universal                  consider the multiple ways we classify malignant neo-
           (type or kind) that the particular individual instan-            plasms. A malignant neoplasm may be classified according
           tiates at particular time. For example, particular pa-           to:
           tient is an instance of a human being at a particular
           time.                                                                •    The cell type from which the neoplasm is origi-
                                                                                     nates, e.g., carcinomas arise from epithelial cells,
     •     part of: A primitive relation between instances of                        and sarcomas arise from non-epithelial cells.
           parts and wholes at a particular time. For example,
           a particular mass of malignant epithelia tissue is                   •    The organ in which the neoplasm develops, e.g., an
           part of a particular ovary at a particular time.                          ovarian carcinoma originates in the ovary.

     •     is a: A is a B means that A and B are universals and                 •    The organ system to which the organ of origin be-
           for all times t every particular individual i, if i in-                   longs, e.g., an ovarian carcinoma is a kind of re-
           stance of A at t, then i instance of B at t. For ex-                      productive system cancer
           ample, a human being is a mammal.                                    •    The anatomical site or region in which the organ of
     •                       10
           derived from: A primitive relation between two                            origin is found, e.g., a tongue carcinoma is a kind
           distinct instances i, j and times t, t’ and is such that                  of head and neck cancer.
           changes in i at t results in a new second entity j at                When such classification information is axiomatized, we
           t’. For example, a particular blastocyst derived                 can then query the information system along these multiple
           from a particular zygote.                                        axes without have to maintain complex data structures that
     •     transformation of: A transformation of B means                   explicitly assert this information. For instance, we can now
           that are universals and for all times t if i instance            query an information system for all carcinomas (i.e., malig-
           of A at t, then there is an earlier time t’ at which i           nant neoplasms that are derived from epithelial cells) that
           was an instance of B.                                            belong to patients’ reproductive systems without having to
                                                                            explicitly link each kind of carcinoma (e.g., ovarian, uterine,
   As an example, suppose a patient (patient1) has a carci-                 testicular) to the organ and associated organ system.
noma (carinoma1) that originated within her ovary (ova-
ry1). We represent this using the axioms:                                   4   OUR PROPOSED ONTOLOGY
          ovary1 at t part of patient1 at t                                     While we consider the work of Smith et al. to be a sig-
          carcinoma1 at t part of ovary1 at t                               nificant improvement over the aforementioned classification
          carcinoma1 at t instance of pathological structure                systems and terminologies, a number of ontologies have
          at t                                                              been developed after this work was published. We take ad-
    Because carcinomas arise from the epithelial tissue lin-                vantage of these more recent ontologies as follows. First, we
ing of organs, we can assert the following about the pa-                    make use of the terms and relations from the Cell Ontology
tient’s tumor:                                                              (CL) (Diehl et al. 2016) to represent the types of cells from
                                                                            which a malignant neoplasm arises. Moreover, as a result of
                                                                            our work, the CL added the terms abnormal cell, neoplastic
                                                                            cell, and malignant cell in order to better represent cell types
9
  Hereafter, relations are represented in bold.                             that play integral roles in tumor formation:
10
  In the referenced Open Biomedical Ontology relations, the name the
                                                                                •    abnormal cell: A cell found in an organism or de-
relation is named ‘derives from’. However, to avoid confusion, we use the
term as presented in the paper.                                                      rived from an organism exhibiting a phenotype that


4
                                                                         Towards an Ontology for Representing Malignant Neoplasms



             deviates from the expected phenotype of any native            Third, in order to account for Smith et al.’s distinction
             cell type of that organism. Abnormal cells are typi-      between material pathological entities and material anatom-
             cally found in disease states or disease models.          ical entities, we adopt OGMS’ account of a disease (as a
                                                                       disposition) being based on a disorder:
        •    neoplastic cell: An abnormal cell exhibiting
             dysregulation of cell proliferation or programmed                  disorder: A material entity which is clinically ab-
             cell death and capable of forming a neoplasm, an                   normal and part of an extended organism. Disor-
             aggregate of cells in the form of a tumor mass or an               ders are the physical basis of disease.
             excess number of abnormal cells (liquid tumor)
                                                                           Since OGMS uses the Basic Formal Ontology (BFO) as
             within an organism.
                                                                       its upper-level framework and a disease in OGMS is type of
        •    malignant cell: A neoplastic cell that is capable of      BFO disposition, it cannot (like all dispositions) exist on its
             entering a surrounding tissue.                            own. Rather, a disease must be borne by a disorder whose
                                                                       structural abnormalities serve as a disease’s basis. For ex-
    Second, an important criterion in Smith et al.’s definition
                                                                       ample, a sprained ankle is a disorder in the sense that the
of an entity being pathological is that it is predisposed to
                                                                       physical structures are clinically abnormal, and these physi-
have health related consequences (Smith et al. 2005b). To
                                                                       ological abnormalities are the reason that a sprained ankle is
more precisely account for predispositions of this sort, we
                                                                       disposed to swell.
adopt Ontology for General Medical Sciences’ (OGMS)
                                                                           Fourth, to relate a disease to the disorder upon which it
model of disease (Scheuermann et al. 2009). In OGMS, a
                                                                       is based, we define the has material basis in relation as
disease is type of disposition that is manifested (or realized)
                                                                       follows:
during those processes that compromise an organism’s
physiological health. This permits us to represent that an                      has material basis in: A primitive relation be-
organism may have a disease even though the disease is not                      tween an instance of a disease i and an instance of
currently being realized. A malignant neoplasm, for in-                         a disorder j at particular time t in which i exists be-
stance, may shed malignant cells that remain dormant in the                     cause of the physical makeup of some part of j at
patient until at some later time they begin to proliferate.                     time t.
During this dormant period, these malignant cells possess
                                                                           In addition to relating a disease to its basis, we must also
the disposition for undergoing uncontrolled cell prolifera-
                                                                       account for the processes that realize (or make manifest) an
tion, although the disposition is not being realized. Similar-
                                                                       instance of a disease. For this we use OGMS’ term patho-
ly, the genome within a native cell may have mutations in
                                                                       logical bodily process:
its BRCA1 or BRCA2 genes, but the cell may behave nor-
mally until certain cellular processes uncover the pathologi-                   pathological bodily process: A bodily process that
cal effects of these mutations. Using the dispositional ac-                     is clinically abnormal.
count of disease, we then incorporate the Disease Ontolo-
                                                                           As observed in the definition, a pathological bodily pro-
gy’s (DO) representation of cancer as follows:
                                                                       cess is a type of bodily process. However, the term bodily
        •    disease: A disposition (i) to undergo pathological        process is not defined in OGMS.
             processes that (ii) exists in an organism because of          Fifth, in order to account for the temporal development
             one or more disorders in that organism.11                 of malignant neoplasms, we make use of the Relations On-
                                                                       tology’s derives from and develops from relations (Smith
        •    disease of cellular proliferation: A disease that is
                                                                       et al. 2005a). The derives from relation is similar to the
             characterized by abnormally rapid cell division.
                                                                       aforementioned derived from relation, but adds the criteria
        •    cancer: A disease of cellular proliferation that is       that the originating entity ceases to exist when the new enti-
             malignant and primary, characterized by uncon-            ty is created and the newly created entity inherits a signifi-
             trolled cellular proliferation, local cell invasion and   cant portion of its matter from the originating entity. For
             metastasis.                                               example, the assertion:
    Recall that above we criticized the DO for its incon-                       abnormal cell derives from native cell
sistent usage of the term ‘neoplasm’. However, given the
                                                                       entails that a particular native cell no longer exists once the
need to represent the dispositional aspect of cancer, we find
                                                                       abnormal cell derived from it comes into existence.
DO’s hierarchy appropriate for characterizing cancer as we
                                                                          The develops from relation also represents new entities
are clear and consistent about which sense of ‘cancer’ we
                                                                       that arise from previously existing entities, but does not re-
are using.
                                                                       quire that the originating entity cease to exist. This allows
                                                                       us to represent that an instance of a secondary neoplasm
11
     DO uses the OGMS term disease.



                                                                                                                                     5
Duncan et al.



develops from an instance of a primary neoplasm without               A summary of proposed ontology of malignant neo-
having to commit the primary neoplasm’s ceasing to exist.          plasms is depicted in Figure 1.
    Sixth, given the importance of representing the anatomi-
cal structures in which malignant neoplasm from, we incor-         5   DISCUSSION
porate the Uberon’s anatomical structure and OGMS’                     We began our work in order to build an application on-
pathological anatomical structure terms, and define them as        tology to assist us in analyzing data in an ovarian cancer
follows (Mungall et al. 2012):                                     patient registry (work in progress). Because of our commit-
    •    material anatomical entity: An anatomical entity          ment to OBO Foundry principles and ontological realism,
         that has mass.                                            we began our ontology development by considering existing
                                                                   ontologies, including OGMS and DO, and related ontolo-
    •    anatomical structure: A material anatomical entity        gies such as Uberon and CL. Our aim has been to reuse on-
         that is a single connected structure with inherent        tology classes where possible and create new classes and
         3D shape generated by coordinated expression of           hierarchies where existing ontologies either are missing
         the organism's own genome.                                classes or providing faulty modeling of the domain.
    •    pathological anatomical structure: A material enti-           We have found the NCIT to be a very useful source of
         ty that comes into being as a result of changes in        information about cancer related entities, their definitions,
         some pre-existing anatomical structure through            and their relationships to each other. Although the NCIT is
         processes other than the expression of the normal         very large and has been developed over many years, it really
         complement of genes of an organism of the given           remains a terminology rather than an ontology. For exam-
         type, and is predisposed to have health-related con-      ple, the NCIT includes the term Disease or Disorder de-
         sequences for the organism in question manifested         fined as:
         by symptoms and signs.                                             Any abnormal condition of the body or mind that
    We note here that although intuitively a pathological                   causes discomfort, dysfunction, or distress to the
anatomical structure is a type of anatomical structure, for                 person affected or those in contact with the person.
reasons that will be discussed below, we classify them in                   The term is often used broadly to include injuries,
separate hierarchies. Moreover, we assert that a particular                 disabilities, syndromes, symptoms, deviant behav-
pathological anatomical structure (1) develops from an                      iors, and atypical variations of structure and func-
instance of a previously existing anatomical structure, and                 tion.
(2) has part an instance of a disorder. These two assertions           This definition does not adequately distinguish between
define both necessary and sufficient conditions for an entity      the processes and material entities that result in abnormal
to be a pathological anatomical structure.                         conditions. This distinction is important for precisely repre-
    Lastly, with the above modifications in place, we define       senting the nature of a malady. If a cancer patient has diffi-
the following terms to necessary for an ontology of malig-         culty breathing due to metastatic tumors spreading through-
nant neoplasms:                                                    out the lungs, both the difficulty in breathing and the tumors
    •    dysregulation of cell proliferation: A pathological       are abnormal conditions, and hence, are would be classified
         bodily process during which cell proliferation oc-        using the term Disease or Disorder (C2991). But, in reality,
         curs at a level not normal for that cell type in its      the process of breathing is a distinct kind of entity than a
         native context.                                           tumor, which is a material entity. There are past and current
                                                                   efforts to redevelop NCIT or at least sections of it into a
    •    neoplasm: A disorder that results from dysregula-         proper ontology. Our hope is these efforts will make the
         tion of cell proliferation (uncontrolled cell prolifer-   NCIT more aligned with OBO Foundry principles. One
         ation).                                                   important result of our work was the addition the abnormal
    •    malignant neoplasm: A neoplasm that has acquired          cell, neoplastic cell, and malignant cell types to CL. These
         the disposition to invade surrounding tissues and         CL classes parallel the naming and relationships of the
         spread to remote anatomical sites.                        NCIT concepts, but as discussed above, we chose to write
                                                                   new definitions that better define these cell types and do not
    •    primary neoplasm: A malignant neoplasm that is            limit their applicability unnecessarily.
         found in the site where the malignant cells first be-         In considering the Disease Ontology, we found it to be a
         gan proliferating.                                        useful catalog of cancer types, but as discussed above, we
    •    secondary neoplasm: A malignant neoplasm that             find that there is confusion as to whether neoplasms are dis-
         develops from a primary neoplasm.                         positions or disorders. Because of our need to represent
                                                                   pathological findings, we need to reflect that these findings



6
                                                                           Towards an Ontology for Representing Malignant Neoplasms



are about disorders (which are material entities) that are              with these other OBO terms, we look forward to collaborat-
observed by pathologists, and not about dispositions, which             ing with the OBO Foundry community on creating a coher-
are not directly observable.                                            ent structure for these upper level classes that is shared
    An important finding of our work is that we found that              among all OBO Foundry ontologies. Thus, we simply leave
OBO Foundry ontologies have difficulty representing ab-                 pathological anatomical structure as a subtype of material
normal or pathological entities. Two prominent examples                 anatomical entity, and pathological bodily process in its
are pathological anatomical structures and pathological pro-            current OGMS hierarchy.
cesses. Intuitively, a pathological anatomical structure is a               Our goal is to contribute to the oncology domain by cre-
kind of anatomical structure. For instance, an ovary contain-           ating a strong and consistent ontological foundation for
ing a carcinoma is still an instance of an ovary. However,              providing metadata and data analysis of patient cancer data
the standard definition (with some variations) for anatomi-             for both research and clinical applications including clinical
cal structure found in Uberon, the Common Anatomy Ref-                  decision support. The ontological framework described
erence Ontology, the Foundational Model of Anatomy, and                 herein attempts to solve some continuing issues in the repre-
the Anatomical Entity Ontology does not allow for this:12               sentation of cancer as a disease and the disorders (neo-
                                                                        plasms) in which it presents. Our framework is intended to
             Material anatomical entity that has inherent 3D
                                                                        be useful for the description and classification of data used
             shape and is generated by coordinated expression
                                                                        in cancer diagnosis and treatment. In future work, we will
             of the organism's own genome.
                                                                        be adding classes to represent additional entities associated
    This issue is that disorders (such as neoplasms and frac-           with cancer such as laboratory methods and results, treat-
tures) that arise with anatomical structures are not necessari-         ments, and outcomes. We hope our ontology will support
ly generated by the organism’s genome. Thus, the definition             other oncology researchers in exploiting the full potential of
is too strong. Smith et al. are aware of this propose an ana-           patient data registries and other cancer-related datasets.
tomical hierarchy consisting of top-level anatomical struc-
ture term with subtypes of canonical anatomical structure,              ACKNOWLEDGEMENTS
variant anatomical structure, and pathological anatomical               We gratefully acknowledge support as follows. William
structure (Smith et al. 2005a):                                         Duncan and Carmelo Gaudioso received support from the
        •    Anatomical structure                                       Clinical Data Network, a Roswell Park Cancer Institute
                o Canonical anatomical structure                        Cancer Center Support Grant shared resource funded by
                o Variant anatomical structure                          NCI P30CA16056. Alexander Diehl received support from
                o Pathological anatomical structure                     NCATS 5UL1TR001412. All three authors received support
                                                                        from NCI P50CA159981.
   While we think this a reasonable proposal, the lack of a
definition for anatomical structure makes is unclear as to              REFERENCES
what canonical, variant, and pathological structures have in
                                                                        Arp, R., Smith, B., & Spear, A.D. (2015). Building Ontologies With Basic
common.
                                                                            Formal Ontology. The MIT Press. doi:10.7551/mitpress/9780262527-
   A similar problem exists for abnormal processes (such as
                                                                            811.001.0001.
dysregulation of cell proliferation). The OGMS, for its part,
                                                                        Diehl, A. D., Meehan, T. F., Bradford, Y. M., Brush, M. H., Dahdul, W.
does provide the term pathological bodily process. But, this
                                                                            M., Dougall, D. S., … Mungall, C. J. (2016). The Cell Ontology 2016:
term is orphaned from other biological processes found in
                                                                            enhanced content, modularization, and ontology interoperabil-
other OBO ontologies. For example, the Gene Ontology
                                                                            ity. Journal of Biomedical Semantics, 7(44). doi: 10.1186/s13326-016-
(GO) includes the term biological process:13
                                                                            0088-7. PMCID: PMC4932724. https://github.com/obophenotype/cell-
             Any process specifically pertinent to the function-            ontology.
             ing of integrated living units: cells, tissues, organs,    Hanna, J., Joseph, E., Brochhausen, M., & Hogan, W. R. (2013). Building a
             and organisms. A process is a collection of molecu-            drug ontology based on RxNorm and other sources. Journal of Bio-
             lar events with a defined beginning and end.                   medical     Semantics, 4(44).   doi:10.1186/2041-1480-4-44.   PMCID:
                                                                            PMC3931349.
Again, intuitively it makes senses that a pathological bodily
                                                                        Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N.,
process should be a subtype of biological process. Howev-
                                                                            Muthukrishnan, V., Owen, G., Turner, S., Williams, M., & Steinbeck,
er, the definition of biological process does not permit this.
                                                                            C. (2013). The ChEBI reference database and ontology for biologically
    Although we do not have any concrete solutions, at this
                                                                            relevant chemistry: enhancements for 2013. Nucleic Acids Re-
point, for how to align pathological structures and processes
                                                                            search, 41(Database issue), D456–D463. doi:10.1093/nar/gks1146.
12
                                                                        Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A.
     Definitions retrieved from www.ontobee.org, accessed 2017-07-06.
13
                                                                            (2012). Uberon, an integrative multi-species anatomy ontolo-
     Definition retrieved from www.ontobee.org, accessed 2017-07-06.



                                                                                                                                                 7
Duncan et al.



    gy. Genome    Biology, 13(1),   R5.   doi:   10.1186/gb-2012-13-1-r5.         cal Informatics, 40(1): 30-43. doi: 10.1016/j.jbi.2006.02.013. PMID:
    PMCID: PMC3334586. http://uberon.github.io.                                   16697710.
Rosse, C., Kumar, A., Mejino, J. L., Cook, D. L., Detwiler, L. T., & Smith,   Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax,
    B. (2005). A Strategy for Improving and Integrating Biomedical On-            J., Mungall, C., Neuhaus, F., Rector, A.L., & Rosse, C. (2005a). Rela-
    tologies. AMIA Annual Symposium Proceedings, 2005, 639–643.                   tions in biomedical ontologies. Genome Biology, 6(5): R46.
    PMCID: PMC1560467                                                             doi: 10.1186/gb-2005-6-5-r46. PMCID: PMC1175958.
Scheuermann, R.H., Ceusters, W., & Smith ,B. Toward an ontological            Smith B., Kumar A., Ceusters W., and Rosse C. (2005b). On Carcinomas
    treatment of disease and diagnosis. San Francisco: Proceedings of the         and Other Pathological Entities. Comparative and Functional Ge-
    2009 AMIA Summit on Translational Bioinformatics, 2009, 116–120.              nomics, 6(7-8): 379-387. doi:10.1002/cfg.497. PMCID: PMC2447494.
    PMCID: PMC3041577. https://github.com/OGMS/ogms.                          Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W.,
Schriml, L. M., & Mitraka, E. (2015). The Disease Ontology: fostering             Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., The OBI Con-
    interoperability between biological and clinical human disease-related        sortium, Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.-A.,
    data. Mammalian Genome, 26(9-10): 584–589. doi: 10.1007/s00335-               Scheuermann, R. H., Shah, N., Whetzel, P. L., & Lewis, S. (2007). The
    015-9576-9. PMCID: PMC4602048. http://disease-ontology.org.                   OBO Foundry: coordinated evolution of ontologies to support bi-
Sioutos, N., de Coronado S., Haber M.W., Hartel F.W., Shaiu WL, &                 omedical data integration. Nat Biotechnology, 25(11): 1251–1255.
    Wright LW. (2007). NCI Thesaurus: A semantic model integrating                PMCID: PMC2814.
    cancer-related clinical and molecular information. Journal of Biomedi-




Figure 1. This figure illustrates our proposed ontology for representing malignant neoplasms. Our upper-level framework
consists of classes imported from the Ontology for General Medical Sciences (OGMS), Cell Ontology (CL), Disease Ontolo-
gy (DO), and Uberon. We extend the upper-level framework by adding the classes dysregulation of cell proliferation, neo-
plasm, malignant neoplasm, primary neoplasm, and secondary neoplasm.



8