Can a Convolutional Neural Network Support
      Auditing of NCI Thesaurus Neoplasm Concepts?
                            Hao Liu1, Ling Zheng2, Yehoshua Perl1, James Geller1, Gai Elhanan3
 1                                                        2                                         3
 Department of Computer Science                           CSSE Department                            Applied Innovation Center
New Jersey Institute of Technology,                    Monmouth University,                          Desert Research Institute
        Newark, NJ USA                               West Long Branch, NJ USA                            Reno, NV USA
  {hl395, perl, geller}@njit.edu                      zdzhengling@gmail.com                           gelhanan@gmail.com

    Abstract—We present a Machine Learning methodology               assurance? Knowledge enrichment mines external sources for
using a Convolutional Neural Network to perform a specific case      new knowledge that does not exist in the ontology. However,
of an ontology Quality Assurance, namely discovery of missing        QA discovers incorrect or missing knowledge. Consider
IS-A relationships for Neoplasm concepts in the National Cancer      missing IS-A relationships from an existing concept A to a
Institute Thesaurus (NCIt). The training step checking all
“uncles” of a concept is computationally intensive. To shorten the
                                                                     concept B. If the concept B is already in the ontology, then
time and to improve the accuracy, we define a restricted             adding an IS-A link between A and B is considered correcting
methodology to check only uncles that are similar to each current    an omission error. If concept B is not in the ontology and is
concept. The restricted technique yields higher classification       added together with adding an IS-A from concept A to it, then
recall (compared to the unrestricted one) when testing against       this is knowledge enrichment. We note that curators of some
known errors found by domain experts who manually reviewed           ontologies, e.g., NCIt, are less interested in knowledge
Neoplasm concepts in a prior study. The results are encouraging      enrichment, unless required by users, than in quality assurance.
and provide impetus for further improvements to our technique.       In this paper, we attempt to use ML to address the task of
                                                                     detecting missing IS-A links between two existing concepts.
   Keywords—CNN; Deep Learning; Neoplasm Hierarchy;
National Cancer Institute Thesaurus; Quality Assurance;              This task is more challenging than knowledge enrichment,
Abstraction Network; Machine Learning                                since it requires a judgement that concept A is a specification
                                                                     of concept B. For knowledge enrichment we only recognize
                                                                     that a concept is missing in the ontology and then insert it into
                       I.    INTRODUCTION
                                                                     the proper place.
    Ontologies play a major role in enabling precise
communications and in support of healthcare applications, e.g.           In an unpublished study, we trained a Convolutional Neural
EHR systems. Many ontologies are large and complex. For              Network (CNN) deep learning model to insert new concepts
example, the National Cancer Institute Thesaurus (NCIt) [1],         into the SNOMED CT ontology, i.e., an enrichment problem.
serving cancer researchers inside and outside NIH, contains          In the present work, we train a CNN deep learning model to
135,243 concepts interrelated by 480,141 links in the April          find missing parent/child errors in the Neoplasm subhierarchy
2018 release. Due to their size and complexity, errors in            of NCIt. The vector representations of concepts are obtained
ontologies are unavoidable. Users of ontologies such as the          from an unsupervised neural network language model. The
SNOMED ontology are concerned about errors [2]. Thus,                model is evaluated by its classification recall on an unseen
quality assurance (QA) is essential in the lifecycle of              dataset. We check the model’s classification recall by testing
ontologies [3]. For a summary of auditing (QA) techniques for        against 18 missing parent/child errors found by domain experts
ontologies and in particular for SNOMED and NCIt, see [4, 5].        in a prior study [9]. Due to the size of the Neoplasm
                                                                     subhierarchy, the application of the training methodology is
    However, QA resources for ontologies are typically scarce,       computation-intensive and time consuming.
while QA tasks are labor-intensive and time-consuming.
Therefore, automated or semi-automated techniques that can               In previous research we have introduced Abstraction
either help in auditing an ontology or narrow down the places        Networks (AbNs) [10, 11]. An AbN provides a compact
where to look for errors, are highly desired. Missing                summarization and visual simplification of an ontology. The
parent/child errors are particularly interesting to ontology         SABOC (Structural Analysis of Biomedical Ontologies Center)
curators, as the IS-A links are the backbone structure of an         team at NJIT has demonstrated that Abstraction Networks are
ontology, facilitating the inheritance of lateral relationships      an effective tool to support quality assurance of ontologies [9,
(called roles in NCIt).                                              12]. An area taxonomy [3], a type of Abstraction Network, is
                                                                     composed of meta-concepts called areas, connected by child-
    Machine Learning (ML) has been proven successful in              of links. An area (see Background section) represents a group
many fields, e.g., knowledge mining. ML was previously used          of concepts with the same structure.
in knowledge enrichment for ontologies [6-8]. However, can
ML be used for quality assurance of ontologies in spite the             To accelerate the processing and improve recall, we modify
major difference between knowledge enrichment and quality            the CNN methodology to limit its consideration, for each
concept, to the similar concepts of its area (in our formal sense     (called “roles”), e.g. Disease Has Associated Anatomic Site.
of area). The modified, restricted methodology achieves 0.81
                                                                          Due to the NCI’s cancer focus, the Neoplasm subhierarchy
recall on the unseen testing data. It performs 50% better than
                                                                      of NCIt is composed of 9,955 concepts. It is a core component
the unrestricted methodology on the 18 known errors in terms
                                                                      of the Disease, Disorder or Finding, the largest hierarchy with
of recall. The results for detecting missing IS-A links are not
                                                                      35,081 concepts, and is modeled with more detail, compared to
yet strong enough. However, the performance in recognition of
                                                                      non-neoplasm concepts in the hierarchy.
known errors is encouraging and supports further improvement
of our methodology in respect to CNNs and the use of AbNs.
                                                                      D. Areas and Area Taxonomy
                       II.    BACKGROUND                                  An area taxonomy [3, 20, 21] is a compact Abstraction
                                                                      Network summarizing the structure (roles) of an ontology. It is
A. Doc2vec                                                            composed of areas and child-of relationships connecting areas.
                                                                      An area is a group of concepts having the same set of role
    Numeric representation of variable-length texts, ranging
                                                                      types. A concept can be in only one area, i.e., areas are disjoint.
from sentences to documents is a challenging task. Doc2vec, or
                                                                      A concept that has no parent in its area, is called a root of the
Paragraph Vectors [13], an extension of word2vec (word
                                                                      area. An area may have multiple roots. If a root concept of area
embedding) [14], maps variable-length texts to fixed-length
                                                                      B has a parent concept in area A, then there is a child-of
vectors. It is an unsupervised framework that learns continuous
                                                                      relationship from area B to area A.
distributed vector representations from unlabeled text data of a
paragraph/document, while preserving the inter-relationships              Fig. 1(a) is an excerpt of 12 Neoplasm concepts from NCIt.
of the text in the numeric format. In such vector                     Concepts are represented as rounded-corner boxes and the
representations, similar pieces of text are close to each other in    arrows denote IS-A relationships. Concepts with the same set
Euclidean or cosine distance in lower dimensional vector              of role types are enclosed within a colored dashed rectangle.
spaces. The Doc2vec inherits the semantics of the words in the        For example, both Benign Neoplasm and Tumorlet have the
context, and takes the word order into consideration when             two role types Disease Excludes Abnormal Cell and Disease
constructing the representation. The latter advantage is              Has Abnormal Cell, they reside in the left green dashed
important to our problem, as word order in our setup carries the      rectangle. Fig. 1(b) shows the area taxonomy for Fig. 1(a).
concepts’ topological/hierarchical order in the ontology. This is     Each colored, dashed rectangle in Fig. 1(a) becomes an area
useful information for feature learning. To the best of our           with the same color in Fig. 1(b). An area is labeled by its role
knowledge, this is the first study to derive vector                   type set and the number of concepts it summarizes. Areas with
representations for biomedical ontology classes via Doc2vec.          the same number of role types have the same color. For
                                                                      example, there are two areas colored in green, since both have
B. CNN                                                                two role types. Skin Neoplasm is the root concept of the red
     Convolutional Neural Networks (CNN), initially invented          area and its parent Neoplasm by Site is in the grey area. Hence,
for image recognition, have been widely used for various              there is a child-of relationship from the red area to the grey
applications, including vision, speech recognition, and               area, denoted as a bold arrow in Fig. 1(b).
language translation. CNN models have also been successfully
applied to solve various Natural Language Processing (NLP)
problems such as search query retrieval [15], semantic parsing
[16] and sentence modeling [17]. CNN utilizes convolving
filters to automatically learn and extract local features from
various layers, regardless of the input size. This makes CNN a
very powerful tool for classification or prediction tasks, e.g.
text classification [18] and relation extraction [19], even if the
data or features have not been manually labeled for learning
purposes. To the best of our knowledge, this is the first effort to   Fig. 1. (a) Excerpt of 12 concepts from the Neoplasm subhierarchy. (b) The
adopt the CNN model for ontology quality assurance.                         area taxonomy for (a) with 4 areas.

C. Neoplasms of NCIt
                                                                                                  III.    METHOD
    NCIt is published monthly by the National Cancer Institute
(NCI) in OWL and flat file formats. It is a cancer reference              We describe two methodologies, the unrestricted
terminology that is widely used. It covers cancer-related             methodology and the refined restricted methodology. The ML
terminology in various fields, e.g., clinical care and                training problem is viewed as a binary classification task: given
translational and basic research. Concepts are linked to other        a concept pair, we classify it into a positive category (there is
concepts (parent concepts) in the same hierarchy by IS-A              an IS-A link) or a negative category (there is no IS-A link). We
relationships. A concept may have multiple parent concepts.           train a Convolutional Neural Network (CNN) model to solve
The semantics of concepts are defined by lateral relationships        this classification problem.
Unrestricted Methodology: To train a CNN model to yield            subset of a concept’s uncles, which are structurally similar to
high precision, we must carefully choose the training data for     the investigated concept. To provide such a “closely related”
both categories. The source of training samples for the positive   subset we partition the Neoplasm subhierarchy into sets of
category is in the IS-A hierarchy of the ontology. The challenge   concepts of similar structure. In doing this we are availing
is in the choice for the negative samples as we cannot use the     ourselves of a powerful mechanism that derives an area
full set of unconnected pairs. For the Neoplasm subhierarchy       taxonomy from an ontology [3]. An area taxonomy is an
of NCIt with 9955 concepts, the size of this set is 99,075,537     Abstraction Network that clusters together groups of concepts
(= 9955*9954 – 16533). We subtract 16533 existing IS-A link        according to their roles. All concepts in one area have exactly
pairs from the potential missing parent/child errors. Training     the same roles. Concepts in different areas differ in at least one
pairs should not be chosen randomly. We need to choose pairs       role from each other.
where there is a reasonable likelihood for an IS-A link, not
                                                                        In the area taxonomy each cluster of concepts constitutes
pairs that obviously have no taxonomic relation. For example,
                                                                   an area. Due to the high average number of roles per concept in
a body part concept and a drug concept are not related by an
                                                                   the Neoplasm subhierarchy, the number of areas is large, and
IS-A relationship and would be a bad training pair. Negative
                                                                   the average size of an area is small. Note that AbNs are
training samples should be near misses, close to the
                                                                   automatically derived from the ontology so the limitation
“hyperplane of separation” in SVM terms.
                                                                   process is automatic [22]. By selecting pairs only within areas,
    To address these two problems of magnitude and recall in       we narrow down 37,147 negative samples to 10,574 more
the ML training, we limit the negative samples for the             closely related “uncle – nephew” pairs, of the same magnitude
unrestricted methodology to only “uncle - nephew” pairs for        as the 16,533 positive samples.
only near misses. That is, connections between a concept and a
                                                                       Since all uncles in an area will have exactly the same roles
sibling of its parent. By only choosing “uncle - nephew”
                                                                   as the current concept, they will be similar to it. Thus the recall
pairs, we guide the model to learn the underlying features used    of the CNN training model is expected to be higher than for the
to distinguish IS-A-connected concept pairs and similarly          unrestricted model trained with all uncles of a concept, many
positioned concept pairs that have a high potential to have        of which are not similar to the current concept.
secondary IS-A links but are not connected by IS-A links. There
are total 37,147 such “uncle - nephew” pairs in the Neoplasm          The following description is related to both methodologies.
subhierarchy of NCIt.                                              Overall, our methodology comprises following four steps:
    The unrestricted model must consider all uncles of a           A. Document Embedding
concept (that are not connected to that concepts by an IS-A
                                                                       The CNN model requires its input in the format of fixed-
link). Due to the size of the Neoplasm subhierarchy, the
                                                                   length feature vectors. Thus, before sending concept pairs for
number of uncles is large. For example, there are 24, 15 and 15
                                                                   training, we need to transform each concept into its
concepts with 10, 11 and 12 children, respectively and the
                                                                   corresponding vector representation with fixed length.
maximum number of children of a concept is 60. Each
grandchild of a concept with 15 children has 14 uncles. If the         The Paragraph Vector (Doc2vec) framework introduced by
average number of children of each sibling is 5, then there are    [13] generates fixed-length feature vectors from variable-
70 concepts with 14 uncles each. Applying the proposed             length pieces of text, as it was designed for text corpus
technique to select negative samples from the whole hierarchy      processing. Thus, the problem to overcome in applying
results in a large number of “uncle - nephew” pairs. This is       Doc2vec to ontologies is to find the vector representation of
computationally expensive for training the model and leads to      single concepts. However, an IS-A link is defined by a pair of
low accuracy by distracting the model from learning subtle         concepts, thus a joint representation of pairs is needed that is
features that distinguish between IS-A and “uncle - nephew”        also compatible with the input required by CNN.
links within similar groups of concepts. In other words, not all
                                                                       To derive the vector representation of a single concept, we
“uncle – nephew” pairs are equally useful for training
                                                                   need text “descriptions” of the concept. We recast a concept
purposes. We need pairs that are similar to existing IS-A-
                                                                   into a document such that it preserves hierarchical and partial
connected pairs, yet are not themselves IS-A-connected.
                                                                   semantic information of the concept:
   Additionally, many machine learning models work best
with balanced training sets. The number of positive and                 The document of a concept contains the concept ID, the
negative samples should be approximately equal. The number         name(s) of its ancestor(s), the name(s) of the concept itself, the
of positive training instances in our problem domain is given      name(s) of its child(ren) and the names of its grandchild(ren),
and fixed. The number of potential negative training samples is    if they exist. In this way, the document implicitly maintains the
much larger. Thus a cogent way has to be used to select a          hierarchical relationships of the ontology.
number of negative training samples that is closer to the             For example, the document representation of the concept
number of positive training samples.                               Malignant Nipple Neoplasm (Fig. 2) is “c5213: Neoplasm →
The Restricted Methodology: To cope with these problems,           Neoplasm by site → Breast neoplasm → Malignant Breast
we introduced the restricted approach. The restricted approach     Neoplasm → Nipple Neoplasm→ Malignant Nipple Neoplasm
limits the number of negative samples by only choosing a           → Female Malignant Nipple Neoplasm → Male Malignant
Nipple Neoplasm →Nipple Carcinoma.” Thus, the generated             Doc2vec. The CNN model architecture is shown in Fig. 3. The
distributed vector representation maintains the most important      input to the CNN model is two 128x1 dimension vectors and
hierarchical relationship semantics.                                the output is a 2x1 dimension vector.
    The document vectors are derived using the Distributed
Memory version of Paragraph Vector (PV-DM) [13] via the
Gensim [23] Doc2Vec implementation. Each vector has the
dimensionality of 128. Pairs of concepts connected by an IS-A
link are represented by the concatenation of the document
vectors of the two concepts, with the child concept first.


                                                                    Fig. 3. CNN model architecture

                                                                        Some structural details of this model are summarized as
                                                                    follows:
                                                                        • There are four convolution layers, each followed by a
                                                                          max pooling layer. This choice was informed by
                                                                          previous research. The first convolution layer has 18
                                                                          filters with kernel size =1. The filter number doubles
                                                                          with the increase of convolution layers. We use stride
                                                                          =1, meaning we slide the filters one number (position)
Fig. 2. Concept document derivation for Malignant Nipple Neoplasm         at a time over the input. The pooling size is 2 for all
B. Training and Testing Data                                              max pooling layers.
    The positive samples are directly extracted from the                • The Adam [24] optimization algorithm for stochastic
hierarchy as all the concept pairs connected via an IS-A link.            gradient descent is used for training with the learning
For example, (Malignant Nipple Neoplasm, Nipple Neoplasm)                 rate set to 0.001.
is a positive sample, because Malignant Nipple Neoplasm is a
Nipple Neoplasm (Fig. 2). As mentioned above, there are                 • The ReLU (Rectified Linear Unit) activation function is
16,533 positive samples in the Neoplasm subhierarchy. We                  used in every convolution layer, because it has proven
randomly picked 2,000 positive samples for testing. The                   successful in recent research projects. This corresponds
remaining 14,533 (=16,533-2,000) samples are used in a ratio              to a “rectifier function” from electrical engineering,
of 80% for training and 20% validation. The positive samples              blocking the negative half-wave and letting the positive
are treated equally for both models.                                      half-wave pass through one-to-one.

    For the restricted model, there are 10,574 potential pairs      D. Test against Reviewed Data
where both concepts of each pair are from the same area. We
                                                                        Traditionally, machine learning models are tested with k-
randomly picked 2,000 for testing. As noted above, there are
                                                                    fold cross validation. Thus, all known data is partitioned into k
37,147 “uncle – nephew” negative sample pairs in the
                                                                    folds, the model is trained with k-1 folds and tested with the
hierarchy. However, for the unrestricted model, we use the
                                                                    remaining fold. This process is repeated k times, with resulting
same 2000 negative pairs, used for the restricted model. This is
                                                                    precision, recall and F1 values averaged. We have augmented
done to enable performance comparison between the two
                                                                    this testing by human expert quality assurance results.
models. Similar to the way we handle the positive samples, the
remaining 8,574 (=10,574-2,000) samples for the restricted              In a previous study [9], domain experts reviewed 190
model and the remaining 35,147 (=37,147-2,000) for the              concepts from the Neoplasm subhierarchy and reported 18
unrestricted model are divided to 80% vs. 20% ratio for             missing parent errors. This data was used as ground truth in
training and validation, respectively.                              this study to check the sensitivity/recall of our model’s
                                                                    performance.
   In addition, we down-sampled the negative samples for the
unrestricted model and up-sampled for the restricted model to
14,533 samples, in order to balance the number of samples for                                 IV.    RESULTS
both categories, as customary in Machine Learning.                      We report our CNN model’s performance in the following
                                                                    three aspects:
C. CNN Model
    We trained a CNN model with 4 convolution layers on top
of vectors derived from the Neoplasm subhierarchy of NCIt via
                                                     Fig. 4. ROC curve of the two models

A. Testing recall and AUC                                                of the processed concept. Both ideas save processing time
    The testing recall is 0.75 and 0.81 for the unrestricted and         while improving the accuracy, by training unconnected pairs of
restricted models, respectively. Fig. 4 (a) and (b) show the             concepts similar to the original IS-A links of the processed
Receiver Operating Characteristic (ROC) curves of the testing            concept. The uncles are similarly positioned as siblings of the
performance for the unrestricted and restricted models,                  parent of the processed concept. The uncles within the area
respectively. The AUC (area under the curve) scores, as the              share the same roles as the processed concept.
measure of test accuracy, are 0.84 and 0.90, respectively.                   Table II compares the performance of the two models. As
                                                                         can be expected, the restricted model utilizing training with
B. Confirmed errors found by domain experts                              similar concepts achieves higher performance. We evaluated
    The restricted model detected 10 out of the 18 errors that           the results of the two ML models based on a list of errors that
domain experts found [9], while the unrestricted model                   domain experts found in a previous study [9]. Such an
detected only five errors, all contained in the above 10 errors.         evaluation is usually not available for ML studies. An
Table I shows two missing parent examples confirmed by both              interesting observation is that the set of five errors confirmed
models, and two examples confirmed only by the restricted                by the unrestricted model is a subset of the 10 errors found by
model.                                                                   the restricted model. The reason for this may be that the two
                                                                         models use the same negative test pairs where the uncles are
C. Training time efficiency                                              from the same area as the nephew. This choice was made in
     Each model is trained with 2000 epochs, with batch size =           order to be able to compare performance of the two models,
2000. We recorded the duration of the training. With the same            but tends to unnecessarily limit the unrestricted model to find
computer hardware configuration, training the unrestricted and           error pairs of similar concepts. In the future, we will perform
restricted models took 1116 and 1110 seconds, respectively.              experiments where the unrestricted and restricted models will
                                                                         use disjoint negative training data in an effort to optimize the
TABLE I.       MISSING PARENT ERRORS CONFIRMED BY THE TWO MODELS         results of each model rather than to compare them.
            Child               Model           Missing Parent                   TABLE II.       TWO MODELS PERFORMANCE COMPARISON
    Reproductive Endocrine
                                 Both        Endocrine Neoplasm                                   Unrestricted   Restricted
          Neoplasm                                                                                                             Difference
                                                                                                    Model         Model
                                            Anterior Pituitary Gland
  Basophilic Adenocarcinoma      Both                                      Confirmed Errors            5            10             5
                                                   Neoplasm
                                                                           Corresponding
   Breast Tubular Adenoma      Restricted      Tubular Adenoma                                        0.28          0.56          0.28
                                                                           Recall (out of 18)
   Cutaneous Glomangioma       Restricted   Benign Skin Neoplasm           Testing Recall             0.75         0.81           0.06
                                                                           Testing AUC                0.84         0.90           0.06
                         V.    DISCUSSION                                  Training Time (sec)        1116         1110            6

    To the best of our knowledge, this is the first published                From Table II we can see that the restricted model performs
attempt to use Machine Learning (ML) for QA of ontologies.               6% better than the unrestricted model in recall. The area under
Such technique can prepare a subset of pairs of concepts as              the curve in Fig. 4(b) is 6% larger than in Fig. 4(a), reflecting a
candidates for missing IS-A link omission errors, optimizing             better classification of the restricted model than the unrestricted
the use of scarce QA resources.                                          model. The document vectors used in this study were derived
    In this paper, we discussed two ML approaches. The                   using the Distributed Memory version of Paragraph Vector
unrestricted model utilizes all uncle concepts of the processed          (PV-DM) that works well for most tasks, as stated in the
concept. The restricted model is further taking advantage of the         original paper [13]. However, it is also recommended in the
area taxonomy of the ontology to utilize only uncles in the area         paper to combine Paragraph Vector with Distributed Bag of
Words (PV-DBOW) to obtain consistency. The more accurate                          cancer research results," Stud Health Technol Inform, vol. 107, no. Pt 1,
                                                                                  pp. 33-7, 2004.
the vector representations of concepts are, the better recall
                                                                             [2] G. Elhanan, Y. Perl, and J. Geller, "A survey of SNOMED CT direct
should be expected. This is left for future work.                                 users, 2010: impressions and preferences regarding content and quality,"
                                                                                  J Am Med Inform Assoc, vol. 18 Suppl 1, pp. i36-44, 2011.
    The recall obtained is not high enough for reliable QA for               [3] H. Min, Y. Perl, Y. Chen, M. Halper, J. Geller, and Y. Wang, "Auditing
missing IS-A links. For example, out of a random subset of 20                     as Part of the Terminology Design Life Cycle," J Am Med Inform Assoc,
suggested errors, a domain expert (GE) confirmed only one                         vol. 13, no. 6, pp. 676-690, Nov-Dec 2006.
error pair, Hair Follicle Neoplasm missing the parent Dermal                 [4] X. Zhu, J. W. Fan, D. M. Baorto, C. Weng, and J. J. Cimino, "A review
                                                                                  of auditing methods applied to the content of controlled biomedical
Neoplasm, since hair follicles reside in the dermal layer of the                  terminologies," J Biomed Inform, vol. 42, no. 3, pp. 413-25, Jun 2009.
skin. In the NCIt, Dermal Neoplasm has 13 children,                          [5] J. Geller, Y. Perl, M. Halper, and R. Cornet, "Special issue on auditing
representing a mix, based on cell origin as well as malignancy                    of terminologies," J Biomed Inform, vol. 42, no. 3, pp. 407-11, 2009.
status. Currently, Hair Follicle Neoplasm has Skin Appendage                 [6] M. Arguello Casteleiro et al., "A case study on sepsis using PubMed and
                                                                                  Deep Learning for ontology learning," Informatics for Health:
Neoplasm as a parent. Dermal Neoplasm and Skin Appendage                          Connected Citizen-Led Wellness and Population Health, vol. 235, pp.
Neoplasm are siblings. However, Hair Follicle Neoplasm has                        516-520, 2017.
only a very indirect relationship to the dermis through the                  [7] J. A. Minarro-Giménez, O. Marin-Alonso, and M. Samwald, "Exploring
Disease Has Primary Anatomic Site role with Hair Follicle as                      the application of deep learning techniques on medical text corpora,"
                                                                                  Stud Health Technol Inform, vol. 205, pp. 584-588, 2014.
the target value and the Anatomic Structure Is Physical Part Of              [8] İ. Pembeci, "Using Word Embeddings for Ontology Enrichment,"
role of Hair Follicle with the target value of Dermis. Adding                     International Journal of Intelligent Systems and Applications in
Dermal Neoplasm as a parent, or even possibly replacing Skin                      Engineering, vol. 4, no. 3, pp. 49-56, 2016.
Appendage Neoplasm with it, might be better.                                 [9] L. Zheng, H. Min, Y. Chen, J. Xu, J. Geller, and Y. Perl, "Auditing
                                                                                  National Cancer Institute thesaurus neoplasm concepts in groups of high
    A higher recall will imply fewer suggested errors with a                      error concentration," Appl Ontol, vol. 12, no. 2, pp. 113-130, 2017.
                                                                             [10] M. Halper, H. Gu, Y. Perl, and C. Ochs, "Abstraction networks for
higher percentage of confirmed errors by domain experts. In                       terminologies: Supporting management of "big knowledge"," Artif Intell
future research, we will further explore the properties of the                    Med, vol. 64, no. 1, pp. 1-16, May 2015.
two models as well as properties of ML processing in an effort               [11] Y. Wang, M. Halper, D. Wei, Y. Perl, and J. Geller, "Abstraction of
to best fine-tune and utilize both models, to increase the recall.                complex concepts with a refined partial-area taxonomy of SNOMED," J
                                                                                  Biomed Inform, vol. 45, no. 1, pp. 15-29, Feb 2012.
                                                                             [12] Y. Wang et al., "Auditing Complex Concepts of SNOMED using a
                         VI.      LIMITATIONS                                     Refined Hierarchical Abstraction Network," Journal of Biomedical
                                                                                  Informatics, vol. 45, no. 1, pp. 1-14, 09/01 2012.
    We presented a supervised training technique, therefore                  [13] Q. V. Le and T. Mikolov, "Distributed Representations of Sentences and
“annotated corpora” are required for its applicability beyond                     Documents," CoRR, vol. abs/1405.4053, / 2014.
finding missing IS-A relationships in the NCI neoplasm                       [14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of
                                                                                  word representations in vector space," arXiv:1301.3781, 2013.
taxonomy. The results are also limited, because the models are               [15] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil, "Learning semantic
not trained on more general data for more general problems.                       representations using convolutional neural networks for web search," in
                                                                                  Proc. of the 23rd Int. Conf. World Wide Web, 2014, pp. 373-374: ACM.
                                                                             [16] W.-t. Yih, X. He, and C. Meek, "Semantic parsing for single-relation
                        VII.      CONCLUSION                                      question answering," in Proceedings of the 52nd Annual Meeting of the
    We explored whether ML methods can be applied to the                          Association for Computational Linguistics (Volume 2: Short Papers),
task of ontological QA, in particular whether ML can help in                      2014, vol. 2, pp. 643-648.
                                                                             [17] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, "A convolutional
detecting missing IS-A links in an ontology. Two models are                       neural network for modelling sentences," arXiv preprint
presented. Our application of ML to the Neoplasm                                  arXiv:1404.2188, 2014.
subhierarchy of NCIt demonstrated that the restricted model                  [18] Y. Kim, "Convolutional neural networks for sentence classification,"
performs better than the unrestricted one. However, the                           arXiv preprint arXiv:1408.5882, 2014.
                                                                             [19] Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, "Neural relation
performance of the restricted model is not yet sufficient for                     extraction with selective attention over instances," in Proceedings of the
QA. In future research, we will explore improvements in ML                        54th Annual Meeting of the Association for Computational Linguistics
processing and more accurate restrictions for concepts in the                     (Volume 1: Long Papers), 2016, vol. 1, pp. 2124-2133.
training stage to improve performance of QA of ontologies.                   [20] Y. Wang, M. Halper, H. Min, Y. Perl, Y. Chen, and K. A. Spackman,
                                                                                  "Structural methodologies for auditing SNOMED," Journal of
                                                                                  biomedical informatics, vol. 40, no. 5, pp. 561-581, 2007.
                         ACKNOWLEDGMENT                                      [21] H. Min, L. Zheng, Y. Perl, M. Halper, S. de Coronado, and C. Ochs,
                                                                                  "Relating Complexity and Error Rates of Ontology Concepts," Methods
    Research reported in this publication was partially                           of Information in Medicine, vol. 56, no. 03, pp. 200-208, 2017.
supported by the National Cancer Institute of the National                   [22] C. Ochs, J. Geller, Y. Perl, and M. A. Musen, "A unified software
Institutes of Health under Award Number R01CA190779. The                          framework for deriving, visualizing, and exploring abstraction networks
                                                                                  for ontologies," JBI, vol. 62, pp. 90-105, 2016/08/01/ 2016.
content is solely the responsibility of the authors and does not             [23] R. Rehurek and P. Sojka, "Software framework for topic modelling with
necessarily represent the official views of the National                          large corpora," in In Proceedings of the LREC 2010 Workshop on New
Institutes of Health.                                                             Challenges for NLP Frameworks, 2010: Citeseer.
                                                                             [24] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization,"
                                                                                  arXiv preprint arXiv:1412.6980, 2014.
                               REFERENCES
[1]   S. de Coronado, M. W. Haber, N. Sioutos, M. S. Tuttle, and L. W.
      Wright, "NCI Thesaurus: using science-based terminology to integrate