<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EmEL++: Embeddings for  ++ Description Logic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sutapa Mondal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumit Bhatia</string-name>
          <email>sumitbhatia@in.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raghava Mutharaju</string-name>
          <email>raghava.mutharaju@iiitd.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Knowledgeable Computing</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reasoning (KRaCR) Lab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IIIT-Delhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IBM Research AI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>New Delhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TCS Research</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Innovation</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ontology</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Description Logic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geometric Embeddings</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>In A. Martin, K. Hinkelmann</institution>
          ,
          <addr-line>H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.)</addr-line>
          ,
          <institution>Proceedings of the AAAI 2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE 2021) - Stanford University</institution>
          ,
          <addr-line>Palo Alto, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graph (KG) embedding models have recently gained increased attention. However, most of the existing models for KG embeddings ignore the structure and characteristics of the underlying ontology. In this work, we present EmEL++ embeddings - an ontology-based embedding model for the ++ description logic. EmEL++ maps the classes and the relations in an ontology to an n-dimensional vector space such that the relations between classes and relations in the ontology are preserved in the vector space. We evaluate the proposed embeddings on four diferent datasets and show that the proposed embeddings outperform the traditional knowledge graph embeddings on the subsumption reasoning task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the embeddings produced by such methods are not suited for reasoning tasks such as
classification, satisfiability and consistency checking. Kulmanov et al. [7] have recently proposed EL
Embeddings (ElEm) and have described a geometric interpretation of embedding  ++
ontologies in an  -dimensional vector space (Section 3). However, the ElEm embeddings are limited
in terms of the coverage of  ++ constructs as they ignore the role oriented constructs in such
ontologies. Further, the use case considered by them is predicting protein-protein interactions
that is modeled as a traditional link prediction task in knowledge bases. In this work, we
address the limitations of ElEm embeddings and propose EmEL++ embeddings. We build upon
the framework introduced by Kulmanov et al. [7] and extend it to ofer more complete
coverage of the  ++ semantics (Section 3.1) for performing subsumption reasoning task. Baader
et al. [8] have shown that all the standard reasoning tasks in  ++ ontologies can be reduced
to subsumption task. Thus, to the best of our knowledge, we present the first attempts at
performing reasoning tasks by embedding ontologies in a vector space. We compare our proposed
approach using four ontologies of diferent sizes and characteristics. Our evaluation shows that
EmEL++ outperform the traditional KG embeddings at the reasoning task and are also able to
preserve the characteristics of underlying ontologies better. We would also like to emphasize
that performing reasoning in the vector space is critical as it has the potential to speed up the
reasoning process significantly. As we describe in Section 4, the subsumption task in vector
space involves computing distances between the source class and all the other classes in the
ontology. In the worst case, this is an  ( ) operation where  is the number of classes. Thus,
irrespective of the complexity of the underlying ontology, the subsumption task could be
performed in  ( ) time. Further, with uses of techniques such as semantic hashing or binarized
embeddings [9], the similarity based search operations can be performed in  (1) time.
Therefore, we believe that embedding based approaches, despite their lower accuracies than standard
reasoners and no theoretical guarantees of performance, ofer a promising direction of future
research to develop more eficient reasoners, especially for more complex description logics
such as  (OWL 2 DL).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        A wide range of methods for computing KG embeddings have been proposed. Node2Vec [10]
initiated the idea of learning features for networks addressing the scalability challenge.
Although their results are decent for link prediction task but they make assumptions on
conditional independence of the feature space which may not hold true in real world scenarios.
Eventually, this concept got popularized with KGs wherein, a fact is represented as a triple of
the form (h,r,t). Semantic matching KG models exploit similarity based scoring functions and
match latent semantics of entities and relations based on vector representations. For
example, [4] proposed RESCAL which uses multiple matrices to represent relations among entities
but scalability remains an issue. [3] proposed DistMult to overcome challenges of RESCAL.
Although, DistMult is similar to RESCAL but DistMult ensures low number of parameters for
relations by restricting the matrices. Later, translational based models for KG embeddings used
distance based scoring functions. This technique gained most attention due to its simplicity to
measure the correctness of a fact. It measures the plausibility of a fact as distance between the
entities after translation carried out by relation. TransE[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], TransH [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and TransR[6] being
diferent variants of the same. TransE being the most representative model, treats the relations
as translations such that given a fact, relation vector r minimizes the distance between h and t
in vector space. Whereas, TransH interprets relations as translating operator on a hyperplane
and TransR models entities and relations in two distinct spaces, performing translation on
corresponding relation space. [11] worked on a novel approach inspired by the theory of quantum
logic to embed a Knowledge Base (KB) in  description logic. Existing works on ontology
embedding such as Onto2vec [12] focused on using word2vec as an underlying model. Most
of this work focuses on encoding the entities and relations, but they lack in handling the
complex relations in an ontology. Recently, [13] pointed out the lack of expressivity in the classical
approaches to model relations for KG embeddings. Moreover, their work indicates that
geometric models are a better way to learn embeddings for ontologies. Further, [14] present a new
approach using deep learning with knowledge based systems to emulate reasoning structure.
Although, they perform experiments on a synthetic and a non-synthetic dataset, but in our
work we look at multiple datasets with diferent characteristics and reason about their varying
performances. However, [7] tries to overcome the drawbacks of KG embeddings but it does not
address all the  ++ constructs that are relevant to capture the relations present in an ontology.
Further, the evaluation is focused only on link prediction task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <sec id="sec-3-1">
        <title>3.1. Embedding  ++ Ontologies in a Vector Space</title>
        <p>A description logic signature is a tuple ⟨ 
,
 
,
  ⟩, where  
infinite, mutually disjoint sets of concept names, role names, and individual names respectively.</p>
        <sec id="sec-3-1-1">
          <title>In the following discussion, { ,</title>
          <p>1,  2,  ,  , ⊤, ⊥} ∈  
, { , ,
  1,  2} ∈  
, and { , } ∈   .</p>
          <p />
          <p>All the axioms in the  ++ description logic can be reduced to one of the normal forms [15]
as follows:
∃.
1 ⊑ ,
and  1 ⊑ ∃.
2)</p>
          <p>.</p>
          <p>appear in the first three normal forms.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3. All role inclusions are of the form  ⊑</title>
          <p>or  1◦ 2 ⊑  .
1. All the TBox axioms are in one of the four normal forms ( 1 ⊑ ,  1 ⊓  2 ⊑ ,
2. The bottom concept can only appear on the right side of the equations, and can only
Further, the instantiation and role assertion axioms in the ABox can be converted into TBox
axioms as follows:</p>
          <p>( ) ⟶ { } ⊑ 
 (,  ) ⟶ { } ⊑ ∃. { }</p>
          <p>Thus, with the above transformations, all the axioms in an  ++ ontology can be reduced to
one of the normalized forms and the task of embedding ontologies in a vector space requires
us to learn mapping functions for classes and relations that are part of the normal forms.</p>
          <p>
            Typically, for mapping the entities of interest to a vector space, requires to learn a mapping
function subject to certain constraints, encoded in the form of an objective function that is
optimized during the training phase. These objective functions are designed such that
specific properties of the underlying entities are also retained in the vector space. For example,
the word2vec [16] model for word embeddings minimizes the distance between contextually
similar words, RDF2Vec [17] adapts language modeling approach to capture local information
from the graph sub-structures, and TransE [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] model for KGs. Similarly, in this work we learn
mapping functions that can embed  ++ ontologies in a vector space. In order to do so, we
build upon and extend the framework proposed by Kulmanov et al. [7] that interprets a class in
the ontology as an  -ball (defined by its radius and center) in the vector space. Let us consider
two classes  and  such that  ⊑  . Let these two classes be represented by their respective
 -balls   and   in the vector space such that   ∶ {⃗,   } and   ∶ {⃗,   }; where ⃗ and ⃗
are the centers and   and   are the radii of the respective  -balls. Geometrically, if  ⊑  ,
the mapping function should aim to ensure that the   lies inside   (Figure1 (a)). Similarly, if
 and  are disjoint, the respective  -balls should not overlap with each other in the vector
space (Figure1 (b)). Further, similar to the TransE model [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], the relations in the ontology are
interpreted as translations operating on the classes. More specifically, if  ⊑ ∃. , the center
of  -ball representing  can be moved to the center of the  -ball representing  (Figure1 (c)).
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Loss Functions</title>
        <p>With the intuitive framework described above, let us now describe the objective functions
that should be optimized during the training phase to learn the mapping functions. Let   ∶
 ∪  ⟼   be the mapping function that maps each class and relation to a unique vector
in the  -dimensional embedding space. For   ∈  , the resulting vector corresponds to the
centre of the  -ball representing the class. Further, let   ∶  ⟼  + be the mapping function
that maps each class into a non-negative real number, that represents the radius of the 
ball corresponding to class  . Thus, the pair (  ,   ) of functions represents the operations
needed to embed an  ++ ontology into an  -dimensional space. We now describe the various
loss functions to represent the diferent constructs in  ++. The total loss that needs to be
minimized during the learning process is the sum of the individual loss functions.
3.2.1. Loss Functions for the Four Normal Forms:
As described before, the first normal form (  ⊑  ) when embedded in a vector space can
be interpreted geometrically as two  -balls, such that the  -ball corresponding to class  lies
inside the  -ball corresponding to  . Hence, our mapping functions   and   should bring the
centers of the two classes closer to each other, and give the sub-class a smaller radius than the
super-class. The loss function presented in Equation 1 captures this intuition and penalizes
the mappings that do not adhere to these constraints. Also note that in addition to the above
constraints, we also add margin loss ( ) and a normalization loss that brings the centres of
 -balls of all the classes on the unity sphere.</p>
        <p>⎛ ⎞
⎜ ⎟
⊑ (,  ) = max ⎜⎜⎜⎜⎝0, ( cpe‖‖⏟ne⏞nt⏞⏞ea⏞(⏞r⏞l⏞si⏞z⏞)⏞a⏞e⏞−r⏞⏞i⏟ef⏞ft⏞a⏞h⏞⏞r⏞e(⏞⏞a⏞t⏞w⏞w⏞)⏞⏞‖‖a⏟oy + pehn⏟aa⏞s⏞l⏞(⏞il⏞za⏞⏞e)r⏞⏞g⏞⏞i−⏞ef⏟rs⏞⏞u⏞r⏞a⏞b⏞(⏞d-⏞⏞ci⏞⏞ul⏞)⏟asss − )⎟⎟⎟⎟⎠ + |||‖‖  ( )‖‖ − 1 ||| +|||‖‖  ( )‖‖ − 1 ||| (1)
In the vector space, the second normal form, i.e.,  ⊓  ⊑  , implies that the  -ball for class
 should completely engulf the area of intersection of  -balls for classes  and  . The first
term in the loss function (Equation 2) imposes a penalty if the classes  and  are disjoint. The
second and third terms together enforce that the center of the  -ball for class  lies in the area
of intersection of  -balls for classes  and  such that it satisfies the normal form. Finally, the
fourth term requires the radius of the  -ball of  to be greater than the smallest radii among
 -balls of  and  .
⊓⊑
(, ,  ) = max (0, (‖‖  ( ) −   ( )‖‖ −   ( ) −   ( ) −  ))+ max (0, (‖‖  ( ) −   ( )‖‖ −   ( ) −  ))
+ max (0, (||  ( ) −   ( )|| −   ( ) −  ))+ max (0, (min (  ( ),   ( ))−   ( ) −  ))
| | | | | |
+|‖‖  ( ) − 1 ‖‖| +|‖‖  ( ) − 1 ‖‖| +|‖‖   ( ) − 1 ‖‖|
| | | | | |
(2)</p>
        <p>
          The first two normal forms are concerned with the mappings of classes and properties of
their respective  -balls in the vector space. The next two normal forms, i.e., NF3 and NF4,
involve relations and how they are associated with the classes. Recall that similar to TransE [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ],
we consider the relations in ontology as translations that operate on class instances. Consider
the normal form  ⊑ ∃. . In the vector space,  and  are represented as two  -balls   and
  , respectively. If   ( ) is the vector for  in vector space, then adding   ( ) to a point in  
should move it to a point in   (i.e.,  translates the points in   to points in   ). The following
loss functions capture these semantics as expressed by the third and fourth normal forms.
⊑ ∃. (, ,  ) = max (0, (‖‖  ( ) +   ( ) −   ( )‖‖ +   ( ) −   ( ) −  )) + |||‖‖  ( )‖‖ − 1 ||| +|||‖‖  ( )‖‖ − 1(3|||)
∃.⊑
(, ,  ) = max (0, (‖‖  ( ) −   ( ) −   ( )‖‖ −   ( ) −   ( ) −  )) + |‖  ( )‖‖ − 1 | +|‖‖  ( )‖‖ − 1 ||
|‖ | |
| | | (4|)
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.2.2. Handling Bottom Concept (⊥):</title>
      <p>Recall from the discussion in Section 3 that the bottom concept can appear only on the right
hand side of the first three normal forms [8]. We now present the loss functions for each of the
three special cases. The resulting first normal form  ⊑ ⊥ indicates that class  is unsatisfiable.
Thus, in the vector space, we represent this constraint by reducing the radius of class  to zero.
This is achieved by the following loss function.</p>
      <p>⊑⊥ ( ) =   ( )</p>
      <p>Next, the second normal form with the bottom concept is  ⊓  ⊑ ⊥ indicating that  and
 are disjoint. In the vector space, this indicates that the  -balls of classes  and  are
nonoverlapping. This is captured by the following loss function.</p>
      <p>⊓⊑⊥
(,  ) = max (0, (  ( ) +   ( ) −‖‖  ( ) −   ( )‖‖ +  )) + |‖  ( )‖‖ − 1 | +|‖‖  ( )‖‖ − 1 ||| (6)
|‖ | |
| | |</p>
      <p>Finally, the third normal form ∃. ⊑ ⊥ indicates that in the vector space translating  by 
results in an unsatisfiable class. We already require the radius of unsatisfiable classes to be zero
(Equation 5) and since translation does not change the radius of the original class, we have the
following loss function.</p>
      <p>∃.⊑⊥</p>
      <p>(,  ) =   ( )
3.2.3. Loss Functions for Role Inclusions and Role Chains:
The role vectors in our proposed framework serve the purpose of translating one class to
another class. The constraints considered until now have imposed restrictions on the role vectors
based on their relations with the  -balls of the concerned classes. We now present two loss
functions to capture the constraints imposed by role inclusions and role chains in the
ontology. The role inclusion of  ⊑  implies that the vectors   ( ) and   ( ) in the vector space
should be nearby because any translation produced by  should also be producible by  plus
both the vectors should be in the same direction. This intuition is captured by the following
loss function represented by Equation 8. Herein, the first term is indicative of the distance that
ensures the vectors   ( ) and   ( ) lie in near vicinity of each other. The second term captures
the directional aspect of roles in vector space such that they tend to be in same direction.</p>
      <p>⊑ (,  ) = max (0,‖‖  ( ) −   ( )‖‖ −  ) + ||||| 1 − ‖‖    (( ))‖‖.‖‖    ((  ))‖‖ ||||| + |||‖‖  ( )‖‖ − 1 ||| +|||‖‖  ( )‖‖ − 1 ||| (8)
Next, we consider the hierarchy defined by the role chain  1◦ 2 ⊑  . In the vector space, this
implies that if class  can be translated to class  by successive application of  1 and  2, it can
(5)
(7)
also be translated to  directly by the vector for role  while preserving the direction of role
vectors. The following loss function captures this behavior represented by Equation 9.
  ⋢∃. (, ,  ) = max (0,   ( ) +   ( ) −‖‖  ( ) +   ( ) −   ( )‖‖ +  ) + |||‖‖  ( )‖‖ − 1 ||| +|||‖‖  ( )‖‖ − (11|||0)
Thus, the total loss for learning the embedding function is the sum of all the loss functions
given by Equations 1- 10. Further, we also add the constraint that radius of the satisfiable
classes are non-negative and penalize the total loss for learning negative radius for classes.</p>
      <sec id="sec-4-1">
        <title>3.3. Training and Implementation</title>
        <p>Given an  ++ ontology, we first normalize the ontology to generate the normal forms. These
normal forms then constitute as a set of TBox statements wherein each axiom is treated as a
positive sample. This normalization is performed using the OWL APIs and the APIs provided
by the jCel reasoner which implements the normalization rules [18]. We then introduce
negative samples using the third normal form. We randomly generate corrupted axioms following
 ⊑ ∃. , by replacing  or  with  ′ or  ′ such that neither  ′ ⊑ ∃. nor  ⊑ ∃. ′
are asserted axioms in the ontology. Therefore, based on the facts the training process learns
ontology embedding such that the facts hold true.</p>
        <p>The code for training of embeddings and optimization is implemented using Python and
Tensorflow library, and Adam optimizer [19] is used for updating the embeddings. We start
the learning process by initializing the embedding vectors for classes and relations by random
values. We process the training samples in mini-batches for each of the losses defined for the
normal forms along with the losses for roles, and update the embeddings depending upon the
total loss i.e. the sum of all the loss functions. The update process is carried till saturation or a
ifxed number of epochs.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiments and Results</title>
      <sec id="sec-5-1">
        <title>4.1. Datasets</title>
        <p>We use following four commonly used and publicly available ontologies of varying size and
diferent characteristics.</p>
        <p>1. SNOMED CT [20] is one of the most comprehensive ontology of clinical terms with
more than 989186 TBox statements involving 307712 classes and 60 relations.
2. Anatomy [21] is an ontology captures linkages of diferent phenotypes to genes. It
consists of 278883 TBox statements involving 106495 classes and 218 relations.
3. Gene Ontology(GO) [22] unifies the representation of gene across all species. It consists
of 130094 TBox statements, 45907 classes and 16 relations.
4. GALEN [23] also represents clinical information. It consists of 84537 TBox statements
with 24353 classes and 1010 relations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Baselines</title>
        <p>We consider the following commonly used knowledge graph embeddings for comparison with
our proposed EmEL++ embeddings.</p>
        <p>
          1. TransE [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], one of the most frequently used embedding model for knowledge graph,
introduced the idea of translation based embeddings where the relations between entities
is interpreted as a translation operation between the entities.
2. TransH [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], is an extension of TransE that better handles reflexive, one-to-many,
manyto-one, and many-to-many relations. TransH considers relations as hyperplanes in the
embedding space. The translation operation is then performed over the projections of
entities on the hyperplane.
3. DistMult [3], a matrix factorization based embedding model, has been found empirically
to perform well at compositional reasoning tasks.
4. EL Embeddings (ElEm) [7] is one of the first embedding models for the  ++
description logic based model. Our proposed model is also an extension of ElEm embeddings
and enhances ElEm by introducing additional constraints for a more comprehensive
coverage of  ++ description logic.
and  is the margin loss parameter.
        </p>
        <p>Best performing Hyper-parameters for each model.  indicates the dimension of embedding vectors
GALEN</p>
        <p>GO</p>
        <p>We use the pykeen framework [24] for implementations of TransE, TransH, and DistMult
embedding models. For ElEm embeddings, we used the source code provided by the authors1.
Our implementation of EmEL++ is publicly available at https://github.com/kracr/EmELpp.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Experimental Protocol</title>
        <p>For learning the embeddings by diferent models, we first normalize the ontologies as described
in Section 3. Next, we remove 30% of the subclass relation pairs from the normalized ontology
to be used for validation (20%) and testing (10%). The remaining ontology with 70% sub-class
relation pairs is used as the training set for learning the embedding functions. Further, we take
an inferences set that consists of the inferences drawn on the training set using a standard Elk
reasoner to evaluate the performance of learned embeddings. We perform hyper-parameter
tuning using the 20% validation set and report the performance of fine-tuned models on the
test set. The hyper-parameters to tune for all the models are the dimensions of the embedding
vectors, and the margin parameter  . We consider 
= {50, 100, 200} and 
yielding nine diferent settings. The best performing hyper-parameters for each of the models
= {−0.1, 0, 0.1}
are reported in Table 2.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Reasoning Performance of EmEL++</title>
        <p>We chose subsumption as the main task to evaluate the efectiveness of the proposed EmEL ++
embeddings. Note that once we have embedded the ontologies in a vector space, we have
to reduce all the tasks we want to accomplish to operations that can be performed in an 
dimensional space. We reduce the task of subsumption in the embedding vector space as a
and rank all the other classes in the ontology in increasing order of their distance from
distance-based operation. Given a test instance of the form  ⊑ 
, we take  as our source class
vector space. We then compare the efectiveness of diferent embedding models based on the
 in the
rank at which  is present in the ranked list. An embedding model that successfully captures
the subclass relation between the two classes should be able to assign vector representations
to the two classes that are very close to each other, hence, producing a lower rank for  .
report and evaluate the performance using six metrics. Hits at ranks 1, 10 and 100 report the
fraction of test cases for which the expected class was found within top 1, 10 and 100 ranks,
respectively. A median rank of</p>
        <p>means that for 50% of the test cases, the correct answer was
1https://github.com/bio-ontology-research-group/el-embeddings</p>
      </sec>
      <sec id="sec-5-5">
        <title>4.5. Preserving Ontology Characteristics in Vector Space</title>
        <p>Next, we compare the ElEm and EmEL++ in terms of retaining the underlying characteristics of
ontology in vector space. Recall that both the models map the classes in an ontology to  -balls
in the vector space. Further, the mapping is such that the  -ball of a super-class subsumes the
 -balls of its sub-classes. Thus, for a test instance  ⊑  , we check that the  -ball of class 
lies inside the  -ball of class  in the vector space. Note that since we have the centers and
radii of the corresponding  -balls, this can be checked easily. Table 4 presents the training,
testing, and the inference accuracy obtained for the two embedding models for this task. We
report accuracy values, i.e., the fraction of instances where the subsumption relation between
the classes was maintained in the vector space. Note that this is a much stricter criterion for
even if the subclass  -ball is slightly outside the  -ball of the superclass, it will be considered
a failure. We observe from Table 4 that EmEL++ outperforms the ElEm embeddings for all the
datasets and across all settings. This indicates that EmEL++ embeddings are better at preserving
the class relationships in the mapped vector space than ElEm embeddings.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>We proposed EmEL++, an embedding model for  ++ ontologies. EmEL++ builds upon and
extends the previously proposed ElEm embeddings by focusing on role inclusions and role
chains and ofers a more complete coverage of  ++. Experiments with four diferent
ontologies showed that EmEL++ outperforms traditional KB embeddings on the subsumption
reasoning task. Further, when compared with ElEm embeddings, it is able to better preserve the
underlying semantics of the ontologies in the vector space. We have also shown how to
perform the subsumption reasoning task in a vector space, which is an  ( ) operation in the worst
case. We believe this is an important capability and it ofers exciting directions for future work.
[3] B. Yang, W.-t. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and
inference in knowledge bases, arXiv preprint arXiv:1412.6575 (2014).
[4] M. Nickel, V. Tresp, H.-P. Kriegel, A three-way model for collective learning on multi-relational
data., in: Icml, volume 11, 2011, pp. 809–816.
[5] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple link
prediction, International Conference on Machine Learning (ICML), 2016.
[6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge
graph completion, in: Twenty-ninth AAAI conference on artificial intelligence, 2015.
[7] M. Kulmanov, W. Liu-Wei, Y. Yan, R. Hoehndorf, EL embeddings: geometric construction of
models for the description logic el++, in: Proceedings of the 28th International Joint Conference on
Artificial Intelligence, AAAI Press, 2019, pp. 6103–6109.
[8] F. Baader, S. Brandt, C. Lutz, Pushing the EL Envelope, LTCS-Report LTCS-05-01, Chair for
Automata Theory, Institute for Theoretical Computer Science, Dresden University of Technology,
Germany, 2005. See http://lat.inf.tu-dresden.de/research/reports.html.
[9] V. Misra, S. Bhatia, Bernoulli embeddings for graphs, in: Thirty-Second AAAI Conference on</p>
      <p>Artificial Intelligence, 2018.
[10] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the
22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp.
855–864.
[11] D. Garg, S. Ikbal, S. K. Srivastava, H. Vishwakarma, H. Karanam, L. V. Subramaniam, Quantum
Embedding of Knowledge for Reasoning, in: Advances in Neural Information Processing Systems,
2019, pp. 5595–5605.
[12] F. Z. Smaili, X. Gao, R. Hoehndorf, Onto2vec: Joint vector-based representation of biological
entities and their ontology-based annotations, Bioinformatics 34 (2018) i52–i60.
[13] Ö. Özçep, M. Leemhuis, D. Wolter, Cone semantics for logics with negation, in: Proceedings of the</p>
      <p>Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI, 2020, pp. 1820–1826.
[14] A. Eberhart, M. Ebrahimi, L. Zhou, C. Shimizu, P. Hitzler, Completion reasoning emulation for the
description logic el+, arXiv preprint arXiv:1912.05063 (2019).
[15] F. Baader, S. Brandt, C. Lutz, Pushing the EL envelope, in: IJCAI, volume 5, 2005, pp. 364–369.
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and
phrases and their compositionality, in: Advances in neural information processing systems, 2013,
pp. 3111–3119.
[17] P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph embeddings for data mining, in: International
Semantic Web Conference, Springer, 2016, pp. 498–514.
[18] J. Mendez, jcel: A Modular Rule-based Reasoner., in: ORE, 2012.
[19] D. P. Kingma, J. Ba, Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014), 2014.
[20] K. Donnelly, SNOMED-CT: The advanced terminology and coding system for ehealth, Studies in
health technology and informatics 121 (2006) 279.
[21] C. J. Mungall, C. Torniai, G. V. Gkoutos, S. E. Lewis, M. A. Haendel, Uberon, an integrative
multispecies anatomy ontology, Genome biology 13 (2012) R5.
[22] G. O. Consortium, The Gene Ontology (GO) database and informatics resource, Nucleic acids
research 32 (2004) D258–D261.
[23] A. Rector, J. Rogers, P. Pole, The GALEN high level ontology (1996).
[24] M. Ali, H. Jabeen, C. T. Hoyt, J. Lehmann, The KEEN Universe, in: International Semantic Web</p>
      <p>Conference, Springer, 2019, pp. 3–18.
[25] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks, Journal of the</p>
      <p>American society for information science and technology 58 (2007) 1019–1031.
[26] L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A: statistical mechanics
and its applications 390 (2011) 1150–1170.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding by translating on hyperplanes</article-title>
          ,
          <source>in: Twenty-Eighth AAAI conference on artificial intelligence</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>