<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods and Metrics for Knowledge Base Engineering and Integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgos Stoilos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Geleta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Szymon Wartak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sheldon Hall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Khodadadi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yizheng Zhao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ghadah Alghamdi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renate A. Schmidt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Babylon Health</institution>
          ,
          <addr-line>London, SW3 3DD</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, The University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Today a wealth of knowledge and data are distributed using Semantic Web standards. Especially in the (bio)medical domain several sources like the SNOMED CT, NCI, MedDRA, MeSH, ICD-10 ontologies, and many more are distributed in RDF and OWL. These can be aligned and integrated in order to create one large medical Knowledge Base. However, integrating di erent and largely heterogeneous sources is far from trivial. First, although distributed in OWL many of the ontologies may not strictly follow the semantics of subClassOf as originally intended for faceted search or use as thesauri. Second, even when they do follow strict ontological guidelines, di erent ontologies may conceptualise the same domain in radically di erent ways. Analysing and understanding these sources before integrating them is highly bene cial. Third, monitoring and understanding how the structure of the Knowledge Base changes (evolves) after the integration is also crucial since changes to its structure may a ect applications that are built on top of it. In the current paper we report on our Knowledge Base construction pipeline which is based on ontology integration. We focus on the various metrics, techniques, and tools we have developed in order to assist in achieving this large-scale integration task. Our work was motivated by the need for a medical Knowledge Base to be used to support digital healthcare services developed at Babylon Health. We present results on the metrics used to analyse various sources and the results of running the pipeline on several medical ontologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Today a wealth of knowledge and data are distributed using Semantic Web
technologies and standards. For example, the Linked Open Vocabularies e ort [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
contains more than 600 ontologies for various subjects like geography,
multimedia, security, geometry, and more. Especially in the biomedical domain, a large
number of sources have been developed during the previous decades such as the
SNOMED CT3, NCI [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], MedDRA4, MeSH5 ontologies, and many more, while
BioPortal [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a repository of more than 600 biomedical ontologies.
Identifying the common entities between these vocabularies and integrating them [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
is bene cial for building ontology-based applications as one could unify
complementary information that these vocabularies contain building a \complete"
Knowledge Base (KB). Such correspondences can be identi ed using ontology
matching (alignment ) techniques [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        However, identifying correspondences between ontologies is just the rst step
towards performing the actual integration. Besides classes with their respective
labels, sources distributed in Semantic Web standards usually come with a class
hierarchy. Depending on the purpose for which ontologies were initially created
these hierarchies may exhibit signi cant incompatibilities. First, it is not
uncommon that sources are created with the intention to be used for supporting faceted
search or act as thesauri. In this case the semantics of subClassOf may not be
the intended subset relationship. Examples of such sources in the biomedical
domain are coding systems like Read Codes, MeSH, and ICD-10. Second, even
if the sources follow the strict semantics of RDF(S) and OWL and even if they
model the same domain they may still exhibit structural incompatibilities due
to the way they conceptualise the domain. For example, in the NCI ontology
proteins are declared to be disjoint from anatomical structures whereas in the
FMA ontology proteins are subclasses of anatomical structures. In this case a
naive integration can lead to many undesired logical consequences such as
unsatis able classes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This may even be the case if an ontology Oe is designed
as an extension of another O simply because it is developed by a di erent
independent community which decides to alter the structure of O. Last but not
least, various services built on top of the KB may also impose requirements on
the structure and properties of the KB. Consequently, there is a dire need to
develop methods that will help us analyse, monitor, and evaluate the content of
(large) Knowledge Bases [
        <xref ref-type="bibr" rid="ref18 ref8 ref9">8, 18, 9</xref>
        ].
      </p>
      <p>
        A signi cant e ort in creating a large medical KB recently started in
Babylon Health6 using ontology matching and integration [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This medical KB is
intended to support various services within Babylon like text annotation,
understanding, reasoning, drug prescribing, clinical healthcare, and more. As a
critical domain the content of the KB needs to be consistent, accurate, and of
high quality, raising the need to monitor the evolution process closely. This is far
from trivial as some of the used sources are quite large and manual inspection
is impossible. Consequently, a set of metrics have been implemented to analyse
the content and structure of the KB and ensure integrity along various
dimensions. Some of these metrics are inspired by well-known logic-based measures
(including consistency and coherence) while others by services that depend on
the KB. For example, text annotation and Named Entity Disambiguation
ser3 https://www.snomed.org/
4 https://www.meddra.org/
5 https://meshb.nlm.nih.gov/
6 https://www.babylonhealth.com/
vices require that the KB exhibits a low level of \ambiguity" in order to be as
easy as possible to associate classes (IRIs) from the KB to words in user text.
The collected statistics are inspected in order to determine if the integration
was successful or the pipeline needs to re-run with di erent parameters. The
metrics are also used to assist us in analysing candidate source for integration
by assessing their structural compatibility with the current KB. Some of these
metrics have been presented previously in the literature, however, some are novel
or provide re nements of previous metrics in order to adapt them to our case.
      </p>
      <p>In the current paper we present our ontology integration pipeline with
emphasis on the evaluation and analysis steps presenting the metrics we have
implemented. We have used these to analyse the structure of many well-known medical
sources and we report on the results. Next, we report statistics about using the
pipeline to integrate many popular medical ontologies like SNOMED CT, NCI,
and FMA. To the best of our knowledge the full versions of these ontologies have
never been actually integrated (uni ed) before; only smaller fragments of them
have been aligned7 and, the computed mappings were never used to actually
merge them. Finally, we developed a highly optimised logical di erence analysis
tool to analyse the Australian and Canadian country extensions with respect to
the base SNOMED CT international version and by its use some discrepancies
were found in the Australian extension.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ontologies and Ontology Matching</title>
      <p>For brevity, throughout the paper we will mostly use Description Logic notation.
However, sometimes we will also use a triple notation, e.g., instead of A v B we
may write hA rdfs:subClassOf Bi and instead of A v 9R:B we may write hA R Bi.
For a set of real numbers S we use S to denote the sum of its elements. For p an
ontology pre x and C some class if we wish to quantify the ontology in which C
appears we use the notation p:C. Hence, for distinct IRI pre xes p1 6= p2, p1:C
and p2:C denote distinct classes. For an ontology O we use Sig(O) to denote
the set of class names that appear in O. Given an ontology O we assume that
each class C in O has at least one triple of the form hC skos:prefLabel vi and
zero or more triples of the form hC skos:altLabel vii. For a given class C function
pref(C) returns the string value v in the triple hC skos:prefLabel vi. An ontology
is called coherent if every C 2 Sig(O) n f?g is satis able.</p>
      <p>In the literature, the notion of a Knowledge Base is almost identical to that
of an ontology, i.e., a set of axioms describing the entities of a domain. In the
following, we loosely use the term \Knowledge Base" (KB) to mean a possibly
large ontology that has been created by integrating various other ontologies but,
formally we assume a KB is an OWL ontology.</p>
      <p>
        Ontology matching (or ontology alignment ) is the process of discovering
correspondences (mappings) between the entities of two ontologies O1 and O2. To
represent mappings we use the formulation presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. That is, a mapping
7 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2017/results/
between O1 and O2 is a 4-tuple of the form hC; D; ; ni, where C 2 Sig(O1)
D 2 Sig(O2), 2 f ; w; vg is the mapping type, and n 2 (0; 1] is the con dence
value of the mapping. Moreover, we interpret mappings as DL axioms|that
is, hC; D; ; ni can be seen as the axiom C D where the con dence degree
is attached as an annotation. Hence, for a mapping hC; D; ; ni when we write
O [ fhC; D; ig we mean O [ fC Dg while for a set of mappings M, O [ M
denotes the set O [ fm j m 2 Mg. When not relevant and for simplicity we
will often omit and n and simply write hC; Di. A matcher is an algorithm that
takes as input two ontologies and returns a set of mappings.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Building Large Knowledge Bases</title>
      <p>
        In this section we brie y present the ontology matching and integration pipeline
we designed for building a large KB [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] mostly through illustrative examples.
Example 1. Consider an ontology-based application that uses the SNOMED
CT ontology Osnmd as a KB. Although SNOMED CT is a large and
wellengineered ontology some relevant medical information may be missing. For
example, for the disease \Ewing Sarcoma" SNOMED CT only contains the axiom
snmd:EwingSarcoma v snmd:Sarcoma and no relations to signs or symptoms. In
contrast, the NCI ontology Onci contains the following axiom about this disease:
nci:EwingSarcoma v 9nci:mayHaveSymptom:nci:Fever
We can use ontology matching to establish links between the related entities
in Osnmd and Onci and then integrate them in order to enrich our KB. More
precisely, using a matching algorithm we can identify the following mappings:
m1 = hsnmd:EwingSarcoma; nci:EwingSarcoma; i
m2 = hsnmd:Fever; nci:Fever; i
and hence replace our KB with Os0nmd := Osnmd [ Onci [ fm1; m2g. Then, Os0nmd
contains the knowledge that \Ewing sarcoma may have fever as a symptom". }
However, it is well known that due to di erences in the structure of the two
ontologies, the integration may introduce undesired consequences like unsatis able
classes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or new subClassOf relations to the initial KB [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Example 2. Consider again the SNOMED CT and NCI ontologies. Both contain
classes for the notion of \soft tissue disorder" and \epicondylitis". Hence, it is
reasonable for a matching algorithm to compute the following mappings:
m1 = hsnmd:SoftTissueDisorder; nci:SoftTissueDisorder; i
m2 = hsnmd:Epicondylitis; nci:Epicondylitis; i
However, in NCI we have Onci j= nci:Epicondylitis v nci:SoftTissueDisorder while
in SNOMED CT Osnmd 6j= snmd:Epicondylitis v snmd:SoftTissueDisorder. Hence,
for the integrated ontology KBint := Osnmd [ Onci [ fm1; m2g we will have:
KBint j= snmd:Epicondylitis v snmd:SoftTissueDisorder
Algorithm 1 KnowledgeBaseConstruction(KB; O; Con g)</p>
      <p>Input: The current KB KB, a new ontology O and a con guration Con g.
introducing a relation between classes of Osnmd that did not originally hold.</p>
      <p>
        To repair this issue the typical approach followed in literature removes some
of the computed mappings, i.e., either m1 or m2 [
        <xref ref-type="bibr" rid="ref12 ref5 ref7">7, 5, 12</xref>
        ]. But removing
mappings will cause the KB to contain two di erent classes for the same real world
entity with a large overlap in their labels. An alternative approach studied in
depth in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] removes axioms from the new ontology. In this example, we can
compute KBi2nt := KBi1nt n fnci:Epicondylitis v nci:SoftTissueDisorderg and hence
KBi2nt 6j= snmd:Epicondylitis v snmd:SoftTissueDisorder as desired. }
      </p>
      <p>In addition, various services that are built on top of the KB may impose
other types of requirements on the KB and integrating sources may negatively
impact them.</p>
      <p>Example 3. Assume that our SNOMED CT-based KB is used to support medical
text annotation and Named Entity Disambiguation services. These take a user
text like \I have a severe pain in my head" and annotate it with classes from
the KB. More precisely, words \severe", \pain", and \head" would be associated
with the respective classes in the SNOMED CT ontology assigning meaning to
the text. For example, the word \severe" would be associated to the SNOMED
CT class Severe.</p>
      <p>Assume now that NCI is integrated with SNOMED CT. NCI contains class
SevereAdverseEvent with an alternative (synonymous) label \severe".
Consequently, after the integration there are two di erent classes that can be associated
with the word \severe". The choice of which class to use may have signi cant
impact on the application and the interoperability of services.
}</p>
      <p>
        Our ontology integration approach is given in Algorithm 1. The algorithm
accepts as input the current KB KB, a new ontology O which will be used to enrich
KB and a con guration Con g, which is used to tune and change various
parameters like thresholds and weights. In brief, the algorithm rst saturates the input
ontology using an OWL reasoner such as HermiT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as well as additional custom
saturation rules. As the KB is loaded to a triple-store for scalable SPARQL query
answering this step is meant to improve the completeness of SPARQL query
answering [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Subsequently, it applies a set of matchers in order to compute a set
of mappings between KB and O, it aggregates them using a di erent weight for
each matcher (matcher:w), and nally removes those mappings that fall below a
certain threshold (Con g:Align:thr). As mentioned previously, newly introduced
subClassOf relations between symbols of the current KB are eliminated and this
is performed in method postProcess. This method is quite involved and details
can be found in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>In the current paper we focus on the nal analysis step of our integration
pipeline. After the matching and mapping post-processing, a set of analysers
are applied. These analysers compute various metrics on the given KBs and
populate two models on which a di operation is applied. These di erences are
then compared against \expectations" that are set before starting the pipeline.
For example, if we integrate an ontology which is rich in alternative labels then
we expect that such types of axioms in the initial KB increase by a proportional
amount. The various statistics and metrics that are used in function analyse are
presented in detail in the next section.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Analysing Knowledge Bases</title>
      <p>
        We have grouped the metrics we are using into two categories. The rst one
contains metrics that characterise the integrity of the KB, that is, the
\correctness" of its content according to either well-known notions or service induced
properties. The second category includes metrics about the actual content of
the KB like assessing its completeness. Many of these metrics are inspired by
work on ontology and knowledge base evaluation [
        <xref ref-type="bibr" rid="ref16 ref2 ref9">16, 9, 2</xref>
        ] or Linked Data
quality analysis [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]; some are adapted or extended to t our use case or are newly
proposed.
4.1
      </p>
      <p>
        KB Integrity
Integrity can be measured using standard logic-based notions but also additional
application speci c metrics are relevant to our use case. In the following we
present in detail the metrics we have de ned grouping them in various
subcategories.
KB Coherence Perhaps one of the most fundamental properties of every KB
is that it is coherent|that is, it does not contain unsatis able classes. Formally,
for KB our KB and A any class in KB we should have KB 6j= A v ?. This check
can be performed using existing OWL reasoners. Unfortunately, when we are
dealing with large KBs possibly containing billions of statements, such checks are
at-least time-consuming if at all possible. Consequently, our coherence checking
algorithms are based on approximate methods and techniques inspired by the
DL-Lite language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. More precisely, the function analyse internally computes
the following set:
      </p>
      <p>
        fC 2 KB j KB j=rdfs C v A u B where A u B v ? 2 KBg
where j=rdfs denotes the entailment relation under the RDFS-semantics. This
check is implemented by the following SPARQL query:
select ?a ?b ?s where f
?s rdfs:subClassOf ?a ; rdfs:subClassOf ?b :
f select ?a ?b where f ?a owl:disjointWith ?b : g
g group by ?a ?b ?s
order by ?a
g
Entailment Invariability Metrics related to entailment invariability capture
the aspects discussed in Example 2. Although such changes are eliminated by
function postProcess this dimension is still analysed in order to ensure that
everything worked as desired in the pipeline. The amount of entailment changes
between the original and a new ontology can be captured by the notion of logical
di erence [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        De nition 1. Let be a signature and let O and O0 be two OWL 2 ontologies.
The -deductive di erence between O and O0 (denoted di (O; O0)) is de ned
as the set of axioms satisfying: (i) Sig( ) , (ii) O 6j= , and (iii) O0 j= .
This notion can be used as follows: given the initial KB KB, a new
ontology O and a set of mappings Mf computed between them, we ideally want
that di (KB; KB [ O [ Mf ) = ; where = Sig(KB). Computing logical
differences between expressive, let alone large KBs, is very challenging if possible
at all. Nevertheless, a highly optimised system, called LDi -FAME has been
implemented within the scope of this project as an adaptation of our system
FAME [
        <xref ref-type="bibr" rid="ref19 ref20">20, 19</xref>
        ]. These systems are based on the notion of uniform interpolation
which is similar to logical di erence but LDi -FAME is highly optimised to scale
over large ontologies. As is shown in the evaluation section LDi -FAME was able
to compute logical di erences between SNOMED CT versions.
      </p>
      <p>
        LDi -FAME has not yet been applied to the full scale of our integrated
KB. Instead we used the approximate version of the above de nition introduced
in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and the di erence is only computed with respect to axioms of the form
A v B. In addition to that, we also look for di erences in axioms of the form
A v 9R:A. Such axioms imply a form of self-loop loop, e.g., saying that Leg v
9partOf:Leg, which we also want to exclude from occurring in our KB. In contrast
to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], however, we do allow for new entailments of the form A v 9R:B for
B 6= A. Actually such new knowledge is desired as it implied the enrichment
of our KB with new relations (see also Example 1). Like before, the above set
is computed using SPARQL queries over triple-stores and no OWL reasoner is
used.
      </p>
      <p>
        Graph-based Invariability The above metrics are based on well-de ned
notions from logics. In addition to those, a number of graph-based metrics can be
identi ed to further analyse ontologies and comprehend their structural
properties. The metrics currently implemented are the following:
1. Path lengths: The maximum and average path length of the subClassOf
hierarchy are computed using a depth- rst algorithm.
2. Tangledness: This notion was introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as a way to quantify the
multi-hierarchical nature of classes in an ontology, that is, how often a class
has more than one parents (a fork) compared to the total number of classes
in the ontology. Building on this we have de ned our version of tangledness
which measures the fork-rejoin points of each class{that is, how many times
a class has more than two parents and what is their least common subsumer.
Instead of counting how many times a node has more than one parent
normalised with the cardinality of the graph [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] we also quantify the joins of
these forks. First we de ne the set of least-common-subsumers (lcs) of two
classes A and B as:
lcs(A; B) := fC j KB j= A v C; B v C and for every D s.t.
      </p>
      <p>KB j= A v D; B v D we have KB j= C v Dg:</p>
      <sec id="sec-4-1">
        <title>Now for every class in the KB we de ne the following:</title>
        <p>tang(A; KB) := fE 2 KB j fA v C1; A v C2g</p>
        <p>KB; C1 6= C2
and E 2 lcs(C1; C2)g:
As we will see in Section 5 these sets can help us understand the
multihierarchical nature of data sources and assess their internal structure. A
SPARQL query computing the above set is:
select ?s ?d1 ?d2 ?anc where f
?s sesame:directSubClassOf ?d1 ; sesame:directSubClassOf ?d2:
filter (?d1 ! = ?d2):
?d1 rdfs:subClassOf ?anc : ?d2 rdfs:subClassOf ?anc :
filter not exists f ?d1 rdfs:subClassOf ?anc2 :</p>
        <p>?d2 rdfs:subClassOf ?anc2 : ?anc2 rdfs:subClassOf ?anc : g
g
In addition, we have de ned the sum of fork-rejoin points that occur on the
descendants of a class. This number can help us identify parts of the KB in
which large structural changes occurred after the integration.</p>
        <p>tang#(A; KB) :=
f]tang(C; KB) j KB j= C v Ag:
Label integrity/ambiguity As motivated by Example 3 we wish to avoid
having distinct classes sharing labels. Obviously this is impossible in general
as language is inherently ambiguous. For example, SNOMED CT alone already
contains several classes that have overlapping labels, e.g., two classes with label
\foot", one for the unit of measurement and one for the body part. One could
use the \types" of these classes8 to disambiguate, however, this is still a hard
problem for text annotation services. For this reason we have developed methods
to measure and eliminate duplication as much as possible. Let C Sig(KB) be
a set of types in KB. For every C 2 C we de ne the following set:
amb-lab(C; KB) := f` j A 6= B exist s.t. KB j= fA v C; B v Cg; and
fhA skos:label `i; hB skos:label `ig</p>
        <p>KB g:</p>
      </sec>
      <sec id="sec-4-2">
        <title>The following SPARQL query computes the above set:</title>
        <p>select ?l where f
?s1 rdfs:subClassOf : C ; skos:prefLabeljskos:altLabel ?l :
?s2 rdfs:subClassOf : C ; skos:prefLabeljskos:altLabel ?l :
filter (?s1 ! = ?s2)
g
Moreover, we also de ne amb-lab(KB) = C2C]amb-lab(C; KB).</p>
        <p>As we will show in the evaluation section, single ontologies may come with
a signi cant amount of ambiguity due to the \loose" way ontology authors may
use synonyms and alternative labels. To reduce ambiguity we have developed a
set of heuristics which can be used in the integration pipeline. Some of these
heuristics are the following:
{ If ` appears as a preferred label in one class and as an alternative in the
other then delete the latter.
{ If ` appears in two classes one of which is a super-class of the other and the
label in the sub-class is not its preferred label, then delete the label from the
sub-class.
{ If ` appears in two classes that have a common direct super-class (i.e., in
two sibling classes), then delete the label from both of them and create an
intermediate parent containing this label.</p>
        <p>Clearly, these heuristics do not completely eliminate ambiguity as there are more
combinations not covered by them.
4.2</p>
        <p>
          KB Completeness Assessment
The completeness or coverage that a knowledge base provides to the underlying
domain is another relevant notion for which metrics have been de ned in the
8 By \types" we mean top-level classes which are commonly used to group classes into
categories, e.g., the type of Leg is BodyPart and that of Malaria is Disease; types are
de ned by the KB engineer.
literature [
          <xref ref-type="bibr" rid="ref16 ref9">16, 9</xref>
          ]. Quantifying completeness is in general impossible since to do
so one would need to know what is all possible knowledge to which the content
of the KB should be compared. Hence, at best we can analyse the content of the
KB and assess by manual inspection and consulting domain experts what type
of content the KB is missing.
        </p>
        <p>For some relation R 2 Sig(KB) let dom(R) := fC j hR rdfs:domain Ci 2 KBg
and ran(R) := fD j hR rdfs:range Di 2 KBg. Our function analyse computes the
following sets:
1. Relation usage:
usage(R) := ]fhA R Bi 2 KB j KB j= fA v C; B v Dg;</p>
        <p>
          C 2 dom(R); D 2 dom(R)g
usage(C; R) := ]fhA R Bi 2 KB j KB j= fA v C; B v Dg; D 2 ran(R)g
The above measures are quite similar to the metric freq(R; C) de ned in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
but here we only count triples that fall within the \speci ed" use of the
relation R|that is, within its de ned domain and range. However, in a large
ontology integration project and due to the scale of the alignment problem,
relations may start to \connect" classes whose types fall outside the domain
and ranges of the relation. We call this issue drift and measure it according
to the following sets where hA R Bi 2 KB:
drifto(R) := ]fhA R Bi j KB j= A v C for some C 2 dom(R); and
        </p>
        <p>KB 6j= B v D for all D 2 ran(R)g
drifts(R) := ]fhA R Bi j KB j= B v D for some D 2 ran(R) and</p>
        <p>KB 6j= A v C for all C 2 dom(R)g
driftso(R) := ]fhA R Bi j KB 6j= A v C; KB 6j= B v D for all</p>
        <p>
          C 2 dom(R) and D 2 ran(R)g
Next for a relation R with domain C 2 dom(R) we measure the percentage
of classes of this type that have this relation associated with them:
extension(C; R) := ]fA j A v C ^ usage(A; R) &gt; 0g
]fA j A v Cg
The above metric can also be used for data type properties like skos:de nition
to measure how many classes in the ontology have associated de nitions. In
this case we have extension(owl:Thing; skos:de nition). A similar metric in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
is the normalised frequency.
2. Class unde nedness: For a class A we de ne its level of \unde nedness"
as the set of relations that should be de ned for this class according to its
speci ed domains.
        </p>
        <p>undef(A) := fR 2 KB j KB j= A v C for some C 2 dom(R) and</p>
        <p>KB 6j= hA R Bi for every Bg
After all metrics are extracted, a di operation is performed on several of them
(line 19, Algorithm 1) in order to provide an intuition about the delta in the
content and integrity of the KB. In addition to the above metrics a di is also
performed on every class of the KB to compute its change of labels, properties, and
ancestors. Many of these di s are compared against pre-set \expectations", the
violation of which can raise errors or warnings depending on how critical a metric
is. For example, coherence is a strong requirement hence if after integration the
new KB is not coherent an error is raised and the set of unsatis able concepts
are put in the report. Other expectations can be that the number of triples for
some subset of properties should increase. For example, if the new ontology is
rich in \has symptom" relations then we expect that \completeness" with
respect to this relation increases. Another example is the length of subClassOf
paths which should not increase signi cantly. More formally, for KB; O; Mf the
sets computed up to line 16 in Algorithm 1, we should have:
depth(KB [ O [ Mf )
depth(KB) + w
depth(O)
for some w 2 (0; 1].</p>
        <p>
          To further assist in inspecting the content of the KB a graphical tool has
been developed at Babylon in which the di s can be visualised; Figure 1 depicts
a screen-shot. Users can navigate the KB hierarchy and click on concepts to view
their di with respect to the previous KB version. New information resulting
from the integration is highlighted with di erent colours (e.g., green for new
ancestors). A weighted formula on the di of each class is also computed in
order to assess the most \changed" classes in the KB and select a subset of
them for doctor-based veri cation [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Fine tuning the formula and developing
a doctor-based veri cation methodology is currently under further investigation.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>Algorithm 1 has been fully implemented in a highly modular and con gurable
pipeline and used to build a large medical KB at Babylon Health. Several medical
sources have been considered. As mentioned, determining whether two sources
can be integrated \smoothly", monitoring the whole process at such a large scale,
and ne tuning the parameters of the pipeline is not trivial. Initially, we used
the tangledness and label ambiguity metrics to analyse the following popular
medical data sources: SNOMED CT, NCI, MeDRA, MeSH, ICD-10, Read2,
CTV3 (a successor of Read2), and FMA. The results are depicted in Table 1
where count(tang) = ]fA j tang(A) &gt; 0g.</p>
      <p>As can be seen SNOMED CT is a highly multi-hierarchical ontology followed
by NCI and to some extent also MeSH although out of the almost 8K classes
that had a fork-rejoin in MeSH the rejoin point was owl:Thing. In MedDRA all
forks have owl:Thing as a least-common-subsumer while in ICD-10 and Read2
there are no forks, a con rmation that all these sources are classi cation systems.
Interestingly, also the FMA ontology does not contain any forks. This ontology
models the human anatomy and perhaps is reasonable to assume that some body
part is not classi ed as two di erent things at the same time. Note that CTV3 is
a successor of Read2 and apparently a more ontological approach was followed.
Regarding ambiguity we note that NCI exhibits a high degree of ambiguity.
After closer inspection we concluded that the alternative labels in NCI do not
represent synonyms but are used in a loose way to indicate similar terms (see
also Example 3).</p>
      <p>
        Based on the above results we selected SNOMED CT as our seed ontology
and used Algorithm 1 to build a medical KB. On top of SNOMED CT we
have so far integrated NCI (which contains 130K classes and 143K subClassOf
axioms), CHV (which contains 57K classes and 0 subClassOf axioms) and FMA
(which contains 104K classes and 255K subClassOf axioms). CHV is a at list of
layman terms of medical concepts from which we only integrated label (synonym)
information; hence CHV was also not included in the previous analysis. Due to
the high ambiguity in NCI its alternative labels were given lower weight in the
matching process. Statistics about the KBs that we created after each integration
step are depicted in Table 2. As can be seen our postProcess method ensures
that no new subClassOf axioms are introduced between the symbols of the seed
ontology (LDi row) something that does not happen when using other ontology
matching frameworks (see also Evaluation in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Moreover, our heuristics do
reduce ambiguity signi cantly and a doctor-based evaluation showed that they
are about 95% correct.
      </p>
      <p>In addition to the above statistics we have also applied our completeness
assessment metrics after each integration step to monitor how the content changes.
These metrics helped in at least the following cases:
{ NCI is rich in skos:de nition hence our expectations were that the number of
such triples would signi cantly increase in the KB. The rst integration of
NCI was rejected since there was no increase due to an implementation bug.
{ Integration of NCI introduced a range drift on property partOf whose range
is BodyParts. This was because the NCI \part of" property was declared
to be a sub-property of this property while the range of the NCI relation
is either BodyPart or Substance. Consequently, the sub-property axiom was
removed.
{ As expected the integration of CHV introduced many alternative labels to
existing classes in the KB while that of FMA many partOf relations between
body parts. Moreover, the integration of NCI introduced many hasFinding
relations between diseases and symptoms.</p>
      <p>We have also considered the Canadian and Australian extensions of SNOMED
CT, denoted by Osnmd and Osanumd, respectively. These extensions add more labels,
ca
country speci c concepts, and provide additional local variations and
customisations relevant to healthcare communities of these countries. Ideally we should
have di (Osnmd; Osnmd) = ; for 2 fau; cag and = Sig(Osnmd). If this is the
case then these can be \safely" integrated into the existing KB. Note that no
alignment is required as the country extensions reuse the IRIs of SNOMED CT
and any new IRI is a new classes not in SNOMED CT.</p>
      <p>We used LDi -FAME to compute the above sets obtaining the following
results: for = ca the above set is indeed empty and the Canadian extension
simply enriches SNOMED CT with additional labels and classes without a
ecting its structure; this is not the case for the Australian extension in which case
there are 67 strongest inferred axioms in the above set. One example is a class
au
subsumption A v B 2 Osnmd which in Osnmd is B v A. How to properly deal with
Australian SNOMED CT remains part of future work. SNOMED International
was made aware of these cases.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We have presented a framework for large KB construction and engineering which
is based on ontology integration. The framework has been used to build a large
medical Knowledge Graph by integrating the popular and well-known ontologies
SNOMED CT, NCI, FMA as well as labels from the CHV vocabulary. To the
best of our knowledge, no ontology integration at this scale has been performed
in the past; all previous matching e orts have used much smaller versions of them
and the computed mappings were never actually used to merge the ontologies.</p>
      <p>To help us decide which sources to use and how to con gure our integration
pipeline a set of analysers were implemented. They were used at the end of each
pipeline run in order to evaluate the integration process and assess the quality
of the nal enriched KB. To further assist in the veri cation process a graphical
user interface was also built.</p>
      <p>We have presented our results in applying several of these metrics on
wellknown medical data sources that are currently under investigation at Babylon.
However, our results showed interesting properties about them like ambiguity
of their labels and multi-hierarchical nature of their structure. The metrics and
veri cation mechanism assisted in xing several errors in the pipeline, assessing
the success of the integration and ne tuning it. Finally, the LDi -FAME
system helped determine that the Australian extension of SNOMED CT is not a
conservative extension.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>DL-Lite: Tractable description logics for ontologies</article-title>
          .
          <source>In: Proceedings of the 20th National Conference on Arti cial Intelligence (AAAI)</source>
          . pp.
          <volume>602</volume>
          {
          <issue>607</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catenacci</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciaramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Modelling ontology evaluation and validation</article-title>
          .
          <source>In: Proceedings of the 3rd European Semantic Conference</source>
          , ESWC. pp.
          <volume>140</volume>
          {
          <issue>154</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Hermit: An OWL 2 reasoner</article-title>
          .
          <source>Journal of Automated Reasoning (JAR) 53</source>
          ,
          <fpage>245</fpage>
          {
          <fpage>269</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Golbeck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fragoso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartel</surname>
            ,
            <given-names>F.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oberthaler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The national cancer institute's thesaurus and ontology</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>75</volume>
          {
          <fpage>80</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Llavori</surname>
          </string-name>
          , R.B.:
          <article-title>Ontology integration using mappings: Towards getting the right logical consequences</article-title>
          .
          <source>In: Proceedings of the 6th European Semantic Web Conference</source>
          ,
          <source>(ESWC)</source>
          . pp.
          <volume>173</volume>
          {
          <issue>187</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The logical di erence problem for description logic terminologies</article-title>
          .
          <source>In: Proceedings of the 4th International Joint Conference on Automated Reasoning, IJCAR</source>
          . pp.
          <volume>259</volume>
          {
          <issue>274</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An e cient method for computing alignment diagnoses</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference on Web Reasoning and Rule Systems</source>
          , RR. pp.
          <volume>182</volume>
          {
          <issue>196</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          , Muhleisen, H.,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Sieve:
          <article-title>Linked data quality assessment and fusion</article-title>
          .
          <source>In: Proceedings of the 2012 Joint EDBT/ICDT Workshops</source>
          . pp.
          <volume>116</volume>
          {
          <issue>123</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torchiano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Knowledge base quality assessment using temporal analysis. Submitted to the Semantic Journal Web, under review (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Salvadores</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          :
          <article-title>Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF</article-title>
          .
          <source>Semantic Web</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ),
          <volume>277</volume>
          {
          <fpage>284</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.:
          <article-title>Ontology matching: State of the art and future challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <volume>158</volume>
          {
          <fpage>176</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Solimando</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guerrini</surname>
          </string-name>
          , G.:
          <article-title>Detecting and correcting conservativity principle violations in ontology-to-ontology mappings</article-title>
          .
          <source>In: Proceedings of the 13th International Semantic Web Conference (ISWC)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>16</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Repairing ontologies for incomplete reasoners</article-title>
          .
          <source>In: Proceedings of the 10th International Semantic Web Conference (ISWC-11)</source>
          , Bonn, Germany (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geleta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shamdasani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khodadadi</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A novel approach and practical algorithms for ontology integration</article-title>
          .
          <source>In: Proceedings of the International Semantic Web Conference (ISWC)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vandenbussche</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atemezing</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poveda-Villalon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vatant</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Linked open vocabularies (LOV): A gateway to reusable semantic vocabularies on the web</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>437</volume>
          {
          <fpage>452</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Ontology evaluation</article-title>
          .
          <source>In: Handbook on Ontologies</source>
          , pp.
          <volume>293</volume>
          {
          <fpage>313</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherif</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          , Buhmann, L.,
          <string-name>
            <surname>Morsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>User-driven quality evaluation of DBpedia</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Semantic Systems</source>
          . pp.
          <volume>97</volume>
          {
          <issue>104</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrobon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {
          <fpage>93</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Forgetting concept and role symbols in ALCOIH +(r; u)-ontologies</article-title>
          . In: Kambhampati,
          <string-name>
            <surname>S</surname>
          </string-name>
          . (ed.)
          <source>Proceedings of the Twenty-Fifth International Joint Conference on Arti cial Intelligence (IJCAI'16)</source>
          . pp.
          <volume>1345</volume>
          {
          <fpage>1352</fpage>
          . AAAI Press/IJCAI (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>FAME: An automated tool for semantic forgetting in expressive description logics</article-title>
          .
          <source>In: Proceedings of the 9th International Joint Conference On Automated Reasoning (IJCAR). Lecture Notes in Arti cial Intelligence</source>
          , vol.
          <volume>10900</volume>
          , pp.
          <volume>19</volume>
          {
          <fpage>27</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>