OntoDrift: a Semantic Drift Gauge for Ontology
                   Evolution Monitoring

Giuseppe Capobianco1[0000−0001−9702−8189] , Danilo Cavaliere1[0000−0003−2859−0447] ,
                  and Sabrina Senatore1[0000−0002−7127−4290]

Department of Information and Electrical Engineering and Applied Mathematics, University of
                            Salerno, Fisciano (SA) 84084, Italy
                       {dcavaliere,ssenatore}@unisa.it


       Abstract. This paper presents OntoDrift, an approach to detect and assess the
       semantic drift among timely-distinct versions of an ontology. The semantic drift
       is evaluated at the concept level, by considering the main features involved in an
       ontology concept (e.g., intention, extension, labels, URIs, etc.) and at the struc-
       tural level, by inspecting the taxonomic relations among concepts (e.g., subclass,
       superclass, equivalent class). New measures are defined to evaluate the seman-
       tic drift among individual concepts from different ontology versions, and among
       entire ontology versions. OntoDrift extends identity-based approaches to assess
       the drift among ontology versions not only on concepts in common among ver-
       sions, but also on concepts added and removed during the ontology evolution to
       improve the drift assessment. OntoDrift can also be run over big-sized ontology
       versions, as shown in a case study about DBpedia. Experiences on various on-
       tologies show the potential of OntoDrift in assessing the semantic drift among
       ontology versions.

       Keywords: Semantic drift · Ontologies · similarity measures.


1   Introduction

An ontology allows the representation of knowledge on a domain of interest as a share-
able, formal, and machine-understandable conceptualization. In many fields, such as
video surveillance [2] and bioinformatics [1], where the knowledge domain tends to
change over time, the ontology evolution process needs management. Since the on-
tology reflects the domain it describes, changes in the domain affect unavoidably the
ontology dynamics. Changes in the domain imply changes to the meaning of concepts,
which are generally referred to as semantic drifts [5, 8]. The changes affect the repre-
sentation of concepts, as well as the relations among them across consecutive ontology
versions. Automatic tools for semantic drift assessment are demanded to help experts in
dealing with the tough, expensive and time-consuming ontology management. Seman-
tic drift has been widely explored in linguistics [7], [4], but these methods focus on text
instead of changes in the Semantic Web formalism. Some works [3], [8] explored the se-
mantic drift among ontology versions by considering changes in both the structure and
the content of the ontology. In [3], drift assessment is achieved by clustering ontology


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
population, other solutions [5] introduce linguistics-based methods to detect changes in
the textual concept description, or exploit well-known model, such as the vector space
model, to detect changes in concept features [10]. To assess the semantic drift among
two ontology versions, two approaches are generally used: the morphing-chain and the
identity-based [8]. The former compares each concept Ai , in the first ontology version
Oi , to each concept B j in the second version Oj . The latter assumes that the con-
cept identity is known, i.e., each concept Ai , in the ontology version Oi , is known to
correspond (or match) to a unique concept B j in Oj . Both methods have advantages
and drawbacks, in fact, the morphing-chain approach has very bad performances and
is unsuited for big-sized ontologies [9]. The identity-based approach achieves better
performances, but it does not consider unmatching concepts across versions in the drift
assessment among two ontology versions [9]. Beyond the existing approaches, drift
evaluation depends on the concept aspects considered. The morphing-chain framework
in [8] assesses the drift among ontology versions on a concept including three aspects:
label, intension and extension. This concept notion does not take into account many
concept aspects, such as concept URI and taxonomy relations. Our approach, instead,
introduces a new notion of concept, taking into account all these aspects; it extends the
identity-based method with additional measures to provide a more refined assessment
of the semantic drift at concept level and entire ontology version level.
The remainder of the paper is organized as follows: Section 2 focuses on the concept
definition and the semantic drift measures. Section 3 is devoted to show the potential of
the approach in the drift assessment. Section 4 highlights the benefits of the approach
through comparisons with the reference framework. Last section is devoted to the con-
clusions.


2     Semantic Drift Assessment: notions and measures

OntoDrift has been designed to evaluate the semantic drift at concept and ontology
levels. The approach defines the ontology Concept in terms of multiple aspects related
to class name (e.g., labels), intensional and extensional aspects (i.e., properties and
instances), Concept identity (e.g., URI) and structural relations, i.e. taxonomic relations
with other concepts (e.g., equivalent classes, subclasses, superclasses, etc.). This notion
of Concept involves many more distinct aspects, concerning both the Concept features
and relations, compared with other approaches, that exclusively consider the label, the
intensional and extensional aspects [8]. OntoDrift introduces new measures to assess the
semantic drift over a concept considering its multiple aspects and between two ontology
versions enahancing the identity-based approaches to consider the concepts added and
removed throughout the ontology update.


2.1   The Concept and its Aspects

A concept is defined as an ontology class that can have properties and relationships
with other concepts. A generic Concept is shown in Figure 1 along with its aspects
used for assessing the ontology drift: the inherited ones from the reference approach
[8] are in cyan, the extended ones are in yellow, while the new-defined aspects are
                        Fig. 1: A Concept schema with its aspects.


in red. Formally, let us suppose that Ot is the ontology (version) updated at the date
t; each concept is related to an object (another concept, a literal, etc.) according to
the hsubject, predicate, objecti triple relation. Let us define the ontology version as
Ot = hC t , Rt , I t , T t , V t i, where C t is the set of classes or concepts, Rt is the set of
relations, I t is the set of individuals, T t is the set of data types, V t is the set of values;
the aspects about the Concept ct ∈ C t are defined as follows.

 – URI aspect. A URI is a string that uniquely identifies a Concept ct ∈ C t . Formally:

                                              cturi = u                                      (1)

   where u is the subject (i.e., a URI) of the triple hu, rdf :type, owl:Classit .
 – Labels aspect. A set of the labels used to refer to a specific Concept ct , also in
   different languages (when ontologies are multilingual). Each item of this set comes
   from all the objects contained in the Concept triples with the property rdf s:label.
   Let us define labels as:

                                 ctlabl = {l|hct , rdf s:label, lit )}                       (2)

   where l is text representing a label.
 – Intensional aspect. A set of triples that have rdf s:domain or rdf s:range as pred-
   icate. Each triple links a property to the Concept through one of these two predi-
   cates. More formally, let p be a generic property (p ∈ Rt ), the intensional aspect of
   the Concept ct is defined as follows:

                                        ctints = {ctd ∪ ctr }                                (3)

    where ctd and ctr are properties having ct as domain and range defined as follows:

                           ctd = {p|hp, x, ct it , (x = rdf s:domain)}                       (4)

                            ctr = {p|hp, x, ct it , (x = rdf s:range)}                       (5)
 – Subclasses aspect. A set of URIs identifying Concepts that are explicit subclasses
   of a specific Concept ct . It is created by taking the subject of the triples with the
   property rdfs:subClassOf as predicate and the analyzed Concept ct as object. Let
   us formally define the aspect as follows:

                             ctsub = {s|hs, rdf s:subClassOf, ct it }                        (6)

    where s ∈ C t is a triple subject (i.e., a class identified by a URI).
 – Superclasses aspect. A set of URIs identifying ancestor Concepts of the analyzed
   Concept ct . The set is composed of the parent Concepts of the given Concept ct .
   Let us define this aspect as follows:
                            ctsup = {s|hct , rdf s:subClassOf, sit }                     (7)
   where s ∈ C t is a triple object representing a URI.
 – Equivalent classes aspect. A set of URIs identifying all the Concepts equivalent
   to the Concept ct , viz., all the objects in the triples, whose predicate is the property
   owl:equivalentClass associated with ct .
                           cteq = {e|hct , owl:equivalentClass, eit }                    (8)
   where e ∈ C t is a class (concept) identified by a URI.
 – Extensional aspect. A set of URIs identifying all the individuals of the Concept
   ct . Each individual is the subject of a triple linked to ct by the property rdf:type.
                                   ctext = {x|hx, rdf :type, ct it }                     (9)
      where x ∈ I t is a URI identifying an individual.
   According to the concept aspects defined, a concept ct ∈ C t of the ontology version
  t
O , can be described as follows:
                       ct = hcturi , ctlabl , ctints , ctsub , ctsup , cteq , ctext i   (10)

2.2    Semantic drift assessment at concept level
The semantic drift among ontology versions is assessed by considering the drift on
Concept pairs, where concepts in a pair belong to distinct ontology versions. OntoDrift
introduces some similarity measures to assess the drift among two concepts. Given
two Concepts A and B, belonging to two ontology versions Ot and Ot0 (A ∈ C t and
B ∈ C t0 ), the similarity measure on each of the aspects (introduced in Section 2.1), is
defined as follows.
 – Similarity on the URI aspect. The similarity on the URI aspect among two Con-
   cepts consists of checking whether or not the two Concepts have the same identifier,
   i.e., they describe the same resource. Recall that each URI in an ontology uniquely
   identifies a resource, that can be a Concept, a relation or an individual, a datatype,
   etc. Let us assume that if the concepts from different ontology versions have the
   same URI, they are identical. For this reason, the similarity on the URI aspect is 1
   when the URIs coincide, otherwise the result is 0. Let A and B be two Concepts,
   the similarity on the URI aspect is defined as follows:
                                               (
                                                 1, if Auri = Buri
                       simuri (Auri , Buri ) =                                       (11)
                                                 0, otherwise
      where Auri and Buri represent the URI aspects of the Concept A and B, respec-
      tively.
– Similarity on aspects labels, subclass, superclass, equivalent class and exten-
  sional. The aspects are name sets, and are described by the Jaccard index [6], which
  evaluates the drift by counting how many instances (names) the two concepts have
  in common in relation to all their instances for an aspect. For each aspect in Equa-
  tions (2), (6)-(9), the measure considers the set of elements (precisely, the element
  names) that describe that aspect. For example, if the aspect is the label (Equation
  (2)), the set of label names, associated with two concepts A and B, are compared.
  Similar evaluations can be applied on the other aspects: in general, considering the
  aspect a, among the possible aspect names: {labl, sub, sup, eq, ext}, the similarity
  value can be defined as follows:
                                                  |Aa ∩ Ba |
                              sima (Aa , Ba ) =                                    (12)
                                                  |Aa ∪ Ba |

  where Aa and Ba are the name sets of the concepts A and B respectively, on the
  aspect whose name is a. The sima values lie in the range [0, 1], where 0 means
  no similarity among the two sets, and 1 represents the equality among the two sets
  (same set of names). The higher the value, the more the Concepts A and B are
  similar on the aspect a.

– Similarity on the intensional aspect. Since the intensional aspect involves triples
  whose predicate is one of rdf s:domain and rdf s:range, the concepts are com-
  pared on the set of the domain or range instances, respectively. If A and B play the
  role of range in the triple hp, rdf s:domain, ci (i.e., c = A or c = B) the similarity
  simd is evaluated on the set of the domain properties for the concept c (see Equa-
  tion 4) by using the Jaccard index (Equation 12). Similarly, the similarity simr
  between the two Concepts A and B on the set of range properties (see Equation 5)
  is given by Equation 12. The similarity between the two Concepts A and B on the
  intensional aspect is calculated as the weighted mean of simd and simr
– All-aspects similarity between two concepts. The whole similarity asim between
  two Concepts A and B, from two ontology versions is computed by considering all
  the similarities assessed on the respective aspects involved, affected by the size of
  the aspect sets:
                                   P
                                          sima (Aa , Ba ) · (|Aa | + |Ba |)
                 asim (A, B) = a∈Γ P                                               (13)
                                             a∈Γ (|Aa | + |Ba |)

  where Aa and Ba are the name sets of the concepts A and B, respectively, on the
  aspect a ∈ Γ , where Γ is the set of all the aspect names, as defined in Equations
  (1)-(9), i.e., Γ = {uri, labl, sub, sup, eq, ext, d, r}.
  If the asim value is 1, the two concepts are equal, otherwise a value in the range
  [0, 1] describes the similarity between the concepts. The measure asim can be
  used to analyze the drift on a concept as it changes over time, through a con-
  cept chain assembled across succeeding ontology versions. More formally, given
  Ot1 , Ot2 , ..., Otn , the n successive versions of the ontology O, the similarity be-
  tween two Concepts Ati and B ti+1 , selected from the two successive ontology
  versions Oti and Oti+1 , is assessed according to Equation 13.
2.3     Semantic drift assessment at ontology version level
To determine how the ontology evolves and how the semantics changes among on-
tology versions, the semantic drift is evaluated at the level of entire ontology ver-
sions. Comparing two ontology versions Oti = hC ti , Rti , I ti , T ti , V ti i and Otj =
hC tj , Rtj , I tj , T tj , V tj i means to find correspondences among the ontology concepts:
for a concept Ati ∈ C ti in the ontology Oti , there must be a concept B tj ∈ C tj in
Otj , such that the two concepts can be considered equivalent. In the Semantic Web
domain, a resource is unequivocally identified by a URI (Uniform Resource Identifier);
i.e., each resource has its own URI, different from any other resource. Starting from this
assumption, two concepts Ati and B tj , belonging to two different ontology versions,
are considered as equal if they have the same URI (Equation 11). These concepts, with
unchanged URIs across the versions, are considered in common among the versions and
represented by the intersection set |C ti ∩C tj |. All the concepts present in the ontologies
are represented as the union set |C ti ∪ C tj |. Therefore, the semantic drift between the
two ontology versions Oti and Otj is calculated through the overall similarity (osim)
over the concepts from the two ontologies with the same URI. The osim measure is
defined as follows:
                                                                               t     t
                                   P
                                                               ti    tj asim (A i , B j )
                  ti     tj
                                    ∀Ati ∈C ti ,∀B tj ∈C tj ,Auri =Buri
      osim O , O =                                                                        ·K (14)
                                                         |C ti ∩ C tj |
                       t
where Aturi
         i        j
            and Buri are the URI aspects of the Concept Ati and B tj , respectively;
                                                                                           ti   tj
                                                                                       |C ∩C |
asim is the all-aspects similarity between two concepts (Equation 13); K = |C             ti ∪C tj |

is a value representing the ratio between the number of concepts in common among the
ontologies over the number of all the individual concepts in the two ontologies. Let us
notice that K provides an important contribution to the similarity calculation, because
it allows considering not just the concepts in common among the two ontology versions
(|C ti ∩ C tj |), but also the remaining ones (|C ti ∪ C tj |), i.e., concepts added or removed
during the ontology evolution. This way, the higher the number of concepts added or re-
moved among the versions, the higher the semantic drift between the ontology versions.


3      A Case Study
This section shows the benefits of the OntoDrift methods and measures through a case
study. Five consecutive DBpedia versions have been selected: DBpedia 3 7, DBpe-
dia 3 8, DBpedia 3 9, DBpedia 2015 04, DBpedia 2015 101 . The semantic drift of the
concept Sport among the DBpedia versions is shown in Figure 2 as a chain connecting
the concepts Sport of different ontology versions through labels describing the similar-
ity values calculated on concept pairs. The chain detects which version pairs have the
highest drift (e.g., DBpedia 3 8 and DBpedia 3 9, with asim = 0.61) or the lowest
one (e.g., DBpedia 2015 04, DBpedia 2015 10, with asim = 0.96). This concept-
per-concept view allows the analysis of how the concept evolves through consecutive
 1
     the ontology versions are available at https://wiki.dbpedia.org/develop/datasets
Fig. 2: The similarity asim on the concept Sport (red marked) among consecutive DB-
pedia versions. The other concepts are the most similar to Sport


versions of the ontology and provides semantic drift values. The other concepts, shown
in figure, are the most similar ones to Sport after Sport itself. The semantic drift on pairs


               Fig. 3: The semantic drift between two DBpedia versions.


of DBpedia versions is assessed by applying the overall similarity measure (osim, see
Equation 14). Figure 3 presents a comparison between the two versions DBpedia 3 7
and DBpedia 2015 04. The Venn diagram depicts three sets: the set of concepts in DB-
pedia 3 7, the set of concepts in DBpedia 2015 04 and the intersection set (i.e., con-
cepts in both the versions). The identity-based solutions for the semantic drift evalu-
ate the similarity only on the concepts in the intersection [8]. Our similarity measure
osim, instead, includes the constant K (see Equation 14) to measure the semantic drift
among the versions, also considering the concepts that are not in both the versions.
In fact, the drift between versions DBpedia 3 7 and DBpedia 2015 04 is around 34%
(osim = 0.66) without the K, and around 78% (osim = 0.22) with the K. Thanks
to K, OntoDrift-assessed drift is more accurate since it considers concepts added and
removed across versions (i.e., in our case study, DBpedia 2015 04 contains many more
new concepts than DBpedia 3 7).
Fig. 4: OntoDrift (OD) vs. Semadrift (SD): similarity evaluation between the Concept
Equipment of the ontology versions Tate 2004 and Tate 2006.


4     Approach Evaluation
This section presents a comparison between OntoDrift and the framework presented
in [8], called Semadrift. A two-steps comparison is given: the first one focuses on
demonstrating how much OntoDrift improves the drift assessment on the single con-
cept, whereas the second one aims at showing the effectiveness of our drift measures
on entire ontology versions. The selected ontologies are Tate [8] and OWL-S profile2 ,
which respectively describe the cataloging of artworks and the services offered by ser-
vice providers. The drift evaluated on a single concept is shown on the Concept Equip-
ment, on Tate versions: Tate 2004 and Tate 2006, as shown in Figure 4a. The similarity
is calculated on each concept aspect from the two ontology versions by using OntoDrift
(OD) and Semadrift (SD). The two approaches are compared on each concept aspect
in common (in yellow) and extended (in cyan). Similarity is provided also on the new-
introduced aspects (in red). The labels aspect does not change, the approaches have
the same similarity on this aspect (1.0). No drift is found on the intensional and ex-
tensional aspects, that are defined in the same way in both the approaches. Similarities
evaluated on new-introduced aspects, such as superclasses (simsup = 0.33), subclasses
(simsub = 0.22) and equivalent classes (simeq = 0) highlight some changes in the on-
tology. According to these aspects, OntoDrift reveals a semantic drift on Equipment
across the two versions (asim = 0.43, Equation 13), whereas Semadrift considers that
concept unchanged (whole similarity = 1, cf.[8]), as displayed in Figure 4b. Onto-
Drift similarity measures can better detect any extensions or upgrades in the knowledge
 2
     https://www.w3.org/Submission/OWL-S/
modeling by considering concept-related identifier and the taxonomic relations (e.g.,
subclass, superclass, equivalent class).
The assessment of semantic drift at ontology level is shown in Table 1 where the sim-
ilarity is calculated among two consecutive ontology versions by OntoDrift consider-
ing the osim measure (Equation 14), and by Semadrift through the whole similarity
measure. Let us notice that OntoDrift similarity measure causes a more sensible evalu-
ation of the semantic drift on the entire versions. In fact, osim considers more aspects
than Semadrift whole similarity, including labels and taxonomic relations. OntoDrift
shows weaker similarity values than Semadrift among consecutive versions of OWL-
S Profile, due to the several concept taxonomic relations (i.e., some concepts are ex-
tended with subclasses, superclasses and equivalent classes) that OntoDrift evaluates.
In Tate ontology, many concepts are added over time, some changes are applied on
single concepts and little changes occur to relations. Since OntoDrift is quite sensitive
to the concept change and extension, it returns more polished assessments on all ver-
sions. For instance, among versions going from T ate 2004 to T ate 2013, Semadrift
assesses a stable drift (i.e., similarity in the range [0.22, 0.25]) while OntoDrift assesses
more variable drifts (i.e., similarity in the range [0.49, 1.00]). Additionally, OntoDrift
improves the identity-based approach, that considers only matching concepts across on-
tology versions, by evaluating the drift also on the unmatching concepts across ontology
versions (see Equation 14).


                   Table 1: Semantic drift evaluation at ontology level
                   Compared ontology versions            OntoDrift Semadrift
                   OWL-S Profile 1.0 - OWL-S Profile 1.1   0.26      0.65
                   OWL-S Profile 1.0 - OWL-S Profile 1.2   0.26      0.65
                   OWL-S Profile 1.1 - OWL-S Profile 1.2   0.49      0.66
                   Tate 2003 - Tate 2004                   0.99      0.29
                   Tate 2003 - Tate 2006                   0.64      0.27
                   Tate 2003 - Tate 2007                   0.56      0.24
                   Tate 2003 - Tate 2011                   0.49      0.23
                   Tate 2003 - Tate 2012                   0.49      0.23
                   Tate 2003 - Tate 2013                   0.49      0.23
                   Tate 2004 - Tate 2006                   0.64      0.26
                   Tate 2004 - Tate 2007                   0.56      0.23
                   Tate 2004 - Tate 2011                   0.49      0.22
                   Tate 2004 - Tate 2012                   0.49      0.23
                   Tate 2004 - Tate 2013                   0.49      0.23
                   Tate 2006 - Tate 2007                   0.59      0.23
                   Tate 2006 - Tate 2011                   0.53      0.23
                   Tate 2006 - Tate 2012                   0.53      0.23
                   Tate 2006 - Tate 2013                   0.53      0.23
                   Tate 2007 - Tate 2011                   0.88      0.24
                   Tate 2007 - Tate 2012                   0.88      0.24
                   Tate 2007 - Tate 2013                   0.88      0.24
                   Tate 2011 - Tate 2012                   1.00      0.24
                   Tate 2011 - Tate 2013                   1.00      0.24
                   Tate 2012 - Tate 2013                   1.00      0.25
5    Conclusion
The paper presented OntoDrift, an approach to assess the semantic drift on Concepts
among different ontology versions. The approach provides a novel definition of Con-
cept, which includes a wide set of related features, called aspects. Similarity measures
are defined to assess the semantic drift among concepts and ontology versions by con-
sidering the multiple-aspect concept definition. The benefits of the approach are vari-
ous, first of all, the semantic drift assessment is more accurate, because it is evaluated
on multiple aspects, not only including concept labels, intension and extension, but also
the URIs and taxonomic relations. The method can be used to assess the drift among
ontology versions and knowledge graphs (e.g., DBpedia), thanks to the identity-based
approach design. Additionally, the indentity-based approach is extended to consider
not only the concepts in common among ontology versions, but also those added and
removed during the ontology evolution to provide more refined drift assessments.


References
 1. Burek, P., Scherf, N., Herre, H.: A pattern-based approach to a cell tracking ontology. Proce-
    dia Computer Science 159, 784 – 793 (2019), Knowledge-Based and Intelligent Information
    & Engineering Systems: Proceedings of the 23rd International Conference KES2019
 2. Cavaliere, D., Loia, V., Senatore, S.: Towards an ontology design pattern for uav video con-
    tent analysis. IEEE Access 7, 105342–105353 (2019)
 3. Fanizzi, N., d’Amato, C., Esposito, F.: Conceptual clustering and its application to concept
    drift and novelty detection. In: Proceedings of the 5th European Semantic Web Conference
    on The Semantic Web: Research and Applications. pp. 318–332. ESWC’08, Springer-Verlag,
    Berlin, Heidelberg (2008)
 4. Frermann, L., Lapata, M.: A bayesian model of diachronic meaning change. Transactions of
    the Association for Computational Linguistics 4, 31–45 (12 2016)
 5. Gulla, J., Solskinnsbakk, G., Myrseth, P., Haderlein, V., Cerrato, O.: Semantic drift in on-
    tologies. In: WEBIST 2010. vol. 2, pp. 13–20 (01 2010)
 6. Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte,
    A.: Similarity measures in scientometric research: The jaccard index versus salton’s cosine
    formula. Information Processing & Management 25(3), 315 – 318 (1989)
 7. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical
    laws of semantic change. In: Proc. of the 54th Annual Meeting of the Association for Com-
    putational Linguistics (Vol. 1: Long Papers). pp. 1489–1501 (2016)
 8. Stavropoulos, T., Andreadis, S., Kontopoulos, E., Kompatsiaris, I.: Semadrift: A hybrid
    method and visual tools to measure semantic drift in ontologies. Journal of Web Semantics
    54, 87 – 106 (2019)
 9. Wang, S., Schlobach, S., Klein, M.: Concept drift and how to identify it. Journal of Web
    Semantics 9(3), 247 – 265 (2011)
10. Wittek, P., Daranyi, S., Kontopoulos, E., Moysiadis, T., Kompatsiaris, I.: Monitoring term
    drift based on semantic consistency in an evolving vector field (07 2015)