<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-domain Semantic Drift Measurement in Ontologies Using the SemaDrift Tool and Metrics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thanos G. Stavropoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Efstratios Kontopoulos</string-name>
          <email>skontopo@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albert Meroño Peñuela</string-name>
          <email>albert.merono@vu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stavros Tachos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stelios Andreadis</string-name>
          <email>andreadisst@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioannis Kompatsiaris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Technologies Institute</institution>
          ,
          <addr-line>Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Detecting and measuring semantic drift in different versions of ontologies across time is a novel area of research that rapidly gains attention. Nevertheless, there exist only a few relevant practical methods and tools and even fewer are flexible enough to be efficiently applied to multiple domains. As the often domain-specific nature of ontologies may render methods and tools for measuring semantic drift ineffective, this paper presents the application and findings of the SemaDrift suite of methods and tools in several domains, illustrating novel insights for the first time. While developed in the context of the PERICLES FP7 project, aimed at Digital Preservation, domain-independent text and structural similarity measures, available both as a software library and as a Protégé plugin for end-users, are now applied in the Dutch Historical Census and the BBC Sports Ontology. The two different domains demonstrate its applicability and ability to pinpoint the location, nature, origins and destinations of concept drift.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic drift</kwd>
        <kwd>concept drift</kwd>
        <kwd>semantic change</kwd>
        <kwd>ontologies</kwd>
        <kwd>Protégé</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>As the world continuously changes, concepts and their underlying semantics also
change over time. In digital environments that are consequently also subject to
continual change, ensuring that digital content remains understandable – and from this
perspective accessible and reusable – poses a formidable challenge. The evolution of
semantics is an active area of research, especially challenged by the lack of universal
metrics to address specificities and peculiarities pertinent to each domain.</p>
      <p>
        Evolving semantics, also referred to as semantic change, observes and measures
the phenomenon of change in the meaning of concepts within knowledge
representation models, along with their potential replacement by other meanings over time. In
the Semantic Web [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the representation of the underlying knowledge is typically
assumed by ontologies. Thus, it can be easily perceived that semantic change can
have drastic consequences on the use of ontologies in Semantic Web and Linked Data
applications. In this setting, semantic change, i.e. the structural difference of the same
concept in two ontologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], relates to various lines of research. Such examples are
concept and topic shift [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], concept change [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], semantic decay [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], ontology
versioning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and evolution [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A brief disambiguation of these terms can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Semantic drift can be defined as the phenomenon of ontology concepts gradually
changing as our knowledge of the world evolves, obtaining possibly different
meanings, as interpreted by various user communities or in a different context, risking their
rhetorical, descriptive and applicative power [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Concept drift can refer to this
language-related phenomenon, but also in abrupt parameter value changes in data mining
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        This paper presents findings in two vastly diverse domains through applying a
novel set of universal, domain agnostic semantic drift metrics across various domains
using the SemaDrift suite of tools and metrics. The metrics, initially presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
are embedded in respective software tools, that offer the means for domain experts to
assess drift without programming knowledge. Namely, the SemaDrift plugin for the
Protégé platform1 aims at assisting a wider audience to monitor and manage concept
drift and was developed in the context of the PERICLES FP7 project2, integrating and
extending existing studies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and previously developed open, reusable methods [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The domains studied in this paper are (a) the CEDAR dataset containing historical
Dutch census data, and (b) the BBC Sport Ontology for representing competitive
sports events. In the historical census domain, the metrics help pinpoint historical
occupation qualities for the population between 1869 and 1930, almost on the fly.
Most importantly, using the same tool we move on to the BBC Sport Ontology where
the metrics pinpoint the location, nature, origins and destinations of concept drift,
across six versions. The ontologies used in this work and the SemaDrift outputs are
publicly available at https://github.com/skontopo/MEPDaW2017.</p>
      <p>The rest of the paper is structured as follows: Section 2 presents related work in
metrics and tools for measuring drift. Section 3 presents the underlying metrics and
the SemaDrift framework. Sections 4 and 5 present the two proof-of-concept
scenarios and report on our findings. Conclusions and directions for future work are listed in
the final section.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Measures of semantic richness of Linked Data concepts have been investigated in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
proving that increasing reuse of concepts decreases its semantic richness. Other
studies have examined change detection between two ontologies at a structural or content
level [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while ontology mapping investigates the relationships and correspondences
between two ontologies [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Concept drift has been measured either by clustering
while populating ontologies [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or by applying linguistic techniques on textual
concept descriptions [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. A vector space model by random indexing has been utilized to
track changes in an evolving text collection [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and to visualize the drift of
vocabular1 The Protégé Ontology Editor: http://protege.stanford.edu
2 PERICLES FP7 project: www.pericles-project.eu
ies in a diachronic sample of the Linked Open Data cloud [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. A strategy to represent
change has been based on ontology evolution [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, most of these techniques
are not directly applicable to Semantic Web constructs or present limited statistical
data.
      </p>
      <p>
        An appealing solution we have adopted transfers the notions of label, extension
and intension from machine learning concept drift to semantic drift, further defining
them in ontology terms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A key reason for choosing the specific approach lies in
the fact that the authors of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] consider both linguistic aspects as well as the structure
of the ontology itself, which to the best of our knowledge is the most complete
methodology we came across.
      </p>
      <p>
        Much philosophical debate examines how and by which properties a concept can
be identified across time and appropriate formalization [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Some have utilized the
notions of perdurance and endurance [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], so as to seek identity, by defining rigid
properties that have to be persistent across instances and, thus, can identify entities
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In this work, we adopt, implement and integrate the methods in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] into a
familiar application for knowledge engineers, targeting not only the lack of reproducible
cross-domain metrics for semantic drift but also the lack of similar graphical user
interfaces.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Semantic Drift Metrics and the SemaDrift Platform</title>
      <sec id="sec-3-1">
        <title>Semantic Drift Metrics</title>
        <p>
          The drift metrics considered here implement and extend previous work in the field of
concept drift ([
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]), where highly applicable notions and metrics for
measuring concept drift in the context of data mining have successfully been transferred to
semantic drift. The method to measure concept drift in semantics considers two basic
factors: (a) the different aspects of change and (b) whether concept identity is known
or not. The aspects of change can be:
 Label, which refers to the description of a concept, via its name or title;
 Intension, which refers to the characteristics implied by it, via its properties;
 Extension, which refers to the set of things it extends to, via its number of
instances.
        </p>
        <p>Meanwhile, the correspondence of a concept across versions can be either known or
unknown, resulting in two different approaches for measuring change:
 Identity-based approach (i.e. known concept identity): Assessing the extent of shift
or stability of a concept’s meaning is performed under the assumption that its
identity is known across ontologies. For instance, considering an ontology A, and its
evolution, ontology B, each concept of A is known to correspond to a single,
known concept of B.
 Morphing-based approach (i.e. unknown concept identity): Each concept is
pertaining to just a single moment in time (ontology), while its identity is unknown
across versions (ontologies), as it constantly evolves/morphs into new, even highly
similar, concepts. Therefore, its change has to be measured in comparison to every
concept of an evolved ontology.</p>
        <p>
          The currently proposed method considers the more general morphing-based approach
and considers drift as the dissimilarity of two maximally similar concepts in two
versions [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Despite several methods have been proposed to seek identity
correspondence across versions [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], they still can be domain- or model-dependent, mandating
for ad-hoc expert knowledge in the form of annotations, user input or using explicit
identities. In order to measure change, the meaning of each concept at a given point in
time is defined as a set of the three different aspects, as follows:
  = &lt;
        </p>
        <p>( ),    ( ),    ( ) &gt;
where   denotes the meaning of concept  at point  . Each of its aspects, 
   ( ) for intensional and    ( ) for extensional, is measured as follows:
  ( ),

  ( ) = { , 〈 , 
: 
,  〉 ∈  }
   ( ) = { ,  = 〈 ,  ,  〉,  = 
: 
: 
, ∀ ∈  }
where  is the set of all triples in version  of the ontology. More concretely:
 label is the  :  of a concept (a string);
 intension is a set of triples (i.e. the properties that involve the concept, calculated
as the union of all RDF triples with  in the subject or object position of OWL
object or datatype properties;
 extension is the set of strings (i.e. the names of instances with the concept as value
of  :  ).</p>
        <p>
          Due to the morphing based approach, each concept’s drift is measured as the average
drift to all concepts of the next ontology. Comparisons for strings are made using the
Monge-Elkan algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], found to optimally suit strings in ontologies such as
CamelCase or snake_case, and Jaccard similarity for sets.
        </p>
        <p>In detail, if  2 is the total number of concepts in  2, we define label, intensional
and extensional drifts of  between versions  1and  2 as follows:
   ( ) = { , 〈 , 
:</p>
        <p>∨  = 
,  〉 ∈  }

  1→ 2( ) =</p>
        <p>∑ =21 
   1→  2( ) =</p>
        <p>
          As all metrics have a range of [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], their average (without using weights in the
general case) can be considered as an overall,  ℎ aspect:
 ℎ   1→ 2( ) =

  1→ 2( ) +    1→  2( ) +
        </p>
        <p>1→  2( )
3
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>The SemaDrift Tool to Measure Semantic Drift</title>
        <p>While the SemaDrift metrics discussed above can be used directly in third-party
scripts and software via the SemaDrift API Library, a domain expert or researcher
may not possess the ability to do so. Especially in the case that several domain
ontologies need to be explored, as in this study, researchers need to use a common tool to
work fast, reliably and without further adaptations as a common point of reference.
The SemaDrift Protégé plugin serves this purpose3.</p>
        <p>The plugin’s main panel is shown in Fig. 1. The tool provides a subset of the basic
functions of the underlying SemaDrift API in a graphical manner. For that purpose, it
3</p>
        <p>The SemaDrift suite is available at:
http://mklab.iti.gr/project/semadrift-measure-semanticdrift-ontologies.
exposes some of its functions and accommodates the outcomes in suitable user
controls using the Java Swing library. This edition of the plugin focuses on ontology
pairs, i.e. two versions of the same ontology, in order to provide more insight into
them and their differences, fitting also into the Protégé workspace philosophy.
Usually, the users work on a single ontology at a time, which is always displayed as a tree
hierarchy of classes at the left pane. Then, plugins occupy the right pane, which is
free to accommodate their functions (Fig. 1).</p>
        <p>As a first step, the user has to select the pair of ontologies for which to measure
drift. To take advantage of the environment, the plugin assumes that the first selected
ontology is the one currently loaded in Protégé, allowing also its in-depth
visualization, reasoning and query execution. The second ontology can be selected from the
SemaDrift pane using the “Browse” button to look through local or remote storage.</p>
        <p>After both ontologies are available, pressing on the “Measure Drift” button will
display the SemaDrift metric results. Stability, as a measure of drift, is shown in two
sections: overall average stability per aspect and concept pair stability for all aspects.
The first section constitutes the most generic, abstract measure of drift. It displays a
table with the average drift of all concepts from the former ontology to the latter, per
each of the four aspects: label, intension, extension, and whole. Naturally, the
measurements are derived using the metrics and algorithms for each aspect described in the
previous section, yielding a value from zero (no similarity) to one (full similarity).</p>
        <p>The second section of results is displayed in respective tables. Each table row
corresponds to a concept of the former ontology and each column to a concept of the
latter. Consequently, each cell holds the similarity metric (i.e. concept stability)
between each pair of concepts. These similarity values between pairs can further be
utilized by users for different purposes for example to generate similarity graphs or
morphing chains such as those demonstrated in the rest of the paper.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Dutch Historical Censuses</title>
      <p>
        Census data are essentially time series of systematic population records, and hence an
important source for studying semantic drift of concepts involving culture and
economics. In the Netherlands, the Dutch historical censuses are 17 country-wide
population reports performed between 1795 and 1971, once every 10 years. In each of these
reports, the government counted the population of the country and its demographic,
occupational, and housing characteristics. In 1971, this detailed reporting stopped due
to social concern on privacy. Nevertheless, the exhaustive, detailed, and aggregated
characteristics4 of these censuses have continued to attract the attention of historians
and social scientists [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], who nowadays study them via a collection of 507
machinereadable spreadsheets, containing 2,288 census tables5. In order to improve more
systematic and universal access to reproduce results of studies on this dataset, recent
4 Notably, the microdata registers (i.e. individual survey data), upon which these censuses are
built, have been lost over time. Hence, the numbers are only aggregations, with no tracking
information leading to the original individuals.
5 See http://volkstellingen.nl/
efforts have managed to publish it as Linked Data (LD) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. This LD set of the Dutch
historical census will be further called the CEDAR (Center of Excellence for
Document Analysis and Recognition) dataset, after its creators.
      </p>
      <p>
        SemaDrift was used to analyze semantic drift in the CEDAR dataset and gain
insights as to how occupational concepts, describing citizen jobs, changed in the period
1869-1930. For each of these two years, these occupational concepts are described
with three attributes: (a) the occupational concepts themselves are SKOS concepts
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] using URIs of the Historical International Standard Classification of Occupations
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] (HISCO); (b) the number of persons having a specific job are associated with
those SKOS concepts; and (c) a number of labels (in Dutch) are associated with these
SKOS concepts. This implies that the constraints inherent to these data with respect
to intensions, extensions, and labels are as follows: intensions do not exist for these
occupational concepts, since no further formal descriptions are available; extensions
are restricted by the cardinality of the concepts; and labels are assigned and abundant
for all concepts in both years. An example is shown in Fig. 2.
      </p>
      <p>A data transformation step is required before feeding the dataset into the tool to
address not only the format but also to generate more meaningful properties. Namely,
the data format, as originally shown in Fig. 2, is transformed to two OWL ontologies
for the years 1869 and 1930. To do so, we convert every HISCO skos:Concept to
an owl:Class, assigning them all rdfs:label in the original data. Furthermore, to
obtain an accurate representation of extensions, we unroll the integer counts, as seen
in Fig. 2, and generate as many anonymous instances as specified by these integers.
This is done since, following a proper ontological representation, the extensional
aspect actually refers to instances and not numerical properties. Finally, we assign these
anonymous instances to their corresponding HISCO owl:Class using an rdf:type
relation, thus each of them representing one person that carried the job indicated by
the class.</p>
      <p>Fig. 2. Excerpt of the original CEDAR data. Census counts are modeled as RDF Data Cube
observations, which carry information about the occupation class and the number of persons
belonging to it.</p>
      <p>After using SemaDrift for the two ontologies, respective average per aspect
stability and average concept-per-concept stability are generated. The latter is used to draw
morphing chains for topics of interest. After observing the table, first, the most stable
concept between both versions is the occupational class hisco:-1 (stability of 0.917
– see Fig. 3), the class for occupations that cannot be classified elsewhere in HISCO.
This is due to a great label (0.750) and extensional (1.000) stability, which suggests
that both the coherency of data coders w.r.t. unclassifiable jobs, and the population
carrying those remained stable in this period.</p>
      <p>
        According to previous studies in extensional semantic drift in this dataset [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ],
other interesting classes from 1869 with expected extensional drift are:
─ hisco:97125, loaders of ships, trucks, wagons or airplanes. These workers do not
appear again in 1930, and the stability w.r.t. similar classes, like hisco:97145
(storehouse workers), is significantly lower (0.479). Their closest matches in terms
of stability are varnishers and stone polishers (0.717);
─ hisco:21110, general managers. This group does appear in 1930, but the
similarity of their classes has greatly drifted (0.511). Many other occupational jobs,
with loose semantic similarity, display more stability w.r.t. the original class;
─ hisco:41025, working proprietors. Similarly, this group of workers shows a great
deal of drift, to the extent of not having an equivalent class in 1930. This might be
due to historical reasons, i.e. the late industrialization in the Netherlands and its
effects on evolving old small business owners into upper-class company investors.
Noticeably, the related class hisco:43200, commercial agents, displays certain
stability (0.405).
      </p>
      <p>According to these results, we can extract various conclusions from the evolution of
occupational concepts in the CEDAR dataset. We notice that the metrics shown in an
extension drift analysis strictly based on concept cardinalities are mostly useful when
an underlying stable schema that holds over time is previously given (in our case,
HISCO). These metrics can be used to better assess the quality of expert-curated
annotations and mappings (e.g. from strings to HISCO codes), but also to evaluate to
what extent these mappings can be reused in different conditions. In this respect, the
SemaDrift results can be used to evaluate the time period for which a certain ontology
class can be safely used to access database content that evolves over time.</p>
      <p>
        It is important to underline some intrinsic limitations in the study of semantic drift
within the CEDAR dataset. Besides the lack of concept intensions, many of these
limitations are related to the problem of identity, as also reported by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. First, the
identity between classes cannot be assumed even between those of identical HISCO
codes, since these are convoluted and culturally changing time periods. Secondly, the
identity between instances of these classes is even more volatile. Human annotated
identity information such as the existence of owl:sameAs links between instances of
different time periods would greatly improve the outcomes of the extensional drift
analysis, but require manual labor. Finally, using class cardinalities as a proxy for
   ( ) is a natural limitation of the CEDAR dataset. However, we use these
cardinalities: (a) to show that    ( ) can be flexibly adapted to scenarios where instance
data is limited; (b) coherently with classic measures of ontology evolution [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], e.g.
leveraging the fact that concepts with a decreasing number of instances over time are
likely to disappear or be merged with others; and concepts with an increasing number
of instances are likely to be split and/or specialized.
      </p>
      <p>The initial data transformation effort required in this scenario is justified, as the
tool assumes proper ontology format (OWL) and design (instances instead of numeric
properties according to their meaning). However, the tool itself with the existing
morphing-based metrics is apparently very useful to very quickly gain access to
insights regarding the evolution of semantic concepts that would otherwise require
serious labor.
5</p>
    </sec>
    <sec id="sec-5">
      <title>The BBC Sport Ontology</title>
      <p>BBC is one of the pioneers in the field of ontology-based technologies, using them at
an industrial level since 20106. In the past, they found that conventional content
management systems impose serious limitations on the flexibility of the ways that content
is served, limiting the richness of the experience they offer to their visitors. To
overcome these limitations and enhance the experience for website users, they turned to
ontologies and Linked Data. An additional key benefit is that this approach also
significantly reduces the time it takes for editors to create content that is easily
discoverable across the website.</p>
      <p>One of the first ontologies developed by BBC was the Sport Ontology7, which
initially started as an effort to represent information about the competitions, teams,
players and matches of the 2010 World Cup. However, although it originated as a specific
use case, the Sport Ontology has since been significantly extended and is now
applicable to representing a wide range of competitive sporting events. The BBC now use
this ontology to support their sports coverage, including coverage of both the 2012
London Olympics and the 2014 Brazil World Cup.
6 BBC ontologies homepage: http://www.bbc.co.uk/ontologies
7 BBC Sport Ontology homepage: http://www.bbc.co.uk/ontologies/sport</p>
      <p>The ontology’s significance for BBC, along with its potential applicability in
various sports-related deployments, has led to our inclusion of the Sport Ontology in this
study. However, compared to the case study presented in the previous subsection, the
scope of this analysis is different, in the sense that we are investigating design
decisions from version to version, possibly influenced by the company’s intended
enduser applications and the public’s demonstrated preference to certain pertinent
aspects. Table 1 contains information regarding the versions of the Sport Ontology
studied in this paper8.</p>
      <p>The six different versions of the Sport Ontology were loaded in SemaDrift. As
derived by the tool (but also implied in the table), the ontology is extremely stable with
regards to its intensional aspect (i.e. classes and properties), with most classes
demonstrating a perfect stability of 1. This implies that the set of concepts included in the
ontology are almost finalized, and the ontology itself has matured enough, rarely
undergoing significant modifications in its structure. An exception was observed for
classes CompetitiveSportingGroup and
CompetitiveSportingOrganisation, whose stabilities were reported at an average of 0.9 each, due to changes in
domains and ranges of respective properties in versions 2.11 and 2.12. After
consulting the ontology documentation, we deduced that these changes coincide with some
corrections to the corresponding properties introduced by BBC’s ontology engineers.</p>
      <p>On the other hand, the ontology is less stable extensionally, which is mostly due to
instances being added to specific classes in versions 2.11, 2.12 and 2.13, indicating</p>
      <p>Note that versions prior to v2.10 were not available online on the BBC website.
that the specific versions of the ontology underwent population by certain individuals.
More specifically, the initially empty (i.e. no instances) class RoundType was
populated with 12 instances in version 2.12 (e.g. final, quarter-final, semi-final
etc.) and with 4 additional instances in version 2.13. Additionally, class
CompetitionType, which initially had 17 instances (e.g. domestic-cup, european-cup,
international etc.), was populated with 4 additional instances in version 2.13.
Overall, version 2.13 was the one where the extension of the ontology was finalized.
Fig. 4 illustrates the relevant morphing chains illustrating the drifts of the “whole”
aspect for the three ontology versions involving the respective classes. Finally, the
changes in the two most recent versions of the ontology (3.0 and 3.2) were minimal,
thus indicating that the ontology has been eventually stabilized.</p>
      <p>Conclusively, the above study offers an insight into the design choices of the six
most recent versions of the BBC Sport Ontology, e.g. corrections in the schema,
population with additional types of sporting competitions etc. Admittedly, the
investigations would probably be more intriguing if we could study earlier versions of the
ontology, but unfortunately we were unable to retrieve the latter at the time of preparing
this work.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>This paper employed novel ways to measure semantic drift in two different domains:
historical censuses and competitive sports events. SemaDrift is a suite of tools, a
software library, and an application, that can measure drift aspects for ontologies
onthe-fly. Linked data for the Dutch historical census from 1869 to 1930 were
transformed to OWL and processed to show interesting insights for the semantic change in
the population’s occupation concepts. Moreover, semantic drift was studied for six
versions of the BBC Sport Ontology. Using the same tool to gain insights for
unrelated domains demonstrated its universal and cross-domain properties. Also, its
usefulness is shown, as it gives access to insights otherwise hard to obtain, such as to assess
the nature of the drift (extensional), locate it in time and track the migration of
meaning from concept to concept through morphing chains.</p>
      <p>
        Future work will be focused on expanding to more domains and extending the
tools. As already apparent in this study, the tool may handle most ontologies
out-ofthe-box enabling researchers without programming knowledge to do more. However,
the historical censuses have uncovered not only a minor change in format but also in
ontology design. As both these matters were solved by writing a transformation script,
such scripts may be incorporated into the tool for future use. Furthermore, the lack of
matching identities, elaborated on in previous studies [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], may be handled by
alternative metrics. Future efforts could help define a gold standard to compare approaches
in terms of precision and recall. Finally, additions to the tool’s GUI include handling
more ontologies, adding visual aids and drawing abilities for morphing chains
evolving the tool into a one-stop-shop for semantic drift measurement.
      </p>
      <p>Acknowledgments. This research received funding from the European Commission
Seventh Framework Programme under Grant Agreement Number FP7-601138
PERICLES.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The semantic web</article-title>
          .
          <source>Sci. Am</source>
          .
          <volume>284</volume>
          ,
          <fpage>28</fpage>
          -
          <lpage>37</lpage>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Tury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bieliková</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An approach to detection ontology changes</article-title>
          .
          <source>In: Workshop proceedings of the sixth international conference on Web engineering - ICWE '06</source>
          . p.
          <fpage>14</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Concept drift and how to identify it</article-title>
          .
          <source>J. Web Semant</source>
          .
          <volume>9</volume>
          ,
          <fpage>247</fpage>
          -
          <lpage>265</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Uschold</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Creating, integrating and maintaining local and global ontologies</article-title>
          .
          <source>In: Proceedings of the First Workshop on Ontology Learning (OL-2000) in conjunction with the 14th European Conference on Artificial Intelligence (ECAI</source>
          <year>2000</year>
          ), Berling, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pareti</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A Linked Data Scalability Challenge: Concept Reuse Leads to Semantic Decay</article-title>
          .
          <source>In: Proceedings of the ACM Web Science Conference. ACM Press-Association for Computing Machinery</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yildiz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Ontology Evolution and Versioning</article-title>
          .
          <source>Tech. Report</source>
          , TU Vienna. (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Stojanovic</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stojanovic</surname>
            ,
            <given-names>N.:</given-names>
          </string-name>
          <article-title>User-driven ontology evolution management</article-title>
          .
          <source>Knowl. Eng. Knowl. Manag. Ontol. Semant. Web</source>
          .
          <volume>133</volume>
          -
          <fpage>140</fpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Stavropoulos</surname>
            ,
            <given-names>T.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreadis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Riga,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kontopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Mitzias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kompatsiaris</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>A Framework for Measuring Semantic Drift in Ontologies</article-title>
          .
          <source>In: 1st Int. Workshop on Semantic Change &amp; Evolving Semantics (SuCCESS'16)</source>
          .
          <source>CEUR Workshop Proceedings</source>
          , Leipzig, Germany (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wittek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daranyi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontopoulos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moysiadis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Monitoring term drift based on semantic consistency in an evolving vector field</article-title>
          .
          <source>In: 2015 International Joint Conference on Neural Networks (IJCNN)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE, Killarney, Ireland (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Meroño-Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoekstra</surname>
          </string-name>
          , R.:
          <source>What is Linked Historical Data? In: Proceedings of the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW</source>
          <year>2014</year>
          ). pp.
          <fpage>282</fpage>
          -
          <lpage>287</lpage>
          . Springer International Publishing (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stavropoulos</surname>
            ,
            <given-names>T.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreadis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontopoulos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Riga,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mitzias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kompatsiaris</surname>
          </string-name>
          , I.:
          <article-title>SemaDrift: A Protégé Plugin for Measuring Semantic Drift in Ontologies</article-title>
          . In: Hollink,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Darányi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Meroño</surname>
          </string-name>
          <string-name>
            <surname>Peñuela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , and
            <surname>Kontopoulos</surname>
          </string-name>
          , E. (eds.) 1st International Workshop on Detection,
          <article-title>Representation and Management of Concept Drift in Linked Open Data (Drift-aLOD) in conjunction with the 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW)</article-title>
          . pp.
          <fpage>34</fpage>
          -
          <lpage>41</lpage>
          . CEUR Workshop Proceedings Vol
          <volume>1799</volume>
          , Bologna, Italy (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kalfoglou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schorlemmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology mapping: the state of the art</article-title>
          .
          <source>Knowl. Eng</source>
          . (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Fanizzi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Conceptual Clustering: Concept Formation, Drift and Novelty Detection</article-title>
          .
          <source>In: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC</source>
          <year>2008</year>
          , Tenerife, Canary Islands, Spain, June 1-5,
          <year>2008</year>
          , Proceedings. pp.
          <fpage>318</fpage>
          -
          <lpage>332</lpage>
          . Springer (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gulla</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solskinnsbakk</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myrseth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semantic Drift in Ontologies</article-title>
          .
          <source>In: Proceedings of 6th International Conference on Web Information Systems and Technologies (WEBIST)</source>
          , Valencia, Spain. pp.
          <fpage>13</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Meroño-Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wittek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darányi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Visualizing the Drift of Linked Open Data Using Self-Organizing Maps</article-title>
          .
          <source>In: Drift-a-LOD Workshop at the 20th International Conference on Knowledge Engineering and Knowledge Management</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Guarino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welty</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A Formal Ontology of Properties</article-title>
          . In:
          <article-title>Knowledge Engineering and Knowledge Management Methods, Models, and Tools</article-title>
          . pp.
          <fpage>97</fpage>
          -
          <lpage>112</lpage>
          . Springer Berlin Heidelberg (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guarino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oltramari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Sweetening ontologies with DOLCE</article-title>
          .
          <source>In: International Conference on Knowledge Engineering and Knowledge Management</source>
          . pp.
          <fpage>166</fpage>
          -
          <lpage>181</lpage>
          . Springer Berlin Heidelberg (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Monge</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The Field Matching Problem: Algorithms and Applications</article-title>
          .
          <source>In: 2nd Intl. Conf. Knowledge Discovery and Data Mining (KDD)</source>
          . pp.
          <fpage>267</fpage>
          -
          <lpage>270</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ashkpour</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meroño-Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandemakers</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The aggregate Dutch historical censuses: Harmonization and RDF</article-title>
          .
          <string-name>
            <surname>Hist. Methods A J. Quant</surname>
          </string-name>
          . Interdiscip. Hist.
          <volume>48</volume>
          ,
          <fpage>230</fpage>
          -
          <lpage>245</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Meroño-Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashkpour</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guéret</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>CEDAR: the Dutch historical censuses as linked open data</article-title>
          .
          <source>Semant. Web</source>
          .
          <volume>1</volume>
          -
          <fpage>14</fpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SKOS simple knowledge organization system refer-ence</article-title>
          . (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Leeuwen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.H.D. van</surname>
            , Maas,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>HISCO: Historical international standard classification of occupations</article-title>
          . Leuven: Leuven University Press (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Meroño-Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guéret</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoekstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Detecting and reporting extensional concept drift in statistical linked data. Presented at the (</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>