=Paper= {{Paper |id=Vol-1799/Drift-a-LOD2016_paper_2 |storemode=property |title=On the Semantics of Concept Drift: Towards Formal Definitions of Concept Drift and Semantic Change |pdfUrl=https://ceur-ws.org/Vol-1799/Drift-a-LOD2016_paper_2.pdf |volume=Vol-1799 |authors=Antske Fokkens,Serge Ter Braake,Isa Maks,Davide Ceolin |dblpUrl=https://dblp.org/rec/conf/ekaw/FokkensBMC16 }} ==On the Semantics of Concept Drift: Towards Formal Definitions of Concept Drift and Semantic Change== https://ceur-ws.org/Vol-1799/Drift-a-LOD2016_paper_2.pdf
     On the Semantics of Concept Drift:
Towards Formal Definitions of Semantic Change

     Antske Fokkens1 , Serge ter Braake2 , Isa Maks1 , and Davide Ceolin3
       1
        Computational Linguistics, VU University Amsterdam, Netherlands
                    antske.fokkens@vu.nl, isa.maks@vu.nl
2
  Media Studies, University of Amsterdam, Netherlands sergeterbraake@gmail.com
   3
     Computer Science, Web and Media department, VU University Amsterdam,
                           Netherlands d.ceolin@vu.nl



      Abstract. Semantic change and concept drift are studied in many dif-
      ferent academic fields. Different domains have different understandings of
      what a concept and, thus, concept drift is making it harder for researchers
      to build upon work in other disciplines. In this paper, we aim to address
      this challenge and propose definitions for these phenomena which apply
      across fields. We provide formal definitions and illustrate how concept
      drift and related phenomena can be modeled in RDF through the use
      of context. We explain and support the definitions through an example
      from historical research and argue that a formal modeling of semantic
      change in RDF can help to better interpret data.

      Keywords: Concept Drift · Semantic Change · Digital Humanities ·
      RDF


1   Introduction
Semantic change and concept drift are important fields of study for many
academic fields, including history, linguistics, philosophy, political science and
computer science. However, not every discipline uses these terms in the same
way. This is not a problem in itself, but does create confusion when these fields
communicate or even work together, like in digital humanities projects. In this
paper, we aim to tackle this challenge with particular focus on how the relation
between concept drift in Semantic Web research and other disciplines.
    We define concept drift as a change in the intension (the definitions and
associations) of a concept, while the (rigid) core of the concept remains stable
[11, 16]. If the core of the concept changes as well, and we have a new concept,
we speak of conceptual replacement, following Kuukkanen [11, p. 367-370].
We argue that these definitions are applicable across domains. We outline how
concept drift relates other forms of semantic change (which are sometimes con-
sidered concept drift as well). The main contributions of this paper are the
following:
1. We propose definitions of concept drift and related phenomena that are
   applicable across domains.
 2. We show how contexts can be used to model concept drift and related phe-
    nomena in RDF.
 3. We illustrate the complex interaction of various forms of semantic change
    and argue formal modeling of these processes can improve research across
    domains.

    The rest of this paper is structured as follows. Section 2 provides an overview
of work on concept drift in various disciplines. We introduce our definitions and
formal models in Section 3. In Section 4, we illustrate the complex interaction
between concept drift and related phenomena through an example from history.
We conclude with a discussion in Section 5.


2     Related Work

Historians have been studying how concepts and their associations change over
time for decades, to understand historical events [13]. Concept drift also receives
plenty of attention from digital humanities researchers.4 Computer scientists and
historians from the Translantis project, for example, have used vector coordi-
nates to see how vocabularies shift over time, which in turn can help to detect
concept change. Rather than taking a word as a starting point, they use the
meaning, the concept, as an anchor, and monitor the evolving set of words that
are used to denote it [10].
    In sociology, some make a distinction between normal concepts and contested
and contestable concepts, like ‘democracy’ and ‘freedom’, which leave room for
lots of debate [5]. Contested concepts are considered to be of great interest to
study in order to see how the political climate changes in a certain period of time.
Historical linguists study causes of shifts in word meanings in general [8]. They
tend to focus on linguistic mechanisms of semantic change such as metonymy
and metaphor rather than on meaning shifts caused by socio-cultural reasons.
Historical linguists generally look at a much broader range of concepts than
sociologists. Within the Semantic Web, researchers seem to focus on a wider
range of changes for concepts, where some consider any change related to a
concept a form of concept change, regardless of whether it is a change in its
intension, its labeling or its extension [16].
    Within the humanities and social sciences a clear definition of ‘concept’ is
often lacking [11, p. 363]. Recently the philosophers Betti and Van den Berg have
argued that to study the history of thoughts (and concepts), researchers should
make use of more explicit models, frameworks, to avoid the scent of arbitrariness
in what they are doing [1].
    Cognitive science has shown that people link many associations with con-
cepts, some being more important than others [11, p. 354]. For example, the
concept of ‘pig’ has the associations of ‘mammal’, ‘four-legged’ and ‘farm ani-
mal’. While a three legged pig living in a castle still is a pig, few would say that
a creature that looks like a pig but lays eggs actually is a pig. Any definition of a
4
    http://www.helsinki.fi/collegium/events/conceptual change/abstracts.pdf


                                          2
concept has a sense of arbitrariness and is based on choices of what associations
to include [14, p. 35-37]. It therefore is desirable to study concepts with as much
flexibility as possible.
    Kuukkanen makes a distinction between the core of a concept, determined
by a historian, and its margins. By making this distinction, it is easier to de-
velop a vocabulary to talk about semantic changes. When the core of a concept
changes he speaks of conceptual replacement [11, p.367-370]. Wang et al. speak
of the ‘rigid core’ of a concept in this respect, and when that rigid core changes,
the concept is not the same concept anymore. What belongs to the rigid core
should ideally be determined by human domain experts (‘oracles’) [16, p. 8]. We
adapt Kuukkanen’s and Wang’s view that it is useful to distinguish between core
and margin associations and that domain experts are the ones to decide which
properties are part of the concept’s core. We define concept drift as changes in
the concept’s associations while the core stays in tact. This means we limit our
definition of concept drift to what Wang et al. call changes in intension. In the
next section we will elaborate on this definition and other forms of semantic
change that have been treated as concept drift in the Semantic Web literature.


3     Modeling Semantic Change in RDF

Wang et al ’s research forms a central piece in work on concept drift within the
Semantic Web community. They provide definitions of three forms of concept
drift as well as formulas to calculate changes. We first address their definitions
and explain how they relate to the ones we propose. We then introduce formal
ways of modeling concept drift, which is not covered by Wang et al. [16].


3.1    Semantic change and concept drift

Wang et al., state that concept drift can involve changes in the concept’s inten-
sion (its associations), in its extension (its reference) or in its label (words used
to refer to it). We follow Frege’s [6] view that the sense of a concept lies in its
intension and thus only intensional changes are concept drift. We see changes in
extension and labels as phenomena of semantic change related to concept drift.
Changes in extension ([1]) and changes in label can be relevant to study when
investigating concept drift, but this is not necessarily the case. We will illustrate
the relevance of extensional changes for three types of concepts.
    The concept of ‘pig’ has an extension that changes with every pig that is
born or slaughtered. Changes in this concept’s extension are not relevant for in-
vestigating semantic change or ontological representation.5 Other concepts have
extensions that inherently change (without the sense of the concept changing),
where changes are relevant for formal modeling. The concept of ‘government’ is
an example: a change in government does not mean the concept of ‘government’
5
    The approximate number of living pigs may be relevant for certain use cases, but it
    is difficult to find a study where the exact extension is important to know.


                                           3
changed, but representing what government is in place is relevant ontological
information. Finally, there are concepts where the extension is closely connected
to the intension of the concept. This is the case for Wang et al.’s example of the
European Union, which is partially defined by its member states.
    Semantic changes that concern the relation between labels (words) and the
concepts they refer to are also often related to concept drift. When a new or
different label is used for a concept, this often reflects an intensional change. For
instance, when someone chooses to use the word pig to refer to a ‘police officer’,
this invokes connotations that reflect the way they look at the police. We also
observe changes where the same label ends up referring to a different concept. A
well-known example of such change is the English word cute (based on Frermann
and Lapata (2016) [7], following the Oxford English Dictionary [15]). In the
early 18th century, the word appeared meaning ‘clever’. In the late nineteenth
century the meaning had shifted to ‘cunning’. It then shifted to its modern
day meaning of ‘sweet/attractive’. We assume that, in this case, the concepts
of ‘clever’, ‘cunning’ and ‘sweet/attractive’ have not changed. Such changes are
forms of what Kuukkanen calls conceptual replacement. In Section 4 we will
see that conceptual replacement often interacts with concept drift.


3.2    Modeling semantic change and concept drift

We propose to model the phenomena outlined above through the use of contexts
and Lemon (lexicon model for ontologies) [12].6 Lemon is a W3C-standard that
can describe complex lexical information about words and link them to ontolo-
gies in the Semantic Web. Lexical knowledge is linked to ontological knowledge
through so-called lexicalSense nodes. Lexical knowledge may refer to spelling
variants (e.g. American vs. British spelling) or morpho-syntactic information
such as part-of-speech or plural forms. The lexicalSense that links this knowl-
edge to the ontology can be used to define contextual and usage conditions, such
as the connotations of ‘pig’ as a reference for police man or the time span in
which cute referred to ‘clever’.7 As such, Lemon directly provides the means to
model label changes for concepts as well as conceptual replacement.
    Changes in the intension or extension of the concept, on the other hand,
are not captured in the lexicalSense (since the label of the concept may remain
stable). We follow Bouquet et al. [2] and place information dependent on a
context in the same graph. Figure 1 illustrates how the concept ‘democracy’ has
drifted. We define the core meaning as ‘voting power for people’. In addition, the
concept can be associated with various properties that can be context dependent.
In Figure 1, the first context (c1) associates democracy with secret ballots and a
multiple party system and assumes there is a minimum age. This could represent
how a (well-functioning) democracy is perceived in Europe in the 21st century.
Context 2 (c2) not only assumes an age limitation, but also restricts voting rights
6
    http://lemon-model.net
7
    https://github.com/cltl/GRaSP/blob/master/examples/concept-
    drift/cuteLemon.png illustrates cute’s conceptual replacement.


                                         4
                     Fig. 1. Democracy in different contexts


to a gender and income, fitting the context of many European countries in the
late 19th century. We can imagine other variations that capture how democracy
was seen in (e.g.) the Soviet Union, Ancient Rome or Athens. Note that we can
also use these contexts to model extensional changes (e.g. to represent which
countries were part of the European Union in a specific time).

3.3   Using formal representations
Conceptual changes can be detected by comparing changing in an ontology [16].
As Horrocks points out, “in computer science an ontology is an engineering
artefact, usually a model of (some aspect of) the world; it introduces vocabu-
lary describing various aspects of the domain being modelled, and provides an
explicit specification of the intended meaning of the vocabulary” [9]. Ontologies
can be changed to improve their ability to model the world. However, we must
distinguish between the case when this improvement takes place because pre-
vious versions of an ontology were inadequate, imprecise, erroneous (ontology
versioning), and when it happens because a change that occurred in the world
needs to be tracked. By making the context in which a certain definition applies
explicit, we can distinguish changes in the world from changes that are extensions
or corrections of the ontology. This furthermore allows us to use the ontology for
interpreting data from or about the past. Concept changes in history start long
before the Semantic Web existed and interpretations from historical contexts
will therefore need to be added directly as representations of the past. We will
describe what this might look like in DBpedia and GeoNames.
    DBpedia contains definitions of several forms of democracy, including Soviet
Democracy, Direct Democracy, Representative Democracy and Pseudo-Democracy.
They are defined as dct:subject of dbpedia:Democracy. They can be linked to
specific societies through the dct:subject predicate as well (e.g. Soviet Democ-
racy is linked to the Soviet Union and Direct Democracy to Ancient Greek
society). DBpedia does however not make explicit that the concept of ‘Direct


                                       5
Democracy’ is a form of democracy that was practiced in Athens during a spe-
cific time and that thus ‘Democracy’ in the context of this society refers to a
direct democracy. In our model these relations could be made explicit by cre-
ating a skos:sameAs link between ‘Democracy’ and ‘Direct Democracy’ for the
context of Athens in the Ancient Greek society. A scholar interpreting Plato may
use this information to deduct that the negative view of democracy expressed in
Plato’s view is not necessarily aimed at modern day democracy.
     GeoNames provides information about locations and geo-political entities.
One of GeoNames’ challenges is that such entities constantly change: countries
and cities appear and vanish and some areas have a controversial political status.
GeoNames indicates that countries that no longer exist such as Czechoslovakia
are ‘historical political entities’, but it currently does not offer any information
about the country’s context. Though this may be easy to add for a country from
the past that was relatively stable throughout its existence, this becomes more
challenging for countries such as Poland, which are current countries, but have
changed drastically throughout history (from being one of the largest empires
in Europe to completely disappearing as a political entity twice to moving West
after World War II). If GeoNames wants to accurately represent this information,
time-dependent contexts are essential.
     The examples above are still relatively straight-forward cases of concept drift
and semantic change. In the next section, we will illustrate that the interaction
between various phenomena can become quite complex and that for accurate
historical interpretation, a wide range of concepts needs to be taken into account.


4   Semantic changes in history

In 1541, Catharina de Chasseur ([3, for her biography]) was arrested on the sus-
picion of forgery and distribution of false coins. Many historians have described
and interpreted this episode in different ways over the past two centuries. One
contemporary had testified that Catharina, the huysfrou of Lord Van Assendelft,
was quaet ende boos. To modern Dutch these two expressions would literally
translate to ‘housekeeper’ and ‘angry and angry’. In the sixteenth century, how-
ever, these words translated to ‘wife’, ‘evil’ and ‘bad’. Trained historians are
aware of such semantic changes and will use the necessary dictionaries to inter-
pret their source material well.
     Both boos and quaet refer to a different concept in modern Dutch (though
they may occur with their old meaning in some fixed expressions). The changes
related to huysfrou are slightly more complex. Conceptual replacement has taken
place, because the modern word huisvrouw no longer refers to spouse. However, it
still typically refers to a woman taking care of the household for her own family.
At the same time, its original meaning ‘wife’ has had a stable core meaning (a
person of the female gender who is legally married to another person), but outside
its core the intension heavily depends on context (time, place and culture). Even
though a wife still is a female person married to another person, this other
person could only be a person of the male gender in the sixteenth century.

                                         6
The sixteenth century ‘wife’ would furthermore be more associated with ‘family
ties’, ‘status’ and ‘bearer of children’ than with ‘romantic love’ and ‘equal’ or
‘other/better half’. This concept drift, the concept of ‘wife’ with a different
intension, is imperative to identify for the historian, because they can lead to
different interpretations of the past.
    Catharina de Chasseur was the wife of the Lord van Assendelft. The Van
Assendelfts opposed this marriage because Catharina was not of nobility. To
them the concept of ‘wife’ did not fit her for their son, because she was socially
mismatched. While modern observers could judge the Van Assendelft family’s
behavior as cruel and uncaring, they behaved mostly in line with the customs of
their time. As such, the correct interpretation of sources is even more dependent
on concept drift than on words changing meaning (conceptual replacement).
    The example of huysfrou and ‘wife’ illustrates the sometimes complex inter-
action between a shift in label and concept drift. The current word huisvrouw
may have lost the core meaning of ‘spouse’, but it still shares many associations
with its original lexicalSense.8 It may be an interesting topic of investigation
how the concept drift of ‘wife’ relates to the lexical semantic change of huysfrou:
did the drifting concept induce the lexical semantic change, or the other way
around? The RDF representation proposed in this paper allows researchers to
make both forms of change explicit supporting this kind of investigation.


5     Conclusion and Discussion
The main goal of this paper was to show how various disciplines look at con-
cept drift in order to increase understanding and hence facilitate collaboration
between disciplines. We argued that out of the three forms of concept drift ad-
dressed by Wang et al. [16], only intensional change is (likely to be) understood
as concept drift across domains. Extensional change and label change are related
phenomena that can be indicative of concept drift, but this is not always the
case. The relevance of extensional change in particular depends on the kind of
concept under consideration.
    We showed how various forms of semantic change can be modeled using lemon
to link lexical entries to concepts in ontologies and using contexts to define how
a specific concept was seen or what its extension was in a specific time and
place. Making this information explicit can allow researchers to better interpret
data [4] and distinguish ontological changes due to changes in the concepts from
those that are corrections or extensions of the ontology. We described how the
model might be used in DBpedia and GeoNames.
    Obtaining this information, however, is challenging. Recent advances in dis-
tributional semantics have been shown to be highly effective in picking up seman-
tic change from text [7], but they have mainly been evaluated on known cases
of conceptual replacement. Capturing concept drift requires domain expertise
and the examples above show that for a field such as history, relevant changes
8
    An illustration of the concept and lexicalSense of huysfrou and ‘wife’ can be found at:
    https://github.com/cltl/GRaSP/blob/master/examples/concept-drift/spouce.gif


                                             7
may apply to concepts for everyday terms such as ‘wife’ to highly philosophical
terms such as ‘freedom’. On the other hand, concept drift is intensively studied
and there is a vast amount of knowledge about semantic change. If we can show
that Semantic Web technology can help them, experts may be swayed to help
the Semantic Web and represent their knowledge in RDF, leading to models
showing how to interpret terms and concepts in various contexts over time.

6    Acknowledgments
This work was supported by the Amsterdam Academic Alliance Data Science
(AAA-DS) Program Award to the UvA and VU Universities and NWO VENI
grant 275-89-029. We would like to thank anonymous reviewers and the work-
shop’s organizers for their feedback. All remaining errors are our own.

References
 1. Betti, A., van den Berg, H.: Modelling the History of Ideas. British Journal for the
    History of Philosophy 4(22), 812–835 (2014)
 2. Bouquet, P., Serafini, L., Stoermer, H.: Introducing context into rdf knowledge
    bases. In: SWAP. vol. 5, pp. 14–16 (2005)
 3. ter Braake, S.: Chasseur, Catharina de. In: Digitaal Vrouwenlexicon van Nederland
    (2006)
 4. Ceolin, D., Noordegraaf, J., Aroyo, L.: Capturing the ineffable:collecting, analysing
    and automating web document quality assessments. In: EKAW (2016), to appear
 5. Collier, D., Hidalgo, F.D., Maciuceanu, A.O.: Essentially Contested Concepts.
    Journal of Political Ideologies 11(3), 211–226 (2006)
 6. Frege, G.: Über Sinn und Bedeutung. Zeitschrift für Philosophie und philosophis-
    chen Kritik (100), 25–50 (1892)
 7. Frermann, L., Lapata, M.: A bayesian model of diachronic meaning change. Trans-
    actions of the Association for Computational Linguistics 4, 31–45 (2016)
 8. Geeraerts, D.: Theories of lexical semantics. Oxford University Press (2010)
 9. Horrocks, I.: Ontologies and the semantic web. Commun. ACM 51(12), 58–67 (Dec
    2008), http://doi.acm.org/10.1145/1409360.1409377
10. Kenter, T., Wevers, M., Huijnen, P., De Rijke, M.: Ad Hoc Monitoring of Vocab-
    ulary Shifts over Time. Proceedings of the 24th ACM International on Conference
    on Information and Knowledge Management pp. 1191–2000 (2015)
11. Kuukkanen, J.M.: Making sense of conceptual change. History and Theory Octo-
    ber(47), 351–372 (2008)
12. Mccrae, J., Aguado-De-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-
    Pérez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T.:
    Interchanging lexical resources on the semantic web. Lang. Resour. Eval. 46(4),
    701–719 (Dec 2012), http://dx.doi.org/10.1007/s10579-012-9182-3
13. Richter, M.: The History of Political and Social Concepts. A critical introduction.
    Oxford University Press, New York and Oxford (1995)
14. Saeed, J.I.: Semantics. Blackwell Publishers (1997)
15. Stevenson, A.: Oxford dictionary of English. Oxford University Press, USA (2010)
16. Wang, S., Schlobach, S., Klein, M.: Concept drift and how to identify it. Web
    Semantics: Science, Services and Agents on the World Wide Web 9(3), 247–265
    (2011)



                                           8