<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Challenges of Knowledge Graph Evolution from an NLP Perspective</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tabea Tietz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marieke van Erp</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KNAW Humanities Cluster, DHLab</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Karlsruhe Institute for Technology, Institute AIFB</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>71</fpage>
      <lpage>76</lpage>
      <abstract>
        <p>Knowledge graphs often express static facts, but concepts and entities change over time. In this position paper, we propose challenges that arise from the perspective of combining NLP and KG evolution in the digital humanities domain based on preliminary experiments.4 Knowledge graphs (KGs) intend to represent what we consider true about (part of) the world. KGs are created at a certain point in time and can be considered static snapshots of the real world [8]. However, \Knowledge lives. It is not static, nor does it stand alone" [2]. Thus, concepts continuously change over time and can vary between social contexts and locations, i.e. we live in a world with in nite variation and variability. These concept changes may be a result of technological developments, changing social constructs, political decisions, globalization etc. For example, our current understanding of family as a concept has changed drastically over the years, e.g. with same-sex marriages being allowed in more and more states. Likewise, the concept of a country can change in terms of tangible properties such as borders, o cial language(s) and rulers, but also in latent ways such as its citizen's identities within the state and the perception of a country's culture by foreigners. These concepts are manifested not only in our cultures norms and values, but also documented through photographs, newspapers, books, music, lm, and ads. Digital humanities research often involves the understanding of cultural heritage data. Recently, novel methods involving Natural Language Processing (NLP) supported by KGs have entered the humanities research community [5]. Hence, the evolution of real-world concepts within a KG in combination with NLP is especially relevant. Therefore, evolution can be understood in two ways:</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph Evolution</kwd>
        <kwd>NLP</kwd>
        <kwd>Cultural Heritage</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1. Natural language text can form the content of a KG through NLP. In this
way, evolution refers to the text itself. Historical text as the mirror to
societies in varying realities and contexts thereby de nes what is being modeled
in a KG. Here, NLP is thus an essential part of the process of KG evolution.
2. It can also be assumed that a KG is created or evolves independently of
automated NLP processes. In this case, evolution means that classes, instances
and values are created or altered by a source outside of the reality a text was
authored or analyzed in. In this case, NLP is not part of the initial process
of evolution, but applies whatever reality is de ned in the KG to its source
text.</p>
      <p>There are a number of challenges in representing the uidity of a concept
within KGs, especially respecting their cultural, temporal and geographical
contexts. The goal of this position paper is to describe the challenges that arise from
the perspective of combining NLP and KG evolution in the digital humanities
domain based on preliminary experiments on the concept of apple pie.
Furthermore, we present some strategies on how these challenges may be addressed.</p>
      <p>The remainder of the paper is structured as follows. Section 2 presents related
work on knowledge graph evolution. The use case of apple pie recipes is brie y
described in section 3. In section 4, the problems of KG evolution in combination
with NLP is discussed based on the use case of apple pie recipes and strategies
on how to tackle these issues are presented. Section 5 concludes this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Concept drift over time is studied in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This analysis is based on theories of
concept identity and concept morphing. The authors de ne the meaning of a concept
in terms of intension, extension and label. The intension changes when
properties are added or disregarded, the extension refers to the change of instances in
the ontology, and a label changes when the name of a concept changes.
      </p>
      <p>
        Once concept drift has been detected, maintaining KGs with respect to
changing entities is the next challenge. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] mostly focuses on the veri cation
of changes to ensure high data quality. They found that evolutionary patterns
in KGs are similar to social networks. Their results contribute to an improved
KG editing process towards better e ciency and reliability. This work takes KG
evolution from a di erent angle than the presented paper. The authors describe
that errors in KGs occur due to vandalism and carelessness. However, the issue
that a de nition about a concept may be true at one point in time, but not in
another is not addressed.
      </p>
      <p>
        Knowledge graphs are dynamic and the facts related to an entity are added
or removed over time. Therefore, multiple versions of the knowledge graph
represent a snapshot of the graph at some point in time. Entities undergo evolution
when new facts are added or removed. The approaches to solve the problem
of automatically generating a summary out of di erent versions of a knowledge
graph are limited. The authors in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] propose an approach to create a
summary graph capturing temporal evolution of entities across di erent versions of
a knowledge graph in order to use the entity summary graphs for documentation
generation, pro ling or visualization purposes.
The goal of this position paper is to investigate challenges of KG evolution
from an NLP perspective and to provide future visions with respect to digital
humanities research. These challenges are based on the preliminary analysis of
the concept of apple pie recipes extracted from historical Dutch and American
newspapers.
      </p>
      <p>
        In order to study the evolution of apple pie recipes over time, data from
Dutch and American newspapers was collected. As [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] remarked, recipes from
newspapers re ect tastes and viewpoints in a certain time period and can o er
understanding of food cultures. This makes newspapers an invaluable data source
to study evolution compared with e.g., cookbooks, which provide a static
collection of recipes. For this contribution, the ingredients of apple pie recipes and their
corresponding quantities in di erent contexts (i.e., time, location) were analyzed.
Since recipes from historical newspapers are not easily accessible, a small
selection of recipes from digitized newspapers was made to provide a proof of concept
and illustration of ideas. This selection includes recipes published in one of the
four Dutch newspapers Trouw, Het Parool, Volkskrant and NRC Handelsblad, or
one of the three American newspapers Evening Star, Wilmington Morning Star
and Paci c Commercial Advertiser in the period from 1857 until 1995, resulting
in 347 apple pie recipes. The recipes were transformed to a structured format,
including the available context information (e.g. date, location and language of
the publication). Finally, 12 recipes with publication dates spread over the time
period 1857-1995 were investigated in a preliminary analysis. The recipes with
extracted ingredients are visualized in gure 1 and available on GitHub.5
      </p>
      <p>The concept of apple pie is seemingly simple: it should always contain apple,
a kind of our, a sweetener and a fat { so what is there to evolve? The challenges
arising from our preliminary analysis are presented in the following section.</p>
      <sec id="sec-2-1">
        <title>5 https://pimpmypie.github.io/</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Challenges and (Possible) Strategies</title>
      <p>During the preliminary analysis of the recipe data, we identi ed the following
challenges:
Spatio-temporal context When extracting knowledge from historical sources,
several spatio-temporal contexts often have to be taken into account. For
example, an article published in an American newspaper in 1995 that describes
a typical Hungarian apple pie recipe from the 1950s entails multiple contexts.
Here we can distinguish the spatio-temporal metadata of the concept itself
(in this example the recipe) and the metadata of its source (i.e. newspaper
article). These provenance information will enable to trace the evolution of
the concept over time and geographic regions.</p>
      <p>Cultural context What is considered as true in one cultural setting may not
be in another. For instance, the traditional apple strudel could be considered
as a type of apple pie in some cultures, however, in the area formerly
belonging to the Austro-Hungarian empire, a clear distinction is made between
both desserts, even though the ingredient list is rather similar. Contextual
information such as cookbook indexes (or more generally, taxonomies) can
help resolve this issue.</p>
      <p>Units Extracting and understanding ingredient units presented in the texts
was found to be a major challenge in this use case. In modern sources, this
involves modern units and their conversion between e.g. the imperial and
the metric system of units (kilogram, pound, litre, cups) and in historical
sources, this also includes units (usually) not customary on this day, e.g. ell,
zentner. Furthermore, less tangible units are sometimes used, e.g. \a load of
butter" or \two deep plates of apples", which provides a greater challenge for
the automated detection and interpretation of values and quantities. There
are resources for historical measures which can be employed, but imprecise
quantities will require human interpretation.</p>
      <p>
        Language When attempting to generate a KG from historical text sources that
captures the uidity of concepts, a number of challenges for NLP arise. In
our use case, we found gures of speech, such as metaphors (likely to appear
in newspapers), that complicated the process of detecting apple pie recipes
correctly. For instance, the following recipe was found: \Take 1000 kilos
of bombs, a few hundred hand grenades, as many boxes of cartridges, go
to Vienna with them, make a coup there and wait until you get arrested.
Then the apple strudel will be ready [...] " 6. Without correctly detecting the
metaphor, bombs, grenades and cartridges would be added to the KG as
ingredients. Previous research has started to investigate the combination of
deep learning and KGs to detect metaphors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Furthermore, the meaning of concept terms may change over time which
can be captured with the help of latent representation of the words and
6 http://anno.onb.ac.at/cgi-content/anno?aid=kik&amp;datum=19190907&amp;query=\%
22Apfelstrudel+Rezept\%22~10&amp;ref=anno-search&amp;seite=7
represented in the KG. One of the initial approaches construct time series of
word usage using word embeddings (where one embedding space is generated
for each point in time) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the authors propose an approach based on
three components; the rst component takes time as an input and generates
a time vector, the second component generates a word vector (independent of
the time) and the third component combines the time and the word vector.
Concept Modeling The above described challenges also raise the question
about how broad or narrow concepts should be modeled in a KG to be able to
capture concept change. Is the ontology modeled too speci c (e.g. the recipe
has to include speci c ingredients), recipes from economically weak years
(in which these ingredients were not available) would not be considered even
though they would yield to interesting results from a digital humanities
viewpoint. For example, in gure 1, the US recipe from 1857 includes citric acid
instead of apples, which could provide hints on economic shortages. On the
other hand, underspeci ed models may introduce false positives, i.e. recipes
falsely detected as apple pies as the example (metaphor) above emphasizes.
Furthermore, it is a challenge to determine the properties that de ne
concepts, also with respect to changes over time. If apple pie and apple strudel
have a rather similar list of ingredients, also (possibly) the baking procedure
and technique plays a vital role in de ning the concept, which would have
to be modeled in a KG.
      </p>
      <p>Evaluation Finally, an evaluation of an ontology capturing concept change in
a KG based on natural language text descriptions should provide measures
on how well the model reacts to changes in the data. However, as described
above, evaluating whether or not something is to be regarded as an apple pie
(or any other concept) depends on many aspects, e.g. cultural background
as well as the spatio-temporal setting. Hence, a concise ground truth is a
challenge to create and possibly only tendencies can be given. One
strategy on how to deal with this is to create a crowd-truth which states the
viewpoints on a cultural heritage object by a greater amount of human
evaluators from varying backgrounds as well as domain experts. In future work,
a crowdsourcing campaign on apple pie (and further use cases) is envisioned.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>The real world is constantly changing and knowledge that was considered true
at one point in time in a speci c cultural and spatial setting may not be true in
another context. That means contexts evolve. On the other hand, there are KGs,
which are created and maintained to continuously compose knowledge. However,
often KGs are static and only re ect one snippet of reality. This static
representation of the real world is a problem when attempting to understand historical
descriptions of concepts (e.g., in newspapers), because linking historical concepts
to today's understanding of the same concept may distort its meaning.</p>
      <p>In this paper, the problem is addressed on the foundation of preliminary
experiments on the concept of apple pie. The take home message of this
contribution is that modeling KG evolution even for simple and contained concepts like
apple pie provide complex challenges for ongoing research in the Semantic Web
community including the various contexts that need to be taken into account,
ambiguities in language and used units as well as the granularity of the model
and evaluation. Most of the challenges that were detected are generalizable to
numerous concepts within the digital humanities domain.</p>
      <p>In future work, more use cases (apart from apple pie) will be analyzed and
methods on how to represent KG evolution will be evaluated on the foundation
of the challenges and strategies as presented in section 4.</p>
      <p>Acknowledgement. This work was made possible by the International
Semantic Web Research Summer School 7 in Bertinoro, July 2019. The authors would
like to thank the Summer School directors, tutors, the organizing team and the
fellow students, especially Mortaza Alinam, Wouter van den Berg, Lientje Maas,
Fabio Mariani and Eleonora Marzi.</p>
      <sec id="sec-4-1">
        <title>7 http://semanticwebschool.org/</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Can knowledge graphs and deep learning approaches help in representing, detecting and interpreting metaphors? Workshop on Deep Learning for Knowledge Graphs (DL4KG) co-located with</article-title>
          <source>ESWC 2019</source>
          Vol-
          <volume>2377</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371)</article-title>
          .
          <source>Dagstuhl Reports</source>
          <volume>8</volume>
          (
          <issue>9</issue>
          ),
          <volume>29</volume>
          {
          <fpage>111</fpage>
          (
          <year>2019</year>
          ). https://doi.org/10.4230/DagRep.8.9.
          <fpage>29</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. van Erp,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Wevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Huurdeman</surname>
          </string-name>
          , H.:
          <article-title>Constructing a recipe web from historical newspapers</article-title>
          .
          <source>In: Int. Semantic Web Conference</source>
          . pp.
          <volume>217</volume>
          {
          <fpage>232</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Rfou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perozzi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skiena</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Statistically signi cant detection of linguistic change</article-title>
          . In: Gangemi,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Leonardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Panconesi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 24th International Conference on World Wide Web, WWW</source>
          <year>2015</year>
          , Florence, Italy, May
          <volume>18</volume>
          -22,
          <year>2015</year>
          . pp.
          <volume>625</volume>
          {
          <fpage>635</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Meron</surname>
          </string-name>
          <article-title>~o-Pen~uela,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ashkpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Van Erp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mandemakers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Breure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Scharnhorst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schlobach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Van Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Semantic technologies for historical research: A survey</article-title>
          .
          <source>Semantic Web</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <volume>539</volume>
          {
          <fpage>564</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nishioka</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Analysing the evolution of knowledge graphs for the purpose of change veri cation</article-title>
          .
          <source>In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC)</source>
          . pp.
          <volume>25</volume>
          {
          <issue>32</issue>
          (Jan
          <year>2018</year>
          ). https://doi.org/10.1109/ICSC.
          <year>2018</year>
          .00013
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rosenfeld</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erk</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Deep neural models of semantic shift</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>474</volume>
          {
          <issue>484</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tasnim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collarana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orlandi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Summarizing entity temporal evolution in knowledge graphs</article-title>
          .
          <source>In: Companion Proceedings of The 2019 World Wide Web Conference</source>
          . pp.
          <volume>961</volume>
          {
          <fpage>965</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Concept drift and how to identify it</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ),
          <volume>247</volume>
          {
          <fpage>265</fpage>
          (
          <year>2011</year>
          ), semantic Web Dynamics Semantic Web Challenge,
          <year>2010</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>