<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward Predicting Impact of Changes in Evolving Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Romana Pernischová</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Dell'Aglio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew Horridge</string-name>
          <email>matthew.horridge@stanford.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Baumgartner</string-name>
          <email>baumgartner@ifi.uzh.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abraham Bernstein</string-name>
          <email>bernstein@ifi.uzh.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stanford University</institution>
          ,
          <addr-line>Stanford</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Zurich</institution>
          ,
          <addr-line>Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The updates on knowledge graphs (KGs) afect the services built on top of them. However, changes are not all the same: some updates drastically change the result of operations based on knowledge graph content; others do not lead to any variation. Estimating the impact of a change ex-ante is highly important, as it might make KG engineers aware of the consequences of their action during KG editing or may be used to highlight the importance of a new fragment of knowledge to be added to the KG for some application. The main goal of this contribution is to ofer a formalization of the problem. Additionally, it presents some preliminary experiments on three diferent datasets considering embeddings as operation. Results show that the estimation can reach AUCs of 0.85, suggesting the feasibility of this research.</p>
      </abstract>
      <kwd-group>
        <kwd>KG Evolution</kwd>
        <kwd>Impact</kwd>
        <kwd>Embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Knowledge graphs (KG) and ontologies1 change over time. Such graphs are
updated by inserting new knowledge and removing or changing outdated and wrong
information. As an example, consider the computation of a (logical)
materialization: a small change in the schema can have a big impact on the KG, significantly
vary the number of materialized axioms or even its consistency. However, not
every change in the KG leads to significant changes in the materialized graph.
The computation of the materialization consumes considerable amounts of
resources. Consequentially, the indication of a potentially big diference from the
old materialization to the new one would signal the necessity for recomputation.</p>
      <p>This topic is not at all new, but so far studies come from diferent
directions. It relates to stream processing since the process executed over a stream
1 We use the terms knowledge graph and ontology interchangably because of the
different datasets used in the case study.</p>
      <p>
        Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
"&amp;5
)
("5
&amp;



"&amp;5
of data changes results as the data comes in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, Gross et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
examine how the changes in an biomedical ontology impact previously conducted
functional analysis. Gottron and Gottron [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] evaluate how indexes are afected
by the evolution of the LOD dataset.
      </p>
      <p>
        In this poster, we present a formalization of the problem of estimating the
impact on a operation result over an evolving KG. Furthermore, we show first
experiments with three diferent datasets on embeddings operations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Problem of Predicting the Impact</title>
      <p>
        Fig. 1 shows the conceptualization of the impact prediction problem. A
knowledge graph K is a set of triples (s; p; o), where s and o are two resources connected
by a predicate/relation p. We define an evolving knowledge graph K as a sequence
(K1; K2; : : : ; Kt; Kt+1; : : :), where Kt denotes the KG at the time instant t. This
definition of evolving KG is similar to the one of ontology stream proposed by
Ren and Pan in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Let Kt and Kt+1 be two consecutive versions of K. The
update of K between t and t + 1 is described by a set of changes t. indicates a
set of edits that are authored by one or more agents, such as ontology engineers
or maintenance bots.
      </p>
      <p>We define op( ) as a function which applies to a KG and produces a result
R. op( ) takes as arguments a KG and zero or more additional parameters if
necessary. When the operation op( ) is applied to K, it will create a sequence of
results R = (R1; R2; : : : ; Rt; : : :), where Rt is the result of op( ) on Kt.</p>
      <p>Given Kt and Kt+1, the respective results Rt and Rt+1 can be the same (if
the changes t do not afect the result of op( )), or they can difer. We model
this comparison through the function impactop(Rt; Rt+1), which represents the
impact that the evolution had on the results of op( ).</p>
      <p>Therefore, the goal of this research is to build an estimator of impactop( ),
which does not require the computation of all op( ) results. We indicate the
estimator with imdpactop(Kt; t) since it takes as arguments a knowledge graph and
the set of changes leading to a new version. Moreover, op( ) should be considered
as well, since diferent operations may lead to diferent impact functions.</p>
      <p>Toward Predicting Impact of Changes in Evolving Knowledge Graphs</p>
    </sec>
    <sec id="sec-3">
      <title>Case study</title>
      <p>
        We discuss our model in one case study considering embeddings as op( ).
Definition of Impact. To evaluate embeddings of two snapshots, we make the
comparison using neighborhoods. Taking the 100 closest neighbors of a particular
node, we compare how many of these are also in the neighborhood of the same
node in the embedding of the next version. The aggregation for the whole graph
is then done via the sum and the mean, producing two impact measures.
Definition of Features. 2 Diferent measures have been established in the
network analysis community such as degree centrality or clustering indicators.
Further, we calculate sparseness and graph entropy measures. To use these measures
as features, we calculated a simple diference between the graph measure value in
two versions of the graph. Other features are based on change actions. A change
action is an item in . We used the classification introduced by Hartung, Gross,
and Rahm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to identify high-level change actions such as move, merge, split,
add, and delete node. Subsequently, we calculate degree, closeness,
betweenness, and degree centrality for the afected nodes.
      </p>
      <p>
        Data. We test the impacts and features defined above through experiments on
three datasets: Bear-B-instant (BB), the Gene Ontology (GO) and an anonymized
and de-identified WebProtege Ontology (WP). The BB dataset was produced
using the 100 most volatile subjects of the DBpedia [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In our experiments, we
use 3’923 datapoints. GO is a well known ontology in the biomedical domain
and has been maintained by the Gene Ontology Consortium since 2000 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It
has (only) 99 versions producing 90 datapoints that we were able to use. Lastly,
we randomly selected a WebProtege Ontology (WP) based on three criteria: it
(1) has at least 2’000 snapshots, (2) was edited by more than two users, and
(3) has less than 30’000 axioms.
      </p>
      <p>The imdpact estimator. We split the data into a training and testing datasets,
where the training dataset contains 70% of the data points. For learning, we use
10-fold cross validation. The area under the receiver operating characteristics
curve (AUC; also called c-statistic) is calculated to compare the models after
testing. To calculate the AUC for regression, we determine a threshold for the
observed impact to turn the regression into a binary classification. Features were
selected using Pearson correlation (corr ) and 10-fold Ridge regression (ridge).
Results. We test the performance of imdpact by running three experiments on
the considered datasets and impact measures. Table 1 reports the AUC results
of the experiments. For the BB dataset, performance does not exceed an AUC
of 0.63. On the other hand, for WP performance reaches 0.851. With GO, the
only dataset including change action features, the performance is 0.8 and higher
on multiple occasions.
2 The complete list of the features can be found in the source code at http://tinyurl.</p>
      <p>com/y29t77v4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Following the butterfly efect, a small change in a KG can lead to large diferences
in operation results. This poster presents the problem setting and definition
along with a case study using embeddings over three datasets and two impact
measures. The successful experiments show that it is possible to predict the
impact. We can therefore decide when an update to the operation is necessary,
potentially saving valuable resources.</p>
      <p>This study triggers new questions. For instance, is it possible to construct a
general model spanning over diferent datasets and operations? We need to gain
a deeper understanding of the common factors among datasets and operations.
Which is why we plan to run further experiments in this direction.</p>
      <p>This study clearly shows that the investigation of KG-evolution and the
impact of changes on KGs is fruitful and needs to be intensified.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blake</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolinski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwight</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eppig</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Issel-Tarver</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasarskis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matese</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherlock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Gene Ontology: Tool for the unification of biology</article-title>
          .
          <source>Nat. Genet</source>
          .
          <volume>25</volume>
          ,
          <string-name>
            <surname>25</surname>
            <given-names>EP</given-names>
          </string-name>
          - (
          <year>05 2000</year>
          /05/01/online)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lecue</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , H.:
          <article-title>Learning from Ontology Streams with Semantic Concept Drift</article-title>
          . In: IJCAI. pp.
          <fpage>957</fpage>
          -
          <lpage>963</lpage>
          . ijcai.
          <source>org</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Evaluating Query and Storage Strategies for RDF Archives</article-title>
          . In: SEMANTICS. pp.
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
          . ACM (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Perplexity of Index Models over Evolving Linked Data</article-title>
          .
          <source>In: ESWC. LNCS</source>
          , vol.
          <volume>8465</volume>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>175</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prüfer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelso</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Impact of ontology evolution on functional analyses</article-title>
          .
          <source>Bioinformatics</source>
          <volume>28</volume>
          (
          <issue>20</issue>
          ),
          <fpage>2671</fpage>
          -
          <lpage>2677</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hartung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <string-name>
            <surname>COnto-Dif</surname>
          </string-name>
          :
          <article-title>Generation of complex evolution mappings for life science ontologies</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>46</volume>
          (
          <issue>1</issue>
          ),
          <fpage>15</fpage>
          -
          <lpage>32</lpage>
          (
          <year>Feb 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          :
          <article-title>Optimising ontology stream reasoning with truth maintenance system</article-title>
          .
          <source>In: CIKM</source>
          . pp.
          <fpage>831</fpage>
          -
          <lpage>836</lpage>
          . ACM (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Knowledge Graph Embedding: A Survey of Approaches and Applications</article-title>
          .
          <source>IEEE Trans Knowl Data Eng</source>
          <volume>29</volume>
          (
          <issue>12</issue>
          ),
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>