=Paper= {{Paper |id=Vol-2456/paper36 |storemode=property |title=Toward Predicting Impact of Changes in Evolving Knowledge Graphs |pdfUrl=https://ceur-ws.org/Vol-2456/paper36.pdf |volume=Vol-2456 |authors=Romana Pernischová,Daniele Dell'Aglio,Matthew Horridge,Matthias Baumgartner,Abraham Bernstein |dblpUrl=https://dblp.org/rec/conf/semweb/PernischovaDHBB19 }} ==Toward Predicting Impact of Changes in Evolving Knowledge Graphs== https://ceur-ws.org/Vol-2456/paper36.pdf
        Toward Predicting Impact of Changes in
             Evolving Knowledge Graphs

    Romana Pernischová1 , Daniele Dell’Aglio1 , Matthew Horridge2 , Matthias
                  Baumgartner1 , and Abraham Bernstein1

                      University of Zurich, Zurich, Switzerland
                      1

          {pernischova,dellaglio,baumgartner, bernstein}@ifi.uzh.ch
                       2
                         Stanford University, Stanford, USA
                         matthew.horridge@stanford.edu



        Abstract The updates on knowledge graphs (KGs) affect the services
        built on top of them. However, changes are not all the same: some updates
        drastically change the result of operations based on knowledge graph
        content; others do not lead to any variation. Estimating the impact of a
        change ex-ante is highly important, as it might make KG engineers aware
        of the consequences of their action during KG editing or may be used to
        highlight the importance of a new fragment of knowledge to be added
        to the KG for some application. The main goal of this contribution is to
        offer a formalization of the problem. Additionally, it presents some pre-
        liminary experiments on three different datasets considering embeddings
        as operation. Results show that the estimation can reach AUCs of 0.85,
        suggesting the feasibility of this research.

        Keywords: KG Evolution · Impact · Embeddings.


1     Introduction

Knowledge graphs (KG) and ontologies1 change over time. Such graphs are up-
dated by inserting new knowledge and removing or changing outdated and wrong
information. As an example, consider the computation of a (logical) materializa-
tion: a small change in the schema can have a big impact on the KG, significantly
vary the number of materialized axioms or even its consistency. However, not
every change in the KG leads to significant changes in the materialized graph.
The computation of the materialization consumes considerable amounts of re-
sources. Consequentially, the indication of a potentially big difference from the
old materialization to the new one would signal the necessity for recomputation.
    This topic is not at all new, but so far studies come from different direc-
tions. It relates to stream processing since the process executed over a stream
1
    We use the terms knowledge graph and ontology interchangably because of the dif-
    ferent datasets used in the case study.
    Copyright © 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2       R. Pernischová, et al.

                                      𝑖𝑚𝑝𝑎𝑐𝑡/0 (𝑅" , 𝑅"&' )
                                             ≅
              𝑅"6'           𝑅"         . /0 (𝐾" , 𝛿" )
                                       𝑖𝑚𝑝𝑎𝑐𝑡                   𝑅"&'      𝑅"&5




                                                              𝑜𝑝(𝐾"&' )
              𝑜𝑝(𝐾"6' )




                                                                          𝑜𝑝(𝐾"&5 )
                          𝑜𝑝(𝐾" )
                                               𝛿"
              𝐾"6'           𝐾"                                 𝐾"&'      𝐾"&5

Figure 1: General model of the problem setting with Kt being the KG at time t,
δt the changes leading to Kt+1 , op(·) the operation executed on the KG, Rt the
result of op(·), impactop (·) the impact, and impact
                                                d op (·) the estimation.


of data changes results as the data comes in [2]. Moreover, Gross et al. [5] ex-
amine how the changes in an biomedical ontology impact previously conducted
functional analysis. Gottron and Gottron [4] evaluate how indexes are affected
by the evolution of the LOD dataset.
    In this poster, we present a formalization of the problem of estimating the
impact on a operation result over an evolving KG. Furthermore, we show first
experiments with three different datasets on embeddings operations [8].


2   The Problem of Predicting the Impact

Fig. 1 shows the conceptualization of the impact prediction problem. A knowl-
edge graph K is a set of triples (s, p, o), where s and o are two resources connected
by a predicate/relation p. We define an evolving knowledge graph K as a sequence
(K1 , K2 , . . . , Kt , Kt+1 , . . .), where Kt denotes the KG at the time instant t. This
definition of evolving KG is similar to the one of ontology stream proposed by
Ren and Pan in [7]. Let Kt and Kt+1 be two consecutive versions of K. The
update of K between t and t + 1 is described by a set of changes δt . δ indicates a
set of edits that are authored by one or more agents, such as ontology engineers
or maintenance bots.
    We define op(·) as a function which applies to a KG and produces a result
R. op(·) takes as arguments a KG and zero or more additional parameters if
necessary. When the operation op(·) is applied to K, it will create a sequence of
results R = (R1 , R2 , . . . , Rt , . . .), where Rt is the result of op(·) on Kt .
    Given Kt and Kt+1 , the respective results Rt and Rt+1 can be the same (if
the changes δt do not affect the result of op(·)), or they can differ. We model
this comparison through the function impactop (Rt , Rt+1 ), which represents the
impact that the evolution had on the results of op(·).
    Therefore, the goal of this research is to build an estimator of impactop (·),
which does not require the computation of all op(·) results. We indicate the esti-
mator with impact  d op (Kt , δt ) since it takes as arguments a knowledge graph and
the set of changes leading to a new version. Moreover, op(·) should be considered
as well, since different operations may lead to different impact functions.
         Toward Predicting Impact of Changes in Evolving Knowledge Graphs             3

3     Case study

We discuss our model in one case study considering embeddings as op(·).
Definition of Impact. To evaluate embeddings of two snapshots, we make the
comparison using neighborhoods. Taking the 100 closest neighbors of a particular
node, we compare how many of these are also in the neighborhood of the same
node in the embedding of the next version. The aggregation for the whole graph
is then done via the sum and the mean, producing two impact measures.
Definition of Features.2 Different measures have been established in the net-
work analysis community such as degree centrality or clustering indicators. Fur-
ther, we calculate sparseness and graph entropy measures. To use these measures
as features, we calculated a simple difference between the graph measure value in
two versions of the graph. Other features are based on change actions. A change
action is an item in δ. We used the classification introduced by Hartung, Gross,
and Rahm [6] to identify high-level change actions such as move, merge, split,
add, and delete node. Subsequently, we calculate degree, closeness, between-
ness, and degree centrality for the affected nodes.
Data. We test the impacts and features defined above through experiments on
three datasets: Bear-B-instant (BB), the Gene Ontology (GO) and an anonymized
and de-identified WebProtege Ontology (WP). The BB dataset was produced
using the 100 most volatile subjects of the DBpedia [3]. In our experiments, we
use 3’923 datapoints. GO is a well known ontology in the biomedical domain
and has been maintained by the Gene Ontology Consortium since 2000 [1]. It
has (only) 99 versions producing 90 datapoints that we were able to use. Lastly,
we randomly selected a WebProtege Ontology (WP) based on three criteria: it
(1) has at least 2’000 snapshots, (2) was edited by more than two users, and
(3) has less than 30’000 axioms.
The impact
        d estimator. We split the data into a training and testing datasets,
where the training dataset contains 70% of the data points. For learning, we use
10-fold cross validation. The area under the receiver operating characteristics
curve (AUC; also called c-statistic) is calculated to compare the models after
testing. To calculate the AUC for regression, we determine a threshold for the
observed impact to turn the regression into a binary classification. Features were
selected using Pearson correlation (corr) and 10-fold Ridge regression (ridge).
Results. We test the performance of impact
                                         d by running three experiments on
the considered datasets and impact measures. Table 1 reports the AUC results
of the experiments. For the BB dataset, performance does not exceed an AUC
of 0.63. On the other hand, for WP performance reaches 0.851. With GO, the
only dataset including change action features, the performance is 0.8 and higher
on multiple occasions.
2
    The complete list of the features can be found in the source code at http://tinyurl.
    com/y29t77v4
4       R. Pernischová, et al.


                        Table 1: Results of the experiments.
          BB                      GO                     WP
         sum         mean        sum         mean        sum         mean
          corr ridge corr ridge corr ridge corr ridge corr ridge corr ridge
GLM      0.534 0.553 0.569 0.612 0.500 0.811 0.843 0.830 0.796 0.796 0.622 0.617
SVMLin 0.536 0.553 0.580 0.622 0.500 0.811 0.744 0.764 0.787 0.782 0.620 0.611
SVMRad 0.564 0.535 0.600 0.640 0.650 0.661 0.753 0.694 0.681 0.767 0.851 0.764
Features   4     15    4    12     21    9     17    7    13    10     2    10


4   Conclusions
Following the butterfly effect, a small change in a KG can lead to large differences
in operation results. This poster presents the problem setting and definition
along with a case study using embeddings over three datasets and two impact
measures. The successful experiments show that it is possible to predict the
impact. We can therefore decide when an update to the operation is necessary,
potentially saving valuable resources.
    This study triggers new questions. For instance, is it possible to construct a
general model spanning over different datasets and operations? We need to gain
a deeper understanding of the common factors among datasets and operations.
Which is why we plan to run further experiments in this direction.
    This study clearly shows that the investigation of KG-evolution and the im-
pact of changes on KGs is fruitful and needs to be intensified.

References
1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis,
   A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver,
   L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin,
   G.M., Sherlock, G.: Gene Ontology: Tool for the unification of biology. Nat. Genet.
   25, 25 EP – (05 2000/05/01/online)
2. Chen, J., Lecue, F., Pan, J.Z., Chen, H.: Learning from Ontology Streams with
   Semantic Concept Drift. In: IJCAI. pp. 957–963. ijcai.org (2017)
3. Fernandez, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating Query and Stor-
   age Strategies for RDF Archives. In: SEMANTICS. pp. 41–48. ACM (2016)
4. Gottron, T., Gottron, C.: Perplexity of Index Models over Evolving Linked Data.
   In: ESWC. LNCS, vol. 8465, pp. 161–175. Springer (2014)
5. Gross, A., Hartung, M., Prüfer, K., Kelso, J., Rahm, E.: Impact of ontology evolu-
   tion on functional analyses. Bioinformatics 28(20), 2671–2677 (2012)
6. Hartung, M., Gross, A., Rahm, E.: COnto-Diff: Generation of complex evolution
   mappings for life science ontologies. Journal of Biomedical Informatics 46(1), 15–32
   (Feb 2013)
7. Ren, Y., Pan, J.Z.: Optimising ontology stream reasoning with truth maintenance
   system. In: CIKM. pp. 831–836. ACM (2011)
8. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge Graph Embedding: A Survey
   of Approaches and Applications. IEEE Trans Knowl Data Eng 29(12), 2724–2743
   (2017)