<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The VLDB Journal 30 (2021) 825-858. doi:10.1007/
s00778</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/mnet.2004.1337732</article-id>
      <title-group>
        <article-title>Model for Temporal Link Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Plamper</string-name>
          <email>philipp.plamper@hs-anhalt.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver J. Lechtenfeld</string-name>
          <email>oliver.lechtenfeld@ufz.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolf von Tümpling</string-name>
          <email>wolf.vontuempling@ufz.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anika Groß</string-name>
          <email>anika.gross@hs-anhalt.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Tokyo, Japan</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Anhalt University of Applied Sciences, Department Computer Science and Languages</institution>
          ,
          <addr-line>Köthen (Anhalt), 06366</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Helmholtz Centre for Environmental Research - UFZ, Central Laboratory for Water Analytics and Chemometrics</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Helmholtz Centre for Environmental Research − UFZ, Department of Analytical Chemistry</institution>
          ,
          <addr-line>Leipzig, 04318</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Magdeburg</institution>
          ,
          <addr-line>39114</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>30</volume>
      <fpage>24</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>Many systems can be intuitively modeled as knowledge graphs using entities and their relationships. However, we often have only partial or little knowledge of the inherent processes of complex, changing systems such as biomedical, economic or ecological systems. As a result, the construction of knowledge graphs often sufers from incompleteness which can lead to inaccurate analysis results and incorrect conclusions. A widely used approach is to monitor and analyse complex changing systems using time series of measurements. To understand a complex temporal network of processes, it is crucial to identify inherent temporal relationships and interactions. A complete temporal knowledge graph model could provide a better foundation for applications in complex systems and increase its potential to add context and connections that allow uncovering hidden or unknown relationships in data. We propose a snapshotbased knowledge graph model and temporal link prediction algorithm to find relationships between examined objects in successive time points of multivariate time series. We evaluate and demonstrate the functionality in an environmental chemistry use case and predict the transformations of molecules for two datasets. Our approach is able to discover previously unknown relationships in a snapshot-based knowledge graph helping to better understand the dynamics of the examined system. Knowledge representation and reasoning, Graph-based database models, Temporal data, Network algo2nd International Workshop on Knowledge Graph Reasoning for Explainable Artificial Intelligence, December 09, 2023, ∗Corresponding author.</p>
      </abstract>
      <kwd-group>
        <kwd>rithms</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Understanding mechanics and dynamics in complex networks has become an active field
of research in recent years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Since the introduction of knowledge graphs, the topic has
become even more prominent [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Despite the diferent naming conventions, the idea behind
complex networks and knowledge graphs have many similarities. Both use a graph-based
data model to represent a system and consist of nodes and edges describing entities and their
relationships. Additionally, the nodes and edges are enriched with descriptive properties [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5, 6</xref>
        ].
The representation of network-like data in the form of graphs is often used to model real-world
CEUR
Workshop
Proceedings
applications to capture knowledge from various domains and to integrate, manage and add
value to linked data sources [7, 8].
      </p>
      <p>Most real-world systems are subject to constant change and therefore should not be treated as
a static network [9]. In a graph, this means that nodes and edges can appear or disappear over
time and that properties can change, leading to a completely new topology and dynamics. In
many complex networks the transitions between the individual time points, i.e. the relationships,
are completely unknown and can hardly be determined by experts. Nevertheless, it is common
to record various values experimentally with the help of time series in order to better understand
the development of e.g. processes over time [10, 11].</p>
      <p>In most cases, the actual temporal relationships between individual time series cannot be
derived directly from the collected multivariate time series [12, 13]. In order to track temporal
developments in the data and to increase the accuracy of graph analyses, the underlying system
should store data over a longer period of time and assign temporal properties such as timestamps
[9].</p>
      <p>We here present a new snapshot-based model for temporal knowledge graphs to find so
far unknown relationships and interactions in complex systems. The prediction is based on
measured quantities for large numbers of objects from multivariate time series such as molecule
intensities in mass spectrometry experiments or vehicle counters for trafic monitoring. Our
graph structure inherently integrates temporal properties to model and predict directed edges
between entities in multivariate time series. The snapshot-based knowledge graph model and
temporal link prediction are particularly useful when no temporal edges are known in advance
and real-time monitoring is not or insuficiently possible. Our approach is generic and could be
adopted to diferent complex changing systems, e.g. biological, economic or trafic networks.
The temporal graph approach allows us to better represent the dynamic nature of temporal
networks and to gain new insights into temporal dynamics based on graph analysis. We evaluate
our approach to show its ability to uncover unknown information about the analysed system.
Our main contributions are
1. A novel snapshot-based knowledge graph model to represent multivariate time series as
a temporal graph structure.
2. A temporal link prediction algorithm to identify directed temporal edges between nodes
across successive time points.
3. An evaluation of the approach using two datasets representing complex environmental
chemistry systems.</p>
      <p>The methods have been implemented in a graph database management system and can be
analysed visually and statistically. In [14] we first introduced a temporal graph model specifically
designed for molecules and chemical transformations. Compared to the previous publication,
we here generalize the data model to objects and temporal edges without specific domain focus.
Moreover, we here present the generalized methodology and actual link prediction algorithm
that were not the focus of the use case in [14]. We further add a new dataset to the evaluation.</p>
      <p>The paper is further organized as follows. We first discuss related work in Section 2. We
present the snapshot-based temporal knowledge graph model in Section 3 and the temporal link
prediction algorithm in Section 4. We then evaluate the model and algorithm in an environmental
chemistry use case in Section 5 and conclude in Section 6.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related work</title>
      <p>
        The property graph model represents a directed heterogeneous multirelational network with
properties at nodes and edges. It is one of the standard models for representing data with
graph-like structure as a knowledge graph [
        <xref ref-type="bibr" rid="ref2 ref3">15, 16, 2, 17, 3, 7</xref>
        ]. Despite the lack of a standard
formalization [18, 17, 19], several graph databases support the model because it is considered as
comprehensive and easy to understand [18, 20, 15]. The original development of the property
graph model was based on static graphs and therefore relies on static data, but many systems
contain time-varying data. However, representing time-varying data in a static graph using
static graph tools and algorithms can lead to inaccurate and false analysis results [21, 9, 22].
The need for temporal knowledge graphs and the lack of a formal definition is reflected in
the diferent models proposed [ 23, 24, 25, 26, 27]. Temporal graph models can already be
found for instance in social graphs [23, 25], logistic graphs [26] or network graphs [24]. The
proposed temporal knowledge graph models can be roughly classified into duration-labeled,
interval-labeled or snapshot-based models [23].
      </p>
      <p>A central topic in graph analysis is the search for missing structures in an incomplete network,
this is called link mining, more precisely link prediction in the domain of complex networks [28]
and knowledge graph completion in the domain of knowledge graphs [29, 21]. Link prediction
algorithms use the existing structure given by the nodes and edges to try to infer missing
relationships based on properties and already observed relationships. The various approaches to
predict missing relationships in a temporal knowledge graph use, for example, graph embedding
[30], machine learning methods [31], similarity-based methods, or probabilistic methods [32].
The adapted algorithms have the goal of predicting missing links in an existing graph or
predicting links that will occur in the future [28, 33]. Existing approaches usually try to predict
edges based on already known structures in the graph, but lack the ability to find edges when
relationships are mostly unknown in the first place.</p>
      <p>In many domains, multivariate time series are used to understand the underlying structure
of the data [10, 11]. Many of the systems under consideration have a graph-like structure
[13], but it is often dificult or impossible to capture the relationships between the individual
time series. Proposed approaches for analysing the relationships between time series include
transforming the time series as a multilayer visibility graph [34, 13], as recurrence networks
[35, 36] or mapping it as a graph based on causality patterns [12]. These methods focus on
comprehensive analysis to find general correlations and patterns in a complex system, rather
than inferring actual transformations or interactions between thousands to millions of objects
undergoing unknown changes. Previous network-like approaches on multivariate time series
do not preserve snapshots while enriching it with temporal edges. We are interested in the
relationships that take place between successive snapshots and lead to changes in the measured
values of the individual time series. Our approach introduces a new knowledge graph model
that represents multivariate time series with a graph-like structure to allow the tracking of
single time series and understanding influences on their development.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Snapshot-Based Knowledge Graph model</title>
      <sec id="sec-4-1">
        <title>3.1. Scenario</title>
        <p>Our snapshot-based knowledge graph model and temporal link prediction is based on
multivariate time series and is particularly useful when actual relationships representing transformations,
interactions or flows between nodes in succeeding snapshots are unknown, although there is
evidence that they exist.</p>
        <p>We here consider multivariate time series data, i.e. there is a time series for each considered
object of interest in a larger set of objects. We assume each discrete time point in each time
series contains some kind of measurable quantity per object. This quantity changes over time
and we expect the objects to be related to each other. Depending on the domain, the repeatedly
measured quantity of objects could be manifold, e.g. number of transferred goods or vehicles at
monitoring stations, measured intensities of molecules, concentrations of chemical compounds
or nutrients in a water treatment plant. Potential connections may be known in advance, but
the actual relationships are unknown. For instance,
• there are chemical rules about how molecules can be transformed into other molecules,
but we do not know what transformations have actually taken place,
• we know a network of roads, but we do not know which routes have actually been used
by vehicles.</p>
        <p>Applying our approach to this scenario, the quantity value is based on the measured intensities
from the mass spectra. Identified molecules will be object nodes in the graph with their quantities
as property. The molecules of the first mass spectrum (  0) form the first snapshot, the molecules
of the second mass spectrum ( 1) form the second snapshot of the knowledge graph, and so
on. Between the two successive mass spectra, the quantities and proportions of the molecules
change, indicating that chemical transformations have taken place between the molecules [37].
We will focus on this chemical scenario for examples in the remainder of this paper, but the
approach is applicable to other time series data with repeated measurements on large object
sets.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Graph Model</title>
        <p>We propose a snapshot-based knowledge graph model to represent multivariate time series as a
temporal graph structure. Our graph model is a temporal extension of the widely used standard
”Labeled Property Graph” model [15] and belongs to the group of the snapshot-based temporal
graphs, e.g. [24, 27]. It includes one graph (or snapshot) for each discrete time point. Snapshots
consist of object nodes that will be connected by temporal edges (see Section 4) and together
form the snapshot-based knowledge graph.</p>
        <p>Figure 2 shows our temporal knowledge graph model. The model contains nodes of one
type (‘Object’) and connects them by three diferent types of temporal edges (‘SAME_AS’,
‘POTENTIAL’, ‘PREDICTED’). All types of temporal edges in the model connect a node in
one snapshot to a node in a subsequent snapshot and do not exist between nodes within one
snapshot as illustrated in Figure 3 C. An ‘Object’ node describes an entity such as a molecule
in a chemical network. The node can be identified by its name and snapshot properties, e.g. a
molecule’s formula and snapshot 0, which describes the object at the first time point in the data.
In addition, each node has a quantity property that describes a value that can change over time,
such as the measured intensity of a molecule in a sample.</p>
        <p>Initially, each discrete time point in the time series forms a snapshot, which means that the
same objects can occur more than once and be identified by the name property. An ‘Object’ can
be uniquely identified by the combination of name property and snapshot. Snapshots can be
distinguished and ordered by the snapshot property at the ‘Object’ nodes. ‘Object’ nodes with
the same name property are connected across successive snapshots via the ‘SAME_AS’ edges.
Every ‘SAME_AS’ edge contains the relative change of the quantities of the successive nodes as
property, which is termed quantity trend. The calculation is shown in Formula 1:

_  =



− 


(1)</p>
        <p>It subtracts the quantity of the previous node (  ) from the quantity of the successive
node (  ) and divides the result by the quantity of the previous node (  ).
The possible interpretations of the quantity trend are increasing, decreasing and consistent,
which is important for our temporal link prediction algorithm (see Section 4). A quantity trend
of exactly 0 is considered consistent, above it is considered increasing (0, ∞) and below it is
considered decreasing (−∞, 0). Additionally, an error margin e around zero (+/ − 12  ) can be
applied to expand the range considered as consistent. The error margin takes into consideration
the immanent analytical uncertainties caused by instrumental variability (i.e., noise).</p>
        <p>The second edge type describes the potential transformations from an ‘Object’ node to another
node in a successive snapshot and is called a ‘POTENTIAL’ edge. ‘POTENTIAL’ edges must
be determined based on existing knowledge beforehand. For instance, experts in a domain
know basic rules on possible interactions or transformations or for transport networks maps
contain all known roads. However, transformations or interactions that actually took place
or actual routes of transportation are unknown. In the chemical scenario, the ‘POTENTIAL’
transformations might be computed based on all chemically possible transformations of the
molecules in the considered use case (e.g. single reactions or chains of reactions).</p>
        <p>The third type of edges describes the predicted transformations that are likely to occur and
are called ‘PREDICTED’ edges. They are a subset of the ‘POTENTIAL’ edges and indicate
the transformations that likely occurred out of all potential transformations. For instance, the
‘PREDICTED’ edges could describe the various chemical transformations that have actually taken
place in a molecule after an conditional experiment measured as multivariate time series. The
‘PREDICTED’ edges are computed using the ‘Transformation Prediction Algorithm’ described
in Section 4.</p>
        <p>Figure 3 illustrates the step-wise construction of the snapshot-based knowledge graph from
multivariate time series data for several objects. Each discrete time point in the time series
represents a snapshot of the graph and each ‘Object’ represents a node (see Figure 3 (A and
B)). The exemplary graph consists of 3 consecutive snapshots ( 0,  1,  2) and four ‘Object’ nodes
(green A-D). Each snapshot has a diferent composition, i.e. nodes can be present in all (e.g.,
 0 ,  1 ,  2 ) or some of the snapshots (e.g.,  1 ,  2 ). Figure 3 (C) illustrates the creation of new
edges between succeeding points in time. This includes the ‘SAME_AS’ edges between same
‘Object’ nodes and the ‘POTENTIAL’ edges, which describe possible transformations. The edges
are directed and always end at a node in the successive snapshot, i.e. no edges exist within a
snapshot. Note, that the snapshots do not necessarily represent equidistant points in time . The
same edge color describes the same transformation (e.g., from  0 to  1 and  1 to  2 ).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Temporal Link Prediction</title>
      <p>At this point, the snapshot-based knowledge graph contains all the necessary information to
describe temporal data as a complex network. However, snapshots are not interconnected via
temporal edges describing the transformations in the considered system. We developed the
‘Transformation Prediction Algorithm’ to identify the likely occurred transformations based
on the potential transformations and the known quantity trends (e.g. from measurements).
Knowing the direction of the transformations and assuming that a decrease in quantity in one
‘Object’ node is reflected by an increase in quantity in another ‘Object’ node and vice versa, it is
possible to predict the likely transformations responsible for the dynamics in the snapshot-based
knowledge graph. For instance in a network of molecules, potential chemical transformations
can be specified by experts and the intensity of the molecules in samples is measured by mass
spectrometry. When the quantity of one molecule decreases, the quantity of another increases.
The prediction is based on the following three considerations and illustrated in Figure 4:
1. Consideration 1: If the quantity between ‘Object’ node   and  +1 increases from
snapshot   to  +1 , the transformations of   and   from snapshot   to  +1 are
assumed to be relevant and the transformation from   to other nodes in snapshot  +1
are assumed to be non-relevant. Therefore, outgoing potential transformations from
the node   are excluded and incoming edges to the node  +1 are included in the
consideration of the ‘PREDICTED’ edges.
2. Consideration 2: If the quantity between ‘Object’ node   and  +1 decreases from
snapshot   to  +1 , the transformations of   and   from snapshot   to  +1 are
assumed to be non-relevant and the transformation from   to other nodes in snapshot
 +1 are assumed to be relevant. Therefore outgoing potential transformations from
the node   are included and incoming edges to the node  +1 are excluded in the
consideration of the ‘PREDICTED’ edges.
3. Consideration 3: If the quantity between ‘Object’ node   and  +1 is consistent, the
‘Object’ node has a balancing number of transformations. Therefore, all incoming and
outgoing potential transformations from   and to  +1 are excluded.
This means that the node in snapshot   and the node in snapshot  +1 can be connected by a
‘PREDICTED’ edge only if both nodes have an opposite quantity trend.</p>
      <p>Algorithm 1 describes the ‘Transformation Prediction Algorithm’ for creating the
‘PREDICTED’ edges in the snapshot-based knowledge graph. The input is graph  consisting of
nodes  and edges  as well as the defined error margin  (line 1). As output, we generate
the graph G enriched with the added predicted likely occurring transformations    (line 2).
We define two variables for the edge types    and   to increase readability of the
pseudo code (line 4-5). The error margin  represents the uncertainty in the measured quantity
in the data and is set to a value between 0 and 1 (line 6) (e.g., 0.05 for 5 % or 0 to neglect error
margin). A quantity trend     above + 21  is considered as increasing while a trend below
− 1  is considered as decreasing. For each node  in graph  (line 7), we check the quantity
2
trend     of the current node  from the preceding snapshot to the current snapshot
along ‘SAME_AS’ edge   (line 8-9) and continue only if the quantity trend     is
considered as increasing (line 10) (see also step 1 in Figure 5). We then collect all nodes  ′ with
a ‘POTENTIAL’ edge    to the current node  (line 11-12). For each collected node  ′
(line 13), we check the quantity trend     from their current snapshot to the succeeding
snapshot along their ‘SAME_AS’ edge   (line 14-15) (see also step 2 in Figure 5). If the
quantity trend     is considered as decreasing (line 16), we add a ‘PREDICTED’ edge
    from the remaining collected nodes  ′ to the currently observed node  (line 17) (see
also step 3 in Figure 5). The process is repeated until all nodes  have been viewed (line 7). As a
result, we return the graph  with all added predicted likely occurring transformations (edge
    ) (line 18).</p>
      <p>We further assign each ‘PREDICTED’ edge a weight as edge property. The weight describes
the most influential ‘PREDICTED’ edges that lead to an increase in the quantity of a node. For
instance, if a node has two incoming ‘PREDICTED’ edges describing the increase in quantity,
then the higher weight describes the edge with the greater influence on the increase. The weight
is calculated during link prediction according to Formula 2:
The formula for calculating the weighting requires two factors. The first factor is the quantity
of the starting node at the ‘PREDICTED’ edge (i.e. the node with a decreasing quantity trend)
(  ). The second factor is the absolute value of the quantity trend at the ‘SAME_AS’ edge of
the same node (|     |). Note that the weight of the edge increases with a high quantity
trend and a high quantity. Each weight is further normalized based on the weights of all
incoming edges of an end node.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Evaluation</title>
      <p>We evaluate our graph model and algorithm based on two data sets from environmental
chemistry. One of the data sets has been previously published and discussed in more detail [37, 14].
A qualitative evaluation of our approach within the considered domain is dificult because the
considered system is complex and the processes that take place are largely unknown. Moreover,
there is currently no comparable approach to track the transformations that are likely to occur.
However, a test with a simplified systems, i.e. a chemical model with known reactions proved
the general suitability of the model [14]. The suggestions generated by our approach cannot
be easily verified by straight forward labeling. The predicted transformations rather serve the
planning of new experiments by providing a novel perspective on the complex system and
iflling gaps between previously unconnected snapshot experiments.</p>
      <sec id="sec-6-1">
        <title>5.1. Datasets</title>
        <p>So called dissolved organic matter (DOM) plays an important role in freshwater, e.g., it alters
available light for photosynthesis and must be removed during drinking water production
[38]. DOM is a complex system with widely unknown processes and chemical relationships.
Therefore, the investigation of DOM is a suitable use case to benefit from our approach. The
extreme chemical diversity and reaction pathways make it dificult to identify the actual
transformations. Ultra-high resolution mass spectrometry (UHRMS) can be used to qualitatively and
quantitatively measure the composition of DOM. The molecules and their quantities measured
over multiple time points form the data basis of the snapshot-based knowledge graph. Figure 1
showed an example of the structure of data generated by ultra-high resolution mass
spectrometry (UHRMS) on DOM over multiple time points. The first dataset considered was generated
from samples of a drinking water reservoir inflow. Within the samples, molecules were
determined over 13 time points using natural sunlight to induce photochemical transformations.
The second dataset originates from samples of a wastewater treatment plant outflow, in which
the molecules were determined over 8 time points. Here, photochemical transformations were
induced by an artificial light source.
5.2. Setup
We use Neo4j1 as the graph database management system together with its query language
Cypher to create the snapshot-based knowledge graph. Besides manipulating the graph structure
(insert, update, delete), users of the graph database can describe patterns to interactively select
matching parts in the graph. Both datasets were transformed into nodes and properties according
to the snapshot-based knowledge graph model and loaded into the graph database. Programming
and preprocessing was done in Python along with the neo4j library, an oficial Neo4j driver
that serves as an interface between the Neo4j graph database and Python. Both datasets are
configured the same and processed with the same hardware and an error margin of 5 %. The
open source project is publicly available on GitHub2.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.3. Results and Discussion</title>
        <p>A snapshot-based knowledge graph is created from each of the two datasets. The nodes in
the graphs describe the molecules identified at a particular discrete time in the mass spectrum.
Each node has several properties, e.g. the chemical formula, the atoms it consists of (i.e.,
C,H,O,N,S) and the quantity. Between each snapshot the molecules with the same chemical
formula are connected by the ‘SAME_AS’ edges and are enriched with the quantity trend.
The ‘POTENTIAL’ edges explain the potential chemical transformations between molecules of
successive snapshots and are based on 22 diferent photochemical transformations considered
[14]. The ‘PREDICTED’ edges describe the likely occurring chemical transformations calculated
with the ‘Transformation Prediction Algorithm’ (see Algorithm 1). Note that in the current
use-case of mass spectrometry data, the algorithm only provides relative trends of molecule
abundances, not actual amounts changed. Figure 6 shows parts of the first three snapshots of
the created graph based on dataset 1. Before using the ‘Transformation Prediction Algorithm’,
it was not possible to determine the ‘PREDICTED’ edges drawn in yellow. Those novel edges
enable new analyses and evaluations on the datasets.
1https://neo4j.com/
2https://github.com/PhPlam/tegrom</p>
        <p>Table 1 compares the nodes and edges of the generated snapshot-based knowledge graphs
from the two datasets. Dataset 1 has 13 snapshots while dataset 2 has 8 snapshots matching
the number of time points in the datasets. Dataset 2 has on average more 36 % more nodes
per snapshot than dataset 1 and also a 27 % higher maximum number of nodes in a snapshot.
In contrast the minimum number of nodes in a snapshot is 25 % lower in dataset 2 than in
dataset 1 reflecting a more pronounced efect of the artificial irradiation. Dataset 2 has on
average 36 % more ‘SAME_AS’ edges than dataset 1. The maximum (minimum) number of
‘SAME_AS’ edges between two snapshots is about 26 % higher (18 % lower) in dataset 2 than in
dataset 1. We observe the three possible quantity trends along ‘SAME_AS’ edges. Averaged
over all snapshots, dataset 1 (2) has 32 % (46 %) increasing, 21 % (14 %) consistent and 47 %
(40 %) decreasing quantity trends. The diferences in the fractions of increasing and decreasing
quantity trends matches the diferent maximum and minimum node values and is consistent
with the stronger photochemical transformations induced by the artificial light (dataset 2) as
compared to the natural light (dataset 1).</p>
        <p>Using the ‘Transformation Prediction Algorithm’, we are able to identify novel, previously
unknown ‘PREDICTED’ edges for both datasets (Table 1 bottom section). Dataset 2 has on
average about factor 2.6 more ‘PREDICTED’ edges than dataset 1. The maximum (minimum)
number of ‘PREDICTED’ edges between two snapshots is about 2.4 (1.4) times higher in dataset
2 than in dataset 1. Note, that we consider a closed system, i.e. no quantities enter from outside
or leave the system, making the predicted edges particularly meaningful. The model results
correspond to the expected behavior of molecules in the two datasets and will be analysed in
more detail below.</p>
        <p>Each of the previously unknown ‘PREDICTED’ edges reflects one of the 22 diferent
photochemical transformations considered, which are identified by the ‘Transformation Prediction
Algorithm’ as likely to have occurred. Figure 7 shows the average share of these
transformations in both datasets. For instance, the figure depicts that in the sample of dataset 1, the
transformations ”-C1 -O1”, ”-C1”, and ”H2 O1” occurred most frequently, while in the sample of
dataset 2, the transformations ”O2”, ”-C4 -H4 -O1”, and ”-C2 -H2” occurred most frequently.
The comparison of the two datasets reveals that the distribution of the transformations in
the two samples and experiments is diferent. For instance, the share of ”-S1” is much higher
(about 606 %) in dataset 2 as compared to dataset 1. This is expected since dataset 2 originates
from a wastewater treatment plant outflow, in which a higher amount of sulfur-(S)-containing
tensides is expected than in dataset 1, which originates from a pristine drinking water reservoir
inflow. A higher share of transformations involving sulfur, i.e. ”-O3 -S1”, ”-S1” and ”-O1 -S1”, is
thus expected reflecting pronounced tensides degradation over time. With this example the
‘Transformation Prediction Algorithm’ shows its basic ability to make reasonable predictions
for the likely occurring transformations within the sample.</p>
        <p>The snapshot-based knowledge graph allows instant evaluation of the previously unknown
chemical transformations and easy access to trends and time courses with interactive queries
in the graph database. The result of the database query in Figure 8 shows the evolution of a
molecule (purple nodes) over time. Blue arrows represent an increasing quantity, red arrows
represent a decreasing quantity, and gray arrows represent a constant quantity. The incoming
and outgoing weighted yellow arrows represent the predicted chemical transformations. The
example shows a way to create more interactive queries that help to capture the structure and
relationships within the data.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>We presented a novel snapshot-based knowledge graph model that allows to transform
multivariate time series data into a feature-rich complex and highly connected network. The nodes and
edges store not only static but also time-dependent properties. Starting with a set of potential
transformations as they are inherent in many domains such as complex chemical networks, our
temporal link prediction algorithm calculates new edges between nodes representing the likely
occurring predicted transformations in consecutive snapshots. We evaluated the snapshot-based
knowledge graph model and temporal link prediction algorithm in an environmental chemistry
use case and demonstrated its functionality to represent graph-like data into a complex network
and identify previously unknown relationships.</p>
      <p>The temporal graph model presented in this study is not limited to specific applications
and can be readily extended to other systems and processes that benefit from the enrichment
with temporal edges. To apply and comparatively evaluate the model and algorithms for other
domains in the future, we will develop a benchmark including gold standard datasets for link
prediction in diferent complex networks. Moreover, we will test diferent approaches such
as answer set programming, logical reasoning and graph neural networks to be compared
with the current approach. By modeling large temporal datasets from multivariate time series
as temporal graphs, further graph-based algorithms can be used to perform comprehensive
analyses. Future extensions of the approach could incorporate temporal clustering to analyse
temporal patterns and dynamics of recognized communities.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank Alexander Sobolev and Christin Wilske for conducting the photodegradation
experiments and collecting samples. We also thank Jan Kaesler for performing FT-ICR-MS
measurements and Peter Herzsprung for supporting data processing.
networks: A survey of measurements, Advances in Physics 56 (2007) 167–242. doi:10.
1080/00018730601170527.
[6] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complex networks: Structure
and dynamics, Physics Reports 424 (2006) 175–308. doi:10.1016/j.physrep.2005.10.
009.
[7] S. Sakr, A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. Aref, M. Arenas, M. Besta,
P. A. Boncz, K. Daudjee, E. D. Valle, S. Dumbrava, O. Hartig, B. Haslhofer, T. Hegeman,
J. Hidders, K. Hose, A. Iamnitchi, V. Kalavri, H. Kapp, W. Martens, M. T. Özsu, E. Peukert,
S. Plantikow, M. Ragab, M. R. Ripeanu, S. Salihoglu, C. Schulz, P. Selmer, J. F. Sequeda,
J. Shinavier, G. Szárnyas, R. Tommasini, A. Tumeo, A. Uta, A. L. Varbanescu, H.-Y. Wu,
N. Yakovets, D. Yan, E. Yoneki, The future is big graphs, Communications of the ACM 64
(2021) 62–71. doi:10.1145/3434642.
[8] X. Zou, A survey on application of knowledge graph, Journal of Physics: Conference</p>
      <p>Series 1487 (2020) 012016. doi:10.1088/1742-6596/1487/1/012016.
[9] P. Holme, Modern temporal network theory: a colloquium, The European Physical Journal</p>
      <p>B 88 (2015). doi:10.1140/epjb/e2015-60657-4.
[10] R. H. Shumway, D. S. Stofer, Time Series Analysis and Its Applications, Springer
International Publishing, 2017. doi:10.1007/978-3-319-52452-8.
[11] Y. Zou, R. V. Donner, N. Marwan, J. F. Donges, J. Kurths, Complex network approaches to
nonlinear time series analysis, Physics Reports 787 (2019) 1–97. doi:10.1016/j.physrep.
2018.10.005.
[12] M. Jiang, X. Gao, H. An, H. Li, B. Sun, Reconstructing complex network for characterizing
the time-varying causality evolution behavior of multivariate time series, Scientific Reports
7 (2017). doi:10.1038/s41598-017-10759-3.
[13] L. Lacasa, V. Nicosia, V. Latora, Network structure of multivariate time series, Scientific</p>
      <p>Reports 5 (2015). doi:10.1038/srep15508.
[14] P. Plamper, O. J. Lechtenfeld, P. Herzsprung, A. Groß, A temporal graph model to predict
chemical transformations in complex dissolved organic matter, Environmental Science &amp;
Technology (2023). doi:10.1021/acs.est.3c00351.
[15] I. Robinson, Graph databases new opportunities for connected data, 2015.
[16] M. A. Rodriguez, P. Neubauer, Constructions from dots and lines (2010). doi:10.48550/</p>
      <p>ARXIV.1006.2361.
[17] O. Hartig, Reconciliation of rdf* and property graphs, arXiv preprint arXiv:1409.3288
(2014). doi:10.48550/ARXIV.1409.3288.
[18] R. Angles, The property graph database model., in: AMW, 2018.
[19] M. Ciglan, A. Averbuch, L. Hluchy, Benchmarking traversal operations over graph
databases, in: 2012 IEEE 28th International Conference on Data Engineering Workshops,
IEEE, 2012. doi:10.1109/icdew.2012.47.
[20] R. kumar Kaliyar, Graph databases: A survey, in: International Conference on Computing,</p>
      <p>Communication &amp; Automation, IEEE, 2015. doi:10.1109/ccaa.2015.7148480.
[21] B. Cai, Y. Xiang, L. Gao, H. Zhang, Y. Li, J. Li, Temporal knowledge graph completion: A
survey, 2022. doi:10.48550/ARXIV.2201.08236.
[22] P. Holme, J. Saramäki, Temporal networks, Physics Reports 519 (2012) 97–125. doi:10.</p>
      <p>1016/j.physrep.2012.03.001.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Barabási</surname>
          </string-name>
          ,
          <article-title>Statistical mechanics of complex networks</article-title>
          ,
          <source>Reviews of Modern Physics</source>
          <volume>74</volume>
          (
          <year>2002</year>
          )
          <fpage>47</fpage>
          -
          <lpage>97</lpage>
          . doi:
          <volume>10</volume>
          .1103/revmodphys.74.47.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>D'amato</article-title>
          , G. D.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>J. E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmelzeisen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          . doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Obraczka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saeedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Köpcke</surname>
          </string-name>
          , E. Rahm,
          <article-title>Construction of knowledge graphs: State and challenges</article-title>
          ,
          <source>arXiv preprint arXiv:2302.11509</source>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2302. 11509.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          ,
          <article-title>Industry-scale knowledge graphs: Lessons and challenges</article-title>
          ,
          <source>Queue</source>
          <volume>17</volume>
          (
          <year>2019</year>
          )
          <fpage>48</fpage>
          -
          <lpage>75</lpage>
          . doi:
          <volume>10</volume>
          .1145/3329781.3332266.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. da F.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          , G. Travieso,
          <string-name>
            <given-names>P. R. V.</given-names>
            <surname>Boas</surname>
          </string-name>
          , Characterization of complex
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>