<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Schema Evolution and Reproducibility of Long-term Hydrographic Data Sets at the IOW</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tanja Auge</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erik Manthey</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susanne Jurgensmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susanne Feistel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Heuer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leibniz Institute for Baltic Sea Research Warnemunde</institution>
          ,
          <addr-line>Germany https://</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Rostock</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>National and international exploration of the Baltic Sea ecosystem can be traced back to the 19th century. In its quite long history, the Leibniz Institute for Baltic Sea Research Warnemunde (IOW) is the only research institution in Germany that has made interdisciplinary research of the Baltic Sea its central mission. The IOW hosts data from more than 130 years of research work. Using the example of hydrographic datasets that have been created over a period of about 50 years, this paper examines changes in the data and the associated schemes that have resulted from the continuous development and re nement of measurement methods over time. The paper focuses on the schema development operators: What kind of schema development has taken places over the years, and what are the important basic schema development operators that can be identi ed? It classi es well-known schema evolution operators which can be expressed as schema mappings, and de nes two new operators for merging and splitting attributes, up to now not considered in other research works. These operators have proven to be essential for development of a new universal schema for the central oceanographic database on the IOW { the IOWDB.</p>
      </abstract>
      <kwd-group>
        <kwd>Schema Evolution Baltic Sea Long-term Data Research Data Management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        National and international exploration of the Baltic Sea ecosystem can be traced
back to the 19th century. In its quite long history, the Leibniz Institute for Baltic
Sea Research Warnemunde (IOW) is the only research institution in Germany
that has made interdisciplinary research of the Baltic Sea its central mission.
The IOW hosts data from more than 130 years of research work. Due to their
origin they were stored in various formats and on diverse storage media.
The optimal preparation of such research data is part of the so-called FAIR
principles (Findable, Accessible, Interoperable, Reusable). These principles
dene "characteristics that contemporary data resources, tools, vocabularies and
infrastructures should exhibit to assist discovery and reuse by third-parties"
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. They should ensure sustainable research data management by processing
the data and their metadata.
      </p>
      <p>The IOW publishes many of its newer data online on IOWMeta3, realizing
the F and A of FAIR. We are therefore concentrating on interoperability (I) and
reusability (R) to ensure the reproducibility of the data evaluations. To be able to
determine exactly those research data that have an in uence on the evaluation
result, we have to automatically compute the evaluation queries, the scheme
and data evolution stages as well as the provenance queries in a homogeneous
framework (see Fig. 1). This is important not only for the reproducibility of
evaluation results, but also for the updating of evaluations over time.</p>
      <p>In this paper, we focus on the process of data collection with a so-called CTD
probe (see Section 3) in the period from the 1970s to the present. Within this
time, the scienti c requirements changed dramatically and were accompanied by
signi cant improvements in instrumentation, data acquisition and processing on
board of the research vessels and, last but not least, data storage on land.</p>
      <p>Within the scope of hydrographic data collection with a CTD probe, new
sensors were developed or existing ones improved. Correspondingly new
measurement parameters were de ned or replaced. The measurements could then be
carried out faster and more accurately. Therefore, not only the physical storage
media used at the IOW developed enormously during this time, from simple
paper records, punch cards and magnetic tapes to modern databases (see Fig.
2), but also the underlying concepts for data processing and structuring had to
be reconsidered continuously.</p>
      <p>
        However, this change in the data formats and structures leads to problems
nowadays when newer evaluations of the data have to be performed on old data
sets or long-term data ranging over some decades [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Thus, one goal is the
reproducibility of old results with the help of a new evaluation technique.
      </p>
      <p>An evaluation of the data over the entire period of time and the tracing of
individual tuples is only possible with considerable e ort. In database terms the
development of the new structures can result in the creation of new attributes
or tables as well as a reassignment of the a ected tuples. It is possible that</p>
    </sec>
    <sec id="sec-2">
      <title>3 https://iowmeta.io-warnemuende.de</title>
      <p>1000k
100k
10k
1k
1990</p>
      <p>Ingres
2000
the variety of measured variables changes over time. Thus, past and current
schemas di er signi cantly in this respect as well. Columns are renamed, created
or deleted. Merging or splitting of attributes can be observed.</p>
      <p>
        To be able to support reproducibility over time, we have to combine (1) the
data evaluations (represented as database queries including some linear algebra
functions), (2) the schema evolution steps, and (3) the process of inverting these
steps by means of data provenance techniques (why- and how-provenance).
Since we already developed formal techniques based on schema mappings (such
as s-t tgds, i.e. source-to-target tuple-generating dependencies) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for steps (1)
and (3), we have to integrate schema evolution steps by the use of schema
mappings, as well [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This can be done by using formally de ned schema modi cation
operators such as the ones introduced by the PRISM++ project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The combination of data provenance with schema and data evolution enables
us to evaluate provenance queries on evolving data and schemas. Through
inverse evolution steps, the new database can be transferred to the old schema.
Formally, we use the CHASE algorithm, a universal tool of database theory. Our
application of the CHASE for schema (and corresponding instance) mappings is
based on the s-t tgds mentioned above. It is shown in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], that the Schema
Modi cation Operators (SMOs) and the Integrity Constraint Modi cation Operators
(ICMOs) can be transformed to such kind of schema mappings.
      </p>
      <p>In Section 2 we rst introduce the original SMOs and ICMOs of PRISM++.
In this paper, we focus on the analysis of the schema evolution process at the
IOW, that is brie y presented in Section 3. We identify the most important
SMOs and ICMOs which are necessary to describe the IOW schema evolution,
and de ne two new operators MERGE Column and SPLIT Column (see Section 4).</p>
      <sec id="sec-2-1">
        <title>2 State of the Art</title>
        <p>Our work is based on three di erent schema de nitions: Schema Mapping, Schema
Evolution and Schema Operator. We understand Schema Mapping as a
concrete evaluation query, Schema Evolution as the overall process of evolution and
Schema Modi cation as the individual step itself.
Schema Mapping: Schemas and Schema Mappings are two fundamental
components of heterogeneous data management. While schemas specify the structure
of the various databases, schema mappings describe the relationships between
them. Schema mappings can be used in particular to transform data between
two di erent schemas.</p>
        <p>
          Schema Evolution: Schema evolution describes the schema changes of a
database over time. It therefore give a description to the overall process. This includes
changes in the relational schemas themselves as well as changes in the key and
integrity conditions. In order to enable earlier states to be analyzed
retrospectively and queries on other schema versions to be made, starting from the current
database state, PRISM++ [
          <xref ref-type="bibr" rid="ref10 ref5 ref6">5, 6, 10</xref>
          ] introduced special operators which describe
the changes in schemas and integrity conditions in a compact way. For de ning
these operators, the authors examined several schemas from practice
concerning their modi cations [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The most frequently occurring operators formed the
core of these Modi cation Operators which are divided into Schema Modi cation
Operators (SMOs) and Integrity Constraint Modi cation Operators (ICMOs).
While SMOs can describe the direct changes of relations, such as the creation or
deletion of individual attributes or entire relations, ICMOs are used whenever
key or integrity conditions in relations change. Thus, these conditions can be
generally tightened or weakened by adding or discarding restrictions.
Schema Modi cation Operators (SMOs): The so called Schema Modi
cation Operators are basic operators used to describe changing database schemas.
Thirteen operators are de ned in [
          <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
          ], summarized in Table 1. Each
operator captures an atomic change and by combining them, it is possible to express
complex evolutions. The most common ones are CREATE Table, DROP Table,
ADD Column, DROP Column and RENAME Column [
          <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
          ]. These also correspond
to the operators relevant to the IOW, gray highlighted.
        </p>
        <p>
          Integrity Constraints Modi cation Operators (ICMOs): In addition to
the SMOs, there is another class of Schema Evolution Operators. These
Integrity Constraints Modi cation Operators (ICMOs), rst introduced in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
describe changes in integrity conditions. More precisely, the six ICMOs handle the
three restrictions PRIMARY KEY, FOREIGN KEY, and VALUE CONSTRAINT.
Attention must be paid to the Enforcement Policy &lt;policy&gt;, which is necessary for
the rst three ICMOs. Here we can choose between
(1) CHECK: The system checks whether the current table satis es the new primary
key. If not, no primary key is added.
(2) ENFORCE: The primary key is added in any case. Tuples that violate the new
key are deleted.
        </p>
        <p>However, before we begin to investigate the schema evolution, we will brie y
introduce the IOW in general as well as the analyzed data sets of the IOW.
3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Research at the IOW</title>
        <p>National and international exploration of the Baltic Sea ecosystem can be traced
back to the 19th century. In its quite long history, the Leibniz Institute for Baltic
Sea Research Warnemunde is the only research institution in Germany that has
made interdisciplinary research of the Baltic Sea its central mission. The IOW
hosts data from more than 130 years of research work. Due to their origin they
were stored in various formats and on diverse storage media.</p>
        <p>Leibniz Institute for Baltic Sea Research Warnemunde (IOW): The
story of the IOW goes back to the German Democratic Republic (GDR) of the
1950s. At that time, the Seehydrographic Service of the GDR was founded in
order to be able to operate independently of the west-German equivalent. It
included, among other things, a separate department for oceanography, which
was known as the Institute of Oceanography Warnemunde (IfM-W) in 1960.
The institute achieved international recognition through independent research
cruises and was able to establish itself as a permanent xture in worldwide
marine research.</p>
        <p>The systematic exploration of the Baltic Sea nally began in 1964 with the
International Synoptic Recording of the Baltic Sea. In the following decades,
these regular scheduled cruises, i.e. cruises at xed times and places, became an
important part of Baltic Sea research within the IfM-W.</p>
        <p>After the German reuni cation in 1990, research and science in the whole of
Germany had to be newly regulated and ordered, which led to the foundation
of the Institute for Baltic Sea Research Warnemunde 4 in 1992.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4 https://www.io-warnemuende.de</title>
      <p>IOWDB: The Oceanographic Database of IOW (IOWDB) had originally been
designed for particular internal requirements of the IOW. The IOWDB has
always been aimed at the management of historical and recent oceanographic
measurements.</p>
      <p>Research cruises have been conducted since 1949, and their data
systematically collected. The post-processing and analysis of the resulting observational
data was carried out by varying methods and with changing quality. A
substantial fraction of legacy data was successively transferred to modern storage media,
their quality was controlled and, if possible and necessary, improved by detailed
individual scienti c review.</p>
      <p>The content of the database includes oceanographic readings and metadata
(mainly Baltic Sea) from 1877 to 2020 obtained during 951 research campaigns
of the IOW and cooperating institutions. As of June 2020, the IOWDB contains
more than 78 million measured samples representing georeferenced point data
from the water column, primarily from CTD pro les, hydrochemical and
biological sampling, current-meter time series, trace metal sampling and long-term
monitoring. Phyto- and zooplankton data are available for 1988 to 2018.
CTD measurement: One third of the stored data in the IOWDB are
obtained with a so-called CTD probe. Primary parameters of this instrument are
Conductivity (electrical conductivity, which is used to determine salinity), water
Temperature and Depth, which is determined by the prevailing pressure.</p>
      <p>Optional sensors, e.g. for oxygen, uorescence and other parameters may
be added. The CTD probe is surrounded by several water samplers, allowing
additional water samples to be taken. These are then used for further analyses.</p>
      <p>The examined CTD data originates from regularly conducted monitoring
cruises. On these cruises xed locations at the Baltic Sea are travelled, so the
changes of CTD values can be recorded. After rst recording the primary data
from the cruise, the resulting records are validated. In this case validation means
that the values from missing depths are interpolated to get an approximated
value. Within this paper's research only these validated data was examined.
Next we will take a closer look at the schema evolution at the IOW. We focus
here on the evolution of the CTD data.
4</p>
      <sec id="sec-3-1">
        <title>Schema Evolution at IOW</title>
        <p>We describe the schema and the corresponding changes over a period from 1977
to the present day solely for the data resulting from CTD measurements. It
should be noted that adding measurement data of a new type would of course
result in a new schema.
4.1</p>
        <sec id="sec-3-1-1">
          <title>Schema Changes over the Years</title>
          <p>Although data from CTD measurements exist from the early 1950s on, the data
les from the years before 1977 are very di cult to interpret. Essentially, the
les contain a large amount of measured values, but there are hardly any
descriptions or parameter names and thus insu cient schematic information. Without
historical background knowledge of the underlying process of data collection,
only vague assumptions can be made to identify the underlying scheme. For this
reason, we start with a more detailed investigation in 1977.</p>
          <p>
            Because no structural changes occurred during each period, the schema
changes over the years can be divided into three blocks: one period between 1977 and
1996, another between 1997 and 2016 and the years after 2017 [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
1977-1996: The schema is clearly de ned and divided into ve relations
describing the metadata of the Cruises, of the measurement Stations, the
Series, as well as the actual recorded CTD Values and the corresponding
Parameter descriptions. The ve relations are connected by three relationships
(see Fig. 3): In case of CTD measurements there is a trivalent relationship ( &amp;-)
between the cruise, station and series relations. The relation with the CTD
values is dependent ( ) on the relation containing information about the
measurement series. And the relation with the parameter descriptions is connected
to the relation with the CTD values by a "belongs to" relationship ($). The
set of attributes corresponding to the relations and relationships is quite limited
compared to the schemas of the following years. For example, in 1977 the schema
contained only 34 attributes, whereas in 1997 it already contained 61 attributes,
and the trend is rising.
1997-2016: However, the 1997 scheme brings some major changes. The
previous entities remain unchanged, but we can record a lot of new attributes, a
new relation (highlighted in Fig. 3) and associated changes in integrity
conditions. Thus, in the Series and Cruises relations, the attributes to be stored
increase from 15 to 26 and from 5 to 17 attributes, respectively. The new
relation Port lay time records the laytime of a research vessel in a port and is
therefore dependent on the Cruise relation. Stations, Values and
Parameters Description remain unchanged. Furthermore, some attributes are split
while others are merged. All in all these are some minor and major changes. We
will discuss this in more detail later on.
          </p>
          <p>After 2017: The previous structure of the schemas from 1997 to 2016 remains
unchanged, only seven attributes are added to the relations. Overall, there are
no major changes and all schemas described previously can be mapped to it.
The 2017 schema is therefore used as the universal schema, shown in Fig. 3.
4.2</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Speci cation of Schema and Data Changes</title>
          <p>Operations that are repeatedly performed when converting schemas before 2017
include, in particular, the addition, merging and splitting of attributes (see Table
3 (a)). Since the schemas gain more information over the years, this is not
surprising. This continuous process of expansion is only supplemented a few times
by renaming individual attributes, creating the new table Port lay time and
adapting related integrity constraints in 1997. However, we will investigate this
further.</p>
          <p>
            Most evolution steps can be described easily using the SMOs and ICMOs
de ned for PRISM++ [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. Only merging and splitting attributes requires a new
operator can be understood as a sequence of the basic operators ADD Column
and DROP Column. They are de ned in Section 4.3. Because of their special
importance at the IOW they shall be de ned here as separate operators. But rst
of all let's have a look on the obvious schematic changes.
          </p>
          <p>
            Adding and deleting attributes: These types of schema changes can be
identi ed by the SMOs ADD Column or DROP Column. Adding attributes is one of
the easiest described schema changes and occurs comparatively often but least
frequently. There are almost fty new attributes in total. This observation is also
consistent with the results of other studies such as [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Here 38.7% of all schema
changes were due to the addition and 26.4% to the deleting of attributes.
          </p>
          <p>Looking at the relation Series, we observe the almost complete substitution
of all associated attributes. However, contrary to our initial assumption, these
attributes are not deleted irreversibly. They are rather merged into new tuples
or split into new attributes. This results in the fact that we have to develop the
two new operators MERGE Column and SPLIT Column.
Example 1. The SMO ADD Column extends Cruises by the possibility of a
comment, formally :</p>
          <p>ADD COLUMN Comment INTO Series.</p>
          <p>Creating relations: This form of schema change occurs only once in the data
sets examined. Since 1997, the relation Port lay time is part of the schema.
It should be noted that in addition to the SMO CREATE Table itself, additional
integrity conditions must be speci ed. On the one hand, it is necessary to create
a primary key. On the other hand, due to the dependence on Port lay time
and Cruises, a suitable foreign key must be speci ed.</p>
          <p>Adding and deleting primary keys: Creating relation Port lay time
implies the requirement of a specialized attribute, the so-called primary key, which
clearly identi es the tuples of the new relation Residence. Furthermore, when
mapping the schema from 1977 to the universal schema, the primary key of
Cruises is changed. This is equivalent to deleting the old primary key and
adding a new one.</p>
          <p>The primary key can be modi ed using the ICMOs ADD PRIMARY KEY or DROP
PRIMARY KEY. When creating a primary key, the enforcement policy must be
observed as well. This checks whether individual tuples violate the new integrity
condition or not. If so, the ICMO will not be applied (CHECK policy), if not, all
tuples violating the newly introduced constraint are removed (ENFORCE policy).
Accordingly, the primary key should always be created directly after or during
the creation of the new table.</p>
          <p>Adding foreign keys: Since the new relation Port lay time is dependent on
Cruises, a foreign key must be speci ed additionally to the primary key. This
is derived as usual from the primary key of the higher-level relation.
Example 2. Let us take a closer look on the relation Port lay time. Introduced
in 1997, it is dependent on Cruises. Besides the creation of the relation itself
using the SMO</p>
          <p>CREATE Table Port lay time</p>
          <p>(ArchiveNo, Reason for stay, Date, Harbour, Duration),
a primary key consisting of the attributes ArchiveNo, Date, Harbour and
Duration must be de ned and the foreign key condition of ArchiveNo be speci ed.
Adding key and integrity constraint can be realized using the ICMOs
ALTER Table Port lay time
ADD PRIMARY KEY pkLZ</p>
          <p>(ArchiveNo, Date, Harbour, Duration)
and</p>
          <p>CHECK
ALTER Table Port lay time
ADD FOREIGN KEY fk (ArchiveNo) REFERENCES Stations(ArchiveNo)
CHECK.
Renaming an attribute: Also the SMO RENAME Column exists only a few
times. Even if this looks rather harmless on rst sight, it must be ensured that
the old evaluations can no longer be easily reproduced on the new schema. It
occurs for example in the evolution of the relation Series.</p>
          <p>Example 3. When mapping the schema from 1997 to the universal schema, the
Attribute ws is renamed to ws-ID. This can be described by:</p>
          <p>RENAME COLUMN ws IN Series TO ws-ID.</p>
          <p>Merging and splitting attributes: In the course of the years and decades,
attributes are repeatedly summarized at the IOW. According to PRISM++
there is no independent Schema Modi cation Operator for these changes. So let
us next de ne these two operators MERGE Column and SPLIT Column.
Example 4. The aim of the two new operators is a representation of the form
MERGE Column BeginDate AS func(StartYear,StartMonth,StartDay)</p>
          <p>INTO Series.
and</p>
          <p>SPLIT Column Date IN Series TO</p>
          <p>StartDate USING func1(Date), EndDate USING func2(Date).
4.3</p>
          <p>
            MERGE Column and SPLIT Column
Besides the schema changes, which can already be described by the operators
de ned in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] and [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], there are at least two more, which can be expressed by
composites of these and adding one or more functions. These additional changes
consist in the merging and splitting of columns as seen in Table 4. At the IOW
they occur, among other things, when time and date columns are changed.
MERGE Column: As seen in Table 4, merging two columns into a new column
is realized by executing ADD Column and DROP Column one after the other. For
creating the new attribute values a function func is used, which combines the
old attributes. Accordingly, merging consists of creating a new column and then
deleting the old ones.
          </p>
          <p>In the case of relation Series the three columns StartYear, StartMonth and
StartDay are merged into the new column BeginDate. sThe SMO MERGE Column
can be interpreted as</p>
          <p>ADD Column BeginDate AS func(StartYear,StartMonth,StartDay)</p>
          <p>INTO Series
DROP Column StartYear FROM Series;
DROP Column StartMonth FROM Series;</p>
          <p>DROP Column StartDay FROM Series.</p>
          <p>Assuming that the date type of the attribute BeginDate is a string of the format
DD.MM.YYYY, we can choose the function func as concatenation of the form
func := CONCAT(StartDay,'.',StartMonth,'.',StartYear).</p>
          <p>However, this choice is in no way binding.</p>
          <p>SPLIT Column: Splitting a column has a similar structure. Here too, new columns
are created and old ones deleted. One major di erence, however, is that each
new column requires its own function, implemented in SQL, for example.</p>
          <p>So, in the case of Cruises, the column Date is split into StartDate and
EndDate. The SMO SPLIT Column can be interpreted as</p>
          <p>ADD Column StartDate AS func1(Date) INTO Cruises;
ADD Column EndDate AS func2(Date) INTO Cruises;</p>
          <p>DROP Column Date FROM Cruises.</p>
          <p>Assuming the format DD.MM.YYYY we can divide the string into two
substrings of length 10 using the functions
func1 := SUBSTRING(Date,1,10),
func2 := SUBSTRING(Date,12,10).</p>
          <p>This static approach is only a simpli ed example because of the high error rate.
Date values may have been saved incorrectly (e.g. typing errors) or in a
different format (e.g. American date format). Functions that automatically split
the source string are of course better, but would go beyond the scope of this
example.</p>
          <p>In total, we could identify seven merge and split operations. These contain 10
times ADD Column and 15 times DELETE Column. Compared with the previous
studies we get the new distribution shown in Table 3 (b). The most common
operator remains ADD Column, but we no longer need the DROP Column operator,
at least not explicitly. Metadata is often given as strings or intervals. Since
intervals were repeatedly represented as a single attribute or as interval boundaries
(divided into two attributes), further changes are expected in the future.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>4.4 Implementation</title>
          <p>
            Even though the selection of our operators is based on the works [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] and [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], we
have not implemented them in the PRISM++ system. We have decided for an
implementation in a schema modi cation interface to MySQL which is already
used at the IOW. All operators from Section 4.2 are implemented as described
above. The auxiliary functions funci mentioned in Section 4.3 were implemented
with appropriate Update operations, so we did not have any problems with the
prototypical implementation here either.
In this paper we classi ed { using CTD data as an example { the schema
evolution operators relevant for research data management at the IOW. In particular,
the addition of attributes using ADD Column as well as the merging and splitting
of attributes were considered relevant. We have rede ned the required operators
MERGE Column and SPLIT Column based on the existing SMOs of [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
          </p>
          <p>
            To support reproducibility over time, we must combine (1) data analysis, (2)
schema development steps, and (3) the process of reversing these steps using data
prevention techniques. Since we have already developed formal techniques based
on schema mappings for steps (1) and (3), the schema evolution steps must
also be integrated by using schema mappings [
            <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
            ]. Now, with our extended
PRISM++ approach, we can apply the evolution operators expressed as s-t tgds
against the given research database using the CHASE algorithm. And the IOW
would thus be able to track the oxygen content of the Baltic Sea as well as other
interesting parameters over a period of decades.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auge</surname>
            ,
            <given-names>T:</given-names>
          </string-name>
          <article-title>Extended Provenance Management for Data Science Applications</article-title>
          .
          <source>PhD@VLDB, CEUR Workshop Proceedings</source>
          , CEUR-WS.org (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Auge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Heuer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Combining Provenance Management and Schema Evolution</article-title>
          .
          <source>IPAW</source>
          ,
          <volume>222</volume>
          {
          <fpage>225</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bock</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Feistel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <article-title>Jurgensmann, S.: Data Management at IOW</article-title>
          .
          <string-name>
            <surname>Poster</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bruder</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Klettke,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; Moller, M. L.; Meyer, F.;
          <article-title>Jurgensmann, S.;</article-title>
          <string-name>
            <surname>Feistel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Daten wie Sand am Meer - Datenerhebung</source>
          , -strukturierung,
          <article-title>-management und Data Provenance fur die Ostseeforschung</article-title>
          .
          <source>Datenbank-Spektrum</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <volume>183</volume>
          {
          <fpage>196</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Curino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; Moon,
          <string-name>
            <given-names>H. J.</given-names>
            ;
            <surname>Deutsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++</article-title>
          .
          <source>PVLDB 2</source>
          (
          <issue>4</issue>
          ),
          <volume>117</volume>
          {
          <fpage>128</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Curino</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          ; Moon,
          <string-name>
            <surname>H. J.</surname>
          </string-name>
          ; Zaniolo,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Graceful Database Schema Evolution: the PRISM Workbench</article-title>
          .
          <source>PVLDB</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>761</volume>
          {
          <fpage>772</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Curino</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tanca</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Moon</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          ; Zaniolo,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Schema Evolution in Wikipedia: Toward a Web Information System Benchmark</article-title>
          .
          <source>ICEIS(1)</source>
          ,
          <volume>323</volume>
          {
          <fpage>332</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fagin</surname>
            , R.; Kolaitis,
            <given-names>P.G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Schema Mapping Evolution Through Composition and Inversion</article-title>
          .
          <source>In: Schema Matching and Mapping</source>
          , Springer,
          <volume>191</volume>
          {
          <fpage>222</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Manthey</surname>
          </string-name>
          , E.:
          <article-title>Beschreibung der Veranderungen von Schemata und Daten am IOW mit Schema-Evolutions-Operatoren</article-title>
          .
          <source>Bachelor Thesis</source>
          , University of Rostock (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Moon</surname>
            , H. J.; Curino,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Deutsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hou</surname>
          </string-name>
          , C.-Y.;
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Managing and Querying Transaction-time Databases under Schema Evolution</article-title>
          .
          <source>PVLDB</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>882</fpage>
          -
          <lpage>895</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>An Empirical Analysis of the Co-evolution of Schema and Code in Database Applications</article-title>
          . ESEC/SIGSOFT FSE, ACM,
          <volume>125</volume>
          {
          <fpage>135</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aalbersberg</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          et al.:
          <article-title>The FAIR Guiding Principles for scienti c data management and stewardship</article-title>
          .
          <source>Sci Data</source>
          <volume>3</volume>
          ,
          <issue>160018</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Neamtiu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Schema evolution analysis for embedded databases</article-title>
          .
          <source>ICDE Workshops</source>
          , IEEE Computer Society,
          <volume>151</volume>
          {
          <fpage>156</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>