<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Framework for Managing Evolving Information Resources on the Data Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marios Meimaris</string-name>
          <email>m.meimaris@imis.athena-innovation.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Papastefanatos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christos Pateritsas</string-name>
          <email>pater@imis.athena-innovation.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theodora Galani</string-name>
          <email>theodora@imis.athena-innovation.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Stavrakas</string-name>
          <email>yannis@imis.athena-innovation.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for the Management of Information Systems, Research Center “Athena”</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The web of data has brought forth the need to preserve evolving information within linked datasets; however, a basic requirement of data preservation is the maintenance of the datasets' structural aspects as well. In this paper, we present a linked data approach for the preservation and archiving of open heterogeneous datasets that evolve through time, at both the structural and the semantic layer, taking into consideration the requirements for modelling evolving linked datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Evolution</kwd>
        <kwd>Change Management</kwd>
        <kwd>Linked Data Dynamics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Data Web has brought forth a need to treat the web as a dynamic accumulation
of facts created within collaborative environments that can be processed and
combined in order to extract new knowledge. The benefits of evolution management in
this context are placed into two categories, (i) quality control and maintenance and (ii)
data exploitation. Evolution management addresses the following challenges: dataset
synchronization, link maintenance, schema and entity evolution and versioning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Data-aware practices make persistence, accessibility and usability value adding
attributes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this paper, we advocate the need for addressing the problem in
multiple dimensions through a framework that combines versioning, provenance, change
detection and quality control. As a basis, a conceptual model that supports the
representation of constructs relevant to the aforementioned dimensions and treats simple as
well as complex changes as first-class citizens is presented herein.
      </p>
      <p>Section 2 includes relevant work, section 3 discusses the requirements for evolving
LOD datasets, section 4 presents the evolution space model and its components, and
section 5 provides a conclusion and discusses future directions.
* Work supported by the EU project DIACHRON
 Work supported by the EU/Greece funded KRIPIS: MEDA Project</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the authors extend HTTP with a temporal dimension for accessing past
versions of web documents and LD resources. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], they provide extended
versioning functionality to the web server. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the authors address multi-versioning for
XML documents by using deltas between sequential versions. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposes a method
for archiving scientific data from XML documents, by targeting individual elements
in the tree and pushing down time to the children nodes in order to assert changes, an
approach also followed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] differentiates between the document-centric and
entity-centric perspectives of LOD change dynamics, a distinction we partially adopt,
as will be described further on. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] computes the semantic and structural differences
between versions of a RDFS graph. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] deals with dataset dynamics in distributed
LD, and identifies several levels for the requirements of change dynamics: (1)
vocabularies for describing dynamics and representing changes, (2) protocols for change
propagation and (3) algorithms for change detection. They implement a change
detection framework which incorporates these points in a unified functionality scheme.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Requirements for evolving information resources</title>
      <p>Most of the challenges in LD dynamics stem from the decentralized nature of
publishing and curating interdependent datasets across disparate sites. In contrast with
traditional settings where evolution is performed within a controlled and monitored
environment, the Data Web poses new requirements for dataset evolution dynamics:</p>
      <sec id="sec-3-1">
        <title>Persistent identification and reference. An Identifier mechanism is needed that</title>
        <p>will reify the id information, e.g., primary keys must be converted to persistent citable
URIs. Representations must capture both temporal and time-agnostic characteristics.
Thus, the identifiers must be able to abstract from time.</p>
        <p>
          Simple and Complex Changes. Changes can be asserted in a multitude of levels,
depending on the semantic richness. In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] there is a hierarchical differentiation of
changes that considers additions and deletions as the building blocks for higher-level
changes. A formal hierarchical representation model is required and it must be
possible to define complex changes on higher semantic structures [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Temporal and provenance annotations. Provenance management enables trust,
interoperability and licensing, by capturing dataset lineage. It can be modelled in
many granularities, from datasets to individual facts. We consider time to be part of
provenance, making temporal provenance a direct enabler of dataset versioning and
evolution. We adopt the partitioning of time into transaction-time and valid-time.</p>
        <p>Common abstraction model. LOD use heterogeneous data models, including
standard and/or ad hoc or proprietary formats. Diachronic preservation should exhibit
format-independence, data traceability and reproducibility and an overall common
denomination for data that originate from different models.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Support for low-level and high-level preservation. The model must be able to</title>
        <p>capture the evolution of both the structural aspects of datasets and the evolution of
information entities across time.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Multi-versioning and longitudinal querying. The framework must answer sever</title>
        <p>al types of queries, within a version or across sets of versions. It should support
dataset listing, complete/partial retrieval, longitudinal queries and change queries.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Modelling Evolution: the 2x2 Model Space</title>
      <p>The model space is comprised of the following two dimensions: time awareness
and semantic awareness. At the core lies the notion of the evolving entity as an
abstraction of all entities. Evolving entities are identifiable and citable objects. The main
entity types are:
(i) Datasets: conceptual entities that represent a particular dataset from a
time-agnostic point of view (diachronic datasets) and a time-aware point
of view (dataset instantiations).
(ii) Schema objects: these represent the schema-related entities of the
archived datasets given the dataset’s source model.
(iii) Data Objects: these consist of records and record attributes. A record
represents a data entry about a particular evolving entity. Records are
uniquely identified in order to allow record-level annotations.
(iv) Diachronic Resources: these are concepts defined through a declarative
way over a dataset. They provide a curation mechanism to define contexts
of evolution and relate high level changes to them.
(v) Record, Schema and Resource Sets: These are collections of their
corresponding entities that exist within a particular dataset instantiation. This
way they become pluggable and interchangeable across versions.
(vi) Change: these come in Change Sets between two instantiations of the
same dataset. When applied to collections of entities, change sets are
specialized to record set, schema set and resource set changes.</p>
      <p>Datasets are given diachronic identifiers, and linked to their temporal versions.
Low-level and high-level changes are computed between versions. We propose a set
of rules to map such models to the 2x2 space model. For relational models, we extend
R2RML1, where relations are mapped to classes and columns to properties, as schema
objects. Tuples create records and fields create record attributes. Multidimensional
models are modeled as profiles of the Data Cube Vocabulary. Measures, dimensions
and attributes are mapped to properties and observations create records and record
attributes. In ontologies, classes and properties are typed as schema objects while
triples create record attributes. Groups of triples with the same subject create records.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we have presented our position towards evolution management on
the Data Web, as well as the challenges and requirements for preservation and
evolution management of heterogeneous web datasets. We have proposed a model for
evolution, the components of which can reside into a 2X2 space where objects are
separated by their temporal dependence and their curator-imposed evolution semantics
and showed how to map three common models to this.
6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>James</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <article-title>"Big data: The next frontier for innovation, competition, and productivity</article-title>
          .
          <source>"</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Stavrakas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <article-title>"Diachronic Linked Data: Towards Long-Term Preservation of Structured Interrelated Information."</article-title>
          <source>arXiv preprint arXiv:1205.2292</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Van de Sompel, H. et al.
          <article-title>"An HTTP-based versioning mechanism for linked data</article-title>
          .
          <source>" arXiv preprint arXiv:1003.3661</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Van de Sompel, H. et al.
          <article-title>"Memento: Time travel for the web</article-title>
          .
          <source>" arXiv preprint arXiv:0911.1112</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dyreson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          et al.
          <article-title>"Managing versions of web documents in a transaction-time web server."</article-title>
          <source>In Proceedings of WWW2004</source>
          , pp.
          <fpage>422</fpage>
          -
          <lpage>432</lpage>
          . ACM (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Raymond</surname>
            <given-names>W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>"Managing and querying multi-version XML data with update logging."</article-title>
          <source>In Proceedings of the 2002 ACM symposium on Document engineering</source>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          . ACM (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          et al.
          <article-title>"Archiving scientific data." ACM Transactions on Database Systems (TODS) 29</article-title>
          , no.
          <issue>1</issue>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Papastefanatos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          et al.
          <article-title>"Capturing the history and change structure of evolving data."</article-title>
          <source>In Proceedings of DBKDA</source>
          <year>2013</year>
          , pp.
          <fpage>235</fpage>
          -
          <lpage>241</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <article-title>"Towards dataset dynamics: Change frequency of linked open data sources</article-title>
          .
          <source>"</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Völkel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Groza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          "
          <article-title>SemVersion: An RDF-based ontology versioning system."</article-title>
          <source>In Proceedings of the IADIS international conference WWW/Internet</source>
          , vol.
          <year>2006</year>
          , p.
          <volume>44</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Popitsch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Haslhofer</surname>
            ,
            <given-names>B. "</given-names>
          </string-name>
          <article-title>DSNotify: handling broken links in the web of data."</article-title>
          <source>In Proceedings of WWW2010</source>
          , pp.
          <fpage>761</fpage>
          -
          <lpage>770</lpage>
          . ACM (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <article-title>"Dataset dynamics compendium: A comparative study</article-title>
          .
          <source>"</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Papavasileiou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          et al.
          <article-title>High-level change detection in RDF(S) KBs</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>38</volume>
          (
          <issue>1</issue>
          ): 1 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>