=Paper= {{Paper |id=None |storemode=property |title=Publishing Provenance Information on the Web using the Memento Datetime Content Negotiation |pdfUrl=https://ceur-ws.org/Vol-813/ldow2011-paper02.pdf |volume=Vol-813 |dblpUrl=https://dblp.org/rec/conf/www/CoppensMDHJW11 }} ==Publishing Provenance Information on the Web using the Memento Datetime Content Negotiation== https://ceur-ws.org/Vol-813/ldow2011-paper02.pdf
  Publishing Provenance Information on the Web using the
           Memento Datetime Content Negotiation

                 Sam Coppens                            Erik Mannens                   Davy Van Deursen
            Ghent University - IBBT                Ghent University - IBBT            Ghent University - IBBT
                Multimedia Lab                         Multimedia Lab                     Multimedia Lab
          Gaston Crommenlaan 8 /201              Gaston Crommenlaan 8 /201          Gaston Crommenlaan 8 /201
                Ghent, Belgium                         Ghent, Belgium                     Ghent, Belgium
          sam.coppens@ugent.be                  erik.mannens@ugent.be              davy.vandeursen@ugent.be
           Patrick Hochstenbach                      Bart Janssens                     Rik Van de Walle
               Boekentoren - Ghent                Descartes Systems Group Ghent University - IBBT
                University Library                    Duwijckstraat 17        Multimedia Lab
                    Rozier 9                            lier, Belgium   Gaston Crommenlaan 8 /201
                 Ghent, Belgium                 bjanssens@descartes.com       Ghent, Belgium
         patrick.hochstenbach@ugent.be                                             rik.vandewalle@ugent.be

ABSTRACT                                                          At this moment, the analogue carriers are degrading and
In Belgium, we developed a digital long-term preservation         continuously losing quality, making the data inaccessible.
archive to preserve the information from our heritage insti-      While we are still able to see wall paintings from millennia
tutions. This platform harvests the information from the in-      ago, many documents from merely a decade or two decades
stitutions, preserves the information for the long term and       ago have become inaccessible, e.g., WordPerfect files. Some
disseminates the information as Linked Open Data. Our             refer to this situation as the Digital Dark Age[4]. Digital
platform produces many different versions of the harvested        long-term preservation forms the solution for this issue. A
data to keep the information accessible over time when, e.g.,     digital long-term archive has the necessary processes in place
mapping the metadata or transcoding the multimedia files,         to withstand many long-term preservation risks, e.g., bitrot,
but it also produces a lot of provenance information relating     file formats becoming obsolete, etc. These preservation pro-
all those different versions of a resource. For publishing this   cesses make sure the content remains intact and accessible
information as Linked Open Data, we extended our Linked           over time.
Open Data server with Memento datetime content negotia-              The project Archipel 1 initiates the dissemination and dig-
tion. Next to this, we extended the Memento framework to          ital long-term preservation of the cultural heritage in Flan-
also publish the provenance information of those datetime         ders, Belgium, and researches the problems encountered with
content negotiated versions using an HTTP provenance link         digital long-term preservation. In this project, we developed
header for automatic discovery of the provenance informa-         a platform that harvests data coming from various institu-
tion. This way, our framework allows to publish the infor-        tions (libraries, archival institutions, the art sector (muse-
mation of a resource as Linked Open Data, including all           ums), and the broadcasters), preserves the data for the long
its previous versions and their provenance information, in a      term and disseminates the data as Linked Open Data [1]
web-accessible manner.                                            (LOD) Dublin Core2 records.
                                                                     To guarantee the long-term preservation of the harvested
                                                                  content, our platform has the necessary processes in place
Categories and Subject Descriptors                                to keep the information intact and interpretable, in line
H.4 [Information Systems Applications]: General                   with the Open Archival Information System (OAIS) refer-
                                                                  ence model [5] for the long-term preservation of information.
General Terms                                                     These processes rely heavily on the provenance information
Design, Management, Standardization                               of the harvested data, but at the same time produce also a
                                                                  lot of provenance information. This provenance information
Keywords                                                          is modelled using a semantic implementation of the PREMIS
                                                                  2.0 data dictionary3 , i.e., PREMIS OWL4 .
Linked Open Data, Memento datetime content negotiation,              Our developed platform generates many different versions
Provenance                                                        of the harvested data, i.e., metadata and referenced multi-
                                                                  media files, via its preservation processes. These resources,
1.   INTRODUCTION                                                 their previous versions and their provenance information, re-
   Many organisations and private persons still possess a lot
                                                                  1
of material which is stored on analogue carriers. This mate-        http://www.archipelproject.be
                                                                  2
rial is mostly part of important cultural heritage anywhere.        http://dublincore.org/
                                                                  3
                                                                    http://www.loc.gov/standards/premis/
                                                                  4
Copyright is held by the author/owner(s).                           http://multimedialab.elis.ugent.be/users/
LDOW2011, March 29, 2011, Hyderabad, India.                       samcoppe/ontologies/Premis/index.html
lating the different versions, will be published on the Web as   technologies for digital preservation 8 . The OAIS Reference
LOD. When preserving information for the long term and           Model was chosen as the base platform, and the project was
publishing the information as LOD at the same time, differ-      focused on implementing the different steps in the preser-
ent problems arise. First of all, we need to have persistent     vation workflow. They focus more on preservation services
URIs for our resources, which will publish the information       than on describing the preservation information. BOM Vlaan-
of a certain version of the resource. Another problem in-        deren 9 , a national research project, was aimed at preser-
volves the enrichments that occur on the resources before        vation and disclosure of audio-visual content in Flanders.
publishing them as LOD. These enrichments will not always        Additionally, it looked at ways to unify different metadata
remain valid over time. We need a way for preserving the         standards currently used for describing audio-visual content.
temporality of these enrichments. The last problem being         Current trends are on integrating different media archives.
tackled in this paper is the publication of the provenance in-   PrestoPRIME researches and develops practical solutions
formation on the Web which will allow automatic discovery        for the long-term preservation of digital media objects, pro-
of the provenance information.                                   grammes and collections, and finds ways to increase access
   To solve these problems, our developed platform is ex-        by integrating the media archives with European on-line dig-
tended with the Memento5 [13] datetime content negotia-          ital libraries in a digital preservation framework 10 .
tion. This datetime content negotiation will allow to select        The previous discussed related work were focusing on the
the appropriate version, called memento in the Memento           digital long-term preservation, not on the more general prob-
framework, of the archived information and to publish it         lem of enabling their provenance information on the Web.
on a persistent URI. This datetime content negotiation will      For the work done is this area, the work of the W3C Prove-
also solve the problem of preserving the temporality of the      nance Incubator Group11 is the major reference. This in-
enrichments of the archived information. The different ver-      cubator group produced working definitions for provenance
sions of the archived information are linked to each other via   information, provided a state-of-the-art understanding and
their provenance information. To publish the provenance          developed a roadmap for development and possible stan-
information of each version on the Web, we extended the          dardisation of provenance on the Web. This work included
Memento framework to offer provenance links using a spe-         defining key dimensions for provenance, collecting use cases,
cial Hypertext Transfer Protocol (HTTP)[8] link header for       designing three flagship scenarios from the use cases, cre-
automatic discovery of the provenance information.               ating mappings between existing provenance vocabularies,
   In this paper, we present how our digital long-term preser-   looking how provenance could fit in the Web architecture
vation platform is able to publish the provenance informa-       and providing a state-of-the-art report on the current prove-
tion on the Web. First, Section 2 describes some related         nance activities. Their work is summarised in a final report
work on this topic. Then, in Section 3, we introduce our         [6]. The first flagship scenario describes a news aggrega-
semantic layered metadata model, which allows the archive        tor site that assembles news items from a variety of data
to deal with the diversity of metadata records coming from       sources, e.g., news sites, blogs and tweets. The provenance
diverse institutions and to track the provenance of the har-     records of these data providers can help with verification,
vested data. Section 4 describes the distributed architecture    credit and licensing. This flagship scenario could be covered
of the archive and its processes. Section 5 explains the pub-    by publishing the provenance information using our frame-
lication of the content and its provenance information using     work. What still forms a problem is the lack of a standard-
the Memento framework, extended to provide provenance            ised metadata model for publishing provenance on the Web.
information. We end with a conclusion in Section 6.              In our framework, we publish the provenance information as
                                                                 Linked Open Data using PREMIS OWL. This information
                                                                 is only interoperable in the long-term preservation context,
2.   RELATED WORK                                                where PREMIS is well known, not in a Web context. This
   Interest in digital preservation can be seen by the mul-      standardised provenance model for the Web is still a major
titude of projects in this area. Planets (Preservation and       research area. The work of the W3C Provenance Incubator
Long-term Access through Networked Services)6 was espe-          Group was a first step into that direction.
cially aimed at defining guidelines for preservation planning.      Another interesting work done in the area of publishing
However, it did not tackle the integration of different exist-   provenance for linked data is the paper of Olaf Hartig and
ing metadata formats, or the dissemination of the metadata       Jun Zhao published at IPAW [7]. In that paper they de-
as LOD. Likewise, the Prestospace (Preservation towards          scribe the Provenance Vocabulary12 used for describing the
storage and access) project’s objective was to provide tech-     provenance information as Linked Open Data. Next to this,
nical solutions and integrated systems for a complete digital    they also offer ways of publishing this provenance informa-
preservation of all kinds of audio-visual collections 7 . The    tion for Linked Data. They discuss how provenance can
project was especially focussed on the underlying technolo-      be added to Linked Data objects, how provenance can be
gies, e.g., automated generation of metadata or detection of     included into RDF dumps and how the provenance informa-
errors in content [11], but without using a standardised, se-    tion can be queried using SPARQL endpoints. This work
mantic preservation model to support the archiving, nor do       enables provenance for Linked Data, but it does not offer
they tackle the problem of publishing the generated prove-       solutions for automatic discovery of the provenance infor-
nance information to the Web.
                                                                 8
   The CASPAR project (Cultural Artistic and Scientific             http://www.casparpreserves.eu/
                                                                 9
knowledge for Preservation, Access, and Retrieval) presented        https://projects.ibbt.be/bom-vl
                                                                 10
                                                                    http://www.prestoprime.org/
5                                                                11
  http://www.mementoweb.org                                         http://www.w3.org/2005/Incubator/prov/wiki/W3C_
6
  http://www.planets-project.eu/                                  Provenance_Incubator_Group_Wiki
7                                                                12
  http://prestospace.org/project/index.nl.html                      http://purl.org/net/provenance/
Figure 1: Data Model of the Premis 2.0 Data Dic-
tionary.



mation or ways for publishing provenance on the Web be-
yond using semantic web technologies. Future work could
involve publishing the provenance information using this vo-
cabulary, which is more suited for publication on the Web
than PREMIS OWL, which is intended to be a data model
for digital long-term archives. The mapping table, relat-          Figure 2: Layered data model for the long-term
ing various provenance vocabularies, produced by the W3C           archive.
Incubator Group13 will be the reference for this work.

3.   LAYERED METADATA MODEL                                        PREMIS 2.0 Data Dictionary was especially designed for
   Descriptive metadata schemes describe the content of the        storing provenance information in the context of digital long-
harvested data: subject, author, date of creation, file for-       term preservation and is in line with the requirements of
mat, etc. This metadata makes it possible to manage and            OAIS. This PREMIS OWL schema is currently undergoing
search the complete digital archive. When archiving data           a standardisation process and will soon be published on a
coming from different sectors like the broadcast sector, the       more stable URL of the Library of Congress. The PREMIS
libraries, the cultural sector, and the archival sector, a prob-   2.0 Data Dictionary is described by a data model, which con-
lem arises concerning descriptive metadata. Many of the in-        sists of five semantic units or classes important for digital
stitutions already have descriptive metadata using domain-         preservation purposes:
specific metadata models. To deal with this diversity of              • Intellectual Entities: a part of the content that can be
metadata models, the descriptive metadata is mapped to                  considered as an intellectual unit for the management
Dublin Core RDF [12] and is archived along with the data                and the description of the content. This can be for
in their original metadata format, e.g., MARC , so there is             example a book, a photo, or a database.
no information loss. This gives the archive the necessary
tools to search the whole archive. When finding the data              • Object: a discrete unit of information in digital form,
of interest, the original metadata that is stored as data can           typically multimedia objects related to the intellectual
still be presented to the users.                                        entity.
   DC RDF was chosen as format for the descriptive meta-
data, as it is a broadly accepted descriptive schema. The             • Event: An action that has an impact on an object or
power of this schema is its simplicity and generality. It only          an agent.
consists of fifteen fields among which creator, subject, cov-         • Agent: a person, institution, or software application
erage, description, and date. It can answer to the basic                that is related to an event of an object or is associated
questions: Who, What, Where, and When. All the fields                   to the rights of an object.
in DC are optional and repeatable. This makes it possible
to map relatively easily almost all the descriptive metadata          • Rights: description of one or more rights, permissions
schemes to DC RDF as many institutions already support                  of an object or an agent.
DC. This choice will also benefit the publication of the di-
verse records coming from the institutions as LOD, as will           Intellectual entities, events, and rights are directly related
be discussed in Section 5.                                         to an object, whereas an agent can only be related to an
   To store the preservation metadata, we developed a se-          object through an event or through rights, as can be seen
mantic binding of the PREMIS 2.0 Data Dictionary. The              on Figure 1. This way, not only the changes to an object
                                                                   are stored, but the event involved in this change is also
13                                                                 described. These relationships offer the necessary tools to
 http://www.w3.org/2005/Incubator/prov/wiki/
Provenance_Vocabulary_Mappings                                     properly store the provenance of an archived object. The
rights metadata needed for preservation are covered by the                                                                      @prefix     rdf:                       < h t t p : / / www. w3 . org /1999/02/22 − r d f−syntax−ns#> .
                                                                                                                                @prefix     rdfs:                      < h t t p : / / www. w3 . org / 2 0 0 0 / 0 1 / r d f−schema#> .
rights entity, which relates to the agent entity and the ob-                                                                    @prefix     owl:                       < h t t p : / / www. w3 . org / 2 0 0 2 / 0 7 / owl#> .
                                                                                                                                @prefix     premisowl:                 < h t t p : / / m u l t i m e d i a l a b . e l i s . ugent . be / users / samcoppe /
ject entity. The binary metadata, technical metadata and                                                                                  o n t o l o g i e s / Premis / premis . owl#> .
structural metadata are encapsulated in the PREMIS data
                                                                                                                                                                    a                                  premisowl:Event ;
dictionary via the description of the object entity. Examples                                                                               premisowl:eventIdentifier                                           ;
                                                                                                                                            premisowl:eventType                                                " dissemination migration " ;
of an PREMIS OWL Object entity, Event entity, Rights en-                                                                                    premisowl:eventDateTime                                            "2010−08−06T00:00:00 .002 " ;
                                                                                                                                            premisowl:eventDetail                                              " ImageMagick " ;
tity and Agent entity are given in the resp. Listing 1, Listing                                                                             premisowl:eventOutcomeInformation                                  < event1OutcomInformation > ;
2, Listing 3, and Listing 4.                                                                                                                premisowl:linkingAgent                                              ;
                                                                                                                                            premisowl:linkingObject                                             ;
                                                                                                                                            premisowl:linkingObject                                             ;
                                                                                                                                                    .

@prefix     rdf:                       < h t t p : / / www. w3 . org /1999/02/22 − r d f−syntax−ns#> .                                                                                 a             premisowl:EventIdentifier ;
@prefix     rdfs:                      < h t t p : / / www. w3 . org / 2 0 0 0 / 0 1 / r d f−schema#> .                                premisowl:identifierType                                                " LocalDCMS " ;
@prefix     owl:                       < h t t p : / / www. w3 . org / 2 0 0 2 / 0 7 / owl#> .                                         premisowl:identifierValue                                               " E002 . 1 " ;
@prefix     premisowl:                 < h t t p : / / m u l t i m e d i a l a b . e l i s . ugent . be / users / samcoppe /           .
          o n t o l o g i e s / Premis / premis . owl#> .
                                                                                                                                             a
                       a                                              premisowl:File ;                                       premisowl:EventOutcomeInformation ;
        premisowl:preservationLevel                                             ;                            premisowl:eventOutcome                                                  " successful " ;
        premisowl:significantProperties                                        ;                         .
        premisowl:objectCharacteristics                                        ;
        premisowl:originalName                                                 " 0001h . t i f " ;
        premisowl:storage                                                       ;
        premisowl:environment                                                   ;                          Listing 2:                    PREMIS OWL Event instance in N3
        premisowl:linkingEvent                                                  ;
        premisowl:linkingRightsStatement                                       ;
                                                                                                                               notation.
        premisowl:linkingIntellectualEntity                                     .

      a                                              premisowl:PreservationLevel ;
        premisowl:preservationLevelValue                                       "0" ;
        premisowl:preservationLevelRole
        premisowl:preservationLevelDateAssigned
                                                                               " master copy " ;
                                                                               "2010−07−29T14:41:28 " .
                                                                                                                                 Employing a data model with the original metadata, the
                                                                                                                               mapped Dublin Core RDF descriptions and the PREMIS
 a                                               premisowl:SignificantProperties
       ;                                                                                                                       OWL metadata for storing the provenance leads to a lay-
         premisowl:significantPropertiesType                                   " behavior " ;
         premisowl:significantPropertiesValue                                  " hyperlinks traversable " .                    ered, semantic metadata model, which the archive uses for
 a                 premisowl:ObjectCharacteristics
                                                                                                                               management, dissemination and preservation purposes, as
       ;                                                                                                                       depicted in Figure 2.
         premisowl:compositionLevel              "0" ;
         premisowl:fixity                        ;
         premisowl:size                          " 20800896 " ;
         premisowl:format
         premisowl:creatingApplication
                                                  ;
                                                 ;                                                4.         ARCHITECTURE
         premisowl:objectCharacteristicsExtension<
               object1CharacteristicsExtension>.                                                                                 In this section, our architecture of the digital long-term
                 a                                              premisowl:Fixity ;
                                                                                                                               preservation archive is described. In this networked world,
        premisowl:messageDigestAlgorithm
        premisowl:messageDigest
                                                                               "MD5" ;
                                                                               " 36
                                                                                                                               various resources are linked to each other. We do not want to
               b03197ad066cd719906c55eb68ab8d " ;                                                                              build yet another central e-depot, but a distributed network
        premisowl:messageDigestOriginator                                      " LocalDCMS " .
                                                                                                                               of storage components. For this reason, the platform will
                  a
        premisowl:formatDesignation
                                                                               premisowl:Format ;
                                                                                ;
                                                                                                                               have a service oriented architecture14 (SOA). This SOA will
        premisowl:formatRegistry                                                .                       make use of a central service hub, which will offer the needed
      a                                              premisowl:FormatDesignation ;                   services for the platform. The objectives of our platform are
        premisowl:formatName                                                   " image / t i f f " ;
        premisowl:formatVersion                                                " 6.0 " .                                       twofold:
          a                                             premisowl:FormatRegistry ;
        premisowl:formatRegistryName                                           "PRONOM" ;                                           • Disseminate the content and provenance information
        premisowl:formatRegistryKey                                            " fmt /10 " ;
        premisowl:formatRegistryRole                                           " specification " .                                    as LOD.
    a                                             premisowl:CreatingApplication ;
        premisowl:creatingApplicationName                                      " Adobe Photoshop " ;                                • Enable long-term preservation.
        premisowl:creatingApplicationVersion                                   "CS2" ;
        premisowl:dateCreatedByApplication                                     "2006−09−20T08:29:02 " .

                a                                              premisowl:Storage ;
                                                                                                                                 Our architecture is depicted in Figure 3. The green arrow
        premisowl:contentLocation                                               ;                      indicates the dissemination path, the red arrow stipulates
        premisowl:storageMedium                                                " disk " .
                                                                                                                               the preservation path. The basic components of our archi-
         a                                             premisowl:ContentLocation ;
        premisowl:contentLocationType                                          " filepath " ;
                                                                                                                               tecture are:
        premisowl:contentLocationValue                                         " amserver " .

            a                                              premisowl:Environment ;                              • Repositories: these are the repositories of the diverse
        premisowl:environmentCharacteristic
        premisowl:environmentPurpose
                                                                               " recommended " ;
                                                                               " render " ;
                                                                                                                                      institutions, which have their content published on-
        premisowl:environmentPurpose                                           " edit " ;                                             line, using the OAI-PMH protocol [10], depicted in
        premisowl:software                                                      ;
        premisowl:hardware                                                      .                                   Figure 3 in box 1.
                                  a                          premisowl:Software ;
        premisowl:swName
        premisowl:swVersion
                                                                               " Adobe Acrobat " ;
                                                                               " 5.0 " ;
                                                                                                                                    • Shared Repositories: for those institutions, which do
        premisowl:swType                                                       " renderer " .                                         not have published their content on-line, our Archipel
              a                                              premisowl:Hardware ;                                   project foresees several shared repositories, using Omeka 15
        premisowl:hwName
        premisowl:hwType
                                                                               " I n t e l x86 " ;
                                                                               " processor " ;                                        or MediaMosa 16 , which will publish their content on-
        premisowl:hwOtherInformation                                           " 60 mhz minimum " .                                   line using the OAI-PMH protocol. This is shown in
                                                                                                                                      Figure 3 in box 2.
Listing 1: PREMIS OWL Object Instance in N3
                                                                                                                               14
Notation.                                                                                                                         http://opengroup.org/projects/soa/
                                                                                                                               15
                                                                                                                                  http://omeka.org/
                                                                                                                               16
                                                                                                                                  http://www.mediamosa.org/
 @prefix     rdf:                       < h t t p : / / www. w3 . org /1999/02/22 − r d f−syntax−ns#> .                                     • Identity Service: with this distributed architecture an
 @prefix     rdfs:                      < h t t p : / / www. w3 . org / 2 0 0 0 / 0 1 / r d f−schema#> .
 @prefix     owl:                       < h t t p : / / www. w3 . org / 2 0 0 2 / 0 7 / owl#> .                                               identity server is needed for authentication across the
 @prefix     premisowl:                 < h t t p : / / m u l t i m e d i a l a b . e l i s . ugent . be / users / samcoppe /
           o n t o l o g i e s / Premis / premis . owl#> .
                                                                                                                                              different systems, shown in box 6 of Figure 3.
                              a                                        premisowl:License ;
        premisowl:rightsStatementIdentifier                                     ;                                             For building our distributed, digital long-term preserva-
        premisowl:rightsBasis
        premisowl:licenseInformation
                                                                                " license " ;
                                                                                ;
                                                                                                                                      tion platform, we need an integration server to orchestrate
        premisowl:rightsGranted                                                  ;                                    the different processes, based on SOA technology. An Enter-
        premisowl:linkingObject                                                  ;
        premisowl:linkingObject                                                  ;                                           prise Service Bus (ESB) provides the open, standards-based
        premisowl:linkingAgent                                                  <> ;
               .                                                                                                                      connectivity infrastructure for the service oriented architec-
                                                a
                                                                                                                                      ture and allows these services to exchange data with one
       premisowl:RightsStatementIdentifier ;                                                                                          another as they participate in our processes. Orchestration
         premisowl:identifierType                                          "URL" ;
         premisowl:identifierValue                                         " h t t p : / / a r c h i p e l l o d . demo . i b b t .   between services is handled by a workflow engine. This en-
               be:8080 / r i g h t s / r e s o u r c e / d i s s e m i n a t i o n " ;
         .                                                                                                                            gine is integrated in the service bus architecture and sup-
                              a          premisowl:LicenseInformation ;
                                                                                                                                      ports the execution of the preservation processes. An exe-
        premisowl:licenseIdentifier                          ;                                                    cutable preservation process is defined by a control flow that
        premisowl:licenseTerms                                " Here comes t h e a c t u a l t e x t o f
               t h e l i c e n s e . ( under development ) " ;                                                                        consists of a combination of basic and structured activities.
        premisowl:licenseNote                                 " These o b j e c t s may be
               disseminated . " ;                                                                                                     For the communication, the ’Simple Object Access Protocol’
        .                                                                                                                             (SOAP)[2] is used, a protocol specification for exchanging
                                      a               premisowl:LicenseIdentifier ;                               structured information between services. This integration
        premisowl:identifierType                                          "URL" ;
        premisowl:identifierValue                                         " h t t p : / / a r c h i p e l l o d . demo . i b b t .    server is built using the Porthus 18 .NET Integration server.
              be:8080 / l i c e n s e / r e s o u r c e / d i s s e m i n a t i o n " ;
        .                                                                                                                                The whole preservation/dissemination cycle starts with a
                                                   a            premisowl:LicenseInformation ;
                                                                                                                                      harvesting process, which will harvest the metadata, and
        premisowl:act                                                           ;                                 the referenced files. The metadata harvested, is described
        premisowl:termOfGrant                                                   ;
        .                                                                                                                             using several descriptive metadata formats, e.g., MARC,
                                              a             premisowl:TermOfGrant ;                              DC, or CDWA. For management and dissemination pur-
        premisowl:startDate
        .
                                                                                 "2009−09−01T08:30:00 " ;                             poses this metadata needs to be mapped to DC RDF. For
                                                                                                                                      this, we rely on a mapping service, which will map the
Listing 3: PREMIS OWL Rights instance in N3                                                                                           incoming metadata to DC descriptions.
notation.                                                                                                                                If the content also to be preserved, the original metadata
                                                                                                                                      record, the mapped DC RDF record and the referenced files
                                                                                                                                      get packed into a Submission Information Package (SIP),
                                                                                                                                      according to the OAIS specifications by the SIP creator
       • Integration Server : this server provides an integration                                                                     service. For this SIP, the BagIt [3] package format is used.
         layer for orchestrating all the needed processes, which                                                                      This SIP package is then delivered to the CMS, using the
         are all implemented as web services, e.g., transcoding                                                                       SIP ingest service.
         services. Box 3 of Figure 3 shows this.                                                                                         When ingesting this BagIt package into the CMS, it has to
                                                                                                                                      be supplemented with the preservation information to form
       • LOD server : this server is used for the dissemination
                                                                                                                                      an Archival Information Package (AIP) in the OAIS termi-
         of the content and the provenance information, with a
                                                                                                                                      nology. This package holds all the different versions of the
         triple store as a storage back-end, shown in box 4 of
                                                                                                                                      metadata and the multimedia files, referenced by the meta-
         Figure 3.
                                                                                                                                      data files. For this preservation information, we will use
       • CMS : The CMS will store the archived content, using                                                                         our PREMIS OWL ontology. During this ingest process, all
         persistent identifiers and cloud storage, depicted in box                                                                    files in the package get a PREMIS Object description, re-
         5 of Figure 3. For this Fedora Commons 17 is used.                                                                           lated to the mapped DC RDF description, thus becoming
                                                                                                                                      the PREMIS intellectual entity. For this we rely on a char-
17
     http://fedora-commons.org/                                                                                                       acterisation service, which will identify the file format of
                                                                                                                                      the files and model the files as PREMIS Objects. Every ac-
                                                                                                                                      tion performed on such a PREMIS Object, will get related to
 @prefix     rdf:                       < h t t p : / / www. w3 . org /1999/02/22 − r d f−syntax−ns#> .
                                                                                                                                      that Object and will be modeled as a PREMIS Event. This
 @prefix     rdfs:                      < h t t p : / / www. w3 . org / 2 0 0 0 / 0 1 / r d f−schema#> .                              way, the platform is able to store and track the provenance
 @prefix     owl:                       < h t t p : / / www. w3 . org / 2 0 0 2 / 0 7 / owl#> .
 @prefix     premisowl:                 < h t t p : / / m u l t i m e d i a l a b . e l i s . ugent . be / users / samcoppe /         of the descriptive metadata and the referenced multimedia
           o n t o l o g i e s / Premis / premis . owl#> .
                                                                                                                                      files.
                                                           a            premisowl:Event ;
             premisowl:agentIdentifier                                           ;
                                                                                                                                         The next thing within the workflow is the migration of
             premisowl:agentType                                                " person " ;                                          the stored, related multimedia files. These files get migrated
             premisowl:agentName                                                "Sam Coppens " ;
             premisowl:linkingAgent                                              ;                                            to a file format, defined by the archives preservation plans.
             premisowl:linkingObject                                             ;
             premisowl:linkingObject                                             ;                                           Such a preservation plan can stipulate, e.g., that all image
                     .
                                                                                                                                      files must be migrated to the TIFF file format to keep the
                                                         a             premisowl:AgentIdentifier ;                          image information accessible for long-term preservation pur-
        premisowl:identifierType                                                 " OpenID " ;
        premisowl:identifierValue                                                " h t t p : / / smcoppens .                          poses, or, e.g., that all image files must be migrated to the
              a r c h i p e l o p e n I D . be " ;
        .                                                                                                                             JPEG file format to keep the image information accessible
                                                                                                                                      for dissemination purposes. For this, we need migration
Listing 4:                    PREMIS OWL Agent instance in N3                                                                         services, which can then migrate various incoming file for-
notation.                                                                                                                             18
                                                                                                                                           http://www.porthus.be
                          Figure 3: Architecture of the long-term preservation platform.



                                                                sources like the OpenCalais infrastructure19 for extracting
                                                                these named entities, GeoNames 20 for enriching the loca-
                                                                tions, DBPedia 21 for enriching the persons, organisations
                                                                and events, BibNet 22 for authors, singers and music bands
                                                                enrichment, and Toerisme Vlaanderen 23 for touristic infor-
                                                                mation enrichment on locations. This way, our approach
                                                                provides i) unique identifiers for the resource and ii) for-
                                                                malised knowledge about this resource.We will not only dis-
                                                                seminate the intellectual entity, i.e., the descriptive meta-
                                                                data, but also the preservation information, so the end-user
                                                                has access to all the information available about that object.
                                                                   If the harvested content does not need to be preserved, it
                                                                is directly routed to our enrichment service, which will
                                                                interlink the data with external data sources after harvesting
Figure 4: Schematic Overview of the Service Bus                 and mapping the metadata. This enriched DC description
and its Connected Services.                                     then gets ingested into the triple store of the LOD server,
                                                                which automatically publishes the enriched DC records as
                                                                LOD.


                                                                5.   PUBLICATION
mats to the appropriate file format according the preserva-        Our architecture, described in the previous section, in-
tion plans. This migration will extend the AIP package with     gests all the harvested and generated information into our
the extra migrated data stream. This data stream is then        triple store. This information, including the provenance in-
passed to the characterisation service to get a PREMIS Ob-      formation, needs to be disseminated as Linked Open Data.
ject description of the generated data stream and the preser-   For this dissemination, we want to have stable URIs [9], e.g.,
vation information is also extended with a description of the   http://../record/VTi/1/oai:archipel1.demo.ibbt.be:10
migration service as a PREMIS Event relating the source         for the harvested original resources. These resources change
object to the migrated object.                                  over time via the preservation processes. Every version of
   During the last phase, the archived information is moved
                                                                19
to the LOD server for dissemination of the information.            http://www.opencalais.com/
                                                                20
For this, the descriptive DC RDF metadata will get en-             http://www.geonames.org
                                                                21
riched by the enrichment service before it gets ingested           http://dbpedia.org
                                                                22
into the LOD server’s triple store by the LOD ingest ser-          http://www.bibnet.be/
                                                                23
vice. For the enrichment service, the platform relies on data      http://www.toerismevlaanderen.be
                                                                  1 : UA −−− HTTP GET/HEAD; Accept−D a t e t i m e : T j −−−−> URI−R
                                                                  2 : UA <−− HTTP 200; L i n k : URI−G −−−−−−−−−−−−−−−−−− URI−R
                                                                  3 : UA −−− HTTP GET/HEAD; Accept−D a t e t i m e : T j −−−−> URI−G
                                                                  4 : UA <−− HTTP 302; L o c a t i o n : URI−Mj ; Vary ; L i n k :
                                                                         URI−R, URI−T , URI−Mj ,−−−−−−−−−−−−−−−−−−−−−−−−− URI−G
                                                                  5 : UA −−− HTTP GET URI−Mj ; Accept−D a t e t i m e : T j −−> URI−Mj
                                                                  6 : UA <−− HTTP 200; Memento−D a t e t i m e : T j ; L i n k :
                                                                         URI−R, URI−T , URI−G, URI−Mj −−−−−−−−−−−−−−−−−−− URI−Mj

                                                                     Listing 5: Typical Memento HTTP interaction



                                                                  negotiation for that original resource. The timegate redi-
                                                                  rects the user agent to the appropriate memento, which re-
                                                                  sponds with a memento-datetime. This memento-datetime
                                                                  gives the datetime the resource was created. This datetime
                                                                  of a memento is retrieved using the provenance information
                                                                  of that memento. The provenance of every memento is mod-
                                                                  eled as a PREMIS OWL Object instance relating to Event
                                                                  instances. Such an Object instance has always a creation
                                                                  event. The datetime of this creation event is used for the
Figure 5: Schematic Overview of the Content Nego-                 Memento datetime content negotiation. List 5 gives an ex-
tiation.                                                          ample of such an HTTP interaction.
                                                                     Next to the two new headers, Memento also introduces
                                                                  some new values for the existing HTTP headers: Vary and
                                                                  Link. The value for the VARY header in our case will be ne-
the resource has another URI, e.g., http://../record/VTi/         gotiate, accept-datetime, accept. This VARY header informs
1/oai:archipel1.demo.ibbt.be:10_V3. To link from the              that the content negotiation was performed in two dimen-
original resource with a stable URI to the appropriate ver-       sions, i.e., the datetime content negotiation and the media
sion URI, we extended our Linked Open Data server with the        type content negotiation. The relation types for the Link
Memento datetime content negotiation24 , besides the medi-        header Memento introduced are original, for referencing the
atype content negotiation. This mechanism allows the plat-        original resource, timegate, for indicating the timegate,
form to publish the information on persistent URIs. Based         timemap for linking to the timemap, and memento for refer-
on the Memento datetime content negotiation the right ver-        encing to various mementos for an original resource. These
sion of that resource is selected and published as LOD. This      Link headers allow automatic discovery of the timegate, the
mechanism is depicted in Figure 5 and explained in publi-         timemap, the original resource and several other mementos.
cation [14].                                                         Introducing this Memento datetime content negotiation is
                                                                  justified from our digital long-term preservation perspective.
5.1     Memento Datetime Content Negotiation                      A problem we were facing publishing information as Linked
 The Memento framework is based on HTTP and HTTPS                 Open Data and preserving it at the same time, involved the
URIs and introduces several concepts:                             enrichments. These enrichments do not always remain valid
                                                                  over time. That is why these enrichments are mostly left
     • Original Resource (URI-R): This resource is archived
                                                                  out of the metadata to be stored for the long term. If the
       for the long-term and has several versions.
                                                                  data providers of the enrichments also support the datetime
     • Memento (URI-Mj): This resource refers to one of the       content negotiation, a memento with enrichments would ref-
       versions of an original resource.                          erence that memento of the enrichment when it was valid.
                                                                  In other words, the Memento datetime content negotiation
     • TimeGate (URI-G): The TimeGate for an original re-         also preserves the temporality of the information. This jus-
       source is a resource that supports the datetime content    tifies storing also the enrichments of the metadata records
       negotiation.                                               for the long-term.
     • TimeMap (URI-T): A TimeMap for an original re-
       source lists the URIs of all the mementos of that orig-    5.2    Publishing Provenance
       inal resource.                                                In our platform, every version (memento) of a harvested
                                                                  resource (original resource) has a PREMIS OWL Object de-
  The Memento framework is based on HTTP request and              scription. This Object description describes the provenance
response headers. The framework introduces two new head-          of that object and is related through events to object de-
ers: Accept-Datetime and Memento-Datetime. The Accept-            scriptions of other versions/mementos of that original re-
Datetime header is used to ask for the version of the original    source. This allows our platform to include in the response
resource valid on that time. If a user agent requests an origi-   of the request for a memento a provenance link header which
nal resource for a specific datetime, the server responds with    includes the link to the LOD published PREMIS OWL Ob-
a link to the timegate, which can do the datetime content         ject description (URI-Pj ) of that memento. This provenance
24                                                                link header will allow automatic discovery of the provenance
 http://datatracker.ietf.org/doc/
draft-vandesompel-memento/                                        information.
  We extended the Memento framework with a new concept:             1 : UA −−− HTTP GET/HEAD; Accept−D a t e t i m e : T j −−−−> URI−R
                                                                    2 : UA <−− HTTP 200; L i n k : URI−G −−−−−−−−−−−−−−−−−− URI−R
   • Provenance (URI-Pj): This resource refers to the prove-        3 : UA −−− HTTP GET/HEAD; Accept−D a t e t i m e : T j −−−−> URI−G
     nance of the selected version/memento of the original          4 : UA <−− HTTP 302; L o c a t i o n : URI−Mj ; Vary ; L i n k :
                                                                           URI−R, URI−T , URI−Mj ,−−−−−−−−−−−−−−−−−−−−−−−−− URI−G
     resource.                                                      5 : UA −−− HTTP GET URI−Mj ; Accept−D a t e t i m e : T j −−> URI−Mj
                                                                    6 : UA <−− HTTP 200; Memento−D a t e t i m e : T j ; L i n k :
  To allow this resource to be automatically discovered, we                URI−R, URI−T , URI−G, URI−Mj , URI−P j −−−−−−−−−−−− URI−Mj
extended the Memento framework with a special value for
the existing HTTP header Link referencing the provenance            Listing 6: Extended Memento HTTP interaction
information. The relation type for this Link header is prove-       with provenance information
nance for the current provenance record (URI-Pj). A typical
HTTP interaction, requesting a certain memento, is shown
in Listing 6. In our framework steps 1 and 2 of the shown
                                                                    came from and if that data provider is a trusted party also
interaction are skipped, because the URI the original re-
                                                                    to make a judgment regarding the quality and trustworthi-
sources are published on is also the timegate for the original
                                                                    ness of the information. The PREMIS OWL model also
resources.
                                                                    allows describing the rights information in the provenance
  The provenance records are themselves also datetime con-
                                                                    of a resource, such as licenses, copyrights, rights granted,
tent negotiable. So they become mementos of an original
                                                                    etc. A license checker could use these additional links to
provenance resource. Doing this, gives some extra benefits.
                                                                    browse through the provenance records of the mementos of
The Memento framework defined some extra relation types
                                                                    an original resource and check if in none of them violates
for the HTTP Link header referencing a memento. When
                                                                    the rights information of another memento.
applied to a provenance record of a memento of an original
                                                                       A shortcoming of making provenance records also date-
resource, they get the following definitions:
                                                                    time content negotiable, is that all events happening on a
   • first memento (URI-M0): This resource refers to the            preserved resource more recent than the datetime asked for
     provenance of the first version/memento of the original        will be left out of the provenance description. Hence, the
     resource.                                                      provenance information would then only contain links to
                                                                    older versions/mementos of the preserved resource and the
   • last memento (URI-Mn): This resource refers to the             links to the more recent versions are lost.
     provenance of the last version/memento of the original            To improve the automatic discovery of the provenance in-
     resource.                                                      formation of a memento, our platform will inject the prove-
                                                                    nance link of the memento also in the HTML and RDF de-
   • memento (URI-Mj): This resource refers to the prove-           scriptions of that memento. This will enhance the prove-
     nance of the selected version/memento of the original          nance discovery, because not all clients will be able to in-
     resource.                                                      tercept the provenance link header. For the HTML repre-
   • previous memento (URI-Mi): This resource refers to             sentation of the memento, our framework includes a HTML
     the provenance of the previous version/memento of the          link tag in the head of the HTML document. This link has
     selected version/memento of the original resource.             a relation type of provenance, e.g., . For the RDF representation, our platform in-
     provenance of the next version/memento of the se-              jects a triple denoting the provenance information of that
     lected version/memento of the original resource.               memento. For linking this provenance record (PREMIS
                                                                    OWL Object intance), the PREMIS OWL object property
   • timemap (URI-T): A TimeMap for a provenance record             linkingObject is used. An example of such an injected triple
     of an original resource lists the URIs of the provenance       in the RDF description of a memento is:  premis:lin-
                                                                    kingObject .
nance header link, referencing the provenance information of           In some cases, it might be convenient to store the prove-
that memento. This provenance record is on itself also a me-        nance of the provenance information. An example of this in
mento. The response of this memento includes a timemap              our framework is the characterisation process. This process
link header pointing to a URI (URI-T ) listing the URIs of          identifies a memento of an original resource and creates a
the provenance records of all mementos of that original re-         PREMIS OWL Object instance of it. This can be the meta-
source. This way, an agent can have immediately an overall          data record or a multimedia file referenced in a metadata
view on the provenance of an original resource.                     record. In case of a file, the Object description is being en-
   These extra links could be very helpful in processing the        riched with information from the Preserv2 format registry25 .
provenance information. Our PREMIS OWL model allows                 This is an enrichment event occurring on provenance infor-
describing digital signatures, signing the versions/memen-          mation. This could be described in the provenance of the
tos of that original resource. A quality checker could in-          provenance information. Another example of this are digi-
vestigate the quality and trustworthiness of the published          tal signatures. Our PREMIS OWL model allows describing
information. This quality checker could investigate the dig-        these digital signatures applied to a stored memento, but
ital signature of the last version. If this was signed by a         digital signatures can also be used to sign provenance in-
trusted party and the digital signature is still valid, the qual-   formation. When including a provenance Link header in
ity checker could immediately move on to the provenance of
                                                                    25
the first memento to check where the signed information                  http://p2-registry.ecs.soton.ac.uk/
the response to a provenance record, the provenance of the       version of the PREMIS 2.0 data dictionary, i.e., PREMIS
provenance information can be discovered.                        OWL. Using this ontology, it is possible to store the meta-
   Looking at the 5-star deployment scheme26 of Tim Berners-     data needed for the preservation services. It forms the data
Lee, this framework could add two more stars for indicating      model for the archive.
the rating of a Linked Open Data provider. A sixth star             A SOA was designed for this distributed archive. This
could go to Linked Open Data providers that support the          SOA in combination with an ESB allows to modify and ex-
Memento datetime content negotiation. This sixth star will       pand the current setup of processes and to communicate
indicate to, e.g., a long-term preservation archive, that the    with all the distributed preservation and dissemination ser-
enrichments coming from that provider could be stored also       vices. This platform produces lots of different versions of
for the long term, as discussed earlier. A seventh star could    the stored information and also produces provenance infor-
go to Linked Open Data providers not only supporting the         mation, which will relate the different versions of the stored
Memento datetime content negotiation, but also using this        information. To publish these different versions of a pre-
framework to publish their provenance records as Linked          served resource and their provenance information, our plat-
Open Data. This seventh star will indicate that the data         from relies on the Memento datetime content negotiation.
provider publishes provenance information and, hence, it is      We extended this framework to include also HTTP prove-
possible to make trust judgments over that data using qual-      nance header links for automated discovery of the prove-
ity checkers or license checkers, as mentioned above.            nance information. This approach allows us to dissemi-
                                                                 nate the versioned information of the preserved resources
5.3   Implementation                                             on persistent URIs, depending on the datetime content ne-
   For implementing this framework, we used Jena TDB as          gotiation to redirect to the appropriate version/memento of
triplestore for the back-end. This is a large-scale persistent   the original stored resource. Combining datetime content
triplestore which supports SPARQL. On top of this triple-        negotiation with the publication of the provenance infor-
store, the LOD server was built using Apache Tomcat as           mation, links the provenance information to the datetime
HTTP web server. This LOD server has a servlet which             dimension of a certain stored resource. It also allows to
will do the datetime and the mediatype content negotiation       store even the enrichments of the LOD published and pre-
and will redirect from the original resource, published on a     served resources, because the temporality of these enrich-
persistent URI, to the appropriate version/memento of that       ments is also preserved. Finally, the framework allows dis-
original resource. This servlet will form the timegate. Next     covering the provenance information of the other existing
to this, we have servlets to serve the appropiate mediatype      versions of an original resource bringing provenance infor-
of the information (HTML and RDF) will also insert the           mation to the Web. This can all be tested on our pub-
provenance information. The resources that will be pub-          licly available LOD server, published on the following URL:
lished with this timegate are the harvested collections and      http://archipellod.demo.ibbt.be:8080/
records. As explained in the previous section, we do not
offer datetime content negotiation for the provenance infor-     7.      ACKNOWLEDGMENTS
mation. For this information, we have a seperate servlet
only supporting media type content negotiation.                     The research activities that have been described in this
   Next to the LOD server supporting the datetime content        paper were funded by Ghent University, K.U. Leuven, VRT-
negotiation, we have an integration server which will provide    medialab, the Interdisciplinary Institute for Broadband Tech-
the needed preservation processes. These preservation pro-       nology (IBBT) through the Archipel-project (50Innovation
cesses will generate the different versions of the harvested     by Science and Technology in Flanders (IWT), the Fund
information. This integration server was built using the         for Scientific Research-Flanders (FWO-Flanders), and the
Porthus .NET Integration server.                                 European Union. Special thanks go out the work package
   The LOD server will soon be publicly available for demon-     4 partners. The work described was carried out by this
stration on the URL http://archipellod.demo.ibbt.be:             team. This team consists of Gert Goossens, Bart Janssens,
8080/. It will support the datetime content negotiation and      and Raf Vandesande from Porthus28 , Descartes, Filip Bor-
this can be tested in the Mozilla web browser using the Me-      loo working for VTi29 , Inge Van Nieuwerburgh and Patrick
mento plugin27 .                                                 Hochstenbach from Boekentoren30 , Kris Buytaert from Inu-
                                                                 its31 and Matthias Vandermaesen from Krimson32 .

6.    CONCLUSIONS
                                                                 8.      REFERENCES
  In this article, we have presented a distributed, digital
long-term archive relying on semantic technologies. Our               [1] Bizer, C. and Heath, T. and Idehen, K. and
platform is able to harvest data, store it for the long-term,             Berners-Lee, T. Linked Data on the Web. In
and disseminate it as LOD. This data comes from very di-                  Proceedings of the 17th International World Wide
verse institutions, each using domain-specific metadata for-              Web Conference – LDOW Workshop, pages
mats. For this, we have developed a layered, semantic meta-               1265–1266, Beijing, China, April 2008.
data model. The top layer lets the archive deal with the              [2] Box, D.; Ehnebuske, D.; Kakivaya, G.; Mayman, A.;
diverse data coming from the institutions. For this layer,                Mendelsohn, N.; Frystyk Nielsen, H.; Thatte, S. and
DC RDF was chosen. The bottom layer will enable the              28
long-term preservation processes and consists of a semantic         http://www.porthus.be/default2.aspx
                                                                 29
                                                                    http://www.vti.be
26                                                               30
 http://www.w3.org/DesignIssues/LinkedData.html                     http://www.boekentoren.be
27                                                               31
 https://addons.mozilla.org/en-US/firefox/addon/                    http://www.inuits.be
                                                                 32
mementofox/                                                         http://www.krimson.be
    Winer, D. Simple Object Access Protocol (SOAP) 1.1,           Available at http://www.ietf.org/rfc/rfc2616.txt.
    2000. Available at http://www.w3.org/TR/soap/.            [9] Internet Engineering Task Force. RFC 3986: Uniform
[3] Boyko, A.; Kunze, J.; Littman, J.; Madden, L. and             Resource Identifier (URI) – Generic Syntax, 2005.
    Vargas, B. The BagIt File Packaging Format (V0.96),           Available at http://tools.ietf.org/html/rfc3986.
    2009. Available at                                       [10] Lagoze, C. and Van de Sompel ,H. The open archives
    https://confluence.ucop.edu/download/                         initiative protocol for metadata harvesting - version
    attachments/16744580/BagItSpec.pdf?version=1.                 2.0, 2002. Available at http://www.openarchives.
[4] Brand, S. Escaping The Digital Dark Age. Library              org/OAI/openarchivesprotocol.html.
    Journal, 124, Issue 2:46–49, March 2003.                 [11] Messina, A.; Boch, L.; Dimino, G.; Bailer, W.;
[5] Consultative Committee for Space Data Systems.                Schallauer, P.; Allasia, W.; Basili, R. Creating rich
    Reference Model for an Open Archival Information              Metadata in the TV Broadcast Archives Environment:
    System (OAIS), Januari 2002. Available at                     the PrestoSpace project. In IEEE AXMEDIS06
    http://public.ccsds.org/publications/archive/                 Conference Second International Conference on
    650x0b1.pdf.                                                  Automated Production of Cross Media Content for
[6] Gil, Y.; Cheney, J.; Groth, P.; Hartig, O.; Miles, S.;        Multi-Channel Distribution, pages 193–200, 2006.
    Moreau, L.; da Silva, P. P.; Coppens, S.; Garijo, D.;    [12] Nilsson, M. and Powell, A. and Johnston, P. and
    Gomez, J. M.; Missier, P.; Myers, J.; Sahoo, S.; Zhau,        Naeve, A. Expressing Dublin Core metadata using the
    J. Provenance XG Final Report, 2010. Available at             Resource Description Framework (RDF), 2007.
    http:                                                         Available at
    //www.w3.org/2005/Incubator/prov/XGR-prov/.                   http://dublincore.org/documents/dc-rdf/.
[7] Hartig, O.; Zhao, J. Publishing and Consuming            [13] Van de Sompel, H; Sanderson, R.; Nelson, M.L.;
    Provenance Metadata on the Web of Linked Data. In             Balakireva, L.; Shankar, H. and Ainsworth, S.
    Proceedings of the 3rd International Provenance and           Memento: Time Travel for the Web. CoRR,
    Annotation Workshop IPAW, 2010. Available at                  abs/0911.1112, 2009.
    http://olafhartig.de/files/HartigZhao_                   [14] Van de Sompel, H; Sanderson, R.; Nelson, M.L.;
    Provenance_IPAW2010_Preprint.pdf.                             Balakireva, L.; Shankar, H. and Ainsworth, S. An
[8] Internet Engineering Task Force. RFC 2616:                    HTTP-Based Versioning Mechanism for Linked Data.
    HyperText Transfer Protocol – HTTP/1.1, 1999.                 CoRR, abs/1003.3661, 2010.