<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Timestamped URLs as Persistent Identifiers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lars Gleim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Information Systems, RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Data Management for Evolving Data on the Web</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>and Stefan Decker</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Uniform and persistent resource identifiers play a crucial role for sustainable data management and reuse in evolving knowledge graphs. Recent work identified explicit resource revisioning and immutability as important but insufficiently implemented and standardized components of corresponding data management systems. We propose the implementation of a global, persistent identifier system built upon time-based immutable resource revisioning of generic HTTP resources, as identified by their URL and resolved via time-based HTTP content negotiation, building upon existing Web standards. Supporting both distributed resource archival and state synchronization, the system would provide solutions to the problems of citation (referencing particular resource revisions), archiving (retrieving specific revisions), synchronization (change monitoring), and sustainability (preserving at scale, ensuring long-term access).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the following, we provide an overview of existing resource identification and
resolution standards and their shortcomings w.r.t. evolving data on the Web. We then
propose the implementation of a PID system reusing existing URLs as PIDs through
the combination of dated URIs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a resolution mechanism based on the HTTP
Memento protocol [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], inspired by FactID [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We further argue how this system
simultaneously addresses the citation, archiving, synchronization, and sustainability
problem. Finally, we discuss open issues and the potential impact of the proposed
approach.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Existing Identification Standards</title>
      <p>
        Identifiers are generally divided into two categories; handles and locators. Handles are
explicitly location-independent resource names that may be resolvable via the use of
specialized resolution services and protocols, such as the Handle System [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and its
popular implementation Digital Object Identifier (DOI) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In contrast, locators function
like actionable pointers and enable direct resource dereferencing. The single most
important locator of the Internet is the Uniform Resource Locator (URL) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], core to the
World Wide Web and principal foundation of resource-oriented architectures. In contrast
to generic identifiers, PIDs are designed to enable particularly long-lasting, i.e., persistent,
resource identification. Optimally, they should never be re-assigned, identifying the same
identical resource eternally, and be resolvably indefinitely. Persistence is however purely
a matter of service of the resource provider and not a property of any specific naming
syntax [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        While handle-based systems were traditionally described as advantageous in terms
of identifier persistence and flexibility (since their resolution process may flexibly adapt
to changes of the underlying resource and itself provide additional services such as
functioning as a metadata store for the identified resources [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) more recent accounts
argue that the http: URI scheme can be employed to achieve similar features. [
        <xref ref-type="bibr" rid="ref15 ref17">15,17</xref>
        ]
Well-known examples of such URL-based identification schemes include PURL [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and
indentifiers.org [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which use HTTP redirection to resolve Persistent URLs into regular
URLs according to the semantics of Cool URIs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Even though virtually all identifier
systems are effectively used and resolved via HTTP based dereferencing gateways
nowadays [
        <xref ref-type="bibr" rid="ref2 ref7">2,7</xref>
        ], URL-based resolvers have been criticized since their inception [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ];
mainly, for their dependence on the survival of their host domain name or since HTTP
redirection does not allow multiple resolution (both limiting long-term sustainability).
      </p>
      <p>
        To address this criticism, the Archival Resource Key (ARK) identifier system [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
introduced standardized and replaceable dereferencing gateways, combining aspects of
both handles and locators. To identify a resource, a reusable handle, consisting of naming
authority, resource name, and optional additional variant qualifier, is employed. A locator
may then be derived from this handle by prefixing it with the hostname of an HTTP
dereferencing gateway and persistence service for the resource identified by the handle.
ARK thus enables the preservation of resource copies in multiple locations (i.e., different
persistence services), providing a partial solution to the sustainability problem, as well
as the management and retrieval of different versions of resources using plain
HTTPrequests, providing a possible solution to the archiving problem. Nevertheless, ARK’s
general applicability to the Web of data is limited by its fixed URI scheme (significantly
deviating from common URLs), its lack of synchronization support, inhomogeneous
resource version management, and general lack of practical adoption.
      </p>
      <p>
        All discussed approaches require the assignment, registration, and management of
PIDs, typically distinct from and unrelated to already existing URL identifiers, creating
the need for additional identifier mapping and discovery [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. An identifier system for the
holistic management of dynamically evolving data on the Web – providing compatibility
with ubiquitous generic URLs, HTTP-resolution, and sustainable means for archiving
and synchronization with minimal overhead – is missing to date.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>A Persistent Identifier for Evolving Data on the Web</title>
      <p>
        Instead of employing and managing single-purpose PIDs, we propose to employ existing
URLs as PIDs through a simple resource revisioning mechanism, creating and exposing
immutable, persistently identifiable revisions of arbitrary resources, as recently proposed
by the abstract FactID scheme [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We propose a concrete implementation inspired by
the ARK system, combining existing Web standards to create global, persistent resource
identifiers from any URL using time-based content revisioning.
      </p>
      <p>
        Citation. Revisiting the citation problem, the fundamental challenge persistent
identifiers try to solve is that resources tend to change, move, or disappear over time. To
address this issue, we employ the basic notion that at any given fixed point in time, any
URL identifying a regular resource (or any locator for that matter) identifies only and
exactly that resource. A simple persistent identifier can thus be created by combining
an existing URL with a given point in time. That is also the underlying idea of Larry
Masinter’s ‘duri’ dated URI scheme [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A duri takes the form
      </p>
      <p>
        duri:&lt;timestamp&gt;:&lt;embeddedURI&gt;
where &lt;embeddedURI&gt; is an absolute URI as defined in RFC3986 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
&lt;timestamp&gt; is a date-time, as per RFC3339 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], allowing for the specification of an
arbitrary time resolution. The meaning of such a duri is “the resource that was
identified by the &lt;embeddedURI&gt; at the time given”. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] As such, duris provide a simple
solution to the citation problem.
      </p>
      <p>
        Archiving. To also address the archiving problem a corresponding resolution
mechanism is required. A suitable candidate for duri resolution is the Memento protocol [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
which enables the retrieval of Mementos – historic states of resources – via time-based
HTTP content-negotiation. The Memento framework distinguishes four logical
components: Original Resource, Memento, TimeGate, and TimeMap. A Memento hu; ti
captures the state of an Original Resource with URI u at a given point in time t (exposed
via the Memento-Datetime HTTP header). Mementos are intended to be immutable
and may optionally be associated with one or more distinct Memento URI(s) for
referenceability. Such a URI-M must further identify the URL of its Original Resource in
an HTTP Link header. Using the Accept-Datetime HTTP request header, historic
states of Original Resources may then be requested from a so-called TimeGate through
time-based content-negotiation, and are serviced through either a direct HTTP response
      </p>
      <p>URL
Retrieval</p>
      <p>Persisting Data</p>
      <p>Retrieval</p>
      <p>Time
+</p>
      <p>DURI
Persistent
Identification
=</p>
      <p>DURI
Archiving</p>
      <p>DURI
URL</p>
      <p>Creation
+ Time
HTTP GET URL
Accept-Datetime: Time</p>
      <p>FactID Resolution
or HTTP redirection to external archive locations, providing a simple solution to the
archiving problem.</p>
      <p>Synchronization. Additionally, the Memento protocol also enables revision
discovery and synchronization through TimeMaps, which provide a listing of available
Mementos hb; t*i at points in time t* for a given Original Resource b, i.e., a history of
available revisions with respective associated distinct Memento URIs. Thus, exposing
up-to-date TimeMaps for resources enables trivial change monitoring and the discovery,
retrieval, and thus synchronization of resource state.</p>
      <p>Sustainability. Depending on the application scenario, TimeGate, TimeMap, and/or
Mementos may all be provided by the original resource provider. It is however similarly
possible to deploy all components independently of each other, as well as with optional
redundancies. Notably, external TimeGates and Memento storage enable archiving to
be conducted by third parties and on-demand, such as already provided by archive.org.
Thus, adoption does not hinge on the support of any individual group or organization but
may be adopted by interested users in backward compatibility with existing resources on
the Web. Individual Mementos may further be resolved to multiple URI-Ms,i.e., different
storage locations, (e.g. via TimeMaps), supporting explicit redundancy. As such, the
Memento protocol also provides technological primitives to address the sustainability
problem, by enabling flexible Memento resolution through one to many TimeGates and
TimeMaps (similar to a handle), resolution of Mementos to multiple underlying URIs,
and the ability to efficiently monitor resource changes.</p>
      <p>
        Usage. Given the URL of an existing resource on the Web, a persistently
referenceable and immutable version of it can be created by duplicating it to a Memento archive
server and subsequently identifying it with a duri composed of its original URL and
retrieval time, as illustrated in the left half of Figure 1. Given such a duri, the original
resource state may then be retrieved from an archive through an HTTP GET request
employing the Memento Accept-Datetime header with the Memento’s creation
time as specified in the &lt;timestamp&gt; part of the duri. The right half of Figure 1
illustrates the simplest case where the Original Resource serves as its own TimeGate.
An overview of the additional available retrieval patterns is given in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Open Challenges. While the combination of duri and Memento protocol already
provides a good basis for the implementation of an interoperable data management
system for evolving knowledge graphs and data on the Web, several limitations need to
be addressed.</p>
      <p>While the Memento protocol does allow for the retrieval of arbitrary data types, its
timestamp format limits the temporal resolution of revisions to full seconds, clashing
with the arbitrary temporal precision of RFC3339 timestamps employed by the duri
scheme. Since many high-frequency applications, e.g., in the Internet of Things, require
sub-second precision, we propose to adopt RFC3339 arbitrary resolution timestamps
for time-based content negotiation in a revised version of the Memento protocol, as
already employed in the duri scheme. In addition, future work should explore
extending the protocol to Memento creation, not only their retrieval, to support further use
case scenarios such as simple push-based state synchronization over HTTP, as well as
providing HTTP REST APIs with the capability to uniquely identify created resource
revisions using Memento headers in response to HTTP PUT and POST requests. Lastly,
a promising extension of the duri scheme would be to allow for empty timestamps. The
semantics of this would be to resolve the current state of the Original Resource directly
whenever possible, or otherwise retrieve the latest Memento state available, enabling
explicit referencing of the most recent revision of a resource.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we proposed the implementation of a global, time-based, and
resourceoriented PID system to address the sustainable management and preservation of evolving
resources in knowledge graphs and on the Web. By combining existing URLs with
timestamps through the duri URI scheme, we described a hybrid persistent identification
scheme reusing URLs as handles. In conjunction with a resolution mechanism based
upon the HTTP Memento protocol, such URL-based PIDs may serve as both locators
and actionable handles (through a TimeGate) at the same time. Resource revisions may
be physically stored anywhere, on a task-appropriate level of granularity and possibly
duplicated, synchronized and archived in multiple locations in support of long-term
sustainability. The proposed scheme further holds the potential to serve as a unified
identification system for high precision time series data through to static concepts, as
well as providing PIDs compatible with both generic binary data through to graph data
formats such as RDF.</p>
      <p>As such, the proposed approach provides a possible joint solution to the
citation, archiving, synchronization, and sustainability problems, compatible with existing
resource-oriented architectures and thus supporting interoperable data sharing, reuse,
and integration for the increasing quantity of data published on the Web.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fielding</surname>
          </string-name>
          , R.T.,
          <string-name>
            <surname>Masinter</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          :
          <article-title>Uniform Resource Identifier (URI): Generic Syntax</article-title>
          .
          <source>RFC</source>
          <volume>3986</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Crossref</given-names>
            <surname>Display Guidelines</surname>
          </string-name>
          (
          <year>2017</year>
          ), https://doi.org/10.13003/5jchdy
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gleim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennekamp</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>FactDAG: Formalizing Data Interoperability in an Internet of Production</article-title>
          . IEEE Internet of Things Journal pp.
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gleim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Open Challenges for the Management and Preservation of Evolving Data on the Web</article-title>
          . In: MEPDaW @ ISWC (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ibarra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganzarain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Igartua</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          :
          <article-title>Business model innovation through Industry 4.0: A review</article-title>
          .
          <source>Procedia Manufacturing</source>
          <volume>22</volume>
          ,
          <fpage>4</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. ISO 26324:
          <year>2012</year>
          :
          <article-title>Information and documentation - Digital object identifier system</article-title>
          . Standard, International Organization for Standardization, Geneva,
          <string-name>
            <surname>CH</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Juty</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Novere</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laibe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Identifiers.org and MIRIAM Registry: community resources to provide persistent identification</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>40</volume>
          (
          <issue>D1</issue>
          ),
          <fpage>D580</fpage>
          -
          <lpage>D586</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kunze</surname>
            ,
            <given-names>J.A.: Towards</given-names>
          </string-name>
          <string-name>
            <surname>Electronic Persistence Using ARK Identifiers</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lannom</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boesch</surname>
            ,
            <given-names>B.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Handle System Overview</article-title>
          .
          <source>RFC</source>
          <volume>3650</volume>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Masinter</surname>
            ,
            <given-names>L.M.:</given-names>
          </string-name>
          <article-title>The 'tdb' and 'duri' URI schemes, based on dated URIs (</article-title>
          <year>2012</year>
          ), https: //datatracker.ietf.org/doc/html/draft-masinter
          <string-name>
            <surname>-</surname>
          </string-name>
          dated-uri-
          <volume>10</volume>
          , work in Progress
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
          </string-name>
          , G.:
          <article-title>Date and Time on the Internet: Timestamps</article-title>
          . RFC
          <volume>3339</volume>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sauermann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.,</given-names>
          </string-name>
          <article-title>V o¨lkel, M.: Cool URIs for the Semantic Web (</article-title>
          <year>2007</year>
          ), http: //www.w3.org/TR/cooluris/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shafer</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weibel</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jul</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The PURL Project</article-title>
          .
          <source>Journal of Library Administration</source>
          <volume>34</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>123</fpage>
          -
          <lpage>125</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Van de Sompel, H.,
          <string-name>
            <surname>Nelson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanderson</surname>
          </string-name>
          , R.:
          <article-title>HTTP Framework for Time-Based Access to Resource States - Memento</article-title>
          . RFC
          <volume>7089</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Van de Sompel, H.,
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shankar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping</article-title>
          .
          <source>International Journal of Digital Curation</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>331</fpage>
          -
          <lpage>342</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Stone</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Competitive Evaluation of PURLs (</article-title>
          <year>2000</year>
          ), http://web.mit.edu/handle/www/ purl-eval.html
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>H.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orchard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : URNs, Namespaces and Registries (
          <year>2006</year>
          ), https://www.w3. org/2001/tag/doc/URNsAndRegistries-50
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>