<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linked Data at the Swiss Federal Archives</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Status Report</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dr. Cochard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Luc</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Swiss Federal Archives</institution>
          ,
          <addr-line>Archivstrasse 24, 3003 Bern</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Data is attracting increasing interest from the Swiss public administration. The Swiss Federal Archives are playing a leading role in this respect by investing significantly in the deployment of an infrastructure for publishing data in LD. This approach has enabled the institution to acquire in-depth knowledge on the subject and to consider integrating LD into its core applications and into the services it offers to the public.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data</kwd>
        <kwd>RDF</kwd>
        <kwd>triplestore</kwd>
        <kwd>Archival Information System</kwd>
        <kwd>Database</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>aLOD related activities
The archives participating in aLOD activities have set themselves the following goals:
1. To examine the opportunities for LD to achieve the mission of archival institutions.
2. Based on real descriptive metadata sets from their respective AIS, to transform and
unify these metadata into LD datasets.
3. In doing so, to formulate "best practices" for transforming existing inventories
(metadata) into LD.
4. To communicate and disseminate the achievements of the project within the archival
community and the internal users, but also beyond, for example within the
community of researchers in digital humanities, and also to exchange with the actors who
contribute to the implementation of LD technologies (GLAM and beyond).
5. Demonstrate the potential for third party reuse of descriptive metadata in LD when
made freely available (OGD), as for example in the context of hackathons on cultural
data.</p>
      <p>
        The different archives exported data from their AIS in the form of csv files that were
then converted into LD using an ad hoc data model, since the RiC-O data model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
was not yet available when this work was undertaken. Several particularities had to be
taken into account in order to achieve a commonality of these data:
      </p>
      <p>The contents of the inventories had very different levels of detail from one institution
to another. The data model therefore had to be enriched as new datasets were
integrated.</p>
      <p>The language of these contents was different: French or German in this case.
Fortunately, this aspect is well handled with language tags in RDF2.</p>
      <p>The dates had variable numerical formats or even were in textual form. This is one
of the aspects that took the longest time to be dealt with, without resulting in a clean
and reusable solution.</p>
      <p>For each institution, a data export procedure had to be put in place. Even for
institutions that used the same AIS, it was not possible to have a generic solution, as the
content structures were quite different.</p>
      <p>The data conversion produces triples like those associated with the AFS record with
the signature "B0#1000/1483#3792*" (see Fig. 1). Thanks to this uniform
representation of the data from the different archives and to the fact that these Linked Data are
directly accessible on the web via LINDAS and its SPARQL interface, it has been
possible to have an experimental prototype for the visualisation of all these data (see Fig.
2). This representation includes a histogram of the number of records per date, which
is unusual in archival web portals but could be useful to identify the density of
information over time on a specific subject.
2 Resource Description Framework: a formal model to define graph structures.
Fig. 1. Example of an entity from the AIS of the AFS, of type "File", converted to LD with the
ad hoc data model used in 2015.</p>
    </sec>
    <sec id="sec-2">
      <title>LINDAS</title>
      <p>LINDAS as a linked data hosting infrastructure has been enhanced since its first release
to become a productive infrastructure. This enhancement was carried out between 2017
and 2020. Its general structure is described schematically below (see Fig. 3). In the
centre, there are several triplestores to allow testing, integration and finally production
of new datasets. Data conversion can be a recurring or a one-off process. In any case,
an ETL pipeline is implemented, the execution of which can be scheduled according to
the updating of the source data.</p>
      <p>
        To ease the definition of these conversion processes, the Data Cube Creator tool,
specialised in the conversion of OLAP cubes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], has been implemented. This tool
allows configuring the conversion of this type of data without having in-depth knowledge
of the W3C cube model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], used for this purpose. This auxiliary solution allows many
administrations to publish data in LD as LOGD. In addition, to enrich data
documentation, the Schema Manager tool allows the modelling and publication of schemas and
ontologies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This is central to the long-term archiving of LD, as the description of
the modelling schemas is as important as the data itself to define the semantics of a
dataset.
      </p>
      <p>
        To complete the infrastructure, the graphical visualisation of data hosted in LINDAS
can be parameterised using the Visualize tool [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This solution works as an accelerator
for the adoption of LD as it facilitates the production of interactive graphical
representation in web pages or digital reports if the data is first converted to LD and published
in LINDAS.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Linked Data as core technology of an AIS</title>
      <p>We believe that LD is the optimal solution for publishing data and making it accessible
on the web. The question we asked ourselves in relation to our core activities as an
archive is whether LD and more specifically the RDF model could be used as the central
database of an AIS. In 2018 and 2019, two studies3 were conducted by research
institutes to verify certain aspects of this technology in relation to our own issues.</p>
      <p>It appeared from this work that there are suppliers of triplestores able to deliver
solutions that are perfectly suited to our needs. Thus, Stardog version 5.2 allowed us to
build a graph of 10B triples by reading files of 100M triples with an average and stable
execution time of 20 min. per file. This amount of data is much more than what we
estimate we will eventually have to manage in our AIS: 100-500M triples.</p>
      <p>Updates are crucial operations that are implemented by Delete and Insert functions.
In our test, 1M updates were performed in 12 sec. on average. And finally SPARQL
queries of different complexity combined with Insert, all at different frequencies, have
had sub-second response times.</p>
      <p>Therefore, we are confident that, if the triplestore is installed on suitable servers, this
technology will be performing well as the core database of an AIS.</p>
      <p>
        Another issue that has been studied is whether the RDF model is as powerful as
Property Graphs (e.g. Neo4j [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). Fortunately the evolution of RDF to RDF-star [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and its pendant SPARQL to SPARQL-star considerably reduces the expressive
advantage of Property Graphs while maintaining the advantage of RDF which is a W3C
open standard. As such, the RiC-O standard is written in RDF but is designed to evolve
quickly into RDF-star when this new standard is approved.
      </p>
      <p>In our opinion, there is no reason why an AIS should not be developed with a
triplestore at its core as a central database.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Future developments</title>
      <p>If LD can be implemented at the core of an AIS, it can also have other roles. Here are
two areas we are considering working on in the coming years.
5.1</p>
      <sec id="sec-4-1">
        <title>Publication of database content</title>
        <p>
          By publishing datasets in LD, public administrations take a first step towards publishing
entire databases for public reuse. However, archives also hold databases in their archive
holdings, ideally in SIARD format [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Unfortunately, this format is not designed for
web publication of data and their structure. A conversion from SIARD to LD seems to
be a promising and feasible way to fill this gap.
5.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Testing RiC-O</title>
        <p>RiC-O in its current version 0.2 is a very promising proposal that still needs to be tested
in the very different contexts of Swiss archives. To this end, LINDAS and its data
conversion environment will allow us to test the conversion of the descriptive metadata of
3 These reports have not been published but the author can provide you with a copy if desired.
our inventories according to the RiC-O standard. Only then will it be possible to
identify possible gaps in the model and to establish best practices in the way of proceeding
with this conversion task.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Godby</surname>
          </string-name>
          , Carol Jean.:
          <article-title>-Data Model of Bibliographic Description: A Working Paper</article-title>
          . Dublin, Ohio: OCLC Research (
          <year>2013</year>
          ), https://www.oclc.org/content/dam/research/publications/library/2013/2013-
          <fpage>05</fpage>
          .pdf,
          <source>last accessed</source>
          <year>2021</year>
          /07/02.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <fpage>5</fpage>
          -Star Open Data, https://5stardata.info/en, last accessed
          <year>2021</year>
          /07/02
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. aLOD homepage, http://www.alod.ch,
          <source>last accessed</source>
          <year>2021</year>
          /07/02
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>RiC-O Version</surname>
          </string-name>
          <article-title>0</article-title>
          .2 homepage, https://www.ica.org/standards/RiC/RiC-O_
          <fpage>v0</fpage>
          -
          <lpage>2</lpage>
          .html,
          <source>last accessed</source>
          <year>2021</year>
          /07/02
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. OLAP Cube homepage, https://en.wikipedia.org/wiki/OLAP_cube,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>The RDF Data Cube</surname>
            <given-names>Vocabulary</given-names>
          </string-name>
          , https://www.w3.org/TR/vocab-data-cube/,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Zazuko Ontology Manager, https://zazuko.com/products/ontology-manager/,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Visualize homepage, https://www.visualize.admin.ch/en, last accessed
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Neo4j homepage, https://neo4j.com/,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>RDF-star and</article-title>
          <string-name>
            <surname>SPARQL-star Community</surname>
          </string-name>
          Group Report, https://w3c.github.io/rdf-star/cgspec/editors_draft.html,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>SIARD</surname>
          </string-name>
          <article-title>Suite homepage</article-title>
          , https://github.com/sfa-siard,
          <source>last accessed</source>
          <year>2021</year>
          /07/04
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>