<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LDR: A 2nd-gen, National GeoLD System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>SURROUND Australia Pty Ltd.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Australia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian National University</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <addr-line>Geoscience</addr-line>
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The 2020 Australian bushfire crisis and the global COVID19 pandemic are examples of complex crisis events where the use of data from multiple sources was sought. In 2018 - 2020, Australia built several Linked Data “spines” - themed collections of interoperable reference data that simplify data integration from multiple sources in particular domains. The spatial data spine, Loc-I (Location Index), consists of 7 nationally-significant spatial datasets, such as the Australian Statistical Geographies System. Loc-I delivered Linked Data forms of its datasets and provided infrastructure for their use as a single system. Here described is Loc-I for Disaster Recovery, a scenario deployment of Loc-I. We discuss original Loc-I design, this project's key requirements and other diferences, such as integrating with traditional spatial data systems, and how this system is pushing the development of spatial and Semantic Web standards, such as DGGS and GeoSPARQL.</p>
      </abstract>
      <kwd-group>
        <kwd>Location Index • Loc-I • GeoSPARQL • DGGS • Spatial</kwd>
        <kwd>Data on the Web • Australia • national data infrastructure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>Australia sufers large floods and bushfires, so Australian government is
committing substantial resources over multiple years to new cross-agency data
sharing initi-atives3 that will “connect and leverage the Commonwealth’s extensive
climate and natural disaster risk information to further prepare for and build
resilience to natural disasters”.</p>
      <p>*Copyright '2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>3“Australia commits to climate resilience”, https://minister.awe.gov.au/ley/
media-releases/australia-commits-climate-resilience</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstrator Projects</title>
      <p>Several of demonstrator projects for an anticipated new data sharing regime
were conducted in early 2021. Traditional methods of data aggregation are
being tested, such as data pooling in shared facilities, standardising web services
and cross-cataloging datasets, but forward-looking methods are too. In
particular, Semantic Web (SW) and Linked Data (LD) technologies4 are being used to
integrate diferent, but relatively similar, datasets that are published in a
distributed manner and Discrete Global Grid System (DGGS) spatial data
methods are being used to integrate spatial data from multiple sources. In 2019-2020,
Geoscience Australia tested DGGS data integration for information relevant to
bushfires which includes burned/burning areas, vegetation cover and
demographics.</p>
      <p>This paper describes the SW/LD and DGGS approaches to publish
distributed and harmonised data being implemented by a Geoscience Australia
(GA) project that we will refer to as this project. The project extends the
approach taken by the Location Index project described in the next section.
2</p>
      <sec id="sec-3-1">
        <title>Loc-I: The Location Index</title>
        <p>
          In 2018 - 2020, Australian spatial data and research agencies (CSIRO &amp;
Geoscience Australia foring for the Australian Bureau of Statistics) implemented a
“national and authoritative, also federated, index for Australian spatial data
using Semantic Web technologies [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]”. This system, known as the Location Index
(Loc-I) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], aims to “better geospatially integrate and analyze data across
government portfolios and information domains”. The main use case addressed by
Loc-I’s is to greatly reduce the time taken by government workers in data
analysis using spatial information by providing pre-integrated, authoritative, spatial
datasets that can be used in online, open data scenarios, within secure data
integration environments and across the two. The project deals with data from
multiple domains, see Figure 1. Some of the interesting aspects of Loc-I’s design
include:
∗ federated publication of datasets via standard Linked Data APIs
∗ use of VoID Linkset 5 instances to crosswalk datasets
− these are independently-selectable for use meaning that a specific
crosswalk, of potentially many, may be selected for use
∗ use of a Geometry Data Service6 for spatial integration
        </p>
        <p>
          4By “Linked Data”, as opposed to “linked data” or “data linkage” etc., we mean
systems and data that implement a number of Semantic Web technologies (RDF, OWL,
SKOS, SPARQL, etc.), primarily defined by a series of World Wide Web Consortium
(W3C) standards. The W3C’s definition of Semantic Web is that it is a “Web of Data”,
an evolved Internet able to be queried by machines which can draw inferences from it.
5https://www.w3.org/TR/void/
6The service is online at https://gds.loci.cat/
− this service extends common use of using GeoSPARQL [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] by storing
Geometry instances separately from the Feature instances they are the
geometries for. This allows the geometry data to be managed in a
PostGIS database7, not a triplestore, as usually used for GeoSPARQL data.
∗ several diferent clients for diferent uses
− such as Excelerator 8, used to upload data according to one spatial
reference system and download it reapportioned according to another
Loc-I’s datasets are from many domains including environmental (the
Australian Hydrological Geospatial Fabric 9, a collection of surface hydrology
features), human/census (the Australian Statistical Geography Standard spatial
areas) 10, and cartographic/administrative (the National Composite Gazetteer of
Australia)11.
        </p>
        <p>7https://postgis.net/
8https://loci.cat/excelerator.html
9Original, non-RDF dataset: http://www.bom.gov.au/water/geofabric/, and the
online LD version implemented by Loc-I: http://linked.data.gov.au/dataset/geofabric
10Non-RDF dataset: https://geo.abs.gov.au/arcgis/services/ASGS2016/MB/
MapServer/WFSServer, LD version: http://linked.data.gov.au/dataset/asgs2016
11LD version: https://linked.data.gov.au/dataset/placenames</p>
        <p>Loc-I architecture is shown in Figure 2 for architectural details. It shows
the Loc-I Data Cache, which is a multi-graph triplestore, obtains its data by
“pulling” RDF datasets through APIs that both interpret non-RDF data for
online delivery and are also able to create static RDF versions of the datasets. All
Loc-I datasets conform to the Loc-I Ontology12 which imports the GeoSPARQL13
and DCAT14 ontologies. Alongside the Cache is a traditional spatial DB -
PostGIS15 used to perform fast geometry intersections.
12http://linked.data.gov.au/def/loci
13http://www.opengis.net/doc/IS/geosparql/1.0
14https://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
15https://postgis.net/</p>
      </sec>
      <sec id="sec-3-2">
        <title>Loc-I for Disaster Recovery</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data Validity</title>
      <p>
        This project’s datasets are Loc-I datasets and its Knowledge Graph (KG) is
similar to the Loc-I cache, however conformance to Loc-I is not easily testable:
Loc-I provided no data validators. This project implements formal profiles , which
are specifications defining dependencies and validation tooling. This project uses
profiles for requirements for data publication by API, dataset suitability for the
KG and for use and display by clients. It uses “profiles” as defined using The
Profiles Vocabulary [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and all listed in the project’s LD catalogue16.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Discrete Global Grid System (DGGS) use</title>
      <p>
        Loc-I aspired to use DGGS geometries17 but never really did: DGGS data was
produced but not used in direct support of Loc-I use. In 2020, Geoscience
Australia evaluated DGGS integration of data relating to bushfires in Australia
vegeration, population and bush fire extent information and from this
established some new DGGS integration methods. Also, SURROUND Australia
implemented DGGS data delivery via Linked Data APIs for the OGC,s Testbed 16
interoperability experiment [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Using the GA DGGS methods and SURROUND
tooling, this project has produced DGGS versions of all Feature instances’
geometries, has stored them alongside traditional geometries within the KG (a
triplestore) and has implemented GeoSPARQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] functions within the
triplestore SPARQL extension libraries (Apache Jena’s ARC18) that work with DGGS
geometry representations. These functions are used to obviate the need for
LocI’s Geometry Data Store and thus reduce infrastructure complexity.
      </p>
      <p>An important enabling factor in this use of DGGS with GeoSPARQL is the
inclusion of DGGS geometry serializations within version 1.1 of GeoSPARQL
which was motivated by Loc-I project requirements. This version is currently
under review and is expected to be published around the time of this paper’s
publication. Working documents are avalable19.
3.3</p>
    </sec>
    <sec id="sec-6">
      <title>Observations data use</title>
      <p>
        Loc-I anticipated observational data - human/industry statistics or
naturalworld observation data - would be used with its spatial data. This project
implements two such datasets: 1. population data taken from the 2016
Australian census; 2. “exposure” data per statistical area - this is data about the
16https://w3id.org/l4dr/explorer
17See the defining Abstract Specification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for indications of potential benefits of
DGGS and the more recent OGC Engineering Report [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for current thinking about
how to integrate DGGS use within traditional spatial infrastructure.
      </p>
      <p>18https://jena.apache.org/documentation/query/extension.html
19See https://opengeospatial.github.io/ogc-geosparql/ for the GeoSPARQL
“Standards Working Groups“ ’s working documents
vulnerability of physical infrastructure to natural hazards. This project has
developed an “Observations Dataset” profile (see the project catalogue 16) that
defines the characteristics of a Loc-I-comatable observations dataset using the
profiling mechanisms mentioned above.
3.4</p>
    </sec>
    <sec id="sec-7">
      <title>Knowledge Graph (KG) importing</title>
      <p>This project’s KG includes Loc-I datasets as well as new Loc-I-conformant
datasets. To avoid duplication, it intends to import Loc-I content unchanged
however, currently, the additional requirements this project has (see below) mean
that Loc-I datasets hmust be extended and thus reuse of Loc-I datasets or the
data cache (see Figure 2) is not possible. For now, a “Loc-I 2 KG” has een
created and imported into this project’s KG (see Figure 3) but this will be removed
when Loc-I implements this project’s elements.
3.5</p>
    </sec>
    <sec id="sec-8">
      <title>Data and metadata management</title>
      <p>Operational management of data was out of scope for Loc-I as a technical
demonstrator only so, its data was mostly un-governed in the project: individual
researchers loaded datasets into the Loc-I Cache ad-hoc. This project has a strong
requirement to demonstrate on-going operations and will continuously absorb
new and updated data, so it has a strong requrement to manage content to
assure currency and sustainable growth. For this reason, it has implemented a
sophisticated application layer on top of its KG, the SURROUND Ontology
Platform20, used to track, select for use, update and overall govern datasets. This
application supports provenance absorbtion (for datasets that contain
provenance) and generation (for data processing contained within the platform) as
well as managed item (dataset, ontology, vocabulary) status tracking for over 20
classes of seamntic asset. These classes include TBox items such as ontologies
and vocabularies, as well as ABox datasets but also specialised forms of these
asset classes, such as Linksets (datasets that crosswalk others) and Profiles that
are TBox objects that use and contrain, but don’t defin other TBox assets. The
platform can also runs workflows for repetative data absorbtion (pulling
nonRDF data from source locations, converting it to RDF and presenting it) and
also run other calculations on top of data, such as FAIR Score21 rating.
3.6</p>
    </sec>
    <sec id="sec-9">
      <title>Clients</title>
      <p>Loc-I implemented some generic and specialised clients for its data holdings22.
This project can reuse some, such as IDer Down23 - used to download IDs for all
20https://surroundaustralia.com/sop
21Scored for datasets rated against the FAIR PRinciples: https://www.go-fair.org/
fair-principles/
22See https://loci.cat/#datasets-and-applications for a list
23https://excelerator.loci.cat/iderdown
Feature type instances - due to the same data structures being used. However,
this project is also charged with demonstrating integration of Linked Data with
traditional spatial web data delivery. For this reason, information flows between a
traditional web globe24 and a Linked Data browser25 with panels of per-Feature
information accessible within the globe supplied by KG queries. Previous spatial
web data display only presents simple type key / value pairs of information
perFeature but this system presents graph data which can be followed. Also, the
management requirement, described above, has necessitated an adminstrative
interface to this project’s KG, that Loc-I never had.
3.7</p>
    </sec>
    <sec id="sec-10">
      <title>More standardized Dataset APIs</title>
      <p>
        Loc-I implemented LD APIs for spatial datasets that followed standard LD
protocols and the data model negotiation protocols of Content Negotiation by Profile
(ConnegP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Content within these APIs was all discoverable since top-level
elements - dataset declarations - linked to their content registers and registers linked
to individual Features, however no strict or common spatial API structure was
used. This project implements APIs as both LD APIs and also as OGC API:
Features [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] APIs26. This is possible due to ConnegP implementations being
able to select data models and formats per API endpoint using general
mecahnics (HTTP headers or URI query strings) that can be constrained to meet
OGC API: Features requirements. ConnegP APIs are also used to deliver the
observations datasets but these are not conformant with OGC API:Features since
they don’t contains any geometry information - they link to spatial datasets’
Features for their data’s spatial information.
4
      </p>
      <sec id="sec-10-1">
        <title>Conclusions</title>
        <p>This project is both reuser of Loc-I systems and an extender of them. Core
benefits of spatial Linked Data are preserved - harmonised use of distributed
datasets, human- and machine-readable web content - and Semantic Web
methods - inferencing, ontology modelling however new spatial data indexing is
applied (Discrete Global Grid System use), total project data holdings management
is enabled, data validators created and new clients are delivered. The resulting
system is a proto-operational system as opposed to a proof-of-concept.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Future Work</title>
      <p>This project will operate in test mode until July, 2021, the likely, full
production, when the system will be highly dependent on uninterrupted data supply
24TerriaJS (https://terria.io/) at https://w3id.org/l4dr/globe
25https://w3id.org/l4dr/explorer. Allows for browsing of content in project’s KG, as
opposed to LD dereferencing of resources accomplished by dataset APIs.</p>
      <p>26See an example of such an API online at https://w3id.org/l4dr/provinces or browse
the project catalogue, as linked to in previous footnotes
guarentee currency. To ensure this, inter-agency data supply chain management
- stated in the Loc-I project but not completed - must be finalised. For data to
be delivered by owner agencies as Linked Data, assistance will need to be given
to those agencies to be able to make Semantic Web and Linked Data versions of
their data for delivery via APIs. This will require strong motivation from central
government data users to ensure these requirements are met as implementation
is a socio-technical challenge, not purely a technical one.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atkinson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Car</surname>
            ,
            <given-names>N.J.:</given-names>
          </string-name>
          <article-title>The Profiles Vocabulary</article-title>
          . W3C Working Group Note, World Wide Web Consortium (May
          <year>2020</year>
          ), https://www.w3.org/TR/dx-prof/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Car</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Box</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sommer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Location Index: A Semantic Web Spatial Data Infrastructure</article-title>
          . In: Hitzler,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , Fernan´dez,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Zaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            ,
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Haller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hammar</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.)
          <article-title>The Semantic Web</article-title>
          . pp.
          <fpage>543</fpage>
          -
          <lpage>557</lpage>
          . Lecture Notes in Computer Science, Springer International Publishing (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -21348-0 35
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Clemens</given-names>
            <surname>Portele</surname>
          </string-name>
          ,
          <string-name>
            <surname>Panagiotis (Peter) A. Vretanos</surname>
          </string-name>
          , Charles Heazel:
          <source>OGC API - Features - Part 1: Core. OGC Implementation Standard</source>
          <volume>17</volume>
          -069r3, Open Geospatial Consortium (Oct
          <year>2019</year>
          ), http://www.opengis.net/doc/IS/ogcapi-features-
          <issue>1</issue>
          /1.0
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gibb</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cochrane</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Purss</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : OGC Testbed-16
          <source>: DGGS and DGGS API Engineering Report. Engineering Report OGC 20-039r2</source>
          , Open Geospatial Consortium (
          <year>Jan 2021</year>
          ), https://docs.ogc.org/per/20-
          <fpage>039r2</fpage>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Perry</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herring</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>OGC GeoSPARQL - A Geographic Query</surname>
          </string-name>
          <article-title>Language for RDF Data</article-title>
          .
          <article-title>OGC Implementation Standard</article-title>
          , Open Geospatial Consortium (
          <year>2012</year>
          ), http://www.opengis.net/doc/IS/geosparql/1.0
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Purss</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <source>Topic 21: Discrete Global Grid Systems Abstract Specification. Abstract Specification 15-104r5</source>
          , Open Geospatial Consortium (
          <year>Aug 2017</year>
          ), http:// www.opengis.net/doc/AS/dggs/1.0
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>