=Paper= {{Paper |id=Vol-2977/paper6 |storemode=property |title=LDR: A 2nd-gen, National GeoLD System (short paper) |pdfUrl=https://ceur-ws.org/Vol-2977/paper6.pdf |volume=Vol-2977 |authors=Nicholas John Car,Irina Bastrakova |dblpUrl=https://dblp.org/rec/conf/esws/Car21 }} ==LDR: A 2nd-gen, National GeoLD System (short paper)== https://ceur-ws.org/Vol-2977/paper6.pdf
        LDR: A 2nd-gen, National GeoLD System

                         Nicholas J. Car1[0000−0002−8742−7730] and
                         Irina Bastrakova2[0000−0002−4643−7289] *
                     1
                         SURROUND Australia Pty Ltd., Australia &
                          Australian National University, Australia
                          nicholas.car@surroundaustralia.com
                                   2
                                     Geoscience Australia
                               irina.bastrakova@ga.gov.au




         Abstract. The 2020 Australian bushfire crisis and the global COVID-
         19 pandemic are examples of complex crisis events where the use of data
         from multiple sources was sought. In 2018 – 2020, Australia built sev-
         eral Linked Data “spines” - themed collections of interoperable reference
         data that simplify data integration from multiple sources in particular
         domains. The spatial data spine, Loc-I (Location Index), consists of 7
         nationally-significant spatial datasets, such as the Australian Statistical
         Geographies System. Loc-I delivered Linked Data forms of its datasets
         and provided infrastructure for their use as a single system.

         Here described is Loc-I for Disaster Recovery, a scenario deployment of
         Loc-I. We discuss original Loc-I design, this project’s key requirements
         and other differences, such as integrating with traditional spatial data
         systems, and how this system is pushing the development of spatial and
         Semantic Web standards, such as DGGS and GeoSPARQL.

                                        ·       ·                ·        · Spatial
                            ·          ·
         Keywords: Location Index Loc-I GeoSPARQL DGGS
         Data on the Web Australia national data infrastructure



1       Introduction

1.1     Motivation

Australia suffers large floods and bushfires, so Australian government is com-
mitting substantial resources over multiple years to new cross-agency data shar-
ing initi-atives3 that will “connect and leverage the Commonwealth’s extensive
climate and natural disaster risk information to further prepare for and build
resilience to natural disasters”.
    *
    Copyright    ©
                 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
  3
    “Australia commits to climate resilience”, https://minister.awe.gov.au/ley/
media-releases/australia-commits-climate-resilience
2          Car N.J. et al.

1.2      Demonstrator Projects

Several of demonstrator projects for an anticipated new data sharing regime
were conducted in early 2021. Traditional methods of data aggregation are be-
ing tested, such as data pooling in shared facilities, standardising web services
and cross-cataloging datasets, but forward-looking methods are too. In particu-
lar, Semantic Web (SW) and Linked Data (LD) technologies4 are being used to
integrate different, but relatively similar, datasets that are published in a dis-
tributed manner and Discrete Global Grid System (DGGS) spatial data meth-
ods are being used to integrate spatial data from multiple sources. In 2019-2020,
Geoscience Australia tested DGGS data integration for information relevant to
bushfires which includes burned/burning areas, vegetation cover and demograph-
ics.
     This paper describes the SW/LD and DGGS approaches to publish dis-
tributed and harmonised data being implemented by a Geoscience Australia
(GA) project that we will refer to as this project. The project extends the ap-
proach taken by the Location Index project described in the next section.


2        Loc-I: The Location Index

In 2018 - 2020, Australian spatial data and research agencies (CSIRO & Geo-
science Australia foring for the Australian Bureau of Statistics) implemented a
“national and authoritative, also federated, index for Australian spatial data us-
ing Semantic Web technologies [2]”. This system, known as the Location Index
(Loc-I) [2], aims to “better geospatially integrate and analyze data across gov-
ernment portfolios and information domains”. The main use case addressed by
Loc-I’s is to greatly reduce the time taken by government workers in data anal-
ysis using spatial information by providing pre-integrated, authoritative, spatial
datasets that can be used in online, open data scenarios, within secure data
integration environments and across the two. The project deals with data from
multiple domains, see Figure 1. Some of the interesting aspects of Loc-I’s design
include:

    ∗ federated publication of datasets via standard Linked Data APIs
    ∗ use of VoID Linkset 5 instances to crosswalk datasets
       − these are independently-selectable for use meaning that a specific cross-
          walk, of potentially many, may be selected for use
    ∗ use of a Geometry Data Service 6 for spatial integration
     4
     By “Linked Data”, as opposed to “linked data” or “data linkage” etc., we mean
systems and data that implement a number of Semantic Web technologies (RDF, OWL,
SKOS, SPARQL, etc.), primarily defined by a series of World Wide Web Consortium
(W3C) standards. The W3C’s definition of Semantic Web is that it is a “Web of Data”,
an evolved Internet able to be queried by machines which can draw inferences from it.
   5
     https://www.w3.org/TR/void/
   6
     The service is online at https://gds.loci.cat/
                                                  Loc-I for Disaster Recovery        3




Fig. 1. A project brochure image, from [2], of Loc-I with respect to Australian govern-
ment Environment, Society and Economy data


    − this service extends common use of using GeoSPARQL [5] by storing
       Geometry instances separately from the Feature instances they are the
       geometries for. This allows the geometry data to be managed in a Post-
       GIS database7 , not a triplestore, as usually used for GeoSPARQL data.
 ∗ several different clients for different uses
    − such as Excelerator 8 , used to upload data according to one spatial ref-
       erence system and download it reapportioned according to another
    Loc-I’s datasets are from many domains including environmental (the Aus-
tralian Hydrological Geospatial Fabric 9 , a collection of surface hydrology fea-
tures), human/census (the Australian Statistical Geography Standard spatial ar-
eas) 10 , and cartographic/administrative (the National Composite Gazetteer of
Australia)11 .
   7
      https://postgis.net/
   8
      https://loci.cat/excelerator.html
    9
      Original, non-RDF dataset: http://www.bom.gov.au/water/geofabric/, and the
online LD version implemented by Loc-I: http://linked.data.gov.au/dataset/geofabric
  10
      Non-RDF       dataset:    https://geo.abs.gov.au/arcgis/services/ASGS2016/MB/
MapServer/WFSServer, LD version: http://linked.data.gov.au/dataset/asgs2016
  11
      LD version: https://linked.data.gov.au/dataset/placenames
4        Car N.J. et al.

    Loc-I architecture is shown in Figure 2 for architectural details. It shows
the Loc-I Data Cache, which is a multi-graph triplestore, obtains its data by
“pulling” RDF datasets through APIs that both interpret non-RDF data for on-
line delivery and are also able to create static RDF versions of the datasets. All
Loc-I datasets conform to the Loc-I Ontology12 which imports the GeoSPARQL13
and DCAT14 ontologies. Alongside the Cache is a traditional spatial DB - Post-
GIS15 used to perform fast geometry intersections.




    Fig. 2. An informal architecture diagram of Loc-I’s Linked Data infrastructure.




    12
       http://linked.data.gov.au/def/loci
    13
       http://www.opengis.net/doc/IS/geosparql/1.0
    14
       https://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
    15
       https://postgis.net/
                                                    Loc-I for Disaster Recovery        5

3        Loc-I for Disaster Recovery
3.1      Data Validity
This project’s datasets are Loc-I datasets and its Knowledge Graph (KG) is
similar to the Loc-I cache, however conformance to Loc-I is not easily testable:
Loc-I provided no data validators. This project implements formal profiles, which
are specifications defining dependencies and validation tooling. This project uses
profiles for requirements for data publication by API, dataset suitability for the
KG and for use and display by clients. It uses “profiles” as defined using The
Profiles Vocabulary [1] and all listed in the project’s LD catalogue16 .

3.2      Discrete Global Grid System (DGGS) use
Loc-I aspired to use DGGS geometries17 but never really did: DGGS data was
produced but not used in direct support of Loc-I use. In 2020, Geoscience Aus-
tralia evaluated DGGS integration of data relating to bushfires in Australia -
vegeration, population and bush fire extent information and from this estab-
lished some new DGGS integration methods. Also, SURROUND Australia im-
plemented DGGS data delivery via Linked Data APIs for the OGC,s Testbed 16
interoperability experiment [4]. Using the GA DGGS methods and SURROUND
tooling, this project has produced DGGS versions of all Feature instances’ ge-
ometries, has stored them alongside traditional geometries within the KG (a
triplestore) and has implemented GeoSPARQL [5] functions within the triple-
store SPARQL extension libraries (Apache Jena’s ARC18 ) that work with DGGS
geometry representations. These functions are used to obviate the need for Loc-
I’s Geometry Data Store and thus reduce infrastructure complexity.
    An important enabling factor in this use of DGGS with GeoSPARQL is the
inclusion of DGGS geometry serializations within version 1.1 of GeoSPARQL
which was motivated by Loc-I project requirements. This version is currently
under review and is expected to be published around the time of this paper’s
publication. Working documents are avalable19 .

3.3      Observations data use
Loc-I anticipated observational data - human/industry statistics or natural-
world observation data - would be used with its spatial data. This project
implements two such datasets: 1. population data taken from the 2016 Aus-
tralian census; 2. “exposure” data per statistical area - this is data about the
    16
     https://w3id.org/l4dr/explorer
    17
     See the defining Abstract Specification [6] for indications of potential benefits of
DGGS and the more recent OGC Engineering Report [4] for current thinking about
how to integrate DGGS use within traditional spatial infrastructure.
  18
     https://jena.apache.org/documentation/query/extension.html
  19
     See https://opengeospatial.github.io/ogc-geosparql/ for the GeoSPARQL “Stan-
dards Working Groups“ ’s working documents
6         Car N.J. et al.

vulnerability of physical infrastructure to natural hazards. This project has de-
veloped an “Observations Dataset” profile (see the project catalogue16 ) that
defines the characteristics of a Loc-I-comatable observations dataset using the
profiling mechanisms mentioned above.


3.4      Knowledge Graph (KG) importing

This project’s KG includes Loc-I datasets as well as new Loc-I-conformant
datasets. To avoid duplication, it intends to import Loc-I content unchanged
however, currently, the additional requirements this project has (see below) mean
that Loc-I datasets hmust be extended and thus reuse of Loc-I datasets or the
data cache (see Figure 2) is not possible. For now, a “Loc-I 2 KG” has een cre-
ated and imported into this project’s KG (see Figure 3) but this will be removed
when Loc-I implements this project’s elements.


3.5      Data and metadata management

Operational management of data was out of scope for Loc-I as a technical demon-
strator only so, its data was mostly un-governed in the project: individual re-
searchers loaded datasets into the Loc-I Cache ad-hoc. This project has a strong
requirement to demonstrate on-going operations and will continuously absorb
new and updated data, so it has a strong requrement to manage content to
assure currency and sustainable growth. For this reason, it has implemented a
sophisticated application layer on top of its KG, the SURROUND Ontology Plat-
form 20 , used to track, select for use, update and overall govern datasets. This
application supports provenance absorbtion (for datasets that contain prove-
nance) and generation (for data processing contained within the platform) as
well as managed item (dataset, ontology, vocabulary) status tracking for over 20
classes of seamntic asset. These classes include TBox items such as ontologies
and vocabularies, as well as ABox datasets but also specialised forms of these
asset classes, such as Linksets (datasets that crosswalk others) and Profiles that
are TBox objects that use and contrain, but don’t defin other TBox assets. The
platform can also runs workflows for repetative data absorbtion (pulling non-
RDF data from source locations, converting it to RDF and presenting it) and
also run other calculations on top of data, such as FAIR Score 21 rating.


3.6      Clients

Loc-I implemented some generic and specialised clients for its data holdings22 .
This project can reuse some, such as IDer Down 23 - used to download IDs for all
    20
      https://surroundaustralia.com/sop
    21
      Scored for datasets rated against the FAIR PRinciples: https://www.go-fair.org/
fair-principles/
   22
      See https://loci.cat/#datasets-and-applications for a list
   23
      https://excelerator.loci.cat/iderdown
                                                  Loc-I for Disaster Recovery        7

Feature type instances - due to the same data structures being used. However,
this project is also charged with demonstrating integration of Linked Data with
traditional spatial web data delivery. For this reason, information flows between a
traditional web globe24 and a Linked Data browser25 with panels of per-Feature
information accessible within the globe supplied by KG queries. Previous spatial
web data display only presents simple type key / value pairs of information per-
Feature but this system presents graph data which can be followed. Also, the
management requirement, described above, has necessitated an adminstrative
interface to this project’s KG, that Loc-I never had.

3.7      More standardized Dataset APIs
Loc-I implemented LD APIs for spatial datasets that followed standard LD pro-
tocols and the data model negotiation protocols of Content Negotiation by Profile
(ConnegP) [1]. Content within these APIs was all discoverable since top-level ele-
ments - dataset declarations - linked to their content registers and registers linked
to individual Features, however no strict or common spatial API structure was
used. This project implements APIs as both LD APIs and also as OGC API:
Features [3] APIs26 . This is possible due to ConnegP implementations being
able to select data models and formats per API endpoint using general mec-
ahnics (HTTP headers or URI query strings) that can be constrained to meet
OGC API: Features requirements. ConnegP APIs are also used to deliver the ob-
servations datasets but these are not conformant with OGC API:Features since
they don’t contains any geometry information - they link to spatial datasets’
Features for their data’s spatial information.


4        Conclusions
This project is both reuser of Loc-I systems and an extender of them. Core
benefits of spatial Linked Data are preserved - harmonised use of distributed
datasets, human- and machine-readable web content - and Semantic Web meth-
ods - inferencing, ontology modelling however new spatial data indexing is ap-
plied (Discrete Global Grid System use), total project data holdings management
is enabled, data validators created and new clients are delivered. The resulting
system is a proto-operational system as opposed to a proof-of-concept.

4.1      Future Work
This project will operate in test mode until July, 2021, the likely, full produc-
tion, when the system will be highly dependent on uninterrupted data supply
    24
     TerriaJS (https://terria.io/) at https://w3id.org/l4dr/globe
    25
     https://w3id.org/l4dr/explorer. Allows for browsing of content in project’s KG, as
opposed to LD dereferencing of resources accomplished by dataset APIs.
  26
     See an example of such an API online at https://w3id.org/l4dr/provinces or browse
the project catalogue, as linked to in previous footnotes
8       Car N.J. et al.




Fig. 3. An informal architecture diagram of the LDR project’s Linked Data infrastruc-
ture


guarentee currency. To ensure this, inter-agency data supply chain management
- stated in the Loc-I project but not completed - must be finalised. For data to
be delivered by owner agencies as Linked Data, assistance will need to be given
to those agencies to be able to make Semantic Web and Linked Data versions of
their data for delivery via APIs. This will require strong motivation from central
government data users to ensure these requirements are met as implementation
is a socio-technical challenge, not purely a technical one.


References
1. Atkinson, R., Car, N.J.: The Profiles Vocabulary. W3C Working Group Note, World
   Wide Web Consortium (May 2020), https://www.w3.org/TR/dx-prof/
2. Car, N.J., Box, P.J., Sommer, A.: The Location Index: A Semantic Web Spa-
   tial Data Infrastructure. In: Hitzler, P., Fernández, M., Janowicz, K., Zaveri, A.,
   Gray, A.J., Lopez, V., Haller, A., Hammar, K. (eds.) The Semantic Web. pp. 543–
   557. Lecture Notes in Computer Science, Springer International Publishing (2019).
   https://doi.org/10.1007/978-3-030-21348-0 35
3. Clemens Portele, Panagiotis (Peter) A. Vretanos, Charles Heazel: OGC API - Fea-
   tures - Part 1: Core. OGC Implementation Standard 17-069r3, Open Geospatial
   Consortium (Oct 2019), http://www.opengis.net/doc/IS/ogcapi-features-1/1.0
4. Gibb, R., Cochrane, B., Purss, M.: OGC Testbed-16: DGGS and DGGS API Engi-
   neering Report. Engineering Report OGC 20-039r2, Open Geospatial Consortium
   (Jan 2021), https://docs.ogc.org/per/20-039r2.html
5. Perry, M., Herring, J.: OGC GeoSPARQL - A Geographic Query Language for
   RDF Data. OGC Implementation Standard, Open Geospatial Consortium (2012),
   http://www.opengis.net/doc/IS/geosparql/1.0
6. Purss, M.: Topic 21: Discrete Global Grid Systems Abstract Specification. Ab-
   stract Specification 15-104r5, Open Geospatial Consortium (Aug 2017), http://
   www.opengis.net/doc/AS/dggs/1.0