<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Quality Metrics to Measure the Standards Conformance of Geospatial Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Beyza Yaman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Thompson</string-name>
          <email>kevin.thompson@osi.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rob Brennan</string-name>
          <email>rob.brennang@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ordnance Survey Ireland</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes three new Geospatial Linked Data (GLD) quality metrics that help evaluate conformance to standards. Standards conformance is a key quality criteria, for example for FAIR data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets that showed a wide variation in standards conformance. This is the rst set of Linked Data quality metrics developed speci cally for GLD.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        to having geospatial data printed as cartographic products or data sales and
distribution at data.geohive.ie [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, the United Nations Global Geospatial
Information Management (UN-GGIM) framework3 highlights the importance of
standards conformance of data for quality. Thus, there is a need for
monitoring and reporting on the standards conformance of OSi GLD. It is required to
quantitatively measure, and to provide continuous upward reporting to the Irish
government, European Commission and UN; enable more sophisticated data
quality monitoring within the organisation and provide feedback to managers
within OSi for engineering teams.
      </p>
      <p>This paper investigates the research question: To what extent can quality
metrics derived from geospatial data standards be used to assess the standards
conformance and quality of GLD? Thus, a set of applicable standards were
reviewed for GLD from the International Standards Organization (ISO), Open
Geospatial Consortium (OGC) and World Wide Web Consortium (W3C) to
identify a set of testable conformance points for each standard. Then, in
consultation with the OSi quality team, a new set of metrics were prioritised and
developed to evaluate each conformance point. The rst three metrics which
were implemented in the Luzzu open source quality assessment framework are
presented here. A set of existing open GLD datasets were then evaluated for
standards conformance quality by performing metric computation4.</p>
      <p>
        This work was realized as a part of LinkedDataOps project5 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
implementing an e2e quality assessment framework based on the Luzzu framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
project de nes roles and responsibilities to ensure liability for data quality with
policies and procedures. It supports the process by means of the proposed
standards while maintaining the performance for good decision-making. Continuous
validation of quality and standards conformance will be performed by data
experts and engineers in the OSi data production pipeline using an e2e standards
reporting tool developed in this project.
      </p>
      <p>The contributions of this paper are three new GLD quality metrics for
standards conformance and a study of their use on open GLD datasets. The rest of
this paper is structured as a description of the new metrics and a discussion of
an evaluation of open GLD datasets using them.
2</p>
      <p>Three New GLD Metrics for Standards Conformance
A year-long series of internal workshops with stakeholders across OSi identi ed
and evaluated a set of relevant standards for GLD. The standards were in two
main groups: Geospatial datasets and metadata (ISO/TC 211 Geographic
information/Geomatics committee ISO 19000 series) and Geospatial Linked Data
(OGC's GeoSPARQL and W3C Best Practises for Spatial Data and Web of
Data recommendations) totaling 15 standards including data, metadata and
3 http://ggim.un.org/meetings/GGIM-committee/8th-Session/documents/</p>
      <p>
        Standards_Guide_2018.pdf
4 https://github.com/beyzayaman/standard-quality-metrics
5 linkeddataops.adaptcentre.ie
schema de nitions. Suitable conformance points were identi ed, e.g., OGC's
GeoSPARQL de nes 30 requirements for GLD and there are 14 best practices
identi ed for GLD by W3C [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Each of these could become a target conformance
point and metrics developed to measure it.
      </p>
      <p>
        In consultation with OSi, the most essential conformance points were
identi ed. Metric ease of implementation was also used to guide the choices to
enable rapid prototyping. Three new metrics were implemented (M1 from OGC
GeoSPARQL6 and ISO 19125-17, M2 from W3C Best Practices [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and M3 from
ISO 191578). Together these metrics enable the assessment of a dataset in terms
of standards conformance including metadata, spatial reference systems and
geometry classes. Note that each metric described below must compute their rate
(Equation 1) over the whole dataset to give values in the range of [
        <xref ref-type="bibr" rid="ref1">0-1</xref>
        ] as is best
practice for quality metrics and this is not repeated in each de nition, instead the
base calculation method (e) is the set of instances conforming the conditions for
each metric which must then be inserted into Equation 1 for each metric((Pre x
for geo: http://www.opengis.net/ont/geosparql#):
      </p>
      <p>e
M = X
i=1</p>
      <p>e(i)
size(e)
(1)</p>
      <p>M1, Geometry Extension Object Consistency Check: This metric
addresses the requirement \All RDFS Literals of type geo:wktLiteral shall obey
a speci ed syntax and ISO 19125-1.". According to the OGC GeoSPARQL
requirements, WKT serialization regulates geometry types with ISO 19125
Simple Features, and GML serialization regulates them with ISO 19107 Spatial
Schema. Metric Computation: If the entity in the dataset is a member of
class geo:Geometry then this metric checks the rate of employed geo:asWKT
or geo:asGML properties in the dataset.</p>
      <p>
        e := fej8e 2 class(geo : Geometry) hasW KT (e) _ hasGM L(e)g
M2, Links to Spatial Things Check: This metric addresses the
requirement \Use appropriate relation types to link Spatial Things where source and
target of the hyperlink are Spatial Things". Thus, W3C suggests using
appropriate relation types to link Spatial Things which is any object with spatial extent,
(i.e. size, shape, or position) such as people, places [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Metric Computation: The
metric detects the rate of entities having links to external spatial things in other
datasets and internal spatial links within the dataset.
      </p>
      <p>e := fej8e 2 class(geo : Geometry) hasST (e))g</p>
      <p>M3, Consistent Polygon and Multipolygon Usage Check:This
metric addresses the requirement \Polygons and multipolygons shall form a closed
circuit". Polygons are topologically closed structures, thus, the starting point
and end point of a polygon should be equal to provide a consistent geometric</p>
      <sec id="sec-1-1">
        <title>6 https://www.ogc.org/standards/geosparql 7 https://www.iso.org/standard/40114.html 8 https://www.iso.org/standard/32575.html</title>
        <p>Four open GLD datasets were assessed using the new metrics implemented in
Luzzu (Table 1). This section discusses the performance of each dataset w.r.t.
the given metrics. The datasets were: OSi's Irish national mapping Linked Open
Data9. Ordnance Survey UK's United Kingdom mapping Linked Open Data10.
LinkedGeoData11 is provided by the University of Leipzig by converting
OpenStreetMap data to Linked Data. Greece LD 12 is provided by the University of
Athens as part of the TELEIOS project. The metric values shown in Table 1
are the mean value of the metric for all GLD resources in the dataset. Speci c
discussion on each metric's results is provided below.</p>
        <p>Geometry Extension Object Consistency Check (M1): OS UK and
Greek LGD does not conform to the standards due to the use of non-standard,
specialized ontologies in the dataset (e.g.,strdf:WKT (Pre x for strdf: http://
strdf.di.uoa.gr/ontology#) instead of geo:wktLiterals). OSi and
LinkedGeoData conform to the standards for every geospatial entity in the dataset.
Links to Spatial Things (M2): It was seen that while LinkedGeoData dataset
has links to the GADM dataset13, the OSi has links to Logainm dataset14. OS
UK provides two di erent granularities in county and Europe within the dataset.
This shows that every LinkedGeoData instance has a connection with another
spatial thing and the dataset has the highest interoperability between datasets.
Polygon and Multipolygon Check (M3): In particular, it was seen that
OSi, OS UK, Greek GLD have polygons and multipolygons included in their
dataset, whereas entities are only represented by points in LinkedGeoData, and
waterlinestring by some Greek GLD (note that full OSi was computed with
sampling so it is estimated and denoted with *). This means that the data in
LinkedGeoData and Greek GLD were not represented (or partially represented)
as boundaries. This is due to having di erent spatial dimensions rather than</p>
      </sec>
      <sec id="sec-1-2">
        <title>9 http://data.geohive.ie/downloadAndQuery.html</title>
        <p>10 https://data.ordnancesurvey.co.uk/datasets/boundary-line
11 http://linkedgeodata.org/Datasets?show_files=0
12 http://linkedopendata.gr/dataset
13 http://gadm.geovocab.org/
14 https://www.logainm.ie/en/inf/proj-machines
geospatial polygons data whereas it is very important for GIS applications e.g.
Polygons are essential to building things, as otherwise it is not known where
anything begins and ends.
4</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>Three new GLD quality metrics have been de ned based on analysis of GLD
standards. They have been implemented in the Luzzu quality assessment
framework and used to assess four open GLD datasets. This has shown that i) it is
fruitful to use standards conformance points as a basis for new quality metrics
and ii) that despite the availability of best practice advice and standards for
GLD, there is still a very low level of conformance to GLD standards in the
GLD cloud. The ability to make this standards conformance assessment in an
objective, quantitative, automated way is an advance in the state of the art.
The metrics have limitations due to their simplicity and the exibility of Linked
Data and hence the heterogeneity of real datasets. However, this approach is still
useful for publishers like OSi who wish their data to conform to the requirements
and best practices published by standardization organisations.</p>
      <p>Acknowledgements. This research received funding from the
European Union's Horizon 2020 research and innovation programme under Marie
Sklodowska-Curie grant agreement No. 801522, by Science Foundation
Ireland and co-funded by the European Regional Development Fund through the
ADAPT Centre for Digital Content Technology [grant number 13/RC/2106] and
Ordnance Survey Ireland.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J.</given-names>
            <surname>Debattista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          .
          <article-title>Luzzu|a methodology and framework for linked data quality assessment</article-title>
          .
          <source>Journal of Data and Information Quality (JDIQ)</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>32</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Clinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>McNerney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nautiyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lavin</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. O'Sullivan.</surname>
          </string-name>
          <article-title>Ireland's authoritative geospatial linked data</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>66</volume>
          {
          <fpage>74</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Karam</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Melchiori</surname>
          </string-name>
          .
          <article-title>Improving geo-spatial linked data with the wisdom of the crowds</article-title>
          .
          <source>In Proceedings of the joint EDBT/ICDT 2013 workshops. ACM</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Athanasiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garc</surname>
          </string-name>
          a-Rojas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hladky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J. Le</given-names>
            <surname>Grange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Sherif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          , et al.
          <article-title>Managing geospatial linked data in the geoknow project</article-title>
          .,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Mostafavi</surname>
            , G. Edwards, and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jeansoulin</surname>
          </string-name>
          .
          <article-title>An ontology-based method for quality assessment of spatial data bases</article-title>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Tandy</surname>
          </string-name>
          , L. van den Brink, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Barnaghi</surname>
          </string-name>
          .
          <article-title>Spatial data on the web best practices</article-title>
          . W3C Working Group Note,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>M. D. Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          <string-name>
            <surname>Aalbersberg</surname>
            , G. Appleton,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>J.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Boiten</surname>
          </string-name>
          , L. B.
          <string-name>
            <surname>da Silva</surname>
            <given-names>Santos</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourne</surname>
          </string-name>
          , et al.
          <article-title>The fair guiding principles for scienti c data management and stewardship</article-title>
          .
          <source>Scienti c data, 3</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Yaman</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Brennan</surname>
          </string-name>
          .
          <article-title>Linkeddataops:linked data operations based on quality process cycle</article-title>
          .
          <source>In EKAW (Posters &amp; Demos)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>