=Paper= {{Paper |id=Vol-2977/paper7 |storemode=property |title=Assessing quality of R2RML mappings for OSi's Linked Open Data portal (short paper) |pdfUrl=https://ceur-ws.org/Vol-2977/paper7.pdf |volume=Vol-2977 |authors=Alex Randles,Declan O'Sullivan |dblpUrl=https://dblp.org/rec/conf/esws/RandlesO21 }} ==Assessing quality of R2RML mappings for OSi's Linked Open Data portal (short paper)== https://ceur-ws.org/Vol-2977/paper7.pdf
Assessing Quality of R2RML Mappings for OSi’s Linked
                   Open Data Portal

       Alex Randles1*[0000-0001-6231-3801] and Declan O’Sullivan1[0000-0003-1090-3548]
                 1 ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland

            {alex.randles,declan.osullivan}@adaptcentre.ie



       Abstract. As the number of geospatial Linked Data datasets being published
       grows, so does the need to ensure their quality and trustworthiness. The quality
       assessment of these datasets is most often assessed after the dataset has been
       published, however, due to the authoritative nature of geospatial data, we pro-
       pose bringing quality assessment earlier into the Linked Data generation pro-
       cess itself. In order to create these datasets, artifacts are required to be defined
       called ‘uplift mappings’. These uplift mappings use the R2RML specification
       language to define the relationship between the non-RDF geospatial data and its
       Linked Data RDF representation. This paper describes a mapping quality
       framework which will assess and refine the quality of the R2RML uplift map-
       pings using a number of quality metrics. We demonstrate the use of our frame-
       work in the publication pipeline for Ordnance Survey Ireland’s (OSi) Linked
       Open Data portal for geospatial data, http://data.geohive.ie. The use of the
       R2RML quality framework early in the publication pipeline provides significant
       confidence in the quality of the resulting linked data geospatial data published
       through the portal.

       Keywords: Geospatial data, Linked data, Data Quality, Uplift mappings.


1      Introduction

Increasingly geospatial data is being exposed using W3C’s Linked Data1 approach,
which allows this data to be easily consumed in a machine-readable manner using
standard web technologies, thus making the interlinking of multiple data sources
much easier. However, due to the expectation that geospatial data provided by Na-
tional Mapping Agencies are authoritative, a high level of quality control is required
throughout the creation process.
   An example of one such project involves a collaboration between the Science
Foundation Ireland ADAPT Research Centre2 and Ordinance Survey Ireland (OSi)3.


* “Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons
   License Attribution 4.0 International (CC BY 4.0).”
1 https://www.w3.org/standards/semanticweb/data
2 ADAPT homepage at http://www.adaptcentre.ie
3 OSi homepage at https://www.osi.ie/
2


The resulting Linked Open Data portal available at data.geohive.ie (see Fig. 1) in-
volves taking selected geospatial data stored using a relational database model called
Prime2 and making it available as Linked Open Data [1]. Prime2 stores information
on over 45,000,000 spatial objects representing key geospatial features in Ireland.
Converting the relational data stored in Prime2 into the RDF format needed for
Linked Open Data, required the creation of the OSi Spatial Ontology4 , as a suitable
ontology was not found, to accurately represent their geospatial data. R2RML5 uplift
mappings are created by domain experts to specify how the geospatial data in rela-
tional format is to be transformed into RDF according to the OSi spatial ontology.




                      Fig. 1. data.geohive.ie Technical Architecture

   In this paper, we describe the quality improvement which can be offered by our
R2RML quality framework in the production of the R2RML uplift mappings. As-
sessing and refining the quality of the uplift mappings used to create the geospatial
Linked Open Data will prevent errors within the uplift mappings causing significant
number of quality issues within the resulting RDF dataset [1]. The R2RML quality
framework allows users to produce higher quality mappings and datasets, while also
facilitating the maintenance and reuse of those mappings [2].
   The remainder of this paper is organized as follows: Section 2 describes a general
overview of our R2RML quality framework. Section 3 demonstrates a walkthrough of
our framework executed on an example from OSi’s set of geospatial R2RML map-


4 OSi Spatial Ontology at http://ontologies.geohive.ie/osi
5 R2RML specification at https://www.w3.org/TR/r2rml/
                                                                                    3


pings. Section 4 discusses related work. Section 5 concludes our paper and discusses
future work.


2       Mapping Quality Framework

In this section we briefly describe the mapping quality framework under development
which assesses and refines the quality of R2RML mappings used to generate RDF
datasets. The rationale for choosing R2RML as our target language for mapping
quality assessment and refinement is that it is the W3C recommendation for mapping
relational databases to RDF datasets and has wide uptake.
   We previously designed a Mapping quality framework [2] using SHACL con-
straints language6, which can be used to validate all data in RDF format. Within this
previous framework, a machine-readable report on R2RML mappings is generated
using SHACL’s validation report vocabulary. Furthermore, SPARQL queries are then
used to update and refine the mappings, since they are defined in RDF format. How-
ever, SHACL is not designed specifically for validating mappings, and we concluded
that a new framework design which is domain specific would allow users to capture
more detailed provenance and metadata relating to the quality information related to
the mappings.
   Our updated framework design is split into two main stages: mapping assessment
and mapping refinement. The framework is designed using a web-based Python ap-
plication which can execute SPARQL queries on the mapping using the RDFLib 7
library, allowing the framework to query and update the mappings. Furthermore, the
machine-readable reports generated by our framework are defined in a domain specif-
ic vocabulary called the Mapping Quality Vocabulary (MQV)8 [3] which we devel-
oped to enable quality metadata and provenance information relating to the assess-
ment and refinement of mappings to be captured and published.
   Our framework design involves the users uploading an R2RML mapping and an
optional local ontology. A local ontology refers to an ontology which is not available
remotely. After these have been uploaded by the users, each remote vocabulary used
within the mapping is fetched and stored in a local cache to speed up execution time.
These vocabularies are queried by the framework in order to generate vocabulary
specific quality metrics. Furthermore, the quality metrics are designed such that the
framework can provide suggested semi-automatic refinements to rectify identified
violations to quality metrics9. These refinements can be selected by the users and
executed on the mapping in the framework in order to produce a refined quality-


6
    SHACL specification at https://www.w3.org/TR/shacl/
7 RDFLib documentation at https://rdflib.readthedocs.io/en/stable/
8Mapping Quality Vocabulary    Specification    available  at    https://alex-
randles.github.io/MQV/
9        Quality        metrics             and           refinements        at
  https://docs.google.com/spreadsheets/d/1165CWRjE3gDxyLy3qL9BB
  ukHu7oR_zXVznGvxubUxbU/edit?usp=sharing
4


improved R2RML mapping. Moreover, the framework uses MQV to capture metada-
ta and provenance relating to the quality assessment and refinement of the mappings.


3      Demonstration Walkthrough

   In this section we present a running example to demonstrate the quality assessment
and refinement of a sample R2RML mapping10. The mapping has been extracted from
the R2RML mappings used to generate OSi’s linked data for data.geohive.ie, in this
case related to geometry of a townland. For illustrative purposes this R2RML map-
ping has been edited to include an undefined property (geo:asWTK), rather than the
correct defined property (geo:asWKT). If this minor spelling mistake was not spotted
before execution, it could easily result in each triple generated from the townland
relational table using this R2RML mapping to be incorrectly represented in the result-
ing linked data dataset. Fig. 2 shows a screenshot of our frameworks user interface
after it has assessed the quality of the sample mapping. The framework highlights the
predicate and object that violate one of the quality metrics for R2RML mappings in
red under the “Location” heading on right hand side of the Figure. This enables a user
to quickly identify the issue in the R2RML mapping.




Fig. 2. Screenshot of R2RML Mapping Quality Assessment Framework: Violation reporting &
Refinement selection


10 Sample R2RML Mapping at https://github.com/alex-randles/GeoLD2021-

    Paper-Examples/blob/main/sample_mapping.ttl
                                                                                     5


A machine-readable quality report11 shown in Listing 1 is generated using the Map-
ping Quality Vocabulary (MQV).              This report describes the violation
(ex:violation-0) which was shown in human-readable format in Fig. 2. This
quality report details important information relating to the violation. Such as quality
metric (mqv:metricD2) which detected the violation, its location within the map-
ping (<#TownlandTriplesMap>) and a result message which describes the vio-
lation in a human-readable format ("Usage of undefined Property.").



 ex:violation-0      a mqv:MappingViolation ;
     mqv:hasLocation   "predicateObjectMap1" ;
     mqv:hasValue      geo:asWTK ;
     mqv:inTripleMap   <#TownlandTriplesMap> ;
     mqv:isDescribedBy mqv:metricD2 ;
     mqv:resultMessage "Usage of undefined Property." ;
     mqv:wasRefinedBy ex:refinement-0 .

                        Listing 1: Extract of quality report generated

   After quality violations have been detected within an R2RML mapping, they
should be refined to prevent violations within the mapping replicating within the
Linked Data dataset generated [1]. Refining the violation detected within this map-
ping which relates to an ‘undefined property’ can be accomplished either semi-
automatically or manually. Semi-automatic refinement involves the framework sug-
gesting several properties similar to the undefined property and allowing the users the
option to input a new property into the framework. Manual refinement involves the
users editing the mapping manually using a text editor or similar. If the user chooses
to semi-automatically refine the mapping using our framework, a refined mapping
and validation report12 will be output. The validation report details the refinement
(ex:refinement-0) associated with the violation detected within the mapping.
Furthermore, the refinement is associated with the SPARQL query
(mqv:hasRefinementQuery) which created the refined mapping.


4      Related work

EvaMap [4] is a framework which generates a global quality score for each mapping
and provides feedback to the users, however, this feedback is not machine-readable. A
test driven approach [5] which extends an existing framework called RDFUnit13 in
order to execute SPARQL queries on the mappings. The quality report generated can

11  Quality report at https://github.com/alex-randles/GeoLD2021-Paper-
    Examples/blob/main/quality_report.ttl
12 Validation report at https://github.com/alex-randles/GeoLD2021-Paper-

    Examples/blob/main/validation_report.ttl
13 http://rdfunit.aksw.org/
6


be represented using the RDFUnit ontology which has not been designed for the pur-
pose of capturing mapping provenance and metadata. Resglass [6] is a framework
which uses a rule-driven methodology to rank mapping rules based on a score. Fur-
thermore, no machine-readable quality report is generated and the rules are inspected
by experts based on the scores. Another approach [1] extends an existing quality as-
sessment tool called Luzzu14. This approach doesn’t refine the violations detected
within the mappings.


5      Conclusion and Future Work

Exposing geospatial data in RDF format requires artifacts to be defined called map-
pings, which define the relationship between the data sources. Creating suitable map-
pings requires the knowledge of domain experts [7]. However, this creation process is
error prone and can result in poor quality geospatial Linked data being published.
Furthermore, the authoritative nature of this data requires high quality for consumers.
   Introducing mapping quality assessment and refinement into the geospatial Linked
data publication process will result in higher quality and more trustworthy data being
published and consumed by third parties. This paper describes and demonstrates a
mapping quality framework which implements several quality metrics and refine-
ments which focus on common quality issues found within mappings.
   Future work will include the implementation of further metrics and refinements
which will allow the framework more expressive capabilities in improving the quality
of mappings. Furthermore, an extensive system and user evaluation of the framework,
as well as improvements based on evaluation results.
Acknowledgements. This research was conducted with the financial support of the
SFI AI Centre for Research Training under Grant Agreement No. 18/CRT/6223 at the
ADAPT SFI Research Centre at Trinity College Dublin. The ADAPT SFI Centre for
Digital Media Technology is funded by Science Foundation Ireland through the SFI
Research Centres Programme and is co-funded under the European Regional Devel-
opment Fund (ERDF) through Grant #13/RC/2106.


References

1.      Junior, A.C., Debattista, J., O’Sullivan, D.: Assessing the Quality of R2RML
        Mappings. In: Kaffee, L.-A., Endris, K.M., Vidal, M.-E., Comerio, M., Sadeghi, M.,
        Chaves-Fraga, D., and Colpaert, P. (eds.) Joint Proceedings of the 1st International
        Workshop On Semantics ForTransport and the 1st International Workshop on
        Approaches for MakingData Interoperable co-located with 15th Semantics Conference
        (SEMANTiCS2019), Karlsruhe, Germany, September 9, 2019. CEUR-WS.org (2019).
2.      Randles, A., Crotti Junior, A., O’Sullivan, D.: A Framework for Assessing and
        Refining the Quality of R2RML mappings. In: Proceedings of the 22nd International
        Conference on Information Integration and Web-Based Applications & Services.

14 https://github.com/Luzzu/Framework/tree/V5
                                                                                          7


     Association for Computing Machinery, New York, NY, USA (2020).
     https://doi.org/10.1145/3428757.3429089.
3.   Randles, A., Crotti Junior, A., O’Sullivan, D.: Towards a vocabulary for mapping
     quality assessment. To Appear Proc. 15th Int. Work. Ontol. Matching 19th Int.
     Semant. Web Conf. (ISWC), 2020. (2020).
4.   Moreau, B., Serrano-Alvarado, P.: Assessing the Quality of RDF Mappings with
     EvaMap. In: 17th Extended Semantic Web Conference (ESWC2020) (2020).
5.   Dimou, A., Kontokostas, D., Freudenberg, M., Verborgh, R., Lehmann, J., Mannens,
     E., Hellmann, S., Van de Walle, R.: Assessing and refining mappings to RDF to
     improve dataset quality. In: Lecture Notes in Computer Science (including subseries
     Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 133–
     149. Springer Verlag (2015). https://doi.org/10.1007/978-3-319-25010-6_8.
6.   Heyvaert, P., De Meester, B., Dimou, A., Verborgh, R.: Rule-driven inconsistency
     resolution for knowledge graph generation rules. Semant. Web. 10, (2019).
     https://doi.org/10.3233/SW-190358.
7.   Mcglinn, D.K., Brennan, R., Debruyne, C., Meehan, A., McNerney, L., Clinton, E.,
     Kelly, P., O’Sullivan, D.: Publishing authoritative geospatial data to support
     interlinking of building information models. Autom. Constr. 124, 103534 (2021).
     https://doi.org/10.1016/j.autcon.2020.103534.