=Paper= {{Paper |id=Vol-2751/short-6 |storemode=property |title=LinkedDataOps:Linked Data Operations based on Quality Process Cycle |pdfUrl=https://ceur-ws.org/Vol-2751/short6.pdf |volume=Vol-2751 |authors=Beyza Yaman,Rob Brennan |dblpUrl=https://dblp.org/rec/conf/ekaw/YamanB20 }} ==LinkedDataOps:Linked Data Operations based on Quality Process Cycle== https://ceur-ws.org/Vol-2751/short6.pdf
    LinkedDataOps:Linked Data Operations based
             on Quality Process Cycle?

                          Beyza Yaman and Rob Brennan

              ADAPT Centre, Dublin City University, Dublin, Ireland
                 {beyza.yaman,rob.brennan}@adaptcentre.ie



      Abstract. Quality assessment is extremely relevant to measure the util-
      ity of data and it is especially critical for geospatial data due to its impor-
      tance in daily life, e.g. navigation, and self-piloted vehicles. This paper
      describes a new end-to-end framework for quality-oriented continuous
      development and improvement of data based on standards compliance.
      The implemented methods build upon the open-source Luzzu framework
      with an open-source standards-agnostic dashboard to visualize and ana-
      lyze quality metric observations in a data production pipeline.


1    Introduction
Linked Data has a life-cycle and data quality issues in Linked Open Data (LOD)
are the result of a combination of data and process-related factors in this life-
cycle. This dynamic process requires continuous improvement, in contrast to
the static releases that typify most LOD datasets. In particular, geospatial data
suffers from high demands on quality and if not met, these can cause major
problems in real life, such as the navigation problems leading to the Irish coast
guard helicopter crash in 20171 .
    While there is an extensive number of studies on the quality of traditional
data, the topic of Geospatial Linked Data (GLD) has received little attention.
GLD problems include i) a lack of quality metrics for GLD, e.g., geo-political
boundaries datasets are required to define the extent of town or county bound-
aries (spatial things) as polygons. However, there are no well-known LOD quality
metrics to check the conformance of the polygon shapes (e.g. if they’re closed
or not). ii) a lack of the end to end (e2e) data quality cycle considering data
transformation, objective assessment metrics or root causes of problems [3–5].
Whereas, many quality aspects can be achieved by assuring standards confor-
mance, e.g., Findable in FAIR principles implies the availability of appropriate
catalog metadata like W3C DCAT.
    Thus, the research question we are tackling is “To what extent can we imple-
ment effective methods and tools for quality-oriented continuous development
and improvement of Linked Data deployments taking into account the e2e data
?
  Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
1
  Irish coast guard helicopter crash, t.ly/COas
2        B. Yaman and R. Brennan




     Fig. 1. OSi Geospatial Data Publishing Pipeline with Quality Control Points



lifecycle”. The objective of the Elite-S Marie Sklodowska-Curie Cofund Action
LinkedDataOps project2 is to meet dynamic quality needs by providing new
continuous data quality tools and methodologies.
    This project will be realized in the Ordnance Survey Ireland (OSi) data
publication use-case. OSi’s national geospatial digital infrastructure (Fig. 1)
encompasses surveying and data capture, image processing, translation to the
Prime2 object-oriented spatial model of over 50 million spatial objects tracked
in time and provenance, conversion to the multi-resolution data source (MRDS)
database for printing as cartographic products or data sales and distribution at
data.geohive.ie [2]. Thus, managing data quality throughout the data pipeline
and lifecycle is key to OSi. Moreover, the United Nations Global Geospatial
Information Management (UN-GGIM) framework highlights the importance of
standards conformance of data for quality. Thus there is a need for monitoring
and reporting on the standards conformance of OSi GLD. For example, to pro-
vide continuous upward reporting to the Irish government, European Commis-
sion and UN. Inspired by the DevOps methodology, LinkedDataOps will enable
sophisticated data quality monitoring within the organization.
    LinkedDataOps implements an e2e quality assessment framework based on
the Luzzu framework [1]. The project defines roles and responsibilities to ensure
liability for data quality with policies and procedures. It supports the process by
means of the proposed standards while maintaining the performance for good
decision-making. Continuous validation of quality and standards conformance
will be performed by data experts and engineers in the OSi data production
pipeline using an e2e standards reporting tool developed in this project (Fig. 3).
    Contributions of this project are as follows: i) Defining new standards com-
pliant quality and FAIR metrics according to existing data governance standards
ii) Implementing a novel tool based on the existing state of the art approaches
on data quality which will be integrated to OSi Linked Data cycle to improve
the quality of data iii) Publishing data and quality metadata reusing standard
vocabularies iv) Implementing an open-source dashboard for e2e data quality
management based on a unified quality graph v) Deploying a system based on
the case study in OSi.
2
    linkeddataops.adaptcentre.ie
    LinkedDataOps:Linked Data Operations based on Quality Process Cycle           3

2   LinkedDataOps Approach
The overall scope of this work is to improve the quality and service outcomes of
an organization while conforming to the standards and support good decision-
making. The following approach is employed in order to achieve this goal: i)
Quality assessment is performed for the transformation phase from relational
data to Linked Data automatically. Quality constraints are defined for R2RML
mappings for high quality transformation of data. The tool is integrated with the
Luzzu framework (Fig.2, Step 1). ii) Implementation of the geospatial data qual-
ity metrics and FAIR assessment metrics is performed. Aligned with the OSi’s
standard compliance objectives, relevant metrics are defined for the geospatial
data at hand and then they are integrated with Luzzu framework to measure the
quality, standards conformance and FAIRness of the dataset. Existing quality
metadata definition of the Luzzu are extended by those metrics in both dataset
and triple levels via standard vocabularies (Fig.2, Step 2).




           Fig. 2. LinkedDataOps Workflow (arrows indicate data input)

    iii) Quality problems are detected while consuming the data, and errors
should be fixed to publish the OSi dataset with high characteristics. For this
reason, feedback on the data is gathered from the experts using an automatic
standards compliance reporting portal. The given feedback is used to improve the
data. Moreover, logs are analyzed for incompatibilities between software versions
and data versions to verify the efficiency of the tool. Validation of the tool is
realized by the integrity checking of the input and output of the tool (Fig.2, Step
3). iv) Continuous monitoring of the data for inconsistencies is performed, thus,
automation of the steps is realized for data profiling. Different quality assessment
results are saved as a W3C data cube with different versions of the assessment
and quality metadata along with their assessment date and time (Fig.2, Step 4).
The detected errors are fixed via using an extension for the Luzzu tool.


3   Results and Conclusions
The purpose of a dashboard is to provide meaningful insights to the user by
depicting significant correlations for the given data. A proof of concept Linked-
DataOps user dashboard was implemented (Fig. 3) to visualize and analyze
quality metric observations. The interactive dashboard allows users to visualize
the i) data quality along the e2e pipeline ii) data quality metrics and their scores
4       B. Yaman and R. Brennan




                    Fig. 3. OSi End-to-End Dashboard Reporting


for specific datasets (associated with stages or systems within the pipeline) iii)
historical quality changes over time (illustrated in Fig. 3) iv) pass/fail quality
status of a specific dataset for a given quality threshold, for example w.r.t the
different standardization approaches.
    Conclusions This paper presents the LinkedDataOps end-to-end geospa-
tial data standards compliance framework for quality-oriented continuous data
development and improvement. Although it is still a proof of concept, however,
there is already an open source e2e dashboard available. The research is being
carried out with OSi where the framework is planned to be deployed.
    Acknowledgement This research received funding from European Union’s
Horizon 2020 research and innovation programme under Marie Sklodowska-Curie
grant agreement No. 801522, by Science Foundation Ireland and co-funded by the
European Regional Development Fund through the ADAPT Centre for Digital
Content Technology [grant number 13/RC/2106] and Ordnance Survey Ireland.

References
1. J. Debattista, S. Auer, and C. Lange. Luzzu—a methodology and framework for
   linked data quality assessment. Journal of Data and Information Quality (JDIQ),
   8(1):1–32, 2016.
2. C. Debruyne, A. Meehan, É. Clinton, L. McNerney, A. Nautiyal, P. Lavin, and
   D. O’Sullivan. Ireland’s authoritative geospatial linked data. In International Se-
   mantic Web Conference, pages 66–74, 2017.
3. R. Karam and M. Melchiori. Improving geo-spatial linked data with the wisdom of
   the crowds. In Proceedings of the joint EDBT/ICDT 2013 workshops, pages 68–74.
   ACM, 2013.
4. J. Lehmann, S. Athanasiou, A. Both, A. Garcı́a-Rojas, G. Giannopoulos, D. Hladky,
   J. J. Le Grange, A.-C. N. Ngomo, M. A. Sherif, C. Stadler, et al. Managing geospatial
   linked data in the geoknow project., 2015.
5. M.-A. Mostafavi, G. Edwards, and R. Jeansoulin. An ontology-based method for
   quality assessment of spatial data bases. 2004.
               LinkedDataOps:Linked Data Operations based on Quality
                                  Process Cycle
                                                           Beyza Yaman and Rob Brennan
                                                      ADAPT Centre, Dublin City University, Ireland
                                                       beyza.yaman,rob.brennan@adaptcentre.ie




Problem Statement                                                              LinkedDataOps Approach
• Data available from poorly controlled sources                                 • LinkedDataOps [2, 3] : Inspired by the DevOps methodology
• Dynamic sources causing inconsistencies                                       • Quality assessment for the transformation phase from rela-
• Geospatial data suffers from high demands on quality                            tional data to Linked Data
• e.g. transportation, navigation, GIS guidance, and self-piloted               • Implementation of the geospatial data quality metrics and
  vehicles                                                                        FAIR assessment metrics
                                                                                • Detecting the root causes of problems by analysing the errors
                                                                                  occurring in the pipeline
Proposal                                                                        • Monitoring the data for inconsistencies continuously
• An end-to-end (e2e) quality-oriented continuous development
  framework
                                                                               LinkedDataOps Workflow
• Improvement of data based on standards compliance
• Implemented methods build upon the Luzzu framework
• An open-source dashboard to
→ visualize quality metric observations
→ analyze quality scores in a data production pipeline


Contributions
• Implementing a novel tool to be integrated to OSi Linked Data
  cycle                                                                        Data Governance Dashboard
• Defining new standards compliant quality and FAIR metrics
                                                                               The interactive dashboard allows users to visualize the
• Publishing data and quality metadata reusing standard vo-
                                                                                • data quality along the e2e pipeline
  cabularies
                                                                                • data quality metrics and their scores for specific datasets
• Implementing a data governance dashboard for e2e data qual-
  ity management                                                                • historical quality changes over time
• Deploying a system based on the case study in OSi                             • pass/fail quality status of a dataset for a given quality thresh-
                                                                                  old


OSi Data Publishing Pipeline                                                   OSi End to End Dashboard Reporting




      References
       [1] Debattista, Jeremy et al. , Luzzu—a methodology and framework for linked data quality assessment, Journal of Data and Information Quality
           8.1 (2016): 1-32
       [2] This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the ELITE-S COFUND
           Marie Skłodowska-Curie grant agreement No. 801522.
       [3] Project Website: linkeddataops.adaptcentre.ie