<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alex Randles</string-name>
          <email>alex.randles@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Declan O'Sullivan</string-name>
          <email>declan.osullivan@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Keeney</string-name>
          <email>john.keeney@est.tech</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liam Fallon</string-name>
          <email>liam.fallon@est.tech</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ericsson Software Technology</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Athlone</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Co. Westmeath</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ireland</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre for Digital Content, Trinity College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>1 The linked data generation process is a complex process which involves multiple stakeholders in the definition of mapping artefacts representation. The creation of these mapping artefacts is error prone, and the quality of the artefacts should be assessed prior to the generation of the linked data. Producing high quality mappings will result in high quality linked data and provide higher confidence to linked data consumers. Furthermore, the source data of these mappings should be regularly monitored to detect data changes which could impact the quality of the resulting linked data dataset. A process designed to offer fresh and high-quality data will benefit both data consumers and producers. Applying the process to a real-world use case demonstrates the feasibility and effectiveness of the process. In this paper we describe a process designed to improve quality within the linked data generation process. Furthermore, we describe the application of the process to a real-world cloud monitoring use case CEUR Workshop Proceedings (CEUR-WS.org)</p>
      </abstract>
      <kwd-group>
        <kwd>Cloud native monitoring</kwd>
        <kwd>Linked data</kwd>
        <kwd>Data quality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Cloud monitoring data typically contains time series information with the metric information stored
with a corresponding timestamp. However, commonly used cloud monitoring platforms such as
Prometheus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] do not as yet offer the ability to easily avail of machine-readable data which can limit
the usability and utility of such data for analysis. Converting the monitoring data into an analysis-ready
machine-readable format (such as the W3C linked data representation3) will enable additional
knowledge discovery.
      </p>
      <p>
        Transforming the time series data collected from a system such as Prometheus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] into linked data
format requires the definition of the mapping artefacts which define the rules for transformation.
Creating these mapping artefacts can be a complex time-consuming task, which is frequently error
prone [2]. It is argued that dealing with the quality issues for these mapping artefacts themselves will
ensure that data quality issues do not propagate into the resulting dataset [3]. The MQV framework
[4,5] which is designed to assess and refine the quality of these mappings can be used for the purpose
of producing high quality mappings. Details of the core assessment and refinement aspects of the MQV
framework is being presented in a paper in the research and innovation track of the SEMANTiCS 2022
conference. The framework has recently been extended to include a change detection component. The
component is designed to ensure the freshness of linked data generated through the mapping artefacts
by detecting and analyzing source data changes. The framework enables the impact of these detected
changes to be assessed, and necessary action can be taken to ensure the freshness of the associated
linked data for consumers.
      </p>
      <p>2022 Copyright for this paper by its authors.</p>
      <p>Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
3 W3C linked data representation at https://www.w3.org/standards/semanticweb/data</p>
      <p>In this paper we provide a discussion of the evolution of the MQV framework and the application
to a real-world cloud monitoring use case. The paper is structured as follows: Section 2 describes the
background &amp; motivation of this work; Section 3 discusses the application of the framework to the
cloud monitoring use case; Section 4 describes related work and Section 5 concludes the paper and
describes future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background &amp; Motivation</title>
      <p>
        Prometheus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a widely used open-source system designed for cloud native monitoring. Prometheus
collects and stores its metrics as time series data. The main component of the Prometheus ecosystem is
the Prometheus server which scrapes time series data from target servers. The Prometheus API allows
queries to be executed which return metric information. However, the API cannot return data in a graph
format which limits the analysis, linkability and expressiveness of the data.
      </p>
      <p>In our research we propose to transform the Prometheus time series data into a more analysis and
linkable friendly format, namely linked data. The benefits of transformation include easier data
processing and interlinking with semantically similar data. Linking the data with other monitoring data
could be used to drive an intent-based system, which can define and accomplish required goals
automatically. Within the state-of-the-art limited work could be found which tackles the challenge of
transforming Prometheus time series data into a linkable graph format. It is hoped the approach will
provide greater insight and flexibility when performing cloud native monitoring.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Use case</title>
      <p>
        A framework named the Prometheus RDF Generator has been designed to transform statistical data
from Prometheus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] into linked data format. Figure 1 shows a screenshot of the current implementation
of the framework.
The process starts with the users entering the IP address of the Prometheus server and selecting a
metric from a predefined list, which is shown on the left side of the figure. Thereafter, an API request
is made to the server which will return a response, which is shown on the right side of the figure. The
response is stored in a relational database which allows the data to be converted into linked data format
using R2RML [6] mappings. The mapping uses an ontology currently under development which has
been designed specifically to model the time series metric data. The result is expressive
machinereadable information which can be easily queried and interlinked with other related information.
      </p>
      <p>The MQV framework [4,5] is designed to assess and refine the quality of mappings and its main
functionality is described in a paper in the research and innovation track of the SEMANTiCS 2022
conference. The paper describes the framework design and a user evaluation which was conducted with
a sample size of 58 participants. Improvements have been made to the framework based on the
evaluation such as improved interface aesthetics. Furthermore, in the time since the submission of the
SEMANTiCS paper, the framework has been extended to include a change detection component and
applied as part of an internship with Ericsson Software Technology.</p>
      <p>The component is designed to ensure the freshness of the linked data by detecting and analyzing the
changes which occur in the source data. The component regularly monitors the source data to detect
new changes and uplifts them into linked data, represented using the Change Detection Ontology
(CDO)4. The ontology has been specifically designed by us to represent source data changes. Moreover,
the component allows users to define a notification policy, which defines when users will be notified
of changes. The policy is defined by users entering thresholds into the user interface. These thresholds
define how many changes or types of changes should be detected before a notification is sent. The user
interface input is uplifted into linked data and represented using Rei policy ontology [7]. Mappings
associated with the source data can be uploaded to the component, which can be analyzed to detect the
impact of these changes on them and the resulting linked data. For example, if data has been added to
the source data of the mapping, the dataset should be regenerated to capture the new information and
ensure the freshness of the data. Figure 2 shows a screenshot of the newer version of the framework.</p>
      <p>The framework allows users to select two modes of operation. These modes include i) Mapping
Assessment and ii) Change Detection. These two modes have been used during the design of the
Prometheus RDF Generator system and their usage is outlined below.</p>
      <p>Mapping Assessment. The mapping assessment component was used to assess the quality of
mappings5 which were used to uplift the monitoring information. The local ontology functionality
which was previously described in [4,5] was utilized as the ontology used to represent the information
is still under development. The mappings were uploaded to the framework and a validation report6 was
generated, which contains quality assessment and refinement information. The report generated shows
that no quality issues have been detected within the mappings. Therefore, no quality refinement
information is contained in the report.</p>
      <p>
        Change Detection. The component is currently being used to consistently analyze the monitoring
data which is being collected from the Prometheus server [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The data is stored within a relational
4 Change Detection Ontology specification at https://alex-randles.github.io/Change-Detection-Ontology/
5 R2RML mappings used to uplift Prometheus data at
https://github.com/alex-randles/Semantics-2022-DemoPaper/blob/main/prometheus_mapping.ttl
6 Validation report generated by framework at
https://github.com/alex-randles/Semantics-2022-DemoPaper/blob/main/validation_report.ttl
database by the Prometheus RDF Generator which the change detection component can monitor in
order to detect new changes. Other source data formats can be exposed through a URL. Furthermore,
the mappings used to uplift the data into linked data format have been uploaded, which allows the
framework to assess the impact of changes on the mappings and resulting linked data dataset. Moreover,
a notification policy has been defined to ensure we are notified of the changes when necessary. An
example of a change detected when applied to the use case relates to the addition of a node exporter to
the Prometheus server. A node exporter is software designed to collect metric information from a target
server through exporting server and OS level information [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The addition of the node exporter to the
Prometheus environment was detected by the change detection component and represented in graph
format. Figure 3 shows the components involved in the detection.
The data from the Prometheus server is fed into the Prometheus RDF Generator, where it is
processed and stored in a relational database before being uplifted into linked data format. The change
detection component is connected to the database and regularly queries it to detect new changes. These
changes are transformed into linked data format. The extract of the graph shows where the system has
detected the new target server (:insertChange1) and the time it was detected
(:detectionTime1).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Related work</title>
      <p>The state of the art in mapping quality frameworks and dataset change frameworks has been reviewed.</p>
      <p>Mapping Quality Frameworks. Resglass [8] is a rule driven approach which targets RML [9]
mappings to detect inconsistencies using rules used to generate linked data datasets. The approach [3]
provides a test driven approach which extends an existing RDF test case based architecture to assess
and refine the quality of RML mappings. The approach [2] extends an existing quality assessment
framework to target R2RML [6] mappings using metrics commonly used to assess the quality of the
resulting dataset. Furthermore, the framework is extensible allowing additional metrics to be added.</p>
      <p>Dataset Change Frameworks. DSNotify [10] is an approach designed to address change detection
and link maintenance in linked data datasets. The approach includes a monitor component which
regularly analyses the dataset for resource changes. sparqlPuSH [11] is a framework which enables
proactive notification of updates within linked datasets. Users can subscribe to a subset of the dataset
using a SPARQL query and will be notified when content within the subset changes. DELTA-LD [12]
is designed to address broken links and synchronization of a linked dataset by identifying the delta
between two versions of dataset. The approach detects changes by generating features from the
properties and objects within the dataset which can be used for comparison.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future work &amp; Conclusion</title>
      <p>Future work includes the completion of the development and deployment of the Prometheus RDF
Generator. Furthermore, the ontology used to model the time series data will be evaluated using
methods such as ontology competency questions [13]. Therefore, ensuring the ontology conforms to
the design requirements. A second user evaluation will be completed on the MQV framework which
will evaluate the additional change detection component functionality. The evaluation will involve users
interacting with the system using predefined tasks and standardized methods such as the Post-Study
System Usability Questionnaire [14]. Furthermore, the ontology used within the change detection
component to represent change information will be evaluated in a similar manner to the aforementioned
methods.</p>
      <p>Deploying the framework into a real-world use case is beneficial to demonstrate the usability and
effectiveness of the design. The framework ensures that the mappings used within the Prometheus RDF
Generator design are of the good quality, thus ensuring high quality data to the consumers. Furthermore,
the change detection component allows changes within the source data to be detected and analyzed
allowing the impact of changes on the mappings and resulting dataset to be assessed, therefore,
promoting fresh linked data. It is hoped the combination of these two components within the use case
will aid the data transformation process while demonstrating the feasibility of the design for others.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>This research was conducted with the financial support of the SFI AI Centre for Research Training
under Grant Agreement No. 18/CRT/6223 at the ADAPT SFI Research Centre at Trinity College
Dublin. The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland
through the SFI Research Centres Programme and is co-funded under the European Regional
Development Fund (ERDF) through Grant # 13/RC/2106. Application of the MQV framework in the
Prometheus use case has been undertaken as part of an internship in Ericsson Software Technology.
7. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]</source>
          [14]
          <string-name>
            <surname>Prometheus</surname>
          </string-name>
          [Internet]. Available from: https://prometheus.io/ Junior AC,
          <string-name>
            <surname>Debattista</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan D.</surname>
          </string-name>
          <article-title>Assessing the Quality of R2RML Mappings</article-title>
          . In: Joint Proceedings of the International Workshop On Semantics For Transport and
          <article-title>on Approaches for Making Data Interoperable co-located with 15th Semantics Conference</article-title>
          , Karlsruhe, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ;
          <year>2019</year>
          . (CEUR Workshop Proceedings; vol.
          <volume>2447</volume>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Dimou</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenberg</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            <given-names>E</given-names>
          </string-name>
          , et al.
          <article-title>Assessing and refining mappings to RDF to improve dataset quality</article-title>
          .
          <source>In: Lecture Notes in Computer Science</source>
          . Springer Verlag;
          <year>2015</year>
          . p.
          <fpage>133</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Randles</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>Assessing quality of R2RML mappings for OSi's Linked Open Data portal</article-title>
          .
          <source>4th Int Work Geospatial Linked Data ESWC</source>
          <year>2021</year>
          .
          <year>2021</year>
          ;
          <string-name>
            <surname>Randles</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan D. Evaluating Quality</surname>
          </string-name>
          <article-title>Improvement techniques within the Linked Data Generation Process</article-title>
          .
          <source>In: 18th International Conference on Semantics Systems</source>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Das</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          . W3C Recomm [Internet].
          <year>2012</year>
          ; Available from: http://www.w3.org/TR/r2rml/ Kagal L, others. Rei:
          <article-title>A policy language for the me-centric project</article-title>
          .
          <year>2002</year>
          ;
          <string-name>
            <surname>Heyvaert</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meester</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>Rule-driven inconsistency resolution for knowledge graph generation rules</article-title>
          .
          <source>Semant Web</source>
          .
          <year>2019</year>
          ;
          <volume>10</volume>
          (
          <issue>6</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Dimou</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander Sande</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Walle R. RML</surname>
          </string-name>
          <article-title>: a generic language for integrated RDF mappings of heterogeneous data</article-title>
          .
          <source>In: Ldow</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Popitsch</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haslhofer B. DSNotify -</surname>
          </string-name>
          <article-title>A solution for event detection and link maintenance in dynamic datasets</article-title>
          .
          <source>J Web Semant. 2011 Sep;9</source>
          (
          <issue>3</issue>
          ):
          <fpage>266</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Passant</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            <given-names>PN</given-names>
          </string-name>
          . sparqlPuSH:
          <article-title>Proactive notification of data updates in RDF stores using PubSubHubbub [Internet]</article-title>
          . Available from: http://www.ldodds.com/blog/2010/04/rdf-datasetnotifications/ Singh A,
          <string-name>
            <surname>Brennan</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan D.</surname>
          </string-name>
          DELTA-LD:
          <article-title>A Change Detection Approach for Linked Datasets</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Bezerra</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            <given-names>F</given-names>
          </string-name>
          , Santana F.
          <article-title>Evaluating ontologies with Competency Questions</article-title>
          .
          <source>In: Proceedings - 2013 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops</source>
          , WI-IATW
          <year>2013</year>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Lewis</given-names>
            <surname>JR. Psychometric</surname>
          </string-name>
          <article-title>Evaluation of the PSSUQ Using Data from Five Years of Usability Studies</article-title>
          .
          <source>Int J Hum Comput Interact</source>
          .
          <year>2002</year>
          Sep;
          <volume>14</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>463</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>