<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring GDPR Compliance Over Provenance Graphs Using SHACL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harshvardhan J. Pandit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Declan O'Sullivan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dave Lewis</string-name>
          <email>dave.lewisg@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, Trinity College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic web technologies provide an open and adaptable framework for compliance regarding the General Data Protection Regulation (GDPR). Our previous work in this regard demonstrates the use of SPARQL for querying provenance of consent and personal data lifecycles for compliance. We extend this work through our model for evaluation of GDPR compliance using SHACL to validate the correctness and completeness of information. The model describes the creation of a compliance graph consisting of information required to document and demonstrate compliance linked to speci c articles and obligations within the GDPR using the GDPRtEXT vocabulary.</p>
      </abstract>
      <kwd-group>
        <kwd>GDPR</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Provenance is one of the important categories of information regarding
compliance with the General Data Protection Regulation (GDPR) due to
obligations surrounding how consent and personal data are collected, stored, used,
and shared. Semantic web technologies have been proven to provide an open
and extensible framework for representation and querying of information
related to such obligations [
        <xref ref-type="bibr" rid="ref1 ref2 ref4 ref5">1,2,4,5</xref>
        ]. Our previous work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in this regard
demonstrated the modelling and querying of provenance information related to GDPR
compliance obligations using semantic web technologies. It provided a
proof-ofconcept demonstration1 for the querying of compliance-related information using
SPARQL queries based on the GDPR readiness checklist published by Ireland's
Data Protection Commissioner's o ce2.
      </p>
      <p>
        We extend this work through our model for evaluation of GDPR compliance
using SHACL3 to validate the correctness and completeness of information. The
model describes the creation of a compliance graph consisting of information
required to document and demonstrate compliance linked to speci c articles
and obligations within the GDPR using the GDPRtEXT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] vocabulary.
1 https://w3id.org/GDPRep/checklist-demo
2 http://gdprandyou.ie/
3 https://www.w3.org/TR/shacl/
      </p>
      <p>Pandit et al.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The SPECIAL consent, transparency and compliance framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] de nes RDF
vocabularies representing data subject's consent as usage policies and data
processing and sharing events as provenance logs. It performs GDPR compliance
checking using OWL reasoning and uses a modular architecture that
demonstrates the feasibility of semantic web based approaches for GDPR compliance.
Agarwal Et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] extend ODRL to represent GDPR obligations in their
compliance assessment tool where the obligations are linked to their relevant articles in
GDPR. The work described in this paper takes a similar approach with the major
di ering point being the focus on provenance and use of SHACL for validation.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Validation Model</title>
      <p>Our model for GDPR compliance, as depicted in Fig. 1, consists of three parts
querying (covered in previous work), validating retrieved information (described
in this paper), and generating documentation (planned future work). The
provenance information is represented using the GDPRov vocabulary and is linked to
concepts within GDPR using the GDPRtEXT vocabulary. The model is further
explained with an example use-case in an online article4.
The feasibility of validation on individual data instances at large scale is
questionable due to complexity of analysis. We currently focus only on the model of
the system as an abstract representation of the processes and artefacts and their
4 http://openscience.adaptcentre.ie/projects/CDMM/compliance/model.html
Exploring GDPR Compliance Over Provenance Graphs Using SHACL
interactions. Compliance is therefore an assessment of the organisation's data
practices rather than an investigation of a particular data subject's activities.</p>
      <p>The requirements for creating validation shapes is gathered from an
analysis of the GDPR legal document, various informative articles published by Data
Protection Authorities and commercial organisations, and auditing organisations
such as European Privacy Seal (EuroPriSe). These are then used to create a set
of questions (similar to those within the GDPR readiness checklist) which
retrieve information associated with compliance. The questions are then expressed
as SPARQL queries and can be adapted for creating shapes for compliance
validation using the SHACL-SPARQL extension.</p>
      <p>Validation is performed using SHACL (Shapes Constraint Language), a
language for validating RDF graphs against a set of conditions termed as shapes
and expressed as RDF graphs themselves. The validation occurs in two distinct
stages - rst stage (completeness) ensures presence of required information and is
carried out before any compliance queries are executed. Ideally, the completeness
of the graph is maintained in a continuous fashion. The second stage
(compliance) checks for conformance to speci c obligations to evaluate compliance. The
process of validation is captured using PROV-O to record its execution.</p>
      <p>Compliance validation takes place on a separate graph, termed compliance
graph, which is constructed from queries that retrieve and structure the required
information. The information within the compliance graph is linked to relevant
obligations within the text of the GDPR using the GDPRtEXT ontology.
Compliance validation is then performed using SHACL and the assessment is added
to the graph. The purpose of this graph is to represent information relevant to
compliance, to keep it separate from the data graph which may change with time,
to capture a snapshot of the state of compliance, and to assist in the generation
of compliance documentation.</p>
      <p>The compliance validation process occurs in two stages. In the rst, shapes
are validated and results are linked to speci c obligations and added to
compliance graph. The second stage of validation tests outcomes of the rst stage
against their linked obligations to allow for reuse of shapes to test di erent
obligations, and to generate validation directly linked to GDPR articles based on
ful lled obligations. At the end of the compliance validation process, the
compliance graph contains information required to answer the compliance queries,
their results linked to speci c obligations within GDPR, and an indication of
the compliance status for GDPR articles.</p>
      <p>Documentation and demonstration of compliance can then be performed as
SPARQL queries on the compliance graph, and persisted as a compliance
report using the EARL vocabulary. Further investigation of how and why the
compliance status was achieved is possible by exploring the validation reports
and queries present in the compliance graph. This can be exploited to create a
tool for top-down exploration of compliance that can be used to document and
demonstrate compliance in an interactive fashion.</p>
      <p>Pandit et al.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion &amp; Future Work</title>
      <p>
        This paper extends our our previous work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] on querying compliance-related
information into a model for evaluating GDPR compliance based on provenance
of consent and personal data lifecycles using the validation mechanism provided
by SHACL. The approach consists of creating a compliance graph consisting of
information required for answering compliance-related queries as well as results
for conformance to various GDPR obligations. It uses the GDPRtEXT
vocabulary to link validation results to speci c concepts and articles within GDPR
which allows the creation of interactive compliance documentation. The model
is further described through an example use-case in an online article5.
      </p>
      <p>In terms of future work, we plan to create a proof-of-concept
demonstration of using SHACL to test compliance obligations with a focus on interactive
documentation as described in this paper. There is also the possibility of using
graph reduction and summarising techniques to simplify the validation process
by representing legal obligations as patterns and testing them for compliance.
Finally, we plan to explore compliance coverage as a measure to compare our
work with similar work.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work is supported by the ADAPT Centre for Digital Content Technology
which is funded under the SFI Research Centres Programme (Grant 13/RC/2106)
and is co-funded under the European Regional Development Fund.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyskal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antunovic</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Legislative Compliance Assessment: Framework, Model and
          <string-name>
            <given-names>GDPR</given-names>
            <surname>Instantiation</surname>
          </string-name>
          .
          <article-title>Annual Privacy Forum (APF</article-title>
          <year>2018</year>
          )
          <article-title>(in-press) (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dullaert</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milosevic</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenning</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drozd</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raschke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>A Scalable Consent, Transparency and Compliance Architecture</article-title>
          .
          <source>In: Proceedings of the Posters and Demos Track of the Extended Semantic Web Conference (ESWC</source>
          <year>2018</year>
          )
          <article-title>(in-press) (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pandit</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fatema</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Sullivan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>GDPRtEXT - GDPR as a Linked Data Resource</article-title>
          .
          <source>In: The Semantic Web. Lecture Notes in Computer Science</source>
          , Springer, Cham (Jun
          <year>2018</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -93417-4 31
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pandit</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Modelling Provenance for GDPR Compliance using Linked Open Data Vocabularies</article-title>
          .
          <source>In: Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn)</source>
          (
          <year>2017</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-1951/#paper-
          <fpage>06</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pandit</surname>
            , H.J.,
            <given-names>O</given-names>
          </string-name>
          <string-name>
            <surname>'Sullivan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Queryable Provenance Metadata For GDPR Compliance</article-title>
          .
          <source>In: SEMANTiCS</source>
          <year>2018</year>
          { 14th International Conference on Semantic Systems (in-press). Vienna, Austria (
          <year>2018</year>
          ), https://s3-eu
          <article-title>-west-1</article-title>
          .amazonaws.com/harshp-media/research/publications/ 2018_
          <article-title>conference_queryable_provenance_metadata_for_GDPR_compliance</article-title>
          .pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>