<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Real-world Assessment of Policy-Protected OBDA (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Divya Baura</string-name>
          <email>divya.baura@umu.se</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Calvanese</string-name>
          <email>diego.calvanese@unibz.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Umeå Universitet</institution>
          ,
          <addr-line>Umeå</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Within the Ontology Based Data Access (OBDA) framework, users can query relational data sources using an ontology to which the source is linked via declarative mappings. In a world where data sharing is widespread, ensuring privacy while managing data poses a significant challenge. Controlled Query Evaluation (CQE) is a privacy preserving query answering framework in the presence of ontologies, where policies representing confidential information are used to devise suitable censors that enforce data protection. The integration of CQE within OBDA was recently proposed through the Policy-Protected OBDA (PPOBDA) framework, which is based on embedding policies into mappings. Such framework is essentially theoretical, and the efectiveness with which PPOBDA policies are able to capture real-world privacy requirements has not been assessed so far. In this work, we carry out such an evaluation, utilizing the well-known MIMIC-III hospital dataset, which recently has been mapped, by adopting the OBDA framework, to the Fast Healthcare Interoperability Resources (FHIR) ontology. We identify relevant privacy requirements by analyzing the legal regulations on data sharing expressed in HIPAA of US Federal Law and GDPR of the EU, show how they can be expressed via PPOBDA policies, and analyze the impact of these policies on the answers to a set of representative queries. Our analysis exposes both strengths and weaknesses of the PPOBA framework in relation to these practically relevant privacy regulations. Furthermore, we perform a performance evaluation of the OBDA framework implemented over the MIMIC-III dataset via the FHIR ontology, assessing the overhead introduced by the PPOBDA policies and its implications on such real-world use case.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ontology-Based Data Access (OBDA) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] provides a powerful framework for querying relational data
sources using ontologies. It supports user-friendly access to data by allowing queries over a conceptual
vocabulary of ontologies while relying on mappings to translate these queries to the underlying data.
Our work focuses on OBDA systems that use lightweight description logics, particularly OWL 2 QL.
While OBDA ofers eficient query answering, it also raises privacy concerns, as sensitive information
can be derived from the data retrieved through the mappings, combined with the inferences via the
ontology axioms.
      </p>
      <p>
        To address such concerns, Controlled Query Evaluation (CQE) has emerged as a privacy-preserving
mechanism within OBDA [
        <xref ref-type="bibr" rid="ref3">3, 4, 5</xref>
        ]. CQE enforces privacy through policies that specify which information
must be protected, using a censor to filter query answers accordingly. Building on this, the
PolicyProtected OBDA (PPOBDA) framework introduces policies that are first order denial assertions encoded
in mappings, aiming for stronger integration of privacy and data access [6].
      </p>
      <p>In this work, we explore the application of PPOBDA in the healthcare domain, where privacy is
paramount due to legal and ethical considerations. We reference major regulatory frameworks such as
the Health Insurance Portability and Accountability Act (HIPAA) [7] and the General Data Protection
Regulation (GDPR) [8] to identify key privacy requirements [9] and then use these requirements to
test the relevance of policies expressed in the PPOBDA framework. Our case study uses the MIMIC-III
hospital dataset [10] structured via the OMOP Common Data Model [11] and mapped to the FHIR
ontology [12, 13, 14]. Due to the size and complexity of FHIR, we apply ontology modularization
techniques to manage the execution process.</p>
      <p>We present representative PPOBDA policies, analyze their efectiveness in expressing real-world
privacy constraints, and assess limitation particularly in addressing practices like data de-identification.
Our experimental evaluation investigates how privacy policies impact query results and performance.
These findings contribute to understanding how policy-driven privacy mechanisms can be practically
applied in OBDA systems.</p>
      <p>Code and resources are available at https://github.com/divyabaura/PPOBDA-policies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>In this section, we outline the steps undertaken to construct and validate a privacy-aware
OntologyBased Data Access (OBDA) framework applied to a real-world healthcare dataset.</p>
      <p>Data Source and Standardization. We utilize the MIMIC-III clinical dataset[10], a large,
deidentified dataset containing health-related information from ICU patients. To harmonize the data
with existing standards, we employ the open-source mimic-omop ETL tool [15] to transform MIMIC-III
into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM),
specifically version 5.4. This transformation facilitates the use of standardized vocabularies from the OHDSI
initiative [16].</p>
      <p>Ontology Selection and Module Extraction. To align the data with a semantic model, we adopt
the FHIR Ontology [13], which provides an OWL-based formalization of FHIR resources. Due to its size
(over 1,450 classes), we extract a relevant module using the Syntactic Locality Module Extractor[17]
with the STAR method. The seed signature includes essential clinical concepts such as Patient, Address,
and CodeableConcept. This ensures that the extracted module retains semantic integrity while being
computationally tractable.</p>
      <p>Metadata Extraction and Integration. We leverage Ontop’s CLI [18] to extract metadata from the
transformed OMOP database. This metadata is represented as a JSON file and includes schema-level
information such as table names, column data types, and foreign key constraints. It forms the structural
bridge between the database and the ontology, enabling OBDA reasoning.</p>
      <p>Mapping Specification. We incorporate existing R2RML mappings from Xiao et al. [14], linking
OMOP CDM elements to FHIR RDF representations. These mappings span over 100 data elements
across 11 OMOP tables and 11 FHIR classes, and they allow SPARQL queries to be translated into SQL
over the relational schema while respecting the semantics of FHIR.</p>
      <p>System Setup and Execution. We have deployed the OBDA system using Ontop and connected
it to a PostgreSQL instance containing the OMOP-transformed MIMIC-III data. The system supports
semantic querying via SPARQL and enforces policy-aware access through the integration of the PPOBDA
framework, which will be detailed in subsequent sections.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Legal Analysis and Results</title>
      <p>We combine our analysis of how PPOBDA addresses key legal privacy requirements of HIPAA and
GDPR with experimental results on PPOBDA policy enforcement over the MIMIC-III dataset. To provide
an intuitive understanding of how in PPOBDA, policies afect the instances of a class , assume that all
denials in the set  of policies containing an atom that unifies with () are ∀, ⃗. (()∧ (, ⃗) →
⊥), for  ∈ {1, . . . , }. Then, the PPOBDA mapping reformulation algorithm replaces the atom ()
with () ∧ ⋀︀1≤≤ ¬∃⃗.  (, ⃗) [19], and such expression is rewritten w.r.t. the TBox and unfolded
w.r.t. the original mappings and incorporated in the source part of the new mapping assertions, thus
expressing the PPOBDA policies.</p>
      <p>Protected Health Information (PHI). Under HIPAA, PHI comprises 18 identifiers—demographic,
geographic, and medical data that require strict protection [20]. We have defined within PPOBDA
several policies to safeguard PHI by ensuring that identifiable attributes are concealed. We list here a
few meaningful examples, and refer to the GitHub repo for the full PPOBDA specification:
1 : ∀. ∀. ∀. Patient .gender (, ) ∧ Patient .address(, ) → ⊥
2 : ∀. ∀. ∀. MedicationStatement .subject (, ) ∧ link (, ) → ⊥
3 : ∀. ∀. Patient .id () ∧ Patient .generalPractitioner (, ) → ⊥
Policy 1 blocks any query combining gender and address; 2 prohibits linking medication statements
to patient identity; 3 hides the pairing of patient ID and practitioner assigned to specific patient. A
Limited Data Set (LDS) consists of health information with certain identifiers removed, reducing the
chances of identifying an individual.Embedding 1 into mapping 1 satisfies a LDS by only returning
patient gender when location_id IS NULL, ensuring address is never exposed alongside gender.
De-identified Data. HIPAA’s de-identification methods (Expert Determination, Safe Harbor) rely
on transformations (e.g., truncating ZIP codes) that remove the risk of re-identification [ 21]. Instead,
the only efect of the additional negated atoms introduced in mappings through PPOBDA policies is
to filter out entire tuples from the result (when they violate a policy), but these atoms are not able to
induce any transformation on the result, in particular to apply any of the available de-identification
functions (e.g., obfuscation, truncation, anonymization, or generalization). Therefore, denials are not
suited to perform de-identification in the PPOBDA framework.</p>
      <p>Right to Erasure (RTE). The Right to Erasure (RTE), defined in Article 17 of the GDPR [ 22], allows
individuals to request deletion of their personal data under certain conditions. This is essential for
protecting user privacy, especially in data access systems like OBDA. We consider two approaches to
supporting RTE in the OBDA setting:</p>
      <p>(1) RTE in PPOBDA. While OBDA systems typically lack control over the underlying data sources,
PPOBDA can simulate erasure by ensuring requested data is excluded from query results, even if not
physically deleted. Upon a user’s RTE request, appropriate denial policies can be added to censor access
to the relevant data. For instance, the following policies can hide sensitive patient data:
4 : ∀. ∀. ∀. Encounter .location(, ) ∧ Location.name(, ) → ⊥
5 : ∀. ∀. ∀. Procedure.code(, ) ∧ Procedure.performedDateTime(, ) → ⊥
6 : ∀. ∀. Patient .id () ∧ Observation.subject (, ) → ⊥
Policies 4–6 respectively block results revealing encounter locations, procedure timestamps, or
observation subjects for erased patients. Though the underlying data sources records remain intact,
this method ensures data is functionally inaccessible.</p>
      <p>(2) RTE via Ontology-Based Updates. For full RTE compliance, ontology-based updates can be used
to translate high-level deletion requests into source-level deletions [23]. In this approach, deletions
specified at the ontology level are compiled into the minimal necessary changes to the source database.
However, such updates may introduce side-efects. For example, deleting a patient’s marital status
might also remove their name and patient status if both are stored in the same row of a source table,
due to shared mappings. This highlights the complexity of achieving RTE via physical deletion.
Right to Rectification (RTR). RTR, as defined in Article 16 of the GDPR, grants individuals the right
to request corrections to their personal data if it is inaccurate or incomplete [24]. This right ensures
that data subjects can maintain the accuracy and integrity of their information, preventing incorrect or
outdated records from being used. Similar to RTE, PPOBDA cannot modify source values or adjust query
answers to reflect corrections. Instead, ontology-based updates [ 23] can translate rectification requests
into source updates, but may produce side-efects (e.g., changing eligibility facts tied to corrected birth
dates). Hence, in general, RTR compliance is managed similarly to RTE compliance.
Experimental Results. We evaluated PPOBDA on MIMIC-III data (via OMOP CDM and FHIR) using
the six policies 1–6 (see above) and 17 queries. For each query and policy, we measured average
execution time (over three runs) and result counts. Our key observations are: (i) Each policy successfully
suppresses results for at least one query, demonstrating precise enforcement of privacy constraints.
(ii) Queries unafected by a given policy return the same result count and exhibit comparable execution
times to the baseline. (iii) Embedding privacy policies introduces negligible overhead, indicating that
policy-induced filtering does not impede performance.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>We showed that PPOBDA can enforce many HIPAA/GDPR requirements, such as, hiding PHI
combinations and blocking “forgotten” data from query results while remaining eficient. At the same
time, denial assertions as policies cannot perform true de-identification (obfuscation/truncation) or
delete/rectify data at the source. To bridge these gaps, we are extending PPOBDA so that mappings
can apply simple transformations (e.g., value obfuscation) and invoking ontology-based updates to
propagate erasure/rectification requests down to the database. These enhancements aim to achieve full
regulatory compliance without sacrificing query performance.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research has been partially supported by the Wallenberg AI, Autonomous Systems and Software
Program (WASP) funded by the Knut and Alice Wallenberg Foundation, by the HEU project CyclOps (GA
n. 101135513), by the Province of Bolzano and FWF through project OnTeGra (DOI 10.55776/PIN8884924).
by the Province of Bolzano and EU through projects ERDF-FESR 1078 CRIMA, and ERDF-FESR 1047
AI-Lab, by MUR through the PRIN project 2022XERWK9 S-PIC4CHU, and by the EU and MUR through
the PNRR project PE0000013-FAIR.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Chat-GPT-4o in order to: Grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[4] B. C. Grau, E. Kharlamov, E. V. Kostylev, D. Zheleznyakov, Controlled query evaluation for Datalog
and OWL 2 Profile ontologies, in: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence (IJCAI),
AAAI Press, 2015, pp. 2883–2889.
[5] D. Lembo, R. Rosati, D. F. Savo, Revisiting controlled query evaluation in description logics, in:</p>
      <p>Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), ijcai.org, 2019, pp. 1786–1792.
[6] G. Cima, D. Lembo, L. Marconi, R. Rosati, D. F. Savo, Controlled query evaluation in ontology-based
data access, in: Proc. of the 19th Int. Semantic Web Conf. (ISWC), volume 12506 of Lecture Notes
in Computer Science, Springer, 2020, pp. 128–146. doi:10.1007/978-3-030-62419-4_8.
[7] W. Moore, S. Frye, Review of HIPAA, Part 1: History, protected health information, and privacy
and security rules, J. of Nuclear Medicine Technology 47 (2019) 269–272.
[8] P. Voigt, A. von dem Bussche, The EU General Data Protection Regulation (GDPR) – A Practical</p>
      <p>Guide, Springer, 2017.</p>
      <p>[9] The HIPAA Privacy Rule, 2025. URL: https://www.hhs.gov/hipaa/for-professionals/privacy/.
[10] A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits,
L. Anthony Celi, R. G. Mark, MIMIC-III, a freely accessible critical care database, Scientific Data 3
(2016) 1–9.
[11] OMOP Common Data Model, 2025. URL: https://ohdsi.github.io/CommonDataModel.
[12] HL7 FHIR Release 5, 2023. URL: http://www.hl7.org/fhir/structuredefinition.html.
[13] FHIR Ontology, 2025. URL: http://build.fhir.org/fhir.ttl.
[14] G. Xiao, E. R. Pfaf, E. Prud’hommeaux, D. Booth, D. K. Sharma, N. Huo, Y. Yu, N. Zong, K. J. Ruddy,
C. G. Chute, G. Jiang, FHIR-Ontop-OMOP: Building clinical knowledge graphs in FHIR RDF with
the OMOP Common Data Model, J. of Biomedical Informatics 134 (2022) 104201.
[15] Mapping the MIMIC-III database to the OMOP schema, 2018. URL: https://github.com/MIT-LCP/
mimic-omop.
[16] I. Reinecke, M. Zoch, C. Reich, M. Sedlmayr, F. Bathelt, The usage of OHDSI OMOP – A scoping
review, in: German Medical Data Sciences 2021: Digital Medicine: Recognize – Understand – Heal,
IOS Press, 2021, pp. 95–103. doi:10.3233/SHTI210546.
[17] Syntactic Locality Module Extractor, 2020. URL: https://owlcs.github.io/owlapi/apidocs_5/uk/ac/
manchester/cs/owlapi/modularity/SyntacticLocalityModuleExtractor.html.
[18] Ontop, 2025. URL: https://github.com/ontop/ontop/releases.
[19] D. Baura, D. Calvanese, L. Marconi, Implementing controlled query evaluation in OBDA, in: Proc. of
the Joint Ontology Workshops Episode 10: The Tukker Zomer of Ontology (JOWO), volume 3882 of
CEUR-WS.org, CEUR Workshop Proceedings, 2024. URL: https://ceur-ws.org/Vol-3882/st4dm-1.pdf.
[20] Other requirements relating to uses and disclosures of protected health information,
2013. URL: https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/
section-164.514.
[21] De-identification of Protected Health Information, 2025. URL: https://www.hhs.gov/hipaa/
for-professionals/privacy/special-topics/de-identification/.
[22] Art. 17 GDPR Right to erasure, 2025. URL: https://gdpr-info.eu/art-17-gdpr/.
[23] R. E. Wandji, D. Calvanese, Ontology-based update in virtual knowledge graphs via schema
mapping recovery, in: Proc. of the 8th Int. Joint Conf. on Rules and Reasoning (RuleML+RR),
volume 15183 of Lecture Notes in Computer Science, Springer, 2024, pp. 59–74.
[24] Art. 16 GDPR Right to rectification, 2025. URL: https://gdpr-info.eu/art-16-gdpr/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Linking data to ontologies</article-title>
          ,
          <source>J. on Data Semantics</source>
          <volume>10</volume>
          (
          <year>2008</year>
          )
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          ,
          <article-title>Ontologybased data access: A survey</article-title>
          ,
          <source>in: Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI)</source>
          ,
          <source>IJCAI Org.</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2018</year>
          /777.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bonatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sauro</surname>
          </string-name>
          ,
          <article-title>A confidentiality model for ontologies</article-title>
          ,
          <source>in: Proc. of the 12th Int. Semantic Web Conf. (ISWC)</source>
          , volume
          <volume>8218</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2013</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -41335-
          <issue>3</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>