<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Autorità Garante per la Protezione dei Dati Personali, Piazza Venezia</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multilevel Database Decomposition Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabrizio Baiardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cosimo Comella</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Sammartino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università di Pisa</institution>
          ,
          <addr-line>Largo Bruno Pontecorvo 3, 56127 Pisa, PI</addr-line>
        </aff>
      </contrib-group>
      <volume>11</volume>
      <issue>00187</issue>
      <abstract>
        <p>The Multilevel Database Decomposition Framework is a strategy to enhance system robustness and minimize the impact of data breaches. The framework prioritizes robustness against cyber threats over minimizing data redundancy by decomposing a database into smaller ones to restrict user access according to the least privilege principle. For this purpose, each database the decomposition produces is uniquely associated with a set of users and the decomposition ensures that each user can access all and only the data his/her operations need. This minimizes the data a user can access and the impact of an impersonation attack. To prevent the spreading of an intrusion across the databases it produces, the framework supports alternative allocation strategies that map the databases onto distinct virtual or physical entities according to the robustness of interest. This flexibility in allocation management ultimately reinforces defenses against evolving cyber threats and it is the main advantage of the deposition. As a counterpart of better robustness, some tables will be replicated across the databases the decomposition returns and their updates should be properly replicated to prevent inconsistencies among copies of a table in distinct databases. We present a performance analysis to evaluate the overhead of each allocation. This ofers insights into how the framework can satisfy distinct security requirements. We use these results to evaluate the efectiveness of the framework for healthcare applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Decomposition</kwd>
        <kwd>Database Allocation</kwd>
        <kwd>Impact Assessment</kwd>
        <kwd>GDPR</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Multilevel Database Decomposition Framework (MDDF) is an innovative approach that
targets the protection of personal information with a focus on the healthcare sector. It is designed to
implement relational databases with high robustness as it fully satisfies the least privilege
principle to efectively mitigate contemporary threats and safeguard sensitive healthcare information
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        The MDDF key notion is the decomposition of one relational database into a set of databases
defined according to the user operations. This minimizes user access rights [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] because each
user can access all and only the data his/her operations need. MDDF strongly reduces the blast
radius of a successful intrusion. Consider as an example the data that may be leaked due to an
impersonation attack [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where a threat agent impersonates a legitimate user. This strongly
improves overall data security.
      </p>
      <p>To prevent the spread of intrusions across the decomposed databases, MDDF supports
alternative allocation strategies. While the simplest allocation maps all databases onto the same
machine, confinement and robustness are enhanced by distributing databases across distinct
containers or physical/virtual machines. The ability to choose the allocation according to the
robustness of interest is a fundamental improvement that MDDF ofers with respect to the
management of the user access rights on the resulting databases.</p>
      <p>
        The databases that the MDFF produces may share tables or subsets of tables. Hence, these
tables are replicated across databases, and updates are replicated to maintain consistency
among multiple copies of a table and to ofer the same data view to each user. Multiple copies
update introduces both complexity and overhead that are proportional to the robustness of the
separation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Another problem of replication is the larger amount of memory utilization. We
believe this issue may be neglected when considering the more robust separation among the
various databases and the ability of using one copy to restore consistent information after any
attack against availability. From this perspective, the amount of memory the MDDF uses is
alwayslower than the one of solutions based upon distributed ledgers.
      </p>
      <p>The paper unfolds as follows: Sect. 2 delves into related works and briefly reviews the least
privilege principle and other concepts underlying the proposed framework. Sect. 3 describes the
MDDF, emphasizing database decomposition to optimally adhere to the least privilege principle.
Sect.4 exemplifies the application of. Lastly, Sect. 5 reports performance figures of multiple
update overhead as a function of the allocation of databases.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>
        The principle of least privilege (PoLP) suggests minimizing the access rights of each system user
so the user never owns access rights he/she does not need. This has several implications for the
management of access rights and prevents the adoption of strategies based upon a hierarchical
level of privileges. Systems satisfying the PoLP minimize the impact of an intrusion where a
threat agent impersonates a legal user. State-of-the-art strategies to face the challenges posed
by evolving cyber threats and cloud-based solutions improve database security by merging
the PoLP with measures such as role-based access control (RBAC), encryption, tokenization,
dynamic data masking, and pseudonymization. As discussed in the following, most of these
solutions can be integrated with the MDDF to address the current risk scenarios [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        Role-Based Access Control (RBAC) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] complements PoLP by assigning access permissions
based on predefined roles. This streamlines user privilege management, ensuring individuals
have access only to resources essential for their specific organizational roles.
      </p>
      <p>
        Encryption and Tokenization are crucial components in securing sensitive data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
that, respectively, transform data into unreadable formats and replace sensitive data with
non-sensitive placeholders.
      </p>
      <p>
        Dynamic Data Masking (DDM) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] obfuscates sensitive information dynamically to ensure
that only authorized individuals can access and view sensitive data.
      </p>
      <p>
        Adaptive and comprehensive security measures are essential to counter the constantly
evolving cyber threat landscape of databases that includes challenges such as ransomware, advanced
persistent threats (APTs), and insider threats [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        The widespread adoption of cloud-based database solutions introduces new security
concerns [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] because protecting data in a cloud requires guarding against unauthorized access,
data breaches from users of the same provider, and compliance with data residency regulations.
      </p>
      <p>
        Pseudonymization is another vital facet of database security as it replaces personally
identifiable information (PII) with artificial identifiers or pseudonyms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This adds another layer
of privacy protection by making it challenging to directly associate sensitive data with specific
individuals. Pseudonymization contributes to compliance with data protection regulations as it
allows organizations to leverage data for legitimate purposes without compromising individual
privacy. The implementation of pseudonymization within database systems enhances data
security by both reducing the impact of unauthorized access to personal information and limiting
exposure in the event of a breach.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Multilevel Database Decomposition Framework</title>
      <p>This section details the decomposition step by step to outline the underlying principles. To
this end, it is essential to explain how to decompose a database and then move from the
decomposition of a database to that of its tables. Then, the section highlights the synchronization
problems generated by a decomposition where the same table is shared, i.e. it belongs to distinct
databases. This requires that some operations on the table in one of the databases resulting
from the decomposition fire an automatic update of databases sharing the same table.</p>
      <sec id="sec-3-1">
        <title>3.1. General Approach</title>
        <p>
          MDDF aims to fully satisfy the principle of least privilege (PoLP)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] by decomposing a
relational database shared among a set of users to minimize user access to the data involved in
the operations each user can invoke. To this purpose, starting from a system where each user
can access any information in the database, the framework produces a system with distinct
databases and where each user is granted access to just one of the databases, ie to the smallest
amount of data to implement the operations of interest. This is achieved by distributing the
tables in the original database to several databases, each consisting of only a subset of those in
the original one. Each resulting database includes all the tables to implement the operations
of a class of users and if the operations do not access an attribute of a table, the attribute is
dropped. Each user belongs to one class and it is assigned access rights on the tables in one
of the databases the decomposition returns. This strategy satisfies the PoLP as it assigns all
and only the access rights the user needs. MDDF can be integrated with any of the various
mechanisms discussed in the previous section as any of these mechanisms can be applied to
each of the databases. The last step of the framework maps the databases it builds to distinct
containers, virtual or physical machines to confine a successful intrusion to one of the databases.
        </p>
        <p>Definition and Purpose : Normal forms apply decomposition to simplify the
understanding and management of the database schema. This helps to reduce redundancy, to improve
maintainability, and to enhance the overall database performance.</p>
        <p>The decomposition of a database, e.g. a set of tables, returns subsets of these tables. Each
table is either an original table or a subset of the attributes, the columns, of an original table.
The subsets of tables are not partitions because they can share some tables. Each resulting
subset is stored separately, and it can be accessed and managed independently.</p>
        <p>The decomposition is implemented through a sequence of steps that, given a database and a
set of users, can be described as follows:
• Traditional Normalization in 3NF
• Identification of User Groups
• Creation of Database Subsets
• Association of Groups and Subsets
• Configuring Access Mechanisms
• Choosing the Confinement Robustness
3.2. Steps
The following subsections will show in detail the steps to be followed to apply the framework.</p>
        <sec id="sec-3-1-1">
          <title>3.2.1. Traditional Normalization in 3NF</title>
          <p>Normalization is a process that structures data in a database to eliminate redundancy and
dependency. The third Normal Form (3NF) is a widely used standard that reduces data redundancy.
We assume this normalization has already been applied to ensure that the database is well
structured and data is stored consistently.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2.2. Identification of User Groups</title>
          <p>Identifying users is a preliminary step that identifies all users who interact with the database
and analyzes the needs of the operations of each user. In this way, we create groups of users
tailored to their needs because two users belong to the same group if they invoke the same
operations on the same tables. This allows access to sensitive information to be limited only to
users who are entitled to operate on such information according to the need-to-know principle.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.2.3. Creation of Database Subsets</title>
          <p>The creation of multiple databases, each a subset of the original one is the central step of the
framework. It creates a distinct database for each user group, according to the operation the
users in the group execute and the data these operations access. We consider the worst case,
e.g. any table and any attribute a user may require should belong to the corresponding subset
to ensure that users can access all and only the information they need. As a consequence, some
tables will be shared among users in distinct groups and they appear in distinct databases. The
attributes of these shared tables implement data exchange among users in distinct groups. This
requires that an update in a table is spread across all the databases where the table, or the
attribute, appears.</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.2.4. Association of Groups and Subsets</title>
          <p>The strategy to decompose the database obviously results in a biunivocal association between
user groups and the databases the decomposition returns. The association is the input to define
user access rights. The membership in a group is dynamic because it depends upon roles
assigned by the organization.</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>3.2.5. Configuring the Access Mechanism</title>
          <p>This step ofers a first level of security by granting each user the access rights to access all
and only the information in the database associated with the user group. This prevents users
from accessing information in the original database that is not necessary for their work. The
detailed implementation of this step depends upon both the underlying operating system and
database management system that have been adopted. The restriction of users to access only
the information they need minimises the blast radius of an intrusion ie the amount of data that
may be lost because of data breaches and unauthorized access.</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>3.2.6. Choosing the Confinement Robustness</title>
          <p>The definition of allocation of the databases from the previous steps onto physical machines,
virtual machines, VMs, and containers determines the confinement level according to the
robustness of the overall system that is required to protect the various information. The choice
of allocation usually depends on the critical level of the information in a database and on how
much the solution should confine an intrusion or an impersonation.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Confinement</title>
        <p>
          MDDF enables the designer to select diferent confinement levels, crucial for aligning the
allocation with required security levels, resource usage, management complexity, and other
factors. In more detail, each allocation ofers a distinct robustness that also depends upon
vulnerabilities in containers or virtual machines [
          <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
          ].
        </p>
        <p>Mapping
Distinct physical machines
Distinct VMs
Distinct containers
Simple database decomposition</p>
        <p>Confinement Robustness
High
Medium-High
Medium
Low</p>
        <p>Table 1 outlines the confinement levels, or robustness, resulting from various database
allocations. Allocating databases to distinct physical machines provides the highest level of
confinement, while a simple database decomposition ofers the lowest.</p>
        <p>Each alternative has its pros and cons:
• Distinct Physical Machines: High safety and confinement due to physical separation
but high resource usage and management complexity.
• Distinct VMs: Good confinement with customizable resource allocation, but with a
resource overhead and expertise needed for management.
• Distinct Containers: Lightweight solution with minimal overhead, but potential security
risk due to shared underlying OS.
• Simple Database Decomposition: Easier management and fewer resources, but low
confinement and potential data integrity issues.</p>
        <p>Even if an intrusion can attack the second allocation and the third one by exploiting
vulnerabilities in containers or in VMs, all the allocations that MDDF supports ofer a better robustness
to intrusions than a simple transformation to the third normal form. The choice of the proper
allocation should align with the specific requirements of the system of interest, balancing
confinement, resource eficiency, and management complexity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. An Example</title>
      <p>This example considers a scenario with the information a healthcare organization, ie a hospital,
manages information about operators, patients, prescriptions, and related administration. The
scenario assumes that initially, the hospital uses a centralized database and the tables in the
database record patient details and prescriptions and all the data to satisfy the needs of medical
staf and patients. The overall database management system ensures medical professionals
have access to proper patient information, while patients can conveniently retrieve their own
medical records and prescription details. Further users exist according to the various roles in
the organization. Among the possible groups of users, we also include statisticians and the
corresponding operations. The administrative staf manages costs, while statisticians analyze
aggregated data without compromising individual patient identities. When computing statistics,
the names of the patients are replaced with pseudonymous IDs, ensuring a high level of privacy,
and a mapping table simplifies the association between these pseudonymous IDs and patients.</p>
      <sec id="sec-4-1">
        <title>4.1. Users and Database</title>
        <p>The Healthcare Database (HealthDB) of the organization is designed to meet the diferent needs
of its users, including Patients, Medical Doctors, Nurses, Administrative Staf, and Statisticians.</p>
        <p>Users:
• Patients (Patients): This group requires read access to their medical records, prescription
details, exam results, and associated costs. Access is facilitated through a pseudonymous
ID for privacy.
• Medical Doctors (Doctors): Medical professionals need read and write access to patient
information, medical history, and the ability to prescribe medications and order exams.
• Nurses (Nurses): Nurses require read and write access to patient information, medical
history, and the ability to record exam results and administer prescribed medications.
• Administrative Staf (Admins) : This group focuses on write access to the CostsTable
for billing purposes. They may also have read access to other relevant information.
• Statisticians (Statisticians): Statisticians have read-only access to aggregated data in
the Patients table for statistical analysis. They cannot view sensitive information like
names and addresses.</p>
        <p>Database:
• Healthcare Database (HealthDB): This is the centralized database that stores
information of interest in the following tables:
– Patient Information Table (PatientTable): Includes pseudonymous IDs
(PatientID), comprehensive medical history (MedicalHistory), and IDs of assigned doctors
(AssignedDoctorID) and nurses (AssignedNurseID). A mapping table (MappingTable)
manages the link between real patient identities and pseudonymous IDs where each
ID is automatically generated on the first hospital visit.
– Sensitive Data Table (SensitiveDataTable): Contains sensitive patient
information such as real names (RealName), addresses (Address), and details of who pays
the bill (PayerDetails). It is linked to PatientTable via pseudonymous IDs.
– Prescription Records Table (PrescriptionTable): Stores data related to
prescriptions, including unique prescription IDs (PrescriptionID), details of prescribed
medications (MedicationDetails), dosage (Dosage), prescribing doctor IDs
(PrescribingDoctorID), and associated costs (AssociatedCost).
– Exam Records Table (ExamTable): Stores information about tests and exams,
including unique IDs (examID) and test and exam results (Results).
– Costs Table (CostsTable): Manages the costs of medicines and exams, with
transaction IDs (TransactionID), item descriptions (Item), and associated costs (Cost).</p>
        <p>Accessed by administrative staf for billing purposes.
– Mapping Table (MappingTable): Simplifies the mapping between pseudonymous
patient IDs (PatientID) and their real identities (RealIdentityID).
– Statistics Table (StatisticsTable): Stores aggregated and de-identified data from
the Patients table, accessible to statisticians for analysis. Includes aggregated data
IDs (AggregatedDataID) and de-identified aggregated data (AggregatedData).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. MDDF Implementation Steps</title>
        <p>The implementation of MDDF adopts a systematic approach to satisfy the unique needs of user
groups of the healthcare system. The key steps involve identifying users and their specific needs,
computing the access permissions to satisfy these needs, and ensuring secure data management
within the Healthcare Database (HealthDB). In the following, we delve into a short, high-level
overview of these crucial implementation steps.</p>
        <p>1. Identify Users and Their Needs:
a) Patients
• Needs:
– Read access to their own medical records (MedicalHistory in PatientTable).
– Read access to their prescription details (MedicationDetails, Dosage in</p>
        <p>PrescriptionTable).</p>
        <p>– Read access to their exam records (Results in ExamTable).
• Required Access: Read access to PatientTable, PrescriptionTable, ExamTable
in the Healthcare Database (HealthDB).
b) Doctors
• Needs:
– Read and write access to patient information for assigned patients
(Patient</p>
        <p>Table).
– Write access to prescribe medications (PrescriptionTable).</p>
        <p>– Write access to order exams (ExamTable).
• Required Access: Read and write access to PatientTable, DoctorsTable,
PrescriptionTable, Examtable in the Healthcare Database (HealthDB).
c) Nurses
• Needs:
– Read and write access to patient information for assigned patients
(Patient</p>
        <p>Table).
– Write access to record exam results (Examtable).</p>
        <p>– Write access to administer prescribed medications (PrescriptionTable).
• Required Access: Read and write access to PatientTable, NursesTable,
Examtable, PrescriptionTable in the Healthcare Database (HealthDB).
d) Administrative Staf (Admins)
• Needs:</p>
        <p>– Write access to the CostsTable for billing purposes (CostsTable).
• Required Access: Write access to CostsTable in the Healthcare Database
(HealthDB).
e) Statisticians
• Needs:
– Read-only access to aggregated data in the PatientsTable for statistical
analysis (StatisticsTable).
• Required Access: Read-only access to StatisticsTable in the Healthcare Database
(HealthDB).
2. Create Database Subsets</p>
        <p>This step produces the distinct database subsets of HealthDB for the various user groups.
These databases areMedDB for medical staf, PatientDB for patients, AdminDB for
administrative staf, and StatDB for statisticians. Since each database only include the relevant
tables and fields the designated user roles require. we adhere to the principle of least
privilege. The structure of each database the decomposition returns are listed below:
Subset MedDB: This subset includes only the tables that in the main database are relevant
to Medical Staf.</p>
        <p>Key tables comprise:
Subset PatientDB: This subset includes only the tables in the main database relevant to
Patients. Key tables include:
Subset AdminDB: This subset includes only the tables from the main database relevant
to Administrative Staf. Key tables comprise:</p>
        <p>• CostsTable: TransactionID, Item, Cost
Subset StatDB: This subset includes only the main database tables relevant to Statisticians.
Key tables comprise:</p>
        <p>• StatisticsTable: AggregatedDataID, AggregatedData
These subsets ensure that each user group can access all and only to the information the
corresponding roles require according to the principle of least privilege.
3. Associating User Groups and Database Subsets: The biunivocal association between
a user group and one of the databases the decomposition returns is the fundamental input
for the next step that defines user access rights.</p>
        <p>• Define User-Subset Mapping : The association between each user group and
a database subset map, Medical Staf into the MedDB subset, Patients into the
PatientDB subset while Doctors should be mapped to the MedDB subset (with
access to fields like DoctorID, Name, Specialty), Nurses should be mapped into the
MedDB subset (with access to fields like NurseID, Name, AssignedPatients), and
Administrative Staf should be mapped into the AdminDB subset (with access to
ifelds like TransactionID, Item, Cost). Statisticians are to be mapped into the StatDB
subset (with access to fields like AggregatedDataID, AggregatedData).
• Use Mapping for Access Control: Starting from the mappings previously defined,
we can configure access control mechanisms. The mapping implies the assignment
of the corresponding access rights and permissions to each user in a user group on
the corresponding database subset.
• Regularly Update Mapping: User groups should be reviewed with a fixed
frequency because the group of a user can be updated to take into account changes
in roles, responsibilities, and database structure. This ensures that access control
remains aligned with organizational requirements.</p>
        <p>The overall decomposition process ensures that the mapping between users and database
subsets is clearly defined, access control mechanisms are implemented according to this
mapping, and updates occur to adapt to organizational changes.
4. Configure Access Mechanisms : After creating the database subsets, it is essential to
define who has access to each subset and what level of access is allowed:
a) Patients should have read access to the PatientDB subset, allowing them to view
and access their pseudonymous medical records, prescription details (fields:
PrescriptionID, MedicationDetails, Dosage, PrescribingDoctorID), and exam records
(fields: examID, Results).
b) Doctors should have read and write access to the MedDB subset, similar to Medical
Staf, with the additional ability to prescribe medications (fields: PrescriptionID,
MedicationDetails, Dosage, PrescribingDoctorID) and order exams (fields: examID,
Results).
c) Nurses should have read and write access to the MedDB subset as Medical Staf,
but with the additional ability to record exam results (fields: examID, Results)
and administer prescribed medications (fields: PrescriptionID, MedicationDetails,
Dosage, PrescribingDoctorID).
d) Administrative Staf (Admins) should have write access to the AdminDB
subset, to enable them to manage costs associated with medicines and exams (fields:
TransactionID, Item, Cost).
e) Statisticians should have read-only access to the StatDB subset, to enable them
to analyze aggregated and de-identified data in the PatientsTable for statistical
purposes (fields: AggregatedDataID, AggregatedData).</p>
        <p>After defining the relationships between the user groups and the various database subsets,
some further procedures have to be adopted to guarantee security and data management.
Below, we list some further details of each procedure:
• Authentication and Authorization: It ensures that every user is authenticated,
allowing only authorized users to access data in their respective subsets. This may
include the use of usernames and passwords, two-factor authentication, or other
secure authentication methods.</p>
        <p>Additionally, it is crucial to assign specific roles and permissions based on user
responsibilities. For example, Medical Staf and Doctors must have the role "Medical
Professional" with full permissions for the MedDB subset, while Patients must have
appropriate read access roles for the PatientDB subset. Nurses and Administrative
Staf must also be assigned roles with relevant permissions.
• Auditing and Monitoring: It implements an audit system to track who accesses
the data and what operations are invoked. This system will help to detect suspicious
activities or unauthorized access. As an example, it is critical to record who accesses
tables in the MedDB and AdminDB subsets and to track the updates to these data.
These procedures ensure that an organization manage in the proper way data and security,
according to the specific needs and responsibilities of each type of user.
5. Synchronize Shared Tables: This step identifies shared tables, i.e. tables that appear in
distinct database subsets. In this example, multiple copies of several tables exist. Hence,
it is essential to synchronize the updates of these tables to ensure data consistency and
integrity despite replication.</p>
        <p>When multiple users or divisions can update a shared table, the following steps occur:
• Identify Common Tables: This first step has to identify the tables containing
shared data among subsets. For instance, in our healthcare organization, tables such
as PatientTable (fields: PatientID, MedicalHistory, AssignedDoctorID,
AssignedNurseID) and examsTable (fields: examID, Results) are shared between the MedDB
and PatientDB subsets, and synchronization is crucial to avoid data inconsistencies.
• Implement Synchronization Rules: It defines rules to implement data
synchronization among subsets. These rules cover scenarios such as updates, insertions,
and deletions. For example, when medical staf updates patient data in the MedDB
subset, these changes must be reflected in the PatientDB subset so that patients
have access to the updated information. Similar synchronization rules have to be
established for shared tables among other subsets.
• Plan Synchronization: It plans when data synchronization occurs. For instance,
update to critical data should be immediate to ensure immediate alignment. Less
critical data can tolerate a weaker consistency. As an example, data update for
administrative staf can occur once a day, usually at night. A weekly update may
be appropriate for statisticians. The synchronization strategy should consider the
needs of all user groups, including Medical Staf, Patients, Doctors, Nurses, and
Administrative Staf.</p>
        <p>The choice of the synchronization strategy should tune the synchronization overhead
to the urgency and relevance of data for each user group while preserving consistency
across diferent subsets and assuring the integrity of all the organization data.
6. Optimizing Confinement Robustness</p>
        <p>The optimization of confinement robustness involves the definition of the degree of
physical and logical separation among the database subsets according to data sensitivity
and security needs. The steps to follow are:
• Data Classification : It classifies data based on its sensitivity level. For example,
personal information (RealName, Address, PayerDetails) and medical history
(MedicalHistory) should be classified as highly sensitive, while public or non-sensitive
data are classified as less critical.
• Assigning Security Levels: It assigns a security level to each database subset based
on data classification. For instance, the financial database (CostsTable) requires the
highest security level, while the other database may require a lower one.
• Allocation Choice: To show the flexibility of MDDF, we assume that the most
cost-efective solution is the one that hosts the PatientDB subset on a separate
physical machine and maps other databases onto a further physical machine and
uses virtualization to separate these databases.</p>
        <p>– Separate Physical Machine (PatientDB): The PatientDB subset should be
allocated to a dedicated physical server for maximum confinement. This is
required when the most robust confinement is a system requirement.
– Virtual Machines (VMs): We show how to allocate the other subsets onto
three virtual machines (VMs) and run them on another physical machine. In
this way, VMs ofer logical separation within a single shared physical server.
The other databases are mapped onto the VMs as follows:
∗ VM Prescription: A virtual machine to manage prescription data
(PrescriptionID, MedicationDetails, Dosage, PrescribingDoctorID, AssociatedCost).
∗ VM Exam Costs: A virtual machine is devoted to exam cost data
(TransactionID, Item, Cost).
∗ Containerization (Mapping and Statistics): Within the VM "Exam
Costs" machine we allocate to distinct containers the MappingTable and
StatisticsTable. Containers ofer lightweight and eficient confinement,
striking a balance between complete confinement and resource eficiency.</p>
        <p>This example outlines how MDDF can support distinct degrees of physical and
logical confinement across and within physical and virtual machines according to
the security needs of each database subset.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Comparison of Alternative Synchronization Solutions</title>
      <p>This section discusses the overhead of alternative solutions to synchronize tables shared among
distinct database subsets. In particular, it compares the features and capabilities of two APIs
based on, respectively, triggers and events.</p>
      <p>A trigger is a database event that fires the execution of a code fragment as a response to
predefined conditions to preserve the integrity of the database, ensuring that some actions are
executed when specific changes occur.</p>
      <p>An alternative solution adopts SymmetricDS, an open-source tool to support data replication
and synchronization among databases. It is used by several companies and organizations to
replicate distributed environments, such as bank branches, remote ofices, or geographically
separated data centers (e.g. content delivery network). In particular, it is used by OpenMRS, a
collaborative open-source project to develop software to support the delivery of health care in
developing countries.</p>
      <sec id="sec-5-1">
        <title>5.1. Performance Comparison of the Two Solutions</title>
        <p>This section compares the performance of the two synchronization solutions: Trigger-API and
SymmetricDS. A series of experiments have been implemented to evaluate their efectiveness
under various conditions.</p>
        <p>Experiment
Latency (ms) - Local Configuration
Latency (ms) - Remote Configuration
Latency (ms) - Distributed Configuration
CPU Utilization - Light Workload
CPU Utilization - Medium Workload
CPU Utilization - Heavy Workload
Conflict Resolution (ms) - 100% Resolved
Conflict Resolution (ms) - 90% Resolved
Conflict Resolution (ms) - 80% Resolved
Two-way Synchronization (ms) - Main to Remote
Two-way Synchronization (ms) - Remote to Main
Variable Load (tps) - Peak Activity
Variable Load (tps) - Calm Period</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>
        The Multilevel Database Decomposition Framework ofers several benefits, including:
• Preserve privacy: In healthcare, where sensitive patient information is handled, database
decomposition improves security by restricting each user to one subset with the data the
user needs to protect the privacy of individuals. In healthcare, where sensitive patient
information is handled, this is crucial for complying with regulations such as GDPR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and HIPAA [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The framework supports the implementation of precise access controls,
ensuring that only authorized medical staf can access specific patient information.
• Reduced Impact of Security Breaches: The healthcare sector is a prime target for
cyberattacks. Database decomposition limits the impact of security breaches, as compromising
one subset does not expose patient records in distinct databases.
• Reduction of complexity: By decomposing the database into smaller subsets, users can
use simpler data structures that are easier to understand.
• Improved performance: Access to smaller, more tailored tables and relationships
improves database performance.
• Eficient Data Retrieval for Medical Research: Researchers can access specific subsets
relevant to their studies, streamlining data retrieval for research purposes. This can
contribute to advancements in healthcare through data-driven insights.
• Scalability for Growing Healthcare Systems: As healthcare systems expand, the
multilevel database decomposition framework allows for scalability. New subsets can be
added to accommodate the growing volume of patient data.
      </p>
      <p>As a counterpart, the proposed framework also presents some challenges:
• Management complexity: Managing multiple subsets may increase the complexity of
database administration and require greater resources for maintenance and updates.
• Risk of data inconsistency: Database decomposition requires the adoption of proper
synchronization mechanisms with the resulting overhead.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Union</surname>
          </string-name>
          ,
          <article-title>Data protection in the EU</article-title>
          ,
          <year>2023</year>
          . URL: https://ec.europa.eu/info/law/ law
          <article-title>-topic/data-protection_en.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Appari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <article-title>Institutionalizing hipaa compliance: Organizations and competing logics in u.s. health care</article-title>
          ,
          <source>Journal of Health and Social Behavior</source>
          <volume>55</volume>
          (
          <year>2014</year>
          )
          <fpage>108</fpage>
          -
          <lpage>124</lpage>
          . URL: https://doi.org/10.1177/0022146513520431. doi:
          <volume>10</volume>
          .1177/0022146513520431, pMID:
          <fpage>24578400</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Saltzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Schroeder</surname>
          </string-name>
          ,
          <article-title>The protection of information in computer systems</article-title>
          ,
          <source>Proc. IEEE</source>
          <volume>63</volume>
          (
          <year>1975</year>
          )
          <fpage>1278</fpage>
          -
          <lpage>1308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>A contemporary look at saltzer and schroeder's 1975 design principles</article-title>
          ,
          <source>IEEE Security &amp; Privacy</source>
          <volume>10</volume>
          (
          <year>2012</year>
          )
          <fpage>20</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Campobasso</surname>
          </string-name>
          , L. Allodi,
          <article-title>Impersonation-as-a-service: Characterizing the emerging criminal infrastructure for user impersonation at scale</article-title>
          ,
          <source>in: Proc. of the IEEE Int. Symposium on Secure Software Engineering</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2006</year>
          , p.
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Moiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sailaja</surname>
          </string-name>
          , G. Venkataswamy,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Database replication: A survey of open source and commercial tools</article-title>
          ,
          <source>International Journal of Computer Applications</source>
          <volume>13</volume>
          (
          <year>2011</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Al-Sayid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Aldlaeen</surname>
          </string-name>
          ,
          <article-title>Database security threats: A survey study</article-title>
          ,
          <source>in: 2013 5th International Conference on Computer Science and Information Technology</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Humayun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jhanjhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almufareh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khalil</surname>
          </string-name>
          ,
          <article-title>Security threat and vulnerability assessment and measurement in secure software development</article-title>
          ,
          <source>Computers, Materials and Continua</source>
          <volume>71</volume>
          (
          <year>2022</year>
          )
          <fpage>5039</fpage>
          -
          <lpage>5059</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivasa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhal</surname>
          </string-name>
          ,
          <article-title>Database intrusion detection using role and user behavior based risk assessment</article-title>
          ,
          <source>Journal of Information Security and Applications</source>
          <volume>55</volume>
          (
          <year>2020</year>
          )
          <fpage>102654</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zengin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hizal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Suaib</given-names>
            <surname>Akhter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Altunkaya</surname>
          </string-name>
          ,
          <article-title>A novel data encryption algorithm to ensure database security</article-title>
          ,
          <source>Acta Infologica</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cuzzocrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shahriar</surname>
          </string-name>
          ,
          <article-title>Data masking techniques for nosql database security: A systematic review</article-title>
          ,
          <source>in: 2017 IEEE International Conference on Big Data (Big Data)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4467</fpage>
          -
          <lpage>4473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Linkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Baiardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Florin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Greer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pollock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Trump</surname>
          </string-name>
          ,
          <article-title>Applying resilience to hybrid threats</article-title>
          ,
          <source>IEEE Security &amp; Privacy</source>
          <volume>17</volume>
          (
          <year>2019</year>
          )
          <fpage>78</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <article-title>Data security and privacy protection for cloud storage: A survey</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>131723</fpage>
          -
          <lpage>131740</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Binjubeir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. B.</given-names>
            <surname>Ismail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Sadiq</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Khurram Khan, Comprehensive survey on big data privacy protection</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>20067</fpage>
          -
          <lpage>20079</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          , et al.,
          <article-title>An access control model for preventing virtual machine escape attack</article-title>
          ,
          <source>Future Internet</source>
          <volume>9</volume>
          (
          <year>2017</year>
          )
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Wong</surname>
          </string-name>
          , et al.,
          <article-title>On the security of containers: Threat modeling, attack analysis, and mitigation strategies</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>128</volume>
          (
          <year>2023</year>
          )
          <fpage>103140</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shringarputale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McDaniel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Butler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. La</given-names>
            <surname>Porta</surname>
          </string-name>
          ,
          <article-title>Co-residency attacks on containers are real</article-title>
          ,
          <source>in: Proc. of the 2020 ACM SIGSAC Conf. on Cloud Computing Security Workshop</source>
          , ACM,
          <year>2020</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>