=Paper=
{{Paper
|id=Vol-3731/paper32
|storemode=property
|title=Multilevel Database Decomposition Framework
|pdfUrl=https://ceur-ws.org/Vol-3731/paper32.pdf
|volume=Vol-3731
|authors=Fabrizio Baiardi,Cosimo Comella,Vincenzo Sammartino
|dblpUrl=https://dblp.org/rec/conf/itasec/BaiardiCS24
}}
==Multilevel Database Decomposition Framework==
Multilevel Database Decomposition Framework Fabrizio Baiardi1 , Cosimo Comella2 and Vincenzo Sammartino1 1 Università di Pisa, Largo Bruno Pontecorvo 3, 56127 Pisa (PI) 2 Autorità Garante per la Protezione dei Dati Personali, Piazza Venezia 11, 00187 Roma Abstract The Multilevel Database Decomposition Framework is a strategy to enhance system robustness and minimize the impact of data breaches. The framework prioritizes robustness against cyber threats over minimizing data redundancy by decomposing a database into smaller ones to restrict user access according to the least privilege principle. For this purpose, each database the decomposition produces is uniquely associated with a set of users and the decomposition ensures that each user can access all and only the data his/her operations need. This minimizes the data a user can access and the impact of an impersonation attack. To prevent the spreading of an intrusion across the databases it produces, the framework supports alternative allocation strategies that map the databases onto distinct virtual or physical entities according to the robustness of interest. This flexibility in allocation management ultimately reinforces defenses against evolving cyber threats and it is the main advantage of the deposition. As a counterpart of better robustness, some tables will be replicated across the databases the de- composition returns and their updates should be properly replicated to prevent inconsistencies among copies of a table in distinct databases. We present a performance analysis to evaluate the overhead of each allocation. This offers insights into how the framework can satisfy distinct security requirements. We use these results to evaluate the effectiveness of the framework for healthcare applications. Keywords Decomposition, Database Allocation, Impact Assessment, GDPR 1. Introduction The Multilevel Database Decomposition Framework (MDDF) is an innovative approach that tar- gets the protection of personal information with a focus on the healthcare sector. It is designed to implement relational databases with high robustness as it fully satisfies the least privilege princi- ple to effectively mitigate contemporary threats and safeguard sensitive healthcare information [1, 2]. The MDDF key notion is the decomposition of one relational database into a set of databases defined according to the user operations. This minimizes user access rights [3, 4] because each user can access all and only the data his/her operations need. MDDF strongly reduces the blast radius of a successful intrusion. Consider as an example the data that may be leaked due to an impersonation attack [5] where a threat agent impersonates a legitimate user. This strongly improves overall data security. ITASEC24: Italian Conference on Cybersecurity, April 8-12, 2024, Salerno, IT $ fabrizio.baiardi@unipi.it (F. Baiardi); c.comella@gpdp.it (C. Comella); v.sammartino@studenti.unipi.it (V. Sammartino) 0000-0001-9797-2380 (F. Baiardi); 0009-0002-4632-1179 (V. Sammartino) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings To prevent the spread of intrusions across the decomposed databases, MDDF supports al- ternative allocation strategies. While the simplest allocation maps all databases onto the same machine, confinement and robustness are enhanced by distributing databases across distinct containers or physical/virtual machines. The ability to choose the allocation according to the robustness of interest is a fundamental improvement that MDDF offers with respect to the management of the user access rights on the resulting databases. The databases that the MDFF produces may share tables or subsets of tables. Hence, these tables are replicated across databases, and updates are replicated to maintain consistency among multiple copies of a table and to offer the same data view to each user. Multiple copies update introduces both complexity and overhead that are proportional to the robustness of the separation [6]. Another problem of replication is the larger amount of memory utilization. We believe this issue may be neglected when considering the more robust separation among the various databases and the ability of using one copy to restore consistent information after any attack against availability. From this perspective, the amount of memory the MDDF uses is alwayslower than the one of solutions based upon distributed ledgers. The paper unfolds as follows: Sect. 2 delves into related works and briefly reviews the least privilege principle and other concepts underlying the proposed framework. Sect. 3 describes the MDDF, emphasizing database decomposition to optimally adhere to the least privilege principle. Sect.4 exemplifies the application of. Lastly, Sect. 5 reports performance figures of multiple update overhead as a function of the allocation of databases. 2. State of the Art The principle of least privilege (PoLP) suggests minimizing the access rights of each system user so the user never owns access rights he/she does not need. This has several implications for the management of access rights and prevents the adoption of strategies based upon a hierarchical level of privileges. Systems satisfying the PoLP minimize the impact of an intrusion where a threat agent impersonates a legal user. State-of-the-art strategies to face the challenges posed by evolving cyber threats and cloud-based solutions improve database security by merging the PoLP with measures such as role-based access control (RBAC), encryption, tokenization, dynamic data masking, and pseudonymization. As discussed in the following, most of these solutions can be integrated with the MDDF to address the current risk scenarios [7, 8]. Role-Based Access Control (RBAC) [9] complements PoLP by assigning access permissions based on predefined roles. This streamlines user privilege management, ensuring individuals have access only to resources essential for their specific organizational roles. Encryption and Tokenization are crucial components in securing sensitive data [10] that, respectively, transform data into unreadable formats and replace sensitive data with non-sensitive placeholders. Dynamic Data Masking (DDM) [11] obfuscates sensitive information dynamically to ensure that only authorized individuals can access and view sensitive data. Adaptive and comprehensive security measures are essential to counter the constantly evolv- ing cyber threat landscape of databases that includes challenges such as ransomware, advanced persistent threats (APTs), and insider threats [12]. The widespread adoption of cloud-based database solutions introduces new security con- cerns [13] because protecting data in a cloud requires guarding against unauthorized access, data breaches from users of the same provider, and compliance with data residency regulations. Pseudonymization is another vital facet of database security as it replaces personally iden- tifiable information (PII) with artificial identifiers or pseudonyms [14]. This adds another layer of privacy protection by making it challenging to directly associate sensitive data with specific individuals. Pseudonymization contributes to compliance with data protection regulations as it allows organizations to leverage data for legitimate purposes without compromising individual privacy. The implementation of pseudonymization within database systems enhances data se- curity by both reducing the impact of unauthorized access to personal information and limiting exposure in the event of a breach. 3. Multilevel Database Decomposition Framework This section details the decomposition step by step to outline the underlying principles. To this end, it is essential to explain how to decompose a database and then move from the decomposition of a database to that of its tables. Then, the section highlights the synchronization problems generated by a decomposition where the same table is shared, i.e. it belongs to distinct databases. This requires that some operations on the table in one of the databases resulting from the decomposition fire an automatic update of databases sharing the same table. 3.1. General Approach MDDF aims to fully satisfy the principle of least privilege (PoLP)[3] by decomposing a relational database shared among a set of users to minimize user access to the data involved in the operations each user can invoke. To this purpose, starting from a system where each user can access any information in the database, the framework produces a system with distinct databases and where each user is granted access to just one of the databases, ie to the smallest amount of data to implement the operations of interest. This is achieved by distributing the tables in the original database to several databases, each consisting of only a subset of those in the original one. Each resulting database includes all the tables to implement the operations of a class of users and if the operations do not access an attribute of a table, the attribute is dropped. Each user belongs to one class and it is assigned access rights on the tables in one of the databases the decomposition returns. This strategy satisfies the PoLP as it assigns all and only the access rights the user needs. MDDF can be integrated with any of the various mechanisms discussed in the previous section as any of these mechanisms can be applied to each of the databases. The last step of the framework maps the databases it builds to distinct containers, virtual or physical machines to confine a successful intrusion to one of the databases. Definition and Purpose: Normal forms apply decomposition to simplify the understand- ing and management of the database schema. This helps to reduce redundancy, to improve maintainability, and to enhance the overall database performance. The decomposition of a database, e.g. a set of tables, returns subsets of these tables. Each table is either an original table or a subset of the attributes, the columns, of an original table. The subsets of tables are not partitions because they can share some tables. Each resulting subset is stored separately, and it can be accessed and managed independently. The decomposition is implemented through a sequence of steps that, given a database and a set of users, can be described as follows: • Traditional Normalization in 3NF • Identification of User Groups • Creation of Database Subsets • Association of Groups and Subsets • Configuring Access Mechanisms • Choosing the Confinement Robustness 3.2. Steps The following subsections will show in detail the steps to be followed to apply the framework. 3.2.1. Traditional Normalization in 3NF Normalization is a process that structures data in a database to eliminate redundancy and de- pendency. The third Normal Form (3NF) is a widely used standard that reduces data redundancy. We assume this normalization has already been applied to ensure that the database is well structured and data is stored consistently. 3.2.2. Identification of User Groups Identifying users is a preliminary step that identifies all users who interact with the database and analyzes the needs of the operations of each user. In this way, we create groups of users tailored to their needs because two users belong to the same group if they invoke the same operations on the same tables. This allows access to sensitive information to be limited only to users who are entitled to operate on such information according to the need-to-know principle. 3.2.3. Creation of Database Subsets The creation of multiple databases, each a subset of the original one is the central step of the framework. It creates a distinct database for each user group, according to the operation the users in the group execute and the data these operations access. We consider the worst case, e.g. any table and any attribute a user may require should belong to the corresponding subset to ensure that users can access all and only the information they need. As a consequence, some tables will be shared among users in distinct groups and they appear in distinct databases. The attributes of these shared tables implement data exchange among users in distinct groups. This requires that an update in a table is spread across all the databases where the table, or the attribute, appears. 3.2.4. Association of Groups and Subsets The strategy to decompose the database obviously results in a biunivocal association between user groups and the databases the decomposition returns. The association is the input to define user access rights. The membership in a group is dynamic because it depends upon roles assigned by the organization. 3.2.5. Configuring the Access Mechanism This step offers a first level of security by granting each user the access rights to access all and only the information in the database associated with the user group. This prevents users from accessing information in the original database that is not necessary for their work. The detailed implementation of this step depends upon both the underlying operating system and database management system that have been adopted. The restriction of users to access only the information they need minimises the blast radius of an intrusion ie the amount of data that may be lost because of data breaches and unauthorized access. 3.2.6. Choosing the Confinement Robustness The definition of allocation of the databases from the previous steps onto physical machines, virtual machines, VMs, and containers determines the confinement level according to the robustness of the overall system that is required to protect the various information. The choice of allocation usually depends on the critical level of the information in a database and on how much the solution should confine an intrusion or an impersonation. 3.3. Confinement MDDF enables the designer to select different confinement levels, crucial for aligning the allocation with required security levels, resource usage, management complexity, and other factors. In more detail, each allocation offers a distinct robustness that also depends upon vulnerabilities in containers or virtual machines [15, 16, 17]. Mapping Confinement Robustness Distinct physical machines High Distinct VMs Medium-High Distinct containers Medium Simple database decomposition Low Table 1 Robustness levels of Alternative Database Mappings. Table 1 outlines the confinement levels, or robustness, resulting from various database allocations. Allocating databases to distinct physical machines provides the highest level of confinement, while a simple database decomposition offers the lowest. Each alternative has its pros and cons: • Distinct Physical Machines: High safety and confinement due to physical separation but high resource usage and management complexity. • Distinct VMs: Good confinement with customizable resource allocation, but with a resource overhead and expertise needed for management. • Distinct Containers: Lightweight solution with minimal overhead, but potential security risk due to shared underlying OS. • Simple Database Decomposition: Easier management and fewer resources, but low confinement and potential data integrity issues. Even if an intrusion can attack the second allocation and the third one by exploiting vulnera- bilities in containers or in VMs, all the allocations that MDDF supports offer a better robustness to intrusions than a simple transformation to the third normal form. The choice of the proper allocation should align with the specific requirements of the system of interest, balancing confinement, resource efficiency, and management complexity. 4. An Example This example considers a scenario with the information a healthcare organization, ie a hospital, manages information about operators, patients, prescriptions, and related administration. The scenario assumes that initially, the hospital uses a centralized database and the tables in the database record patient details and prescriptions and all the data to satisfy the needs of medical staff and patients. The overall database management system ensures medical professionals have access to proper patient information, while patients can conveniently retrieve their own medical records and prescription details. Further users exist according to the various roles in the organization. Among the possible groups of users, we also include statisticians and the corresponding operations. The administrative staff manages costs, while statisticians analyze aggregated data without compromising individual patient identities. When computing statistics, the names of the patients are replaced with pseudonymous IDs, ensuring a high level of privacy, and a mapping table simplifies the association between these pseudonymous IDs and patients. 4.1. Users and Database The Healthcare Database (HealthDB) of the organization is designed to meet the different needs of its users, including Patients, Medical Doctors, Nurses, Administrative Staff, and Statisticians. Users: • Patients (Patients): This group requires read access to their medical records, prescription details, exam results, and associated costs. Access is facilitated through a pseudonymous ID for privacy. • Medical Doctors (Doctors): Medical professionals need read and write access to patient information, medical history, and the ability to prescribe medications and order exams. • Nurses (Nurses): Nurses require read and write access to patient information, medical history, and the ability to record exam results and administer prescribed medications. • Administrative Staff (Admins): This group focuses on write access to the CostsTable for billing purposes. They may also have read access to other relevant information. • Statisticians (Statisticians): Statisticians have read-only access to aggregated data in the Patients table for statistical analysis. They cannot view sensitive information like names and addresses. Database: • Healthcare Database (HealthDB): This is the centralized database that stores informa- tion of interest in the following tables: – Patient Information Table (PatientTable): Includes pseudonymous IDs (Patien- tID), comprehensive medical history (MedicalHistory), and IDs of assigned doctors (AssignedDoctorID) and nurses (AssignedNurseID). A mapping table (MappingTable) manages the link between real patient identities and pseudonymous IDs where each ID is automatically generated on the first hospital visit. – Sensitive Data Table (SensitiveDataTable): Contains sensitive patient informa- tion such as real names (RealName), addresses (Address), and details of who pays the bill (PayerDetails). It is linked to PatientTable via pseudonymous IDs. – Prescription Records Table (PrescriptionTable): Stores data related to pre- scriptions, including unique prescription IDs (PrescriptionID), details of prescribed medications (MedicationDetails), dosage (Dosage), prescribing doctor IDs (Prescrib- ingDoctorID), and associated costs (AssociatedCost). – Exam Records Table (ExamTable): Stores information about tests and exams, including unique IDs (examID) and test and exam results (Results). – Costs Table (CostsTable): Manages the costs of medicines and exams, with trans- action IDs (TransactionID), item descriptions (Item), and associated costs (Cost). Accessed by administrative staff for billing purposes. – Mapping Table (MappingTable): Simplifies the mapping between pseudonymous patient IDs (PatientID) and their real identities (RealIdentityID). – Statistics Table (StatisticsTable): Stores aggregated and de-identified data from the Patients table, accessible to statisticians for analysis. Includes aggregated data IDs (AggregatedDataID) and de-identified aggregated data (AggregatedData). 4.2. MDDF Implementation Steps The implementation of MDDF adopts a systematic approach to satisfy the unique needs of user groups of the healthcare system. The key steps involve identifying users and their specific needs, computing the access permissions to satisfy these needs, and ensuring secure data management within the Healthcare Database (HealthDB). In the following, we delve into a short, high-level overview of these crucial implementation steps. 1. Identify Users and Their Needs: a) Patients • Needs: – Read access to their own medical records (MedicalHistory in PatientTable). – Read access to their prescription details (MedicationDetails, Dosage in PrescriptionTable). – Read access to their exam records (Results in ExamTable). • Required Access: Read access to PatientTable, PrescriptionTable, ExamTable in the Healthcare Database (HealthDB). b) Doctors • Needs: – Read and write access to patient information for assigned patients (Patient- Table). – Write access to prescribe medications (PrescriptionTable). – Write access to order exams (ExamTable). • Required Access: Read and write access to PatientTable, DoctorsTable, Pre- scriptionTable, Examtable in the Healthcare Database (HealthDB). c) Nurses • Needs: – Read and write access to patient information for assigned patients (Patient- Table). – Write access to record exam results (Examtable). – Write access to administer prescribed medications (PrescriptionTable). • Required Access: Read and write access to PatientTable, NursesTable, Exam- table, PrescriptionTable in the Healthcare Database (HealthDB). d) Administrative Staff (Admins) • Needs: – Write access to the CostsTable for billing purposes (CostsTable). • Required Access: Write access to CostsTable in the Healthcare Database (HealthDB). e) Statisticians • Needs: – Read-only access to aggregated data in the PatientsTable for statistical analysis (StatisticsTable). • Required Access: Read-only access to StatisticsTable in the Healthcare Database (HealthDB). 2. Create Database Subsets This step produces the distinct database subsets of HealthDB for the various user groups. These databases areMedDB for medical staff, PatientDB for patients, AdminDB for admin- istrative staff, and StatDB for statisticians. Since each database only include the relevant tables and fields the designated user roles require. we adhere to the principle of least privilege. The structure of each database the decomposition returns are listed below: Subset MedDB: This subset includes only the tables that in the main database are relevant to Medical Staff. Key tables comprise: • PatientTable: PatientID, MedicalHistory, AssignedDoctorID, AssignedNurseID • DoctorsTable: DoctorID, Name, Specialty • NursesTable: NurseID, Name, AssignedPatients • examsTable: examID, Results • PrescriptionsTable: PrescriptionID, MedicationDetails, Dosage, PrescribingDoc- torID • MedicineCostsTable: TransactionID, Item, Cost Subset PatientDB: This subset includes only the tables in the main database relevant to Patients. Key tables include: • PatientTable: PatientID, MedicalHistory, AssignedDoctorID, AssignedNurseID • SensitiveDataTable: PatientID, RealName, Address, PayerDetails • examsTable: examID, Results • PrescriptionsTable: PrescriptionID, MedicationDetails, Dosage, PrescribingDoc- torID Subset AdminDB: This subset includes only the tables from the main database relevant to Administrative Staff. Key tables comprise: • CostsTable: TransactionID, Item, Cost Subset StatDB: This subset includes only the main database tables relevant to Statisticians. Key tables comprise: • StatisticsTable: AggregatedDataID, AggregatedData These subsets ensure that each user group can access all and only to the information the corresponding roles require according to the principle of least privilege. 3. Associating User Groups and Database Subsets: The biunivocal association between a user group and one of the databases the decomposition returns is the fundamental input for the next step that defines user access rights. • Define User-Subset Mapping: The association between each user group and a database subset map, Medical Staff into the MedDB subset, Patients into the PatientDB subset while Doctors should be mapped to the MedDB subset (with access to fields like DoctorID, Name, Specialty), Nurses should be mapped into the MedDB subset (with access to fields like NurseID, Name, AssignedPatients), and Administrative Staff should be mapped into the AdminDB subset (with access to fields like TransactionID, Item, Cost). Statisticians are to be mapped into the StatDB subset (with access to fields like AggregatedDataID, AggregatedData). • Use Mapping for Access Control: Starting from the mappings previously defined, we can configure access control mechanisms. The mapping implies the assignment of the corresponding access rights and permissions to each user in a user group on the corresponding database subset. • Regularly Update Mapping: User groups should be reviewed with a fixed fre- quency because the group of a user can be updated to take into account changes in roles, responsibilities, and database structure. This ensures that access control remains aligned with organizational requirements. The overall decomposition process ensures that the mapping between users and database subsets is clearly defined, access control mechanisms are implemented according to this mapping, and updates occur to adapt to organizational changes. 4. Configure Access Mechanisms: After creating the database subsets, it is essential to define who has access to each subset and what level of access is allowed: a) Patients should have read access to the PatientDB subset, allowing them to view and access their pseudonymous medical records, prescription details (fields: Pre- scriptionID, MedicationDetails, Dosage, PrescribingDoctorID), and exam records (fields: examID, Results). b) Doctors should have read and write access to the MedDB subset, similar to Medical Staff, with the additional ability to prescribe medications (fields: PrescriptionID, MedicationDetails, Dosage, PrescribingDoctorID) and order exams (fields: examID, Results). c) Nurses should have read and write access to the MedDB subset as Medical Staff, but with the additional ability to record exam results (fields: examID, Results) and administer prescribed medications (fields: PrescriptionID, MedicationDetails, Dosage, PrescribingDoctorID). d) Administrative Staff (Admins) should have write access to the AdminDB sub- set, to enable them to manage costs associated with medicines and exams (fields: TransactionID, Item, Cost). e) Statisticians should have read-only access to the StatDB subset, to enable them to analyze aggregated and de-identified data in the PatientsTable for statistical purposes (fields: AggregatedDataID, AggregatedData). After defining the relationships between the user groups and the various database subsets, some further procedures have to be adopted to guarantee security and data management. Below, we list some further details of each procedure: • Authentication and Authorization: It ensures that every user is authenticated, allowing only authorized users to access data in their respective subsets. This may include the use of usernames and passwords, two-factor authentication, or other secure authentication methods. Additionally, it is crucial to assign specific roles and permissions based on user responsibilities. For example, Medical Staff and Doctors must have the role "Medical Professional" with full permissions for the MedDB subset, while Patients must have appropriate read access roles for the PatientDB subset. Nurses and Administrative Staff must also be assigned roles with relevant permissions. • Auditing and Monitoring: It implements an audit system to track who accesses the data and what operations are invoked. This system will help to detect suspicious activities or unauthorized access. As an example, it is critical to record who accesses tables in the MedDB and AdminDB subsets and to track the updates to these data. These procedures ensure that an organization manage in the proper way data and security, according to the specific needs and responsibilities of each type of user. 5. Synchronize Shared Tables: This step identifies shared tables, i.e. tables that appear in distinct database subsets. In this example, multiple copies of several tables exist. Hence, it is essential to synchronize the updates of these tables to ensure data consistency and integrity despite replication. When multiple users or divisions can update a shared table, the following steps occur: • Identify Common Tables: This first step has to identify the tables containing shared data among subsets. For instance, in our healthcare organization, tables such as PatientTable (fields: PatientID, MedicalHistory, AssignedDoctorID, Assigned- NurseID) and examsTable (fields: examID, Results) are shared between the MedDB and PatientDB subsets, and synchronization is crucial to avoid data inconsistencies. • Implement Synchronization Rules: It defines rules to implement data synchro- nization among subsets. These rules cover scenarios such as updates, insertions, and deletions. For example, when medical staff updates patient data in the MedDB subset, these changes must be reflected in the PatientDB subset so that patients have access to the updated information. Similar synchronization rules have to be established for shared tables among other subsets. • Plan Synchronization: It plans when data synchronization occurs. For instance, update to critical data should be immediate to ensure immediate alignment. Less critical data can tolerate a weaker consistency. As an example, data update for administrative staff can occur once a day, usually at night. A weekly update may be appropriate for statisticians. The synchronization strategy should consider the needs of all user groups, including Medical Staff, Patients, Doctors, Nurses, and Administrative Staff. The choice of the synchronization strategy should tune the synchronization overhead to the urgency and relevance of data for each user group while preserving consistency across different subsets and assuring the integrity of all the organization data. 6. Optimizing Confinement Robustness The optimization of confinement robustness involves the definition of the degree of physical and logical separation among the database subsets according to data sensitivity and security needs. The steps to follow are: • Data Classification: It classifies data based on its sensitivity level. For example, personal information (RealName, Address, PayerDetails) and medical history (Medi- calHistory) should be classified as highly sensitive, while public or non-sensitive data are classified as less critical. • Assigning Security Levels: It assigns a security level to each database subset based on data classification. For instance, the financial database (CostsTable) requires the highest security level, while the other database may require a lower one. • Allocation Choice: To show the flexibility of MDDF, we assume that the most cost-effective solution is the one that hosts the PatientDB subset on a separate physical machine and maps other databases onto a further physical machine and uses virtualization to separate these databases. – Separate Physical Machine (PatientDB): The PatientDB subset should be allocated to a dedicated physical server for maximum confinement. This is required when the most robust confinement is a system requirement. – Virtual Machines (VMs): We show how to allocate the other subsets onto three virtual machines (VMs) and run them on another physical machine. In this way, VMs offer logical separation within a single shared physical server. The other databases are mapped onto the VMs as follows: ∗ VM Prescription: A virtual machine to manage prescription data (Prescrip- tionID, MedicationDetails, Dosage, PrescribingDoctorID, AssociatedCost). ∗ VM Exam Costs: A virtual machine is devoted to exam cost data (Trans- actionID, Item, Cost). ∗ Containerization (Mapping and Statistics): Within the VM "Exam Costs" machine we allocate to distinct containers the MappingTable and StatisticsTable. Containers offer lightweight and efficient confinement, striking a balance between complete confinement and resource efficiency. This example outlines how MDDF can support distinct degrees of physical and logical confinement across and within physical and virtual machines according to the security needs of each database subset. 5. Comparison of Alternative Synchronization Solutions This section discusses the overhead of alternative solutions to synchronize tables shared among distinct database subsets. In particular, it compares the features and capabilities of two APIs based on, respectively, triggers and events. A trigger is a database event that fires the execution of a code fragment as a response to predefined conditions to preserve the integrity of the database, ensuring that some actions are executed when specific changes occur. An alternative solution adopts SymmetricDS, an open-source tool to support data replication and synchronization among databases. It is used by several companies and organizations to replicate distributed environments, such as bank branches, remote offices, or geographically separated data centers (e.g. content delivery network). In particular, it is used by OpenMRS, a collaborative open-source project to develop software to support the delivery of health care in developing countries. 5.1. Performance Comparison of the Two Solutions This section compares the performance of the two synchronization solutions: Trigger-API and SymmetricDS. A series of experiments have been implemented to evaluate their effectiveness under various conditions. Experiment Trigger-API SymmetricDS Latency (ms) - Local Configuration 59 95 Latency (ms) - Remote Configuration 181 258 Latency (ms) - Distributed Configuration 800 2160 CPU Utilization - Light Workload 5% 25% CPU Utilization - Medium Workload 30% 60% CPU Utilization - Heavy Workload 70% 90% Conflict Resolution (ms) - 100% Resolved 76 180 Conflict Resolution (ms) - 90% Resolved 55 70 Conflict Resolution (ms) - 80% Resolved 40 60 Two-way Synchronization (ms) - Main to Remote 161 215 Two-way Synchronization (ms) - Remote to Main 166 256 Variable Load (tps) - Peak Activity 63 51 Variable Load (tps) - Calm Period 37 49 Table 2 experiments and results for the two solutions Table 2 presents the results of various performance metrics for Trigger-API and SymmetricDS. In the "Latency" experiments, Trigger-API consistently outperforms SymmetricDS across all configurations, showing lower latency in local, remote, and distributed setups. In terms of CPU Utilization, Trigger-API achieves better efficiency, mainly under light and medium workloads, with significantly lower CPU usage compared to SymmetricDS. Trigger-API offers superior performance in handling data conflicts for all percentages of resolved conflicts. Regarding Two-way Synchronization, Trigger-API performs better in both directions, showing lower synchronization times than SymmetricDS. According to Variable Load testing, Trigger-API offers higher transaction rates during peak activity, while SymmetricDS shows slightly better performance during calm periods. Overall, the table outlines a comprehensive comparison of the performance between Trigger- API and SymmetricDS, highlighting Trigger-API’s superior performance in various scenarios. 6. Conclusions The Multilevel Database Decomposition Framework offers several benefits, including: • Preserve privacy: In healthcare, where sensitive patient information is handled, database decomposition improves security by restricting each user to one subset with the data the user needs to protect the privacy of individuals. In healthcare, where sensitive patient information is handled, this is crucial for complying with regulations such as GDPR [1] and HIPAA [2]. The framework supports the implementation of precise access controls, ensuring that only authorized medical staff can access specific patient information. • Reduced Impact of Security Breaches: The healthcare sector is a prime target for cyber- attacks. Database decomposition limits the impact of security breaches, as compromising one subset does not expose patient records in distinct databases. • Reduction of complexity: By decomposing the database into smaller subsets, users can use simpler data structures that are easier to understand. • Improved performance: Access to smaller, more tailored tables and relationships improves database performance. • Efficient Data Retrieval for Medical Research: Researchers can access specific subsets relevant to their studies, streamlining data retrieval for research purposes. This can contribute to advancements in healthcare through data-driven insights. • Scalability for Growing Healthcare Systems: As healthcare systems expand, the multilevel database decomposition framework allows for scalability. New subsets can be added to accommodate the growing volume of patient data. As a counterpart, the proposed framework also presents some challenges: • Management complexity: Managing multiple subsets may increase the complexity of database administration and require greater resources for maintenance and updates. • Risk of data inconsistency: Database decomposition requires the adoption of proper synchronization mechanisms with the resulting overhead. References [1] European Union, Data protection in the EU, 2023. URL: https://ec.europa.eu/info/law/ law-topic/data-protection_en. [2] D. L. Anthony, A. Appari, M. E. Johnson, Institutionalizing hipaa compliance: Organizations and competing logics in u.s. health care, Journal of Health and Social Behavior 55 (2014) 108– 124. URL: https://doi.org/10.1177/0022146513520431. doi:10.1177/0022146513520431, pMID: 24578400. [3] J. H. Saltzer, M. D. Schroeder, The protection of information in computer systems, Proc. IEEE 63 (1975) 1278–1308. [4] R. Smith, A contemporary look at saltzer and schroeder’s 1975 design principles, IEEE Security & Privacy 10 (2012) 20–25. [5] M. Campobasso, L. Allodi, Impersonation-as-a-service: Characterizing the emerging criminal infrastructure for user impersonation at scale, in: Proc. of the IEEE Int. Symposium on Secure Software Engineering, volume 1, IEEE, 2006, p. 1. [6] S. A. Moiz, P. Sailaja, G. Venkataswamy, S. N. Pal, Database replication: A survey of open source and commercial tools, International Journal of Computer Applications 13 (2011) 1–8. [7] N. Al-Sayid, D. Aldlaeen, Database security threats: A survey study, in: 2013 5th Interna- tional Conference on Computer Science and Information Technology, 2013, pp. 60–64. [8] M. Humayun, N. Jhanjhi, M. Almufareh, M. Khalil, Security threat and vulnerability assessment and measurement in secure software development, Computers, Materials and Continua 71 (2022) 5039–5059. [9] I. Singh, N. Kumar, K. Srinivasa, T. Sharma, V. Kumar, S. Singhal, Database intrusion detection using role and user behavior based risk assessment, Journal of Information Security and Applications 55 (2020) 102654. [10] S. Ibrahim, A. Zengin, S. Hizal, A. Suaib Akhter, C. Altunkaya, A novel data encryption algorithm to ensure database security, Acta Infologica 7 (2023) 1–16. [11] A. Cuzzocrea, H. Shahriar, Data masking techniques for nosql database security: A systematic review, in: 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 4467–4473. [12] I. Linkov, F. Baiardi, M. V. Florin, S. Greer, J. H. Lambert, M. Pollock, B. D. Trump, Applying resilience to hybrid threats, IEEE Security & Privacy 17 (2019) 78–83. [13] P. Yang, N. Xiong, J. Ren, Data security and privacy protection for cloud storage: A survey, IEEE Access 8 (2020) 131723–131740. [14] M. Binjubeir, A. A. Ahmed, M. A. B. Ismail, A. S. Sadiq, M. Khurram Khan, Comprehensive survey on big data privacy protection, IEEE Access 8 (2020) 20067–20079. [15] J. Wu, et al., An access control model for preventing virtual machine escape attack, Future Internet 9 (2017) 20. [16] A. Y. Wong, et al., On the security of containers: Threat modeling, attack analysis, and mitigation strategies, Computers & Security 128 (2023) 103140. [17] S. Shringarputale, P. McDaniel, K. Butler, T. La Porta, Co-residency attacks on containers are real, in: Proc. of the 2020 ACM SIGSAC Conf. on Cloud Computing Security Workshop, ACM, 2020, pp. 53–66.