Mapping from Relational Database to Ontology Based on Initial Model Vladislav V. Moiseeva , Ivan A. Zagaichuka a Ulyanovsk State Technical University, Ulyanovsk, Russia Abstract The article describes the ontology compilation from the relational database. The process is based on the model called initial. The model is applicable for the design and redesign of existing software products and tuning of the final ontology, including its size reduction. The described model allows obtaining data from a relational schema. The data can be subsequently used for other more flexible formalizations. Moreover, the model can be extended by new parameters which cannot be derived from the schema but be quite useful for design tasks. The paper brings in a short overview of different approaches to ontology compilation based on relational data sources. Additionally, two extra strategies are presented. Finally, the article contains an example of these strategies implementation which is measured by performance tests on two different data sources. Keywords relational data schema, data model, ontology, OWL, OWL API 1. Introduction The new redesign methods are highly appreciated due to the increasing complexity of software systems. Moreover, new methods are expected to be more performed and optimal. Initially, mathematical logic and computer science formed an idea of a formal subject defi- nition [1]. Thus, the theory of ontologies started. The ontology is a specific formalization that can define relationships and contradictions between the concepts and other elements of subject domain. The ontologies are general-purpose structures that can be a base for other types of domain models. For instance, it can be an object-oriented UML schema, relational model, or another kind of formalization that is necessary for information systems. Additionally, ontolo- gies help to build a solid integration data system. Nowadays, there are many text data exchange standards that exist, for example, the SOAP protocol. The SOAP’s schema is usually described by XSD-documents [2]. Axiomatics makes ontologies useful for the design and redesign of software products. Nevertheless, there are many software products that have no subject domain ontologies. That can be explained as a lot of products are not matured and their domains seem to be in- ternally consistent. As a result, the need for ontology might be unnecessary action. However, Russian Advances in Artificial Intelligence: selected contributions to the Russian Conference on Artificial intelligence (RCAI 2020), October 10-16, 2020, Moscow, Russia " v.v.moiseev@ulstu.ru (V.V. Moiseev); zagaichuk.ivan@gmail.com (I.A. Zagaichuk)  0000-0003-2671-4653 (V.V. Moiseev); 0000-0002-4721-5960 (I.A. Zagaichuk) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) the need appears and the initial data required for the compilation of ontology lies in the rela- tional data storage. Thus, relational storage can be converted to ontological resource due to the need of resolving contradictions within the redesign process. However, the conversion might be complicated due to the variability between the initial data source and the result [3]. For example, the process of relational database conversion into ontology (using approached [4, 5]) can be compromised by several reasons: • It seems not possible to extract all data about the subject domain from the relational model; • Some elements of the relational model are redundant; • Each relational schema has features of model representation. The data extraction of a subject domain from a relational database is high-loaded for large information systems and represents an additional difficulty. Therefore, to simplify the redesign process by ontological analysis, it is recomended to define the model that provides several abilities of: • The model transformation into an ontological resource; • The generation of other formalizations besides ontology; • The independence of the implementations of relational databases; • The exclusion of some entities and other elements of a relational data schema without losing model consistency; • Supplementation of the relational model with additional properties that may be useful for the compilation of an ontological resource. This model is called an initial because it is a base for further ontology compilation of subject domain. 2. The overview of ontology compilation strategies based on relational databases There are many strategies for transforming relational schemas into an ontology (e.g. the con- cept of direct mapping [8, 9]). It describes the matching of relation schema structures to meta- data description technology RDF (Resource Description Framework) developed by W3C. The formal mapping description is also given. The fine-tuning of the initial and result data set cannot be provided by the model. That fully describes direct mapping. W3C develops another technology named as R2RML [10] for relational schema transforma- tion into metadata. The schema helps provide the fine-tuning in a comparison with direct map- ping. The W3C recommendation does not contain the formal language description of R2RML. However, it has various examples of data conversion. There are also works describing the transformation of relational data schemas in ontology, and not just in RDF metadata. For example, [11] gives an overview of current developments in such a transformation. This article also undercovers the main types of entities and the basic rules of transformation and its own data transformation strategy. The following entity types can be distinguished according to [11]: • The strong entities without foreign keys (references); • The strong entities with foreign keys; • The weak entities: – Foreign and primary keys are the same; – The composite primary key that contains various foreign keys for all fields; – The composite primary key is and the attributes of the initial entity are not dupli- cated in referenced ones; – The composite primary key is and some simple attributes can be duplicated in ref- erenced entities. The categories can be used for the correct definition of transformation rules. For instance, the entities that have not foreign keys can be converted into ordinary ontology classes. It is also necessary to take into account the constraints of entities. For example, the NOT NULL constraint can be specified by MinCardinality in OWL notation, the UNIQUE constraint can be assigned by InverseFunctionalProperty [11]. Some transformation algorithms from relational DBMS to ontology (e.g. [5]) propose a phased transition: classification of tables, their mapping, mapping columns to data properties, map- ping relationships and mapping constraints. 3. The description of the initial data model for formation of ontologies The initial data model for ontology compilation is denoted as 𝑀. Thus, the model can be expressed as: 𝑀 =< 𝐸, 𝑅 > (1) where: 𝐸 = {𝐸1 , 𝐸2 , … , 𝐸𝑛 } - entities of relational schema 𝑅 = {𝑅1 , 𝑅2 , … , 𝑅𝑚 } - the relations between entities 𝑛 - the number of entities 𝑚- the number of relations The entity can be described as: 𝐸𝑖 = (𝑁 𝑎𝑚𝑒, 𝐴, 𝐶) (2) where: 𝐴 = {𝐴1 , 𝐴2 , … , 𝐴𝑛 } - many attributes of entity 𝑖 𝐶 = {𝐶1 , 𝐶2 , … , 𝐶𝑛 } - many constraints of entity 𝑖 𝑛 - the number of attributes of entity 𝑖 The entity attribute can be presented as: 𝐴𝑖 = (𝑁 𝑢𝑚, 𝑁 𝑎𝑚𝑒, 𝑇𝑦𝑝𝑒, 𝑃) (3) where: 𝑁 𝑢𝑚 is an ordered number of attribute i within the cortege of entity attributes 𝑁 𝑎𝑚𝑒 - the name of attribute which must be unique within the entity 𝑇𝑦𝑝𝑒 = [𝐼 𝑛𝑡, 𝐹 𝑙𝑜𝑎𝑡, 𝐷𝑜𝑢𝑏𝑙𝑒, 𝐶ℎ𝑎𝑟, 𝐷𝑒𝑐𝑖𝑚𝑎𝑙, 𝐵𝑖𝑡] - data type 𝑃 = {𝑃1 , 𝑃2 , … , 𝑃𝑛 } - many additional attribute properties Additional attribute properties may be as follows: • maximum length (for string fields); • maximum number of digits (for integer and decimal numbers); • number of decimal places (for decimal numbers); • text description. The constraint of entity 𝑃 can be written as: 𝑃𝑖 = 𝐹 (𝐶) (4) where: 𝐶 - many attributes that constraint was applied to 𝐹 - the function that defines the type of constraint The constraints can be described as follows: • 𝐹𝑃 𝐾 is a primary key constraint; • 𝐹𝑁 𝑁 is a NOT NULL constraint; • 𝐹𝑈 𝑄 is a unique constraint; • 𝐹𝐹 𝐾 is a foreign key constraint. The entity relations can be described as: 𝑅 = 𝐷(𝐹𝐹 𝐾 ,𝑖 , 𝐹𝑃𝐾 ,𝑗 , 𝑅𝑃) (5) where: 𝐷 is a function that describes the type of relation; 𝐹𝐹 𝐾 ,𝑖 is a foreign key constraint of entity 𝑖; 𝐹𝑃𝐾 ,𝑗 is a primary key constraint of entity 𝑗; 𝑅𝑃 = {𝑅𝑃1 , 𝑅𝑃2 , … , 𝑅𝑃𝑛 } - many of auxiliary parameters of relation. The conclusion that is based on (2), (3) and (5) states the definition of the R relation can be simplified as: 𝑅 = 𝐸𝑖 ◦𝑅 𝐸𝑗 ∶ {𝐸𝑖 , 𝐸𝑗 } ⊆ 𝐸 (6) where: ◦𝑅 is the operation of binding entities 𝐸𝑖 and 𝐸𝑗 . It is proposed to record the resulting model using the XML markup language using the fol- lowing rules: • 𝑆𝑐ℎ𝑒𝑚𝑎 is a key element of the model; • 𝐸𝑛𝑡𝑖𝑡𝑦 is the descriptive element; • 𝐶𝑜𝑙𝑢𝑚𝑛 is an attribute for description; • 𝑆𝑐ℎ𝑒𝑚𝑎 and Name mean the names of entities, attributes, and constraints; • 𝐼 𝑠𝑃𝑟𝑖𝑚𝑎𝑟 𝑦𝐾𝑒𝑦 is an attribute of a primary key constraint; • 𝐼 𝑠𝑁 𝑢𝑙𝑙𝑎𝑏𝑙𝑒 is NOT NULL constraint; • 𝐹 𝑜𝑟𝑒𝑖𝑔𝑛𝐾𝑒𝑦 is a constraint for the foreign keys. 4. Ontology compilation strategies There are two proposed options for the ontology compilation: 1. Filling the basic ontology of a relational data warehouse by individuals. 2. Ontology creation with entity classes and subsequent filling by individual records. The first approach is the simplest but least flexible. In this case, there is a basic ontology with the classes “entity”, “relationship”, “entity field”, “procedure”, “procedure argument”, “presen- tation”, etc. Object relations are set for the classes. For example, the basic transitive object property “is part” is defined, which is used for relationships between entities and their fields, as well as procedures with their own arguments. The particular elements of the relational schema are filled into the ontology as individuals (individuals) with the axioms of class membership (Class Assertion). This approach has a significant ability to represent any relational schema while expanding the initial ontology. This strategy allows for making similar ontologies from software products of the same subject domains. Thus, they will differ only by sets of individuals and related class axioms. Finally, that helps to simplify the further analysis for integration systems design. The main drawback of this approach appears while expanding the basic ontology. In other words, a software update is required to provide new individuals in accordance with the initial schema. Moreover, this approach does not support all the opportunities provided by ontolo- gies. For example, it is difficult to search for contradictions because the relationships between entities are represented as individuals, and not as object properties with their attributes. The second proposed strategy differs from the first that the initial data model is used as a base for a separate ontology resource that is not associated with any basic one. In this case, the ontology classes are entities, procedures, and other elements of the initial relational schema. The entity relations convert into object properties, the fields of entities, and procedure argu- ments are into the data properties. There is also possible to use relational data records to fill the ontology that are used as individuals for conducting an ontological resource solvability. The vantage of this strategy is the flexibility and application of the entire ontological tools. Although the relational schema does not support all required data for the ontology compila- tion, this can be achieved by initial schema extension. For example, it is possible to define the attributes (e.g. transitivity, symmetry, etc.) of object properties for the initial schema. The re- sulting ontology will most fully describe the domain already at the initial stage of compilation. The disadvantage of this strategy is the need to supplement the initial schema after it is obtained from the relational database or to refine the resulting ontology after it is get. 5. The implementation of the proposed strategies Since the chosen strategies include two stages (the formation of the initial model and the com- pilation of the ontology), the implementation was also divided into two stages. Firstly, a generator and designer of the initial data model have been developed with Visual C# and Windows Forms technology that allows rapid UI development. Microsoft SQL Server 2012 was chosen as a data management tool based on the relational data paradigm. The DBMS allows easy access to the description of the data schema. However, the choice of a particular DBMS was up to the experience of the authors, rather than the significant advantages of the current DBMS system over analogues. The serialization of the initial model is performed using the XML language with an attributive approach to the description of the properties of elements. That also helped to simplify the processes of work with the data model and reduced size of serialized data. Java and OWL API libraries were used for the generation of ontologies. The choice is justified by the prevalence, simplicity, and open-source type. The software helps to deserialize the initial models and generates, based on settings, the ontology in OWL XML. The format was chosen due to its readability, prevalence, and convenient debugging and overview tools (protégé and others). The basic ontology must be opened for the implementation of the first strategy. It is done using the following code: OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); OWLOntology ontology = manager.loadOntologyFromOntologyDocument( new File("sql-base-ontology-1.0.0.owl")); OWLDataFactory df = manager.getOWLDataFactory(); The following are the main classes of the basic ontology (parts of duplicated code omitted): final String ontologyIRI = "urn:vladdy-moses.sql-base-ontology"; OWLClass entityTableType = df.getOWLClass(ontologyIRI + "#Table"); // ... OWLDataProperty schemaDataProp = df.getOWLDataProperty( ontologyIRI + "#schema_name"); // ... A named individual is created for each entity of relational storage and the following set of axioms is applied to it: • 𝐷𝑎𝑡𝑎𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to determine the data schema and entity name; • 𝐶𝑙𝑎𝑠𝑠𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to assign an entity to an ontology class (for example, an entity can be a table, view, or procedure). A named individual is created for each entity field and the following set of axioms is deter- mined on it: • 𝐷𝑎𝑡𝑎𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to determine the field name; • 𝐶𝑙𝑎𝑠𝑠𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to assign an individual to a class of entity fields; • 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to associate a field with an entity. Each relationship between entities (defined by a foreign key in the traditional schema) is also represented by a named individual with the following set of axioms: • 𝐷𝑎𝑡𝑎𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to determine the data schema and name of the relationship; • 𝐶𝑙𝑎𝑠𝑠𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to assign the relationship to an ontology class; • 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to describe whether a relation belongs to an entity field on the one hand and to the primary key of an external entity on the other. The new resource will be created in a similar way as in the case with the secondly proposed strategy. The class with axiom DeclarationAxiom is instantiated for each entity. Each entity’s property is converted to the data property and supports the next axioms: • 𝐷𝑒𝑐𝑙𝑎𝑟𝑎𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚; • 𝐷𝑎𝑡𝑎𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐷𝑜𝑚𝑎𝑖𝑛𝐴𝑥𝑖𝑜𝑚 is used to determine the domain of the data property. Each relationship of the relational model is converted into an object property with the fol- lowing axioms: • 𝐷𝑒𝑐𝑙𝑎𝑟𝑎𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚; • 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐷𝑜𝑚𝑎𝑖𝑛𝐴𝑥𝑖𝑜𝑚 is used to determine the domain of an object property. This strategy allows generating individuals for each database record based on this relational schema. In this case, each record has an individual with the following axioms: • 𝐶𝑙𝑎𝑠𝑠𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used to associate a record with a specific class that describes the entity; • 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used for foreign keys; • 𝐷𝑎𝑡𝑎𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦𝐴𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝐴𝑥𝑖𝑜𝑚 is used for the calculation of each non-null field. The early presented strategies saving procedures coincide with the saving of the ontology and occurs as follows: OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); OWLDocumentFormat documentFormat = new OWLXMLDocumentFormat(); manager.saveOntology(ontology, documentFormat, IRI.create(new File("result.owl"))); 6. Generation result It is necessary to take several databases from various subject areas to test the operability of the proposed ontology strategies compilation from a relational data schema. The data of addresses and houses from the Russian Federal Information Address System, Ulyanovsk Oblast (denoted as 𝐷𝐵1) and data of overhaul in Ulyanovsk region (denoted as 𝐷𝐵2) will be used. Both relational storages correspond to 3rd normal form. Therefore, there would not be any difficulties with the compilation of ontologies. Table 1 The characteristics of relational storages. Parameter DB1 DB2 The number of tables 2 15 The number of fields 67 338 The number of foreign keys 5 19 The number of records 651,883 2,126,054 Data DBMS size 159,936 Kb 355,176 Kb Table 2 The result of ontology generation. Parameter DB1-ST1 DB1-ST2 DB2-ST1 DB2-ST2 Ontology compilation time, s 2 36 3 93 Ontology saving time, s <1 21 <1 48 The size of Ontology (OWL XML format), Kb 114 212,687 204 569,827 Number of classes 13 2 13 15 The number of individuals 2 651,883 2 2,126,054 Number of data properties 180 2,065,082 784 5,645,010 Number of object properties 146 2,045,211 544 1,785,652 The characteristics of relational storages are presented in table 1. After the initial data received, the schemas 𝐼 𝑆1 and 𝐼 𝑆2 are formed and based on 𝐷𝐵1 and 𝐷𝐵2 respectively. Their serialized sizes are 35 Kb and 86 Kb respectively. The 𝑆𝑇 1 and 𝑆𝑇 2 strategies were used to generate ontologies and the results are in table 2. As it is seen, the size of the compiled ontology many times more than data in DBMS format (varying from 1.33 to 1.6). Firstly, it depends on the OWL XML format that keeps each of the axioms as an element of XML language and stores parameters as nested elements. However, other formats were not capable to minimize the size of the final ontology. Obviously, the 𝑆𝑇 2 strategy (assumes generation of entity classes and individual records) increases the ontology compilation time more than the 𝑆𝑇 1. It is due to the fact that data extraction from the relational database requires additional time for setting up a connection and execution of SQL requests. The saving time stays linear to the size of ontology. 7. Conclusion The proposed data model based on ontologies is useful for software products redesign. The model does not rely on the physical implementation of data storage. Moreover, it owns such properties as unity and flexibility which help to build ontologies on its base. The described strategies that are used for ontology compilation from a relational database helps to set up this process for systems based on relational data storage. The strategies are both applicable for the implementation of systems’ interfaces and for ontology compilation. The ontology can help to find contradictions within schema and data of the database. References [1] Guarino N. Formal ontology in information systems: Proceedings of the first international conference (FOIS’98), June 6-8, Trento, Italy. (1998). pp. 3-15. [2] Simple Object Access Protocol (SOAP) 1.1, W3C Note 08 May 2000, Avaliable at: https://www.w3.org/TR/2000/NOTE-SOAP-20000508/. [3] Yarushkina N., Moiseev V. Analytical Review of Data Transformation for the Task of Inte- grating Various Representations on the Example of Ontologies and Relational Databases. CEUR Workshop Proceedings, Vol. 2413 (2019). pp. 191-197. [4] Bumans G. Mapping between Relational Databases and OWL Ontologies: an Example, Avaliable at: https://www.lu.lv/materiali/apgads/raksti/756_pp_99-117.pdf. [5] Astrova A., Kalja A. Mapping of SQL Relational Schemata to OWL Ontologies, Avaliable at: http://wseas.us/e-library/conferences/2006elounda1/papers/537-193.pdf. [6] Freitas, Ricardo André Pereira. "Relational databases digital preservation." (2013), Avaliable at: https://repositorium.sdum.uminho.pt/bitstream/1822/25655/1/ Ricardo%20Andr%C3%A9%20Pereira%20Freitas.pdf [7] .edmx File Overview (Entity Framework), Avaliable at: https://docs.microsoft.com/en- US/previous-versions/dotnet/netframework-4.0/cc982042(v=vs.100), last accessed 2019/04/05. [8] A Direct Mapping of Relational Data to RDF, http://www.w3.org/TR/rdb-direct-mapping/. [9] Sequeda1 J., Arenas M., Miranker D.: A Completely Auto- matic Direct Mapping of Relational Databases to RDF and OWL, https://www.researchgate.net/publication/267232319_A_Completely_Automatic _Direct_Mapping_of_Relational_Databases_to_RDF_and_OWL. [10] R2RML: RDB to RDF Mapping Language, https://www.w3.org/TR/r2rml/#dfn-r2rml- mapping-document. [11] Louhdi M., Behja H., Alaoui S.: Transformation Rules For Building OWL Ontologies from Relational Databases, https://airccj.org/CSCP/vol3/csit3822.pdf.