Using ontology merging for the integration of information systems and the production capacity planning system N Yarushkina1, A Romanov1, A Filippov1, A Dolganovskaya1 and M Grigoricheva1 1Ulyanovsk State Technical University, Severny Venets street, 32, Ulyanovsk, Russia, 432027 e-mail: jng@ulstu.ru, romanov73@gmail.com, al.filippov@ulstu.ru, gms4295@mail.ru Abstract. This article describes the method of integrating information systems of an aircraft factory with the production capacity planning system based on the ontology merging. The ontological representation is formed for each relational database (RDB) of integrated information systems. The ontological representation is formed in the process of analyzing the structure of the relational database of the information system (IS). Based on the ontological representations merging the integrating data model is formed. The integrating data model is a mechanism for semantic integration of data sources. 1. Introduction As part of the work on automating the process of production capacity of the aircraft factory, it is necessary to take into account the presence of heterogeneous information systems in the aircraft factory that automates various business processes [1]. Data consistency can be realized by integrating the production capacity planning system with existing information systems of the aircraft factory. Data integration means the integration of data from different sources and the providing of data to users in a unified way. The main difficulties of data integration are: (i) Data models heterogeneity. (ii) Independence of information systems of the aircraft factory from each other. (iii) Data can be located in different segments of the local network of the aircraft factory and (or) on the Internet. (iv) Different data formats. (v) Different value representations. (vi) Loss of data relevance by one of the data sources. Thus, the organization of the information interaction between the production capacity planning system and the existing information systems of the aircraft factory raises the need to solve the following methodological problems [2, 3, 4, 5, 6, 7, 8, 9]: V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva (i) Creating an integrating data model. Integrating data model is the basis of a single user interface in the integration system. (ii) Development of methods for building onological representations for specific models of various data sources. (iii) Development of methods for building integrating data model for specific models of various data sources. (iv) Solving the problem of data sources heterogeneity. (v) Development of mechanisms for semantic integration of data sources. 2. Ontological representation of data source The proposed information interaction algorithm consists of the following steps: (i) Extracting metadata from the RDB schema for automatic generation of ontologies for the source and target RBDs. (ii) Ontology merging to configure correspondence between objects, attributes, and relationships of integrated ISs. Creation of metaontology. (iii) Using the metaontology to perform the interaction procedure on a schedule or event. The metaontology is the settings contains correspondences between data models (tables and columns) of integrated ISs. Ontology is a model knowledge representation of a specific problem area [10]. An ontology contains a set of classes, individuals, properties, and relations between them. An ontology is based on the dictionary of terms which reflecting the concepts of a problem area. Also, the dictionary contains a set of rules (axioms). Terms can be combined to construct a set of statements about the state of the problem area based on a set of axioms. At the moment, a lot of researchers use the ontological approach for extracting metadata from the RDB schema: (i) The Relational.OWL [11] currently supporting only MySQL and DB2 database management systems (DBMS). The generated ontology contains classes: Database, Table, Column, and PrimaryKey, and properties: has, hasTable, hasColumn, isIndentifiedBy, references, scale, length. The main disadvantage of ontology generated by Relational.OWL is the presence of limited coverage of the domain, not considering, for instance, data type, foreign keys, and constraints. (ii) The OWL-RDBO [12, 13] currently supporting only MySQL, PostegreSQL and DB2 DBMSs. The generated ontology contains classes: DatabaseName, RelationList, Relation, AttributeList, Attribute, and properties: hasRelations, hasType, referenceAttribute, referenceRelation. ,The main disadvantage of ontology generated by OWL-RDBO is the presence of concepts external to the domain, such as RelationList to group a set of Relation, and AttributeList to group a set of attributes. (iii) Other approaches, such as [14, 15] extract the real world relations from the RDB structure, and unable to reconstruct the original schema of the RDB. The relational data model can be represented as the following expression: RDM = hE, H, Ri, (1) where E = {E1 , E2 , . . . , Ep } is a set of RDB entities (tables); Ei = (name, Row, Col) is the i-th RDB entity that contains the name, set of rows Row and columns Col; Colj = (name, type, constraints) is the j-th column of the i-th RDB entity that contains V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 402 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva properties: the name, the type and set of constraints; H = {H1 , H2 , . . . , Hq } is a hierarchy of RDB entities in the case of using the table inheritance function: Hj = Ei D (x) Ek , (2) where Ei and Ek are RDB entities; D (x) is a ’parent-child’ relation between Ei and Ek ; R = {R1 , R2 , . . . , Rr } is a set of RDB relations: F (x) Rl = Ei Ek , (3) G (x) where F (x) is an RDB relation between Ei and Ek ; G (x) is an RDB relation between Ek and Ei . Functions F (x) and G (x) can take values: U is a single relation and N is multiple relations. The ontological representation of the RDB data model is: O = hC, P, L, Ri, (4) where C = {C1 , C2 , . . . , Cn } – is a set of data model ontology classes; P = {P1 , P2 , . . . , Pm } – is a set of properties of data model ontology classes; L = {L1 , L2 , . . . , Lo } – is a set of data model ontology constraints; R is a set of data model ontology relations: R = {RC , RP , RL }, (5) where RC is a set of relations defining the hierarchy of data model ontology classes; RP is a set of relations defining the ’class-property’ data model ontology ties; RL is a set of relations defining the ’property-constraint’ data model ontology ties. The following function is used to map the RDB structure (ex. 1) to the ontological representation (ex. 4): F (RDM, O) : {E RDM , H RDM , RRDM } → → {C O , P O , LO , RO }, (6) where {E RDM , H RDM , RRDM } is a set of RDB entities and relations between them (eq. 1); {C O , P O , LO , RO } is a set of ontology entities (eq. 4). The process of mapping the RDB structure into an ontological representation contains several steps: (i) Formation of ontological representation classes. A set of ontological representation classes C is formed based on the set of RDB entities C Ei → Ci . The number of classes of the ontological representation must be equal to the number of RDB entities. (ii) Formation of properties of ontological representation classes. A set of properties P of the i-th ontological representation class Ci is formed based on the set of columns Col of the i-th RDB entity Ei Colj → Pj . The number of properties of the i-th ontological representation class Ci must be equal to the number of columns of the i-th RDB entity Ei . The name of the j-th property Pj is the name of the j-th column Colj of the RDB entity. (iii) Formation of ontological representation constraints. A set of constraints L of the properties of the i-th ontological representation class Ci is formed based on the set of columns Col of the i-th RDB entity Ei Colk → L̂. The number of constraints of the i-th ontological representation class Ci must be equal to the number of constraints of the i-th RDB entity Ei . However, there are limitations to this approach due to the difficulty of mapping constraints if their presents as triggers or stored procedures. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 403 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva (iv) Forming hierarchy of ontological representation classes. It is necessary to form a set of ontology relationships RC between all the child and parent classes corresponding to the hierarchy of RDB entities if table inheritance uses in RDB H → RC . The domain of the j-th ontological representation relationship RCj is indicated by the reference to the parent class Cparent . The range of the j-th ontological representation relationship RCj is indicated by the reference to the child (or a set) class Cchild . (v) Formation of relations between classes and properties of classes of ontological representation. A set of ontological representation relationships RP is formed based on the set of columns Col of the i-th RDB entity Ei and the set of RDB relations R. Two types of relationships are formed for each j-th ontological representation property Pj : (a) The relationship ’class-property’. The domain of the ontological representation relationship is indicated by the reference to the i-th class Ci to which the j-th property belongs, and the range to the j-th property reference Pj . (b) The relationship ’property-data type class’. The domain of the k-th ontological representation relationship is indicated by the reference to the j-th property Pj . The range is indicated by the reference to the l-th class Cl corresponding to the l-th RBD entity El , or the reference to the m-th ontology class Cm corresponding to the data type of the j-th RBD column Colj . (vi) Formation of relations between properties of classes and constraints of properties of classes of ontological representation. A set of relations RL of ontological representation is formed based on the set of columns Col of the i-th RDB entity.The domain of the j-th ontological representation relationship RLj is indicated by the reference to the k-th property Pk . The range of the j-th ontological representation relationship RLJ id indicated by the reference to the k-th constraint Col → RL . 3. Integrating data model It is necessary to form an integrating data model based on the ontological representations that obtained after mapping the RDB structure of each of the integrated information systems into the ontological representation. The definition of an ontological system is used as a formal representation of an integrating data model: O X = hOM ET A , OIS , M i, (7) where OM ET A is the integrating data model ontology (metaontology); OIS = {O1IS , O2IS , . . . , OgIS } is a set of ontological representations of information systems that must be integrated; M is a model of reasoner. The following steps are necessary to form an integrating data model based on the set of ontological representations of the information systems that must be integrated: (i) Formation of the universal concept dictionary for the current domain. The process of forming an integrating data model OM ET A is based on the presence of common terminology. Ontological representations of all information systems that must be integrated OIS should be built from a single concept dictionary. The concept dictionary is formed by the expert based on the analysis of the obtained ontological representations. (ii) Formation an integrating data model OM ET A . At this step, the set of top-level classes C M ET A are added to the integrating data model OM ET A . The set of top-level classes C M ET A describes systems that must be integrated and is used as the basis for ontology merging. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 404 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva (iii) Formation of class hierarchy of integrating data model OM ET A . At this step, the integrating data model establishes a correspondence between the class IS hierarchies C Oi of ontological representations OIS of information systems that must be integrated. (iv) Formation of class properties of the integrating data model OM ET A . At this step, the integrating data model establishes a correspondence between the properties IS P Oi of ontological representations OIS of information systems that must be integrated. The expert decides which class properties of ontological representations OIS should be included in the integrating data model OM ET A . (v) Formation of axioms of classes and properties, checking the integrating data model OM ET A for consistency. IS IS IS At this step, constraints LO are applied to the properties P O and classes C O of the integrating data model O M ET A based on the constraints presents in the ontological representations O . After that, the resulting integrating data model OM ET A should IS be checked for internal consistency using the reasoner M . However, the development of methods for checking the conditions of constraints is required, since the existing reasoners do not support working with such objects. The proposed method is allowed to configure the correspondence between tables and fields of two RDBs. The main problem is the need for ontology merging. However, that problem can be solved due to the use of specialized tools to automate the ontology merging process. Also, specialized tools allow dividing the developer and domain expert roles. The main advantage of the proposed method is the ability to dynamically generate the necessary SQL queries for select and insert data from/to the RBD based on metaontology. 4. Example of creation the ontological representation of data source Let see the following example of the ontological representation formation. Table 1 shows the structure of the ”Equipment and Tools” table of the aircraft factory IS. Thus, the ontological representation of the ”Equipment and Tools” entity (tab. 1) can be represented as: O=h C = { Equipment and Tools (E&T), CHAR, NUMBER, BLOB, DATE }, P = { t2 ob, t2 ng , t2 nn, t2 r1, t2 r2, t2 r3, t2 p1, t2 z1, t2 p2, t2 z2, t2 p3, t2 z3, t2 gm, t2 p3, t2 z3, t2 gm, up dt, up us, t2 dc, t2 vid, t2 doc, t2 prim, t2 yyyy } L = { nullable, h length, 2 i, h length, 4 i, h length, 8 i, h length, 32 i, h length, 100 i, h length, 200 i, h length, 255 i, h precision, 5 i, h precision, 6 i } RP = { h E&T, t2 ob, CHAR i, h E&T, t2 ng, NUMBER i, h E&T, t2 nn, NUMBER i, h E&T, t2 r1, CHAR i, h E&T, t2 r2, CHAR i, h E&T, t2 r3, CHAR i, h E&T, t2 p1, CHAR i, h E&T, t2 z1, CHAR i, h E&T, t2 p2, CHAR i, h E&T, t2 z2, CHAR i, h E&T, t2 p3, CHAR i, h E&T, t2 z3, CHAR i, h E&T, t2 gm, CHAR i, h E&T, up dt, DATE i, h E&T, up us, CHAR i, h E&T, t2 dc, BLOB i, h E&T, t2 vid, CHAR i, h E&T, t2 doc, CHAR i, h E&T, t2 prim, CHAR i, h E&T, t2 yyyy, CHAR i } RL = { h E&T, t2 ob, h length, 200 i i, h E&T, t2 ng, h precision, 5 i i, h E&T, t2 nn, h precision, 6 i i, h E&T, t2 p1, h length, 2 i i, V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 405 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva h E&T, t2 p1, nullable i, h E&T, t2 z1, h length, 8 i i, h E&T, t2 z1, nullable i, h E&T, t2 p2, h length, 2 i i, h E&T, t2 p2, nullable i, h E&T, t2 z2, h length, 8 i i, h E&T, t2 z2, nullable i, h E&T, t2 p3, h length, 2 i i, h E&T, t2 p3, nullable i, h E&T, t2 z3, h length, 8 i i, h E&T, t2 z3, nullable i, h E&T, up us, h length, 32 i i, h E&T, t2 vid, h length, 4 i i, h E&T, t2 doc, h length, 100 i i, h E&T, t2 prim, h length, 100 i i, h E&T, t2 doc, nullable i, h E&T, t2 yyyy, h length, 4 i i } i. Table 1. The ”Equipment and Tools” table of the aircraft factory IS. Column Data type Description t2 ob CHAR(200) Name t2 ng NUMBER(5) Group t2 nn NUMBER(6) Position t2 r1 CHAR Type #1: 0 — equipment; 1 — tool; 2 — material; 6 — special tool. t2 r2 CHAR Type #2: 0 — standard; 1 — special. t2 r3 CHAR Type #3: 20 — no; 21 — design; 30 — model; 31 — design and model. t2 p1 CHAR(2) Parameter #1 nullable t2 z1 CHAR(8) Parameter #1 value nullable t2 p2 CHAR(2) Parameter #2 nullable t2 z2 CHAR(8) Parameter #2 value nullable t2 p3 CHAR(2) Parameter #3 nullable t2 z3 CHAR(8) Parameter #3 value nullable t2 gm BLOB Geometric model up dt DATE Date of last update up us CHAR(32) User t2 dc BLOB Attachment t2 vid CHAR(4) Tooling type t2 doc CHAR(100) Document name t2 prim CHAR(100) Notes nullable t2 yyyy CHAR(4) Production date As you can see from this example, the resulting ontology representation O has some sets of V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 406 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva objects: (i) A set of classes C contains the ”Equipment and Tools” table and some data types: CHAR, NUMBER, BLOB, DATE. The OWL representation of ontology O uses Class signature to represent the table. (ii) A set of properties P contains all columns of the ”Equipment and Tools” table. The OWL representation of ontology O uses built-in data types to represent RDB data types (xsd:string, xsd:double, xsd:dateTime, xsd:base64Binary), and Class signature to represent RDB relationships. (iii) A set of constraints L contains all variants of restrictions for columns of the ”Equipment and Tools” table. This set is not translated to OWL representation directly. (iv) A set of relations between classes and properties RP contains ties between table and columns that belong to this table. The OWL representation of ontology O uses ObjectProperties and DataProperties signatures to represent a set of relations RP . ObjectProperties signatures are used to represent foreign keys. DataProperties signatures are used to represent columns that contain a value. (v) A set of relations between properties and constraints RL contains a tie between column and constraints of this column. OWL datatype restrictions are used for constraints specification. For example: DatatypeRestriction( xsd:integer xsd:minInclusive ”5”ˆˆxsd:integer xsd:maxExclusive ”10”ˆˆxsd:integer ). Thus, the ontological approach is commonly used to solve the methodological problem of building an integrating data model of information systems. 5. Conclusion This article presents the implementation of the method of integrating the information systems of the aircraft factory with the production capacity planning system. The principles of ontological engineering allows mapping database structure of each information system that must be integrated into ontological representation. From the proposed methodology, an integrated data model is formed based on the obtained ontological representations for each information systems that must be integrated. The proposed method allows organizing information interaction without the participation of developers in contrast to the traditional approach of consolidation, based on the method of direct data exchange. The only requirement of the proposed method is the presence of metaontology. The disadvantages of the proposed method implementation currently are: (i) The need for implementation of the data type casting algorithms in case of their mismatch for each DBMS. (ii) The need for adapting the proposed method implementation to the SQL dialect of DBMS involved in the exchange process. Random DBMS cannot be supported by this implementation. 6. References [1] Yarushkina N, Romanov A, Filippov A, Guskov G, Grigoricheva M and Dolganovskaya A 2019 The building of the production capacity planning system for the aircraft factory Research Papers Collection OpenSemantic Technologies for Intelligent Systems 3 123-128 [2] Clark T, Barn B S and Oussena S 2012 A method for enterprise architecture alignment Practice- Driven Research on Enterprise Transformation 48-76 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 407 Data Science N Yarushkina, A Romanov, A Filippov, A Dolganovskaya and M Grigoricheva [3] Rouhani D B 2015 A systematic literature review on Enterprise Arquitecture Implementation Methhodologies Information and Software Technology 1-20 [4] Medini K and Bourey J P 2012 SCOR-based enterprise architecture methodology Int. J. Comput. Integrat. Manuf. [5] Poduval A 2011 Do more with SOA Integration: Best of Packt [6] Caselli V, Binildas C and Barai M 2008 The Mantra of SOA. Service Oriented Architecture with Java (Birmingham. UK) [7] Berna-Martinez V J, Zamora C, Ivette C, Perez M, Paz F, Paz L and Ramon C 2018 Method for the Integration of Applications Based on Enterprise Service Bus Technologies [8] Evsutin O O, Kokurina A S and Meshcheryakov R V 2019 A review of the methods of embedding information in digital objects for security in the Internet of things Computer Optics 43(1) 137-154 DOI: 10.18287/2412-6179-2019-43-1-137-154 [9] Rycarev I A, Kirsh D V and Kupriyanov A V 2018 Clustering of media content from social networks using bigdata technology Computer Optics 42(5) 921-927 DOI: 10.18287/ 2412-6179-2018-42-5-921-927 [10] Gruber T 2019 Ontology URL: http://tomgruber.org/writing/ontology-in-encyclopedia-of-dbs.pdf [11] de Laborda C P and Conrad S 2005 Relational. owl: a data and schema representation format based on owl Proceedings of the 2nd Asia-Pacific conference on Conceptual modeling 43 89-96 [12] Trinh Q, Barker K and Alhajj R 2006 Rdb2ont: A tool for generating owl ontologies from relational database systems Telecommunications International Conference on Internet and Web Applications and Services/Advanced 170 [13] Trinh Q, Barker K and Alhajj R 2007 Semantic interoperability between relational database systems Database Engineering and Applications Symposium 208-215 [14] Barrett T, Jones D, Yuan J, Sawaya J, Uschold M, Adams T and Folger D 2002 Rdf representation of metadata for semantic integration of corporate information resources International Workshop Real World and Semantic Web Applications [15] Bizer C 2003 D2R MAP – A Database to RDF Mapping Language Proc. of the 12th International World Wide Web Conference – Posters Acknowledgments The study was supported by: • the Ministry of Education and Science of the Russian Federation in the framework of the project No. 2.1182.2017/4.6. Development of methods and means for automation of production and technological preparation of aggregate-assembly aircraft production in the conditions of a multi-product production program; • the Russian Foundation for Basic Research (Projects No. 18-47-732016, 18-47-730022, 17- 07-00973, No. 18-47-730019). V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 408