=Paper=
{{Paper
|id=Vol-1979/paper-27
|storemode=property
|title=An Improvement on Data Interoperability with Large-Scale Conceptual Model and Its Application in Industry
|pdfUrl=https://ceur-ws.org/Vol-1979/paper-27.pdf
|volume=Vol-1979
|authors=Lan Wang,Shinpei Hayashi,Motoshi Saeki
|dblpUrl=https://dblp.org/rec/conf/er/WangHS17
}}
==An Improvement on Data Interoperability with Large-Scale Conceptual Model and Its Application in Industry==
An Improvement on Data Interoperability with Large- Scale Conceptual Model and Its Application in Industry Lan WANG1, Shinpei HAYASHI2, Motoshi SAEKI2 1 Toshiba Corporate R&D Center, 1 Komukai-toshiba-cho, Saiwai-ku,Kawasaki, 212-8582, Japan. Lan.wang@toshiba.co.jp 2 Tokyo Institute of Technology, Ookayama 2–12–1, Meguro-ku, Tokyo 152–8552, Japan. Abstract. In the world of the Internet of Things, heterogeneous systems and devices need to be connected. A key issue for systems and devices is data in- teroperability such as automatic data exchange and interpretation. A well- known approach to solve the interoperability problem is building a conceptual model (CM). Regarding CM in industrial domains, there are often a large num- ber of entities defined in one CM. How data interoperability with such a large- scale CM can be supported is a critical issue when applying CM into industrial domains. In this paper, evolved from our previous work, a meta-model equipped with new concepts of “PropertyRelationship” and “Category” is proposed, and a tool called FSCM supporting the automatic generation of prop- erty relationships and categories is developed. A case study in an industrial do- main shows that the proposed approach effectively improves the data interoper- ability of large-scale CMs. Keywords: conceptual modeling, data interoperability, property relationship 1 Introduction In the world of the Internet of Things, various systems and devices need to be con- nected. One of the key issues is the data interoperability among systems and devices, i.e., systems and devices can exchange and interpret data automatically. A well- known approach is to build a conceptual model (CM) that unambiguously defines entities and their relationships so that different systems and devices can utilize the CM to exchange and interpret data. For example, a CM is defined in IEC 62264 [1], [2], [3] series for enterprise-control system integration, and the Common Information Model (CIM) is defined in IEC 61970/61968/62325 for smart grid systems to ex- change data among different applications and systems. For electro-electronic systems, CM defined in IEC 61360-4 (also called IEC CDD) is utilized among semiconductor management systems to exchange data. One of the characteristics of a CM in industrial domains is that a CM normally has a large number of entities. Such as in the CIM, there are more than 1000 classes and thousands of properties. In order to improve data interoperability with such large- scale CMs, property relationships in a CM need to be created so that data among in- volved properties can be systematically exchanged and interpreted. Furthermore, as Copyright © by the paper’s authors. Copying permitted only for private and academic purposes. In: C. Cabanillas, S. España, S. Farshidi (eds.): Proceedings of the ER Forum 2017 and the ER 2017 Demo track, Valencia, Spain, November 6th-9th, 2017, published at http://ceur-ws.org 2 different systems normally only use part of the whole CM, it is necessary to provide CMs with the capability of collecting only necessary entities so that when a system exchanges data with others, only necessary entities are included without redundancy. In this paper, we propose to introduce two new types as “Category” and “Proper- tyRelationships” to the meta-model of our previous work [4] . A category can collect only needed entities such as classes, properties, and property relationships from the user’s perspective. Category sets can be defined from different viewpoints with aggregation relations among them. Another feature of Category is that entities such as classes, properties, and property relationships can belong to multiple categories. “PropertyRelationship” is for describing relationships among properties. Similarity between properties, transformation rules between properties etc. can be specified using this concept. Mechanisms for automatic generation of property relationships are also proposed. This work adopts natural language processing approaches combined with the CM structure. A developed tool called the Framework for Sustainable Conceptual Modeling (FSCM) implements the above-proposed meta-model and mechanisms. Through a case study of creating a large-scale CM and applying it via FSCM to industrial do- mains, the proposed meta-model and mechanisms are approved to be effective. The remainder of this paper is structured as follows. Section 2 introduces motiva- tions for this research. Section 3 explains the proposed meta-model and its ad- vantages. Section 4 elaborates on mechanisms for supporting the automatic generation of property relationships. Section 5 describes the FSCM tool. Section 6 introduces and discusses the case study. Section 7 shows related work, and Section 8 gives our conclusions. 2 Motivation According to our previous work [4], the CIM for smart grid can be described with a set of structured tables (spreadsheets). The CIM described with a set of structured tables is called Parcellized-CIM. Via that work, CIM expressed with UML models such as packages or class diagrams can be represented as Parcellized-CIM in a tabular format. Furthermore, based on the CM of Parcellized-CIM, a commercial database platform parcimoserTM was developed for applications such as Transmission and Dis- tribution Supervisory Control And Data Acquisition (SCADA), Energy Management System (EMS). However, when applying such CMs to industrial systems, several problems arise. In this paper, two of the typical problems are discussed. To clarify the idea in this paper, Fig. 1 shows a simplified CM, and Fig. 2 shows its corresponding manufacturing process as an explanatory example. As Fig. 1 shows, the class of “Cycle” is composed of “Wheel” and “Frame”. Each class has a set of properties, and each property is defined with a set of attributes for example {ID, Name.en, Unit, Version}. In this example, the “Cycle” class has a set of properties with IDs P6, P7, P10 and property P6 is defined with a set of attributes as {P6, weight, kg, 1}. The “Wheel” class has a set of properties with IDs P1, P2, and P8. The class “Frame” has a set of 3 properties with IDs P3, P4, P5, P6, and P7. Fig. 2 illustrates the three processes to produce a cycle corresponding to Fig. 1. Fig. 1. A partial conceptual model for cycle manufacturing. Fig. 2. A simplified product process for cycle manufacturing. Two of the typical problems to be solved in this paper are as the following: ─ (Q1) Property relationships need to be created to improve data interoperability such as consistency checking and exchangeability. In the example of Fig. 1, prop- erty P3 (inner diameter) of class “Frame” and property of P2 (diameter) of class “Wheel” should have the same value. Thus, if a property relationship between P3 and P2 existed, it could be used to support their data exchange and data consisten- cy check systematically. In our previous work, property relationships cannot be defined. Furthermore, while there are thousands of properties in industrial do- mains, it is difficult for CM creators to manually create all property relationships. ─ (Q2) Categories containing only necessary entities need to be created to filter out unnecessary information. In the example of Fig. 2, it is clear that for “Make cycle frame” systems, entities such as class “Wheel” and its properties defined in the cat- egory “Make cycle wheels” and those in “Assembly and test cycle” are not needed. For a large-scale CM with many entities, the necessity for a system to collect only required partial entities becomes more urgent. 3 Proposed Meta-Model 3.1 Meta-Model Overview In this paper, an improved meta-model for conceptual modeling is proposed based on our previous work. Primitive elements and their relationships are shown in Fig. 3. In Fig. 3, each entity of “TypedElement” is described by a set of attributes. For “TypedElement” of “Class”, “Property”, and “Datatype”, their attributes set is conforming to the specifications of IEC 61360-1 [5]/ISO 13584-42 [6], which is recoginized as the common conceptual modeling methodology for ISO and IEC standard domain models. Typical attributes for “TypedElement” such as “ID”, 4 “Name”, and “Definition” are specified in these standards. All available attributes for each child of “TypedElement” are described in the standard documents [5], [6]. Some known typical characteristics of this conceptual modeling methodology [7] are as follows: Fig. 3. Meta-model proposed (excerpted). ─ Each entity such as a class, a property, or a datatype should have an identifier and a version number; ─ Multi-lingual definitions for name, definition, etc. in a CM are available. ─ Entity-based CM version management is available. In this research, “PropertyRelationship” and “Category” as children of “TypedEle- ment” are proposed and extensively defined. In general, “PropertyRelationship” is to describe relationships among properties such as the “equal” relationship among prop- erties. For each property, numbers of relationships can be defined if necessary. “Cate- gory” is defined as an aggregation of necessary entities available in a CM. A category can contain several sub categories. The proposed meta-model has the following ad- vantages in brief: ─ Property relationships can be assigned with different types or levels according to CM requirements. With created property relationships, data (value) consistency of involved properties can be checked, and their data exchanges become available, so that data interoperability among different systems utilizing involved properties can be improved. ─ Not only classes, but also other entities such as a property, a data type, a created property relationship can be grouped into categories from different users’ perspec- tives. As a result, redundant definitions which are not necessary for a “Category” can be filtered out. To express the proposed meta-model, Table 1, Table 2, and Table 3 representing an excerpted CM complying with the proposed meta-model are utilized. Entities are the same as those in Fig. 1. except for the property relationships in Table 2 and categories in Table 3. 3.2 PropertyRelationship and Its Advantages Overview. The “PropertyRelationship” is specified to express the relationships among properties. For conceptual models in industrial domains, it is normal for prop- erties to gradually evolve or become connected to other properties with different envi- 5 ronmental conditions or operational procedures. Principles such as types, levels for “PropertyRelationship” can be defined depending on the requirements of a CM. For the example represented in Table 2, types of “Constraint” and “Reference” relation- ships are defined. The former is for mandatory relationships among all involved prop- erties. The latter is an optional relationship, meaning that a property involved in this “PropertyRelationship” can make a reference to the other while necessary. In another CM case, property relationships such as “Temporary” and “Permanent” can be de- fined [3]. Using the CM in Fig. 1, three entities “PropertyRelationship” listed in Table 2 can be defined; Table 1. Property definitions (excerpted) ID Name.en DefinitionClass Datatype Definition Unit Version P1 Inner diameter Wheel (or class ID) Real Inner diameter for a wheel Inch 1 P2 Diameter Wheel (or class ID) Real Diameter for a wheel Inch 1 P3 Inner diameter Frame (or class ID) Real Inner diameter for a frame cm 1 P4 Length Frame (or Class ID) Real Length of a frame cm 1 Table 2. Property relationships (excerpted) ID Name.en Definition Resource Target Relation RelationType Version R1 diameterRelation sample Frame.P3 Wheel.P2 Wheel.P2=Frame.P3*2.54 Constraint 1 R2 weightRelation sample Wheel.P8, Cycle.P6 Cycle.P6=Wheel.P8*2 Constraint 1 Frame.P6 /1000 + Frame.P6 R3 colorRelation sample Frame.P7 Cycle.P7 Cycle.P7=Frame.P7 Reference 2 ─ As relationship R1, the property P2 (diameter) of class “Wheel” and P3 (inner diameter) of class “Frame” should have the same value (“=”). However, as P2 and P3 have different units as cm (centimeter) and inch, R1 should be defined as “Wheel.P2=Frame.P3*2.54” to support the unit conversion between these two properties in this constraint. ─ As relationship R2, the weight of a cycle (Cycle.P6) should be calculated as “Cy- cle.P6=Wheel.P8*2/1000+Frame.P6”, i.e., the sum of wheels’ weight and that of the frame. ─ As relationship R3, the color of a cycle (Cycle.P7) should refer to the color of the frame (Frame.P7). This relationship is defined as “Reference” type, meaning that Cycle.P7 does not need to be exactly the same color as “Frame”, but can make a reference to Frame.P7 while necessary. With the above described approach, property relationships can be defined with differ- ent principles for different CMs. Advantages. With the “PropertyRelationship” provided by the proposed meta-model, it is possible to describe the relationships among properties. These created property relationships can be utilized to enhance data interoperability such as data consistency checking and data exchangeability, thereby addressing the capability problem for describing property relationships raised as Q1 in Section 2. The remainder of the problem in Q1: automatic generation of property relationships, will be addressed in Section 4. 6 3.3 Category Overview. A “category” is specified as a collection of necessary “TypedElements”, and can be defined from various viewpoints. For example, from a usage viewpoint, “TypedElements” can be grouped into a category such as “SCADA” or “Demand and Response” in power grid systems. From a viewpoint of a product or system lifecycle, categories such as “General Design”, “Detailed Design”, and “Validation” can be defined. Depending on the purpose of a defined conceptual model, necessary sets of categories can be defined. With the meta-model defined in Fig. 3, an entity of “Cate- gory” can be an aggregation of several subcategories, so that all entities included in sub categories are also included in their parent category and ancestor categories. This concept of “Category” is evolved from the concept of “package” in UML with the following additional features: ─ An entity can belong to multiple categories. ─ Not only classes, but also all available “TypedElement” such as properties and property relationships can specify their own categories respectively. ─ Each CM can have several sets of categories defined from different perspectives. Table 3 gives sample category definitions for the CM in Fig. 1 and its manufacturing process shown in Fig. 2. Example categories are defined corresponding to the manu- facturing process of Fig. 2. Specifically, category Ca1 is an aggregation of entities for the process of “Make cycle frame”, Ca2 is for “Make wheel”, and Ca3 is for “Assem- ble and test cycle”. Since Ca3 contains Ca1 and Ca2, all entities included in Ca1 and Ca2 are also contained in Ca3. Table 3. Category definitions (excerpted) ID Name.en Super ElementList Definition Version Category Ca1 MakeCycle Ca3 {(Frame),Class}, All entities available for functions, 1 Frame {(P3,P4,P5,P6,P7),Property} operations, systems etc. for making cycle frame Ca2 MakeWheel Ca3 {(Wheel),Class}, All entities available for functions, 2 {(P1,P2,P3,P8), Property} operations, systems etc. for making {(R1), PropertyRelation} cycle wheel Ca3 Assemble Root {(Cycle),Class} All entities available for functions, 1 AndTest {{P10},Property} operations, systems etc. for assem- Cycle {(R2,R3), PropertyRelation} bling and testing cycle ─ In the “Make cycle frame” procedure and its relevant systems, only entities defined in category Ca1 are necessary, meaning that for systems related to the procedure, only entities listed in the “ElementList” column of Ca1 in Table 3 are necessary. ─ In the “Make cycle wheels” procedure, only entities defined in the “ElementList” column of Ca2 in Table 3 are required. Because the property relationship R1 is in- cluded in Ca2, systems related to the procedure need to check whether the wheel diameter equals the inner diameter of a cycle. ─ For the “Assemble and test cycle” procedure, Ca3 which includes all entities de- fined in Ca1 and Ca2 can be utilized. It can also have its specified entities listed in 7 the “ElementList” column of Ca2 in Table 3. In consequence, systems relevant to the procedure must check whether each of R1, R2, and R3 are satisfied. For exam- ple, when executing R2, systems need to check whether the weight value of the cy- cle satisfies the relationship R2. Advantages. The above descriptions show that category definitions containing only necessary entities can solve problem Q2 raised in Section 2. Because categories can be defined by CM users from different aspects and for various purposes, we do not discuss category generation in this paper. Further, due to space limitations, this paper does not introduce the approaches that offer supports to collect necessary entities for a given category. Some discussion of category creation and their advantages are dis- cussed in the case study. 4 Generation of Property Relationship As described in Section 3.2, some of the property relationships can be defined when designing the CM. Such property relationships are usually derived from the knowledge and experience of CM creators. In industrial domains, because a large amount of entities are available in a CM, it’s necessary to provide approaches sup- porting automatic generation of property relationships. One approach in this work is to calculate similar properties in a CM and then recommend similar properties to build property relationships automatically. Model creators thus can build and add exact property relationships in existing CMs. For this purpose, natural language pro- cessing approaches combined with the CM class distances are proposed. 4.1 Similarity Calculation among Properties One approach is to use semantic similarities among properties [8] combined with structure information. As illustrated in Section 3.1, each property is described with a set of attributes such as “Name”, “Definition”, “Datatype”, or “Unit”, and similarity between a selected property entity and other properties is calculated as S(P1, P2) = ∑𝑎∈𝐴 𝑊𝑎 ∙𝑆𝑎(P1. 𝑎, P2. 𝑎 ) / ∑𝑎∈𝐴 𝑊𝑎 (1) ⃗⃗⃗⃗ ∙ ⃗⃗⃗⃗ Sa(P1.a, P2.a) = (𝑣1 ⃗⃗⃗⃗ | ∙ |𝑣2 𝑣2) / (|𝑣1 ⃗⃗⃗⃗ |) (2) ⃗⃗⃗ 𝑣𝑖 = (Vi1, Vi2, … Vin) Vij = (TF-IDF-Score(Termj)) (i=1,2) (3) where Termj∈(Term1, Term2,…,Termn), an ordered set of terms and values of P1.a and P2.a. In equation (1)(2)(3), S(P1, P2) is a weighted average similarity score between prop- erty P1 and P2, a is an attribute utilized by a property, P1.a and P2.a are respectively the value of a in P1 and P2, and 𝑊𝑎 is the weight coefficient for attribute a, set from the aspect of similarity calculation. The larger Wa is, the more critical the attribute is to a property. When calculating similarity score Sa (P1.a, P2.a) of attribute a between P1 and P2, we adopt natural language processing approaches such as WordNet [9] for word simi- 8 larity, TF-IDF Cosine similarity [10] shown in Equation (2)(3) for sentence similari- ties. The ⃗⃗⃗⃗ 𝑣1 and ⃗⃗⃗⃗ 𝑣2 are vectors of the calculated TF-IDF scores for each attribute a. Taking the properties listed in Table 1 as an example. Suppose we generate relation- ships of property P1, for each other properties in the same CM, the following proce- dure is adopted to calculate the similarities between P1 and other properties. The idea is clarified with Table 4 which lists calculated similarity scores for property P2, P3, and P4 to P1. Table 4. Similarities between properties Attributes ID Name.en Definition Datatype Definition Unit Version Similarity Similarity Class (no weight) (with weight) Weight 1 10 1 10 10 10 1 - - P2 to P1 0 0.7 1 1 0.89 1 1 0.79 0.88 P3 to P1 0 1 0.5 1 0.8 0.82 1 0.74 0.98 P4 to P1 0 0 0.5 1 0.2 0.82 1 0.5 0.504 ─ In the first step, a similarity score between every attribute is calculated with its value (content); For example, regarding the similarity score of the attribute Name.en, P1 defines Name.en as “Inner diameter”, Name.en of P2 is “Diameter”, and the similarity score between the “Inner diameter” and “Diameter” is calculated as 0.7, then this score is listed in the Name.en columm of line “P2 to P1”. ─ In the second step, a weight coefficient is assigned to each attribute. An important attribute should be assigned with a lager weight coefficient. For example, the at- tributes “Name,” “Definition,” “Datatype,” and “Unit”, which to a large degree de- termine property instances, should be assigned larger weight coefficients than oth- ers. ─ Finally, the similarity score between a property Px and a selected property P1 is calculated with Equation (1), using the weighted average similarity scores for P1 and Px attributes. In the example of Table 4, in the column of “Calculated Similari- ty (no weight)”, similarity scores with no specific weight coefficient, i.e., 1, are recorded, and in the column of “Calculated Similarity (with weight)”, the similarity scores with the set weight coefficient in line 2 are listed. The similarity score results show that different weight coefficients affect similarity score rankings. In the “Similarity (no weight)” column property P2 has the highest score, but in the “Similarity (with weight)” column property P3 has the highest score. This difference is due to weighting of the “DefinitionClass” attribute. Because we want to focus on property relationships among different classes, properties within the same class are set with lower weight coefficients. According to this principle, we adopt the result in the Table 4 column “Similarity (with Weight),” and P3 which has the highest similarity score to P1 is recommended to systematically build a property relationship with P1. The relationship R1 displayed in Table 4 is the corresponding definition. 4.2 Class Distance for Property Relationship Recommendations Structure features are also considered when ranking property relationships. When deciding the recommendation rank of generated property relationships, besides the 9 calculated similarity score of Px and Py, the distance among definition classes of Px and Py is also utilized. Because in this proposed CM, classes have is-a (specialization) hierarchical relationships, a property can be inherited from ancestor classes by a child class. Therefore, if two properties with high similarity scores simultaneously have an ancestor-descendant relationship or neighborhood relationship, in principle, these two properties are highly recommended for creating a property relationship. In order to calculate the distance among classes, a known Lowest Common Ancestor (LCA) [11] are utilized in this work. In LCA, the distance between classes is calculat- ed as Distance(cls1, cls2) = Dist(root, cls1) + Dist(root, cls2) - 2∙Dist(root, lca) (4) where Dist(root, cls1) is the distance from cls1 to root class in the is-a hierarchical tree, and “lca” is the lowest common ancestor of cls1 and cls2. Through these approaches with CM semantic definitions and structure features, simi- lar properties can be collected so that the relationship among properties can be ranked and recommended systematically. 5 Support Tool Fig. 4. FSCM supporting sustainability of CM. A conceptual modeling tool called FSCM shown in Fig. 4 is developed based on a developed tool parcimoserTM, which was introduced in our previous work [4]. In that work, an Excel-based FSCM supports functions for components A, B, and C. This is explained briefly as follows: ─ Component A provides functions such as CM design and generation, instance tem- plate generation, and static data input. ─ Component B provides functions such as database schema design, table creation, CM and static instance data storage to a database. ─ Component C provides functions such as exporting and synchronizing CM and instance data in a database to FSCM. Continuing this work, we newly developed component D for this FSCM to automati- cally generate property relationships utilizing the first version of CM and its instance data. Category creation and collection of necessary entities are newly implemented as an extension of component A. “Conceptual Model Version1” and “Conceptual Model Version2” are the different versions of the same CM. In this paper, we focus on only the differences in property relationships between CM Version1 and CM Version2, Namely, 10 CM(Version1) ᴜ{generated property relationships} = CM(Version2). Other entity changes from CM Version1 to Version2 are not discussed in this paper. Fig. 5 shows a sample CM created by the FSCM. In Area I, a class is-a hierarchical tree is displayed. In Area II, detailed information of a selected class, including its category, is listed. In Area III, properties defined for that class and those inherited from the ancestor classes can be listed. For each property, attributes such as “Name”, “Datatype(valueType)”, “Category”, can be represented. II I III Fig. 5. Example of conceptual model in FSCM. 6 Case Study An FSCM based on the proposed meta-model was applied to a commercial system, and the provided CM was shown to be efficient and successful in an actual industrial application. In this work, some of the standardized industrial conceptual models(1,2) were adopted and defined using the FSCM tool with internal extensions. In this use case, the CM shown in Table 5 was utilized to explain the contributions of the newly proposed “Category” and “PropertyRelationship”. Table 5 lists only necessary entity types; those not pertinent to this paper are omitted. Table 5. Conceptual model in our case study Entity type CM1 Entity number Class 1312 Property 5629 Datatype 55 Category (created with this work) 24 Category definition and discussion. Fig. 6 shows the example of 24 categories de- fined in the CM from one viewpoint. As already explained in Section 3.3, categories can be defined from different perspectives according to users’ requirements and CM utilization in individual user systems. The 24 categories in Fig. 6 are just one of the potential category sets. In Fig. 6, entities of classes and properties included in each 1 http://std.iec.ch/cdd/iec61360/iec61360.nsf/TreeFrameset?OpenFrameSet 2 http://collaboration.iec.ch/other_sc3dworkingmaterial/IEC62656-Part3/ 11 category are presented. Some categories, such as Categories 1, 2, 4, and 5, clearly have only limited entities, while categories such as 17 and 19 have large number of entities. Obviously, no category contains all entities available in the CM. It’s explicit that especially for small size categories, the category concept greatly contributes to reducing the unnecessary information. Fig. 6. Categories and their containing entities. Property relationships generation. Component D of the FSCM shown in Fig. 4 was used to generate property relationships with the mechanisms explained in Section 4.1. In this case study, the weight coefficient Wa of the “Name”, “Definition”, “Datatype” and “Unit” attributes were set to 1, and those for other attributes were set to 0 as discussed in Section 4.1. Fig. 7 shows the results. We totally obtained 15,840,006 property relationships throughout the total CM, 4,508,160 of which were between properties in different categories. Notably, 1,144 property relationships received similarity scores of 1. In Fig. 7, numbers of generated property relationships with similarity scores ranging from 0 to 1 are illustrated. These results were adopted for the evaluations. Fig. 7. Automatic generated property relationships at different similarity scores. Evaluation of automatically generated property relationships. From automatical- ly generated property relationships with similarity scores 1, 0.9~1, 0.8~0.9, 0.7~0.8, 0.6~0.5 and “0.5 or less”, we randomly selected 20 relationships from each similarity score range as samples to evaluate the proposed approach. Totally 120 automatically generated relationships were evaluated and Fig. 8 shows the case study results. In Fig. 8, the horizontal x-axis represents the similarity score (S) of generated property relationships from 1, and 0.9~1 down to “0.5 or less”. Here, 1 means S=1, 0.9 means 0.9=