-

PKBD.Onto: A Plugin for Ontological Schemas Generation

OS OWL = C

0 0 Matrosov Institute for System Dynamics and Control Theory, Siberian Branch of Russian Academy of Sciences , Lermontov St. 134, Irkutsk , Russia

The use of Semantic Web technologies (including ontologies) for intelligent systems and knowledge bases engineering is a widespread practice, it is true especially for tasks of conceptualization and formalization. However, tools and approaches used for these tasks in most cases provide only a manual manipulation of concepts and relationships. In this regard, the use of various information sources for automated ontology engineering is relevant. One of these sources is spreadsheets. In this paper, we propose an approach for the automated creation of ontological schemas based on the analysis and transformation of spreadsheets data. The feature of our approach is the original relational canonicalized form of spreadsheets. This form is used for preprocessing spreadsheets and unifying the input data. The proposed approach is implemented in the form of a plugin (PKBD.Onto) for Personal Knowledge Base Designer - software for prototyping rule-based expert systems. The main stages of the approach, the architecture and functions of the plugin, and the case study are also described.

Spreadsheets Canonical Spreadsheet Ontological Schema OWL Model Transformation Code Generation

Vidiya The use of Semantic Web technologies, including ontologies [ 8 ], for intelligent systems and knowledge bases engineering is a widespread practice. In most cases, ontologies and special software (e.g., Protégé, ONTOedit, Menthor Editor, Semaphore Ontology Editor, OntoStudio, WebOnto, Fluent Editor, etc.) are used by analysts and domain experts for tasks of knowledge conceptualization and formalization. However, these tools provide a weak integration with external information sources (e.g., databases, texts, tables, conceptual models, etc.) in terms of importing domain concepts and relationships. This fact reduces the efficiency of the ontology engineering process. One of the information sources that can be used for the automated creation of ontologies is spreadsheets. Today, a large volume of arbitrary tables has been accumulated worldwide [ 9 ] and presented in the spreadsheet-like formats (HTML, Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

EXCEL, and CSV). Arbitrary tables are a valuable data source in business intelligence and data-driven research.

In our previous papers [ 5, 14 ] we proposed an approach for automated analysis and transformation of spreadsheets into conceptual domain models in the form of UML class diagrams. In this paper, we propose to apply this approach for ontological schemas generation (ontologies at the TBox level) in the OWL2 DL format [ 6 ]. A feature of the proposed approach is the use of the original canonicalized form for representation of spreadsheets, which provides the unification of input data.

Our approach is implemented in the form of the plugin, namely, PKBD.Onto, for Personal Knowledge Base Designer (PKBD) [15] – software for prototyping rulebased expert systems. A case study for the proposed approach and the plugin description are also presented. 2 2.1

Background Method for Spreadsheets Transformation

Rule 2: IF RH corresponds only one CH and at the same time RH contains two values with the separator (“|”) THEN RH transformed to a class with properties from CH and with an additional property "Name" that corresponds to RH-2.

Rule 3: IF RH contains two values with the separator (“|”) and they correspond to two CH values with the separator (“|”), THEN RH transformed to the first class, CH transformed to the second class and a relationship stated between them.

Rule 4: IF RH corresponds to three CH values with the separator (“|”), THEN RH transformed to the first class with properties CH-1, and CH 2 and 3 transformed to the second class and a relationship stated between them.

All obtained parent-child relationships are interpreted as the association and the cardinality of “1..*” is determined by default.

By default attribute values are set based on the D column.

The main results of this algorithm are fragments of conceptual models. These fragments need to aggregate, including operations for clarifying the names of concepts, their properties and relationships, and also their possible merging and separation.

The following rules used for automatic aggregation of conceptual models fragments:

Rule 1: Merge two classes when they have equal names from duplicate fragments of class diagrams.

Rule 2: Merge two classes when they have the same structure, i.e. when sets of attributes are equal. In this case, only the first class with this structure stays in the model.

Rule 3: Merge two classes when they have similar names. The resulting fragments of class diagrams can describe the same objects or processes. We suggest using a simple string comparison method based on the Levenshtein distance [ 10 ] to determine the similarity between two names of classes. If the distance is less than or equal to three, then we assume the classes to be similar. Note that this is not enough, so we also look at the structure of classes (names of attributes must partly match).

Rule 4: Create a new association between two classes if homonymous classes and attributes exist. In this case, a name in one class is equivalent to the attribute name in another class. At the same time, the attribute of the same name is removed.

Rule 5: Remove duplicate associations between classes.

Manual merging and separation operations are performed by using PKBD. 2.2

PKBD: a Tool for Knowledge Base Engineering

We used PKBD when solving problems of knowledge bases of expert systems engineering, in particular, in the field of ISI [ 1 ]. PKBD is implemented as a desktop application designed for non-programmers. The main purpose of PKBD is to prototype knowledge bases that use the formalism of logical rules.

One of PKBD features is a support of the Rule Visual Modeling Language (RVML) [ 7 ]. RVML is considered as a UML extension. Other PKBD features are the following: • a modular architecture that provides the ability to add modules for supporting knowledge programming languages. Currently, CLIPS and Drools are supported; • integrability with conceptual modeling tools when importing and exporting concepts and relationships.

The PKBD architecture determines the interaction of the following main software components: • a knowledge base management module, it provides storage of projects in the EKB format (the proprietary XML-like format); • a user interface subsystem includes the following modules: software wizards for manipulating knowledge base elements, a GUI generation, a Tiny RVML editor; • a subsystem for supporting programming language modules, it provides connection and disconnection of modules, access to their functions for generating program codes; • a module of integration with conceptual models sources: IBM Rational

Rose, StarUML, XMind, CMapTools, and TabbyXL; • a rule engines control module provides activation of rule engine for testing knowledge bases; • a module of interaction with the web-based software called Knowledge

Base Development System (KBDS) [ 3 ].

Main functions of PKBD are: • designing elements of rule bases (fact templates, facts, and rules) by nonprogrammers using a set of wizards and defined sources of conceptual models; • checking the integrity of the developed knowledge bases (syntactic and semantic control); • representing knowledge base elements using RVML; • generating knowledge base codes in the CLIPS format; • testing developed knowledge base codes (logical inference) using the integrated CLIPS rule engine; • integrating with CASE-tools: IBM Rational Rose, StarUML, XMind, and CMapTools, regarding import and transformation of conceptual models in order to highlight the main entities (concepts) and relationships for creating knowledge base drafts; • integrating with TabbyXL [ 11 ] in terms of import and transformation of canonical spreadsheet tables in order to highlight the main entities (concepts) and relationships for creating knowledge base drafts; • interacting with the KBDS service.

We used PKBD as an open software platform and developed a PKBD.Onto plugin. This plugin implements our approach for ontological schemas generation in the OWL2 DL format.

Proposed Approach Method

The method for generating ontological schemas is based on principles of a model transformation. A model transformation is one of the key concepts in Model-Driven Engineering (MDE) [ 2 ].

From a formal point of view our method can be represented as a chain of horizontal exogenous transformations:

T : CS CSV → CM XML where CS CSV is a source spreadsheet presented in a canonicalized form and saved in CSV format using TabbyXL. The structure of a canonical spreadsheet is described in Section 2.1; CM XML is a conceptual model resulted from spreadsheet transformation, which is a form for the internal representation of domain concepts and relationships for PKBD; OS OWL is a target ontological schema in the OWL2 DL format.

Using (2), let’s describe CM XML in more detail:

XML =

C, DT , RL , where C is a set of classes; DT is a set of datatypes; RL is a set of relationships between C . Let’s refine C from (3) as follows:

C = {c1...cn }, ci = namei , ATi , i = 1, n , when namei is a class name; ATi is a set of class attributes,

ATi = {ai,1,..., ai,k }, ai, j = name j , type j , value j , j ∈1, k , when name j is an attribute name; type j is an attribute datatype, type j ∈ DT ; value j is a possible attribute value.

RL = {rl1...rln}, rli = namei , typei , lhsi , rhsi , i = 1, n , when typei is a relationship type (inheritance, dependency, association, aggregation, composition, realization); namei is a relationship name; lhsi is a left side of a relationship, lhsi = namelhs , cdlhs , c j , when namelhs is a name of a class role at the left relationship side, cdlhs is a cardinality of the left relationship side, c j is a link of a class at the left relationship side, c j ∈ C ; rhsi is a right side of a relationship, rhsi = namerhs , cardinalityrhs , ck , when namerhs is a name of a class role at the right relationship side, cdrhs is a cardinality of the right relationship side, ck is a link of a class at the right relationship side, ck ∈ C . Wherein, cdlhs , cdrhs = {0,0..1,0..*,1,1..*}.

Using (2), let’s describe OS OWL in more detail:

when C is a set of classes; OP is a set of object properties; DP is a set of datatype properties; DT is a set of XML Schema datatypes. A detailed description of the OWL 2 DL specification is given in [ 6 ].

Analysis and transformation of source spreadsheets ( CS CSV ) and formation of a conceptual model ( CM XML ) are discussed in detail in [ 5, 14 ]. In this paper, we will describe in detail how to obtain ontological schemas ( OS OWL ). For this, using (2), let’s describe a transformation operator ( T ):

T = TCS −CM ,TCM −OSM ,TOSM −OS , TCS −CM : CS CSV

→ CM XML , TCM −OSM : CM XML → OSM , TOSM −OS : OSM → OS OWL , where TCS −CM is a set of rules for transformation of a source spreadsheet in the CSV format into a conceptual model, for example, a UML class diagram; TCM −OSM is a set of rules for transformation of a conceptual model into an ontological schema model; TOSM −OS is a set of rules for transformation of an ontological schema model into OWL ontology code at the TBox level.

Wherein: OSM is an ontological schema model designed for a unified representation and storage of knowledge extracted from various information sources. This model abstracts from features of knowledge representation languages and their dialects used for the implementation of ontologies (e.g., OWL, RDFS, etc.).

So, using sets of transformation rules ( TCM −OSM and TOSM −OS ), ontological schemas generation ( OS OWL ) includes four main stages.

Stage 1: Analysing and transforming an XML structure of PKBD internal knowledge representation for conceptual models. This stage involves extracting elements, their properties, and relationships from an XML tree (the depth-first search for elements).

Stage 2: Forming an ontological schema model. The main objective of this stage is obtaining typical ontological fragments in the form of a set of classes and their relationships (object and datatype properties), which describe a certain domain and based on the extracted XML elements.

Stage 3: Generating an ontological schema code in the OWL format based on an ontological schema model.

Transformations themselves can be described using special transformation languages, for example, Transformation Model Representation Language (TMRL) [ 4 ]. In this work, we use a general-purpose language to implement transformations. Moreover, all transformations can be represented in tabular form (Table 1).

Stage 4: Editing an obtained ontological schema. This stage is additional and represents a refinement (modification) of OWL code obtained with the aid of various ontological modeling tools, for example, Protégé and others.

So, the main result of these stages is a set of ontology classes and their properties, which define an ontological schema at the TBox level. 3.2

PKBD.Onto: a Plugin for PKBD

The PKBD.Onto plugin is implemented in the form of a Dynamic Link Library (DLL) that is dynamically connected via a unified PKBD API.

The unified PKBD API for supporting integration modules with external software in terms of import and export contains three functions: • getting a description of DLL including name and version (“DllInfo” function); • getting a detailed description of DLL (“About” function); • executing a main function of DLL, while a conceptual model in the PKBD format, a resulting file name, and a list of possible parameters are passed as a parameter (“Execute” function).

In the PKBD.Onto plugin architecture (Fig. 1) can be distinguished following components: • supporting a PKBD format of conceptual models, which provides access and manipulation of model elements; • transforming the input model to the OWL2 DL format; • transforming the input model to a set of linked data in the RDF format (can be viewed as a mean for obtaining a set of specific facts).

XML PKBD Parser OWL DL Generator

RDF Generator

Fig. 1. A PKBD.Onto plugin architecture. 3.3

Case Study

Currently, PKBD is used in the educational process at Irkutsk National Research Technical University (IrNRTU), Institute of Information Technology and Data Science. Therefore, as an example, let’s consider the educational task of developing an ontological schema fragment.

Information on minerals in the form of arbitrary spreadsheets is used as source data (Fig. 2). To unify the input data, a source arbitrary spreadsheet was preprocessed and a canonical spreadsheet resulted (Fig. 3).

Next, the canonical spreadsheet is analyzed using PKBD, in particular, by the PKBD.Onto plugin. Conceptual model elements are extracted as a result of this analysis. These elements can be visually represented as an RVML schema (Fig. 4). The obtained model requires modification, namely, all minerals were aggregated into a “Diamond” class (template), which must be renamed to “Mineral”.

Based on the modified conceptual model (Fig. 4), we generated the code of the ontological schema in the OWL format. Then, this code can be verified in Protégé (Fig. 5). In this paper, we describe a method and tool for ontological schemas generation (ontologies at the TBox level) in the form of a plugin for Personal Knowledge Base Designer. Spreadsheets reduced to a canonicalized form and saved in the CSV format were used as source data. Resulting OWL ontology codes are syntactically correct and can be evaluated by end-users.

The PKBD.Onto plugin allows one to create rapid prototypes of spreadsheet-based ontologies for a specific domain. Modified and refined ontologies can be used for intelligent systems and knowledge bases engineering [ 1 ]. 5

Acknowledgments

This work was financially supported by the Council for Grants of the President of Russia (grant No. MK-1647.2020.9), Program of the Fundamental Research of the Siberian Branch of the Russian Academy of Sciences, project no. IV.38.1.2 (reg. no. АААА-А17-117032210079-1), project no. IV.38.1.3 (reg. no. АААА-А17117032210077-7). Results are achieved using the Centre of collective usage «Integrated information network of Irkutsk scientific educational complex». 15. Yurin, A.Yu., Dorodnykh, N.O.: Personal knowledge base designer: Software for expert systems prototyping. SoftwareX 11, 100411 (2020). DOI: 10.1016/j.softx.2020.100411

1. Berman , A.F. , Nikolaichuk , O.A. , Yurin , A.Yu. , Kuznetsov , K.A. : Support of DecisionMaking Based on a Production Approach in the Performance of an Industrial Safety Review . Chemical and Petroleum Engineering 50 ( 1-2 ), 730 - 738 ( 2015 ). DOI: 10 .1007/s10556-015-9970-x

Silva , A.R. : Model-driven engineering: A survey supported by the unified conceptual model . Computer Languages, Systems & Structures 43 , 139 - 155 ( 2015 ). DOI: 10 .1016/j.cl. 2015 . 06 .001

3. Dorodnykh , N.O. : Web-based software for automating development of knowledge bases on the basis of transformation of conceptual models . Open Semantic Technologies for Intelligent Systems 1 , 145 - 150 ( 2017 ).

4. Dorodnykh , N.O. , Yurin , A. Yu .: A domain-specific language for transformation models . CEUR Workshop Proceedings (ITAMS- 2018 ) 2221 , 70 - 75 ( 2018 ).

5. Dorodnykh , N.O. , Yurin , A.Yu. , Shigarov , A.O. : Conceptual Model Engineering for Industrial Safety Inspection Based on Spreadsheet Data Analysis / / Communications in Computer and Information Science. Modelling and Development of Intelligent Systems (MDIS 2019 ) 1126 , 51 - 65 ( 2020 ). DOI: 10 .1007/978-3- 030 -39237- 6 _ 4

6. Grau , B.C. , Horrocks , I. , Motik , B. , Parsia , B. , Patel-Schneider , P. , Sattler , U. : OWL 2: The next step for OWL . Web Semantics: Science, Services and Agents on the World Wide Web 6 ( 4 ), 309 - 322 ( 2008 ). DOI: 10 .1016/j.websem. 2008 . 05 .001

7. Grishenko , M.A. Dorodnykh , N.O. , Nikolaychuk , O.A. , Yurin , A. Yu .: Designing rulebased expert systems with the aid of the model-driven development approach . Expert Systems 35 ( 5 ), 1 - 23 ( 2018 ). DOI: 10 .1111/exsy.12291

8. Guarino , N.: Formal Ontology in Information Systems . In: the First International Conference on Formal Ontology in Information Systems (FOIS'98) 46 , 3 - 15 ( 1998 ).

9. Lehmberg , O. , Ritze , D. , Meusel , R. , Bizer , C. : A large public corpus of web tables containing time and context metadata . Proceedings 25th International Conference Companion on World Wide Web , 75 - 76 ( 2016 ). DOI: 10.1145/2872518 .2889386

10. Levenshtein , V.I. : Binary codes capable of correcting deletions, insertions, and reversals . Tech. Rep. 8 , Soviet

Physics Doklady

( 1966 ).

11. Shigarov , A.O. , Khristyuk , V.V. , Mikhailov , A.M.: TabbyXL: Software platform for rulebased spreadsheet data extraction and transformation . SoftwareX 10 , 100270 ( 2019 ). DOI: 10 .1016/j.softx. 2019 .100270

12. Shigarov , A.O. , Mikhailov , A.A. : Rule-based spreadsheet data transformation from arbitrary to relational tables . Information Systems 71 , 123 - 136 ( 2017 ). https://doi.org/10.1016/j.is. 2017 . 08 .004

13. Tijerino , Y.A. , Embley , D.W. , Lonsdale , D.W. , Ding , Y. , Nagy , G.: Towards Ontology Generation from Tables . World Wide Web 8 ( 3 ), 261 - 285 ( 2005 ). DOI: 10.1007/s11280- 005-0360-8

14. Yurin , A.Yu. , Dorodnykh , N.O.: A Reverse Engineering Process for Inferring Conceptual Models from Canonicalized Tables . Proceedings of the 2019 International MultiConference on Engineering, Computer and Information Sciences (SIBIRCON ) 485 - 490 ( 2020 ). DOI: 10.1109/SIBIRCON48586 . 2019 .8958458