=Paper=
{{Paper
|id=Vol-2005/paper-02
|storemode=property
|title=Information Technology for Decision-Making Based on Integration of Case Base and the Domain Ontology
|pdfUrl=https://ceur-ws.org/Vol-2005/paper-02.pdf
|volume=Vol-2005
|authors=Tatiana V. Avdeenko,Ekaterina A. Makarova
}}
==Information Technology for Decision-Making Based on Integration of Case Base and the Domain Ontology==
Information Technology for Decision-Making Based on Integration of Case Base and the Domain Ontology Tatiana V. Avdeenko and Ekaterina A. Makarova Novosibirsk State Technical University, Novosibirsk, Russia, avdeenko@corp.nstu.ru, e.s.makarova@corp.nstu.ru Abstract. The paper considers an approach to creation of new informa- tion technology to support decision-making in the field of IT consulting. The proposed approach is based on the storage and subsequent use of decision-making cases from the case base integrated with the domain on- tology. This integration is realized on the basis of establishing weighted associative links of cases with ontology concepts, and the subsequent identifying classes of semantically close precedents based on the proposed algorithm for calculating the matrix of closeness of cases to the termi- nal ontology concepts. Thus, the proposed approach makes it possible to increase the relevance of retrieved cases to the current decision-making problem. Keywords: Information technology, case-based reasoning, ontology, IT consulting 1 Introduction Currently, information technology is a principal instrument of any company that plays a significant role in its functioning. Due to frequent changes in legis- lation and rapid emergence of new interfaces and services, employees of organi- zations frequently have problems with application of the software products. To help in solving these problems, consultants-analysts are traditionally used being the workers of IT-departments engaged specifically in advising on the problems appearing while using the software. Consulting activity of an analyst includes the identification of an issue from the user, a careful analysis and search of the solu- tion, and subsequent formulation of possible answers to the question. Requests that come from the users to the analyst can be divided into the following groups by their difficulty: simple requests that a consultant can respond immediately; requests of medium difficulty that require analysis of the situation; complex re- quests that require detailed study of the situation; and simple specifications to refine the system. The average time spent by a consultant for a single request depends on the experience of his work and on the complexity of the problem. At the same time, it was noted that if an unexperienced consultant uses his or other people’s 12 knowledge about previous cases and solutions, he spends in average less time to solve the problem. This could be explained by the fact that different users often meet the same, or very similar, problems in their work. Thus, one can conclude that the use (of even very simple means of recording and retrieval of precedents that took place in the past), brings the overall perfor- mance of an unexperienced consultant to the effectiveness of an experienced one. Therefore, in the field of IT support, it is promising to build knowledge-based system capable not only to accumulation of the previous experience in the case base, but, also to the effective search of cases that are semantically close to the current problem. An important initial step of building an efficient knowledge-based system is the choice of the knowledge representation model for the knowledge base. Such systems allow one to apply the logical inference algorithms to known rules and facts to extract new (unknown) facts from the knowledge base. Rules are conve- nient and expressive constructions for representing knowledge in various subject areas. It is with this method of knowledge representation that the “success sto- ries” in the Artificial Intelligence (AI) is connected, for the AI transfer from the field of the simplest solutions to the sphere of real applications. Since 1980s, an alternative reasoning paradigm has increasingly attracted more and more attention. The case-based reasoning (CBR) solves new problems by adapting previously successful solutions to similar problems, just as a human- being does it. This approach has been successfully used in medical diagnostics, and in legal counseling. In our opinion, this method is suitable for creating a system of consulting the users using the experience of solving similar problems in the past. The CBR is based on the fundamental idea that this is often the way the decision-maker arrives, when finding in his memory information most closely related to the problem being solved and adapting the past conditions to new realities. The foundations of the CBR approach were set forth in the works of Shank [1,2], who first proposed to generalize knowledge about past situations in the form of the so-called scripts (knowledge containers that can be used both in the learning process and in the decision-making). The model of dynamic memory later became the basis for creation of a number of other systems: MEDIATOR [3], CHEF [4], and JULIA [5]. The CBR allowed one to overcome a number of limitations of the rule-based model [6]. It does not require the construction of an explicit domain model. The complex process of knowledge acquiring is reduced to the task of accumulating cases of making decisions in the past (precedents), as well as analyzing the re- sults of decision making. Implementation of an intellectual system based on the CBR in the simplest case is reduced to identifying the main variables (features) describing the case and accumulating the data described by this set of features. However, with the CBR development, there appeared the main drawbacks. One of them begins to appear when a very large database of precedents is accumu- lated. This raises the problem of retrieving relevant cases from a huge amount of big data. 13 It seems to us that the shortcomings of the CBR could be significantly re- duced on the basis of its integration with the knowledge representation models in the form of an ontology. For example, in paper [7], the representation of the IT application domain in the form of ontology was used to improve the seman- tic search for documents based on the indexing of documents by the ontology concepts in comparison with the usual indexing by keywords. In the present paper, we propose an original method of integrating the cases of IT-consulting with the domain ontology on the basis of establishing the rela- tionships of cases with the concepts of ontology. Each case can be related with several concepts of the ontology, which allows one to describe more adequately the semantics of a precedent of an IT consultation. The proposed method can be used for the subsequent effective extraction of cases relevant to the current situation using various Data Mining methods, for example, the method of con- structing the fuzzy rules proposed in [8,9]. The paper is organized as follows. In Section 2, we describe the ontology of the application area, in which the task of IT-consulting is being solved, as well as its implementation in the Protege editor. Section 3 describes the structure of the class, the instances of which are specific cases (stories) of user counseling. In Section 4, we describe the proposed mechanism for the integration of cases with the ontology concepts. 2 Domain ontology for IT consultation Ontology is a formal explicit description of the notions (concepts) of the application domain and the relations between these concepts [10]. The ontology could be represented by the following tuple: O = hC, R, S, G, T, Ei, (1) where C = {ci |i = 1, n} is a finite non-empty set of classes (concepts) describing the basic notions of the application domain; R = {ri |i = 1, m} is a finite set of binary relations between the classes, R ⊆ C × C, R = {RISA } ∪ {RASS }, RISA is an antisymmetric, transitive and non-reflexive hierarchy relation “class- subclass” defining a partial order on the set of classes; RASS is an associative relationship used to establish a link between the case base and the ontology concepts; S = {si |i = 1, k} is a finite set of slots (class attributes); G = {gsi |i = 1, l} is a finite set of facets (slot attributes); E = {ei |i = 1, u} is a finite set of class instances; T is a finite non-empty set, which determines the controlled vocabulary of the domain terms built on a set of basic terms B = {bi |i = 1, n} being a set of names of the ontology classes n [ n \ T = Ti , Ti = {bi }, Ti = . (2) i=1 i=1 The structure of the class is defined as follows: c = hN ame, (is − a cparent ), (s1 , ..., sn(c) )i, (3) 14 where c, cparent ∈ C are the ontology classes connected by the hierarchy relation RISA , si ∈ S are the class slots, N amec ∈ B is the class name being the base term of the vocabulary T . Hierarchy of classes (taxonomy) is formed by means of indicating the relation “is-a” and the name of the class-parent cparent in the descendant class. We call classes that have no descendants the terminal concepts of the ontology (terminals). Terminals will play the role of keywords in the semantic annotation of cases. The structure of the slot is defined as follows: sc = hN ameS,C , (gs1 , ..., gsk(S,C) )i, (4) where sc ∈ S is a slot of the class C, gsi ∈ G is a slot facets (slot properties), N ameS,C is the slot sc name. A fragment of the taxonomy (hierarchy) of the upper level concepts, which are direct descendants of the general T hing class, is shown in Fig. 1. The ontology was created in the Protege 4.2 Editor that is free software for building ontologies. Fig. 1. Ontological graph of the upper level concepts The main concepts of the upper level ontology are Accounting, P ayroll, and Contractunit. The concept Accounting describes the main subsections of ac- counting. The concept Accounting has twelve subordinate concepts forming the taxonomy. The concept P ayroll describes the main subsections of the taxon- omy “Calculations with the staff”. In this taxonomy, the tasks of automating the activities of both managers (who make decisions on the salary of staff) and accountants of salaries are solved. It also ensures the maintenance of mutual settlements with the employees of the enterprise, records payroll costs as part of the cost of production and services, starting with the input of documents on ac- tual production and payment of sick leaves up to the formation of documents for payment of wages and reporting to Governing supervisory bodies. The concept Contractunit describes the main subsections of the subject area “Contractual Block”. The contractual block is intended for automating the work of users in the sphere of registration and conducting contracts of counterparts. 15 The hierarchy of concepts created in Protege 4.2 contains 71 concepts of the taxonomy P ayroll, 82 concepts of the taxonomy Accounting, and 11 concepts of the taxonomy Contractunit. Thus, the ontology of the application domain described above is a combi- nation of three taxonomies of concepts connected by hierarchical relations. In principle, if the descendants of a certain parent have unequal influence on the parent concept, then it is possible to introduce weight coefficients into the tax- onomy. To do this, each concept with a parent is added a slot, containing the weight of the concept. In present version of the ontology it is assumed that all 1 the children of the same parent have identical weight equal to G , where G is the number of children of the given parent. 3 Structure of Precedent class Case based reasoning (CBR) is an approach that allows one to solve the new problem by using or adapting a solution previously taken in a similar situation. By the case, we mean a description of the problem or situation in conjunction with a detailed sequence of actions taken in this situation to solve this problem. When a new situation is considered, the system finds a similar case in the knowl- edge base as an analog to the task being solved and tries to apply the solution of the found precedent. If necessary, a close precedent is adjusted to the current situation. After applying the analogy-based solution to the current problem, the analysis of the results is processed, after which a new case is added to the case base for future use. So, the complete case description should include the follow- ing elements: description of the situation with the features; the decision that was made in this situation; interpretation (result) of applying the solution. The case can be represented in various ways, for example, with the help of tree structures, rows in databases, frames, etc. It is important to understand that the choice of a precedent representation is necessary that is based on the overall objectives of the system. The main problems when presenting a precedent are: the choice of information that should be included into the description of the case, the search for a convenient precedent structure, and the organization of a knowledge base for optimal and efficient search. It is important to understand that the choice of a case representation should be based on the overall objectives of the system. The major problems in the cases representation are: the choice of information that should be included into the description of the case, the search for a convenient case structure, the efficient organization of the knowledge base for optimal, and efficient retrieval of cases. The initial case representation could be simple (linear) CASE = (x1 , x2 , x3 , ..., xn , s), (5) where x1 , x2 , x3 , ..., xn are the values of attributes (features) identifying the si- tuation, s is solution to the problem defined in the case. Subsequently, with the deepening into the problem domain, possible complication of case structure is 16 possible through, for example, the introduction of hierarchy and other relation- ships between the attributes. To integrate the ontology of the application domain with a description of cases, a class P recedent was created. This class does not have a branched hier- archical structure like the domain ontology classes (concepts), but it does not have child classes at all. The purpose of the class P recedent is to create the complete structure to input the information about the decision-making cases and, also, to establish a semantic link of the case with the domain ontology. The class P recedent includes three groups of properties (slots) dividing structurally and substantively the information included into the case description. The slot M ain has the following child slots: – Decision contains a complete description of the sequence of the user actions (technology) to solve the problem; – DescriptionU ser contains information about the problem that the user transfers to the consultant when formulating the request; – Error is filled with the information about technical error (if it is), which could be solved only by reprogramming; – group of slots Keyword 1 ... 3 contains one or several slots to create semantic link with the ontology keywords; – Sof twareP roduct contains information about the software where the error occurs (1C, Axapta, etc.); – U serRole determines the user, which can be an employee of the personnel department, an accountant, a timekeeper, an auditor, etc. (the functionality that can be used to solve the problem depends on the user’s role); – V ersionP rogram determines release or version of the software product. Soft- ware products are constantly updated developers correct errors, therefore, before deciding the user’s problem it is necessary to understand which release the user is working on. The slot Changes of the class P recedent slot is useful when several con- sultants work with the same database. With this information, one can always understand who and when changed the case sample. This slot has the child slots (P eriod is the date and time when the case was created or the changes on the case were made and U ser the name of the user who made the changes over the case). The slot F ile has the child slots F ileDescription containing a brief de- scription of the file and F ileN ame determines the path to the file attached to the case. This can be a file with an error that occurs in the request or a file with a troubleshooting guide. The structure of the P recedent class described above has the necessary com- pleteness and non-redundancy. We determine where the problem arose (software and its version), who meets the problem (user role), how the user sees the prob- lem (user description, error). The consultant gives professional definition of the problem characteristics and determines the place of the problem in the domain ontology through associative links with the ontology concepts. The case contains the information about modification of the case by the consultant. Finally, a file that contain instructions for solving the problem can be attached to the case. 17 4 Integration of cases with the ontology It seems promising to make a comparison between the current situation and cases assessing the degree of their connection with the concepts of ontology. Thus, closeness of cases to each other is estimated by degree of the semantic closeness of the concepts associated with these cases. To achieve that, it is necessary to determine the semantic links of newly introduced cases with the ontology concepts at the stage of creating the knowledge base. The link of the instances of the P recedent class with the ontology concepts is established by setting the associative relation RASS for the slots of the Keywords group for P recedent class (slot M ain). The link is determined by the explicitly specifying the associated ontology concept name as the slot value. To implement this associative link, the type Dclass is used as the type for the group of slots Keywords. If, for example, the i-th slot of the group Keywords has the type Dclass with the associated class Ci , then as the slot values (when creating the class P recedent instances) we can use the classes of a set T r(Ci ) of the transitive closure of the concept Ci including the class Ci = Ci0 and all its subclasses that are below in the hierarchy [ T r(Ci ) = {Ci = Ci0 } ISA(Ci0 ), (6) SL where ISA(C 0 ) = l=1 {C l ∈ C|∃RISA (C (l−1) , C l )}. L being the maximum depth of the descendants of the class. Here, the classes P recedent and Ci are connected by the associative relation RASS (P recedent, Ci ). Establishing the connection of a specific case with the ontology, the analyst chooses concepts that are closest by the meaning to the case. It can be either terminal (having no descendants), the most specified concepts, and non-terminal (intermediate) concepts that have a more general meaning. Necessity in the links with non-terminal concepts arises if the current problem cannot be unequivocally referred to the terminal concept or the analyst does not have sufficient experience and it is easier for him to classify the case to a more general concept. It should be emphasized that we allow establishing of not unique but several different links for the case with the ontology concepts. This expands the expres- sive possibilities of our approach and can be used when the problem arises at the junction of several concepts and its adequate description requires consideration of this interdisciplinary character. Let in addition to the concept name Ci , the weight value vi , 0 ≤ vi ≤ 1, PI vi = 1, is given as the facet (property of a slot) for the i-th slot of the group i=1 Keywords, which establishes the strength of the relationship between the case and the corresponding ontology concept. The more is the weight, the closer by the meaning the case is to the corresponding concept of the application domain. Now let us consider how to organize the procedure of classifying and retriev- ing semantically close cases using the integrated model described above. Further, we distinguish between terminal and nonterminal concepts of the ontology. Let particular shall terminal concepts be the keywords for indexing the cases. Let 18 we have J keywords and each keyword kwj ,j = 1, J, corresponds to the weight J P wj ,j = 1, J, wj = 1, that can be computed from the weights vi for the cases j=1 and the weights of the hierarchy relations in the ontology. The procedure for calculating the weights wj , j = 1, J, can be organized as follows. Without loss of generality we assume that the ontology concepts with which a certain case is connected do not enter into the transitive closure of each other (that is, they should not be located on the same hierarchical branch). This assumption is quite natural, since if we can link the case instance to a more specific concept-descendant, then there is no need in its connection with a more general concept-ancestor. In this case, the procedure for forming a vector of weight coefficients wj , j = 1, J, for the keywords kwj , j = 1, J, can be represented as follows. Suppose that considered case is related to concepts C1 , C2 , . . . , CI . First, we assign wj , ∀j = 1, J. Second, introduce the cycle for all concepts Ci , i = 1, I, connected with the precedent - if Ci is a terminal concept (kwj = Ci = Ci0 ), then wj = vi ; - if Ci is not a terminal concept, i.e., terminal concept kwj is the L-level L descendant of the intermediate concept Ci , (kwj = CiL ), then wj = vi * vil , Q l=1 (l−1) where vil is the weight of the hierarchical relation from the parent concept Ci to the child concept Cil on the way from the concept Ci connected with the case instance to the terminal concept kwj . The weights of concepts being descendants to the one parent in the ontology are considered to be the same, as discussed in Section 2. Let us illustrate calculation of the keyword weights for a specific exam- ple. Suppose that in the knowledge base we have a specific case associated only with the concepts belonging to the transitive closure of the class P ayroll with respect to the relation RISA , a fragment of an ontology with a fully ex- panded class (up to terminal concepts), is shown in Fig. 2. The expert-analyst has established three associative relationships from the case to the ontology, and two links are established to the terminal concepts T ransf erOrder and OtherOrders. The weights of the terminals T ransf erOrder and OtherOrders, accordingly, are assumed to be equal to the corresponding weights established by the analyst v1 = 0.3 and v2 = 0.1.The third link is established to the non-terminal concept Staf f ing, which is the parent of the three terminal con- cepts DepartmentDirectory, P ostDirectory, and Staf f ingSpecif ication. Ac- cordingly, the weight v3 = 0.6, introduced by the analyst to link this case with the concept Staf f ing is divided into these three keywords equally. The weights corresponding to the keywords are equal to wj = 0.6 ∗ 31 = 0.2. The other keywords remain unrelated to the case sample, and accordingly, have zero weights. Thus, for a subset of the eight terminal concepts OrderOnAdmission, OrderOf AssigmentOf ClassRank,OrderOf Dismissal,OtherOrders, T rans- f erOrder,DepartmentDirectory, Staf f ingSpecif ication, and P ostDirectory represented in Fig.2, we have the following sub-vector of weights 19 e = (0; 0; 0; 0.1; 0.3; 0.2; 0.2; 0.2)T corresponding to the current case. Since this w case has the relations only with the concepts of this fragment of the ontology, all remaining keywords of the ontology have zero weights. Fig. 2. Calculating the relationships of the case with the ontology terminals Thus, all the cases stored in the knowledge base are indexed using the ontol- ogy concepts and corresponding keywords. Each keyword (terminal concept of the ontology) is included into the case representation with a weight calculated on the basis of the associative relationships between the case and the ontology concepts. As a result, we obtain the data table with the values of weights wj , J = 1, J, for each case in the case base being the instances of the P recedent class. The number of rows of the data table is equal to the number of cases and the number of columns is equal to the number J of the ontology terminal con- cepts. One can further apply data mining methods to the obtained data table extracting knowledge from data. For example, it is possible to conduct cluster analysis to divide the set of cases into the classes of semantically close cases. After obtaining semantically close classes of cases as a result of clustering the cases, the procedure for retrieving cases relevant to the current problem consists in the following. Describing the current situation, the analyst should relate the current case with the ontology concepts, as it was done with the cases in the knowledge base. As a result, we will have a vector of keyword weights corresponding to the problem being solved. In this case, the task of extracting cases relevant to the current problem can be reduced to classification of the current problem. Here, the most suitable method is the classification of cases based on building the system of fuzzy linguistic rules proposed in [8,9]. In these papers, the high accuracy of classification of cases have been confirmed on the test data. In addition, this method permits obtaining rules being the knowledge 20 of a higher organizational level than cases in the form of data table. The rules can be analyzed by experts and added to the knowledge base for the direct use in the explicit form. 5 Conclusion The paper proposed an approach to creation of information technology for decision-making in the field of IT consulting based on integration of case base with the domain ontology. As a result of application of the approach we obtain data matrix characterizing semantic closeness of cases through their closeness to the domain ontology concepts. Further one can apply machine learning methods to the data matrix, first, to derive classes of semantically close cases, second, to retrieve relevant decision-making cases using classification algorithms. Acknowledgments The reported study was funded by Russian Ministry of Education and Sci- ence, according to the research project No. 2.2327.2017/4.6. References 1. Schank, R., Abelson, R.: Scripts, Plans, Goals and Understanding, Erlbau (1977) 2. Schank, R.: Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge University Press (1982) 3. Simpson, R.: A Computer Model of Case-Based Reasoning in Problem Solving: An Investigation in the Domain of Dispute Mediation. Technical Report GIT-ICS- 85/18, Georgia Institute of Technology, School of Information and Computer Science (1985) 4. Hammond, K.: CHEF: A model of case-based planning. In: Proc. American Asso- ciation for ArtificialIntelligence, AAAI-86, Philadelphia, PA (1986) 5. Hinrichs, T.: Problem Solving in Open Worlds. Lawrence Erlbaum (1992) 6. Watson, I., Marir, F.: Case-based reasoning: A review. The Knowledge Engineering Review. 9(4), 327–354 (1994) 7. Shanavas, N., Asokan, S.: Ontology-Based Document Mining System for IT Support Service. Procedia Computer Science. International Conference on Information and Communication Technologies (ICICT 2014), 329–336 (2015) 8. Avdeenko, T., Makarova, E.: Integration of case-based and rule-based reasoning through fuzzy inference in decision support systems // Procedia Computer Science. 103, 447–453 (2017) 9. Avdeenko, T., Makarova, E.: The case-based decision support system in the field of IT-consulting // Journal of Physics: Conference Series. 803, 6 (2017) 10. Gruber, T.: Ontolingua: A Mechanism to Support Portable Ontologies, Technical Report KSL 91-66, Knowledge Systems Laboratory, Stanford University (1992)