Integration of Ontological Case-Based Reasoning with Principal Component Analysis: Application to the IT Support Service © Tatiana Avdeenko © Anastasiia Timofeeva © Ekaterina Makarova Novosibirsk State Technical University, Novosibirsk, Russia avdeenko@corp.nstu.ru a.timofeeva@corp.nstu.ru katmc@yandex.ru Abstract. In present paper we propose an original approach to the indexing of cases by ontology concepts, as a result of which the special semantic data matrix is generated. The elements of this matrix are semantic links between cases and terminal concepts of the ontology. This matrix contains knowledge about the most stable, non-trivial relationships between the ontology concepts that determine the most frequently used cases. To identify these groups of concepts we propose and approve an approach based on modification of the principal component analysis with use of combination of polychoric correlations and correlation ratio. Interpretation of the loadings matrix on the principal components allows us to identify groups of interrelated concepts from different hierarchical branches of the ontology. Thus, problems that are at the junction of different concepts can be identified. The proposed method is implemented in the knowledge management system for IT support service. Keywords: case-based reasoning, ontology, principal component analysis, polychoric correlation. much experience, but a very large flow of telephone calls 1 Introduction from customers. users, the IT consultant has to determine the scope of Maintenance (support) of the software is the process of the problem, to analyze the primary information and, improving, optimizing and correcting software defects using personal experience and (or) reference materials, after putting it into operation. Software maintenance is to formulate the answer to the question. Our analysis one of the phases of the software life cycle. In the course shows that the average time taken to make a decision by of maintenance changes are made to the program in order a novice consultant and an experienced specialist differs to correct the defects discovered during the use, as well 2-4 times with the same complexity of the problem. At as to add new functionality increasing the usability and the same time, the use of even very simple means of applicability of the software. recording and extracting knowledge about solving There are two different points of view on the terms similar problems in the past (handwritten, text editor, "software maintenance" and "software support". The first spreadsheet editor, etc.) makes it possible to bring the one considers these two terms as synonyms. We hold the effectiveness of a novice consultant closer to the opposite view on this issue, when there is a difference effectiveness of the experienced analyst. Thus, it seems between these concepts. Maintenance of the software is promising to build a knowledge management system that executed by a maintainer who can be both the external helps to accumulate, systematize, integrate and organization or the organization, which uses the software effectively use the experience of analysts to solve IT (department or a separate employee). Support is provided problems of employees of the organization. exclusively by employees of the department of the The most important component of the knowledge organization that uses the software. They are less management system is the knowledge representation qualified specialists than maintainers. model, as well as the mechanism that allows this To implement the stage of software maintenance in knowledge to be extracted and adapted to the solution of organizations there appear IT departments containing the the required problem. It seems to us that Case-Based staff of analysts, programmers, consultants, most of Reasoning (CBR) is best suited for solving the problems whose work consists of consulting support of the users. of IT users than Rule-Based Reasoning (RBR) [1]. First, Typically, several maintenance lines are distinguished, cases are the most natural way to write down the differing, on the one hand, with the experience and experience of already made decisions, implementation of qualifications of IT support specialists, on the other hand, the system is reduced to the identification of essential the burden on consultants. On the zero-line (call-center, features describing the case. Second, identical or nearly information center, hotline) consultants have not very identical user's problems are very common, especially if the organization has many branches. Third, it is almost Proceedings of the XX International Conference impossible to build static rule-based model in an “Data Analytics and Management in Data Intensive extremely rapidly changing IT field, when very often Domains” (DAMDID/RCDL’2018), Moscow, Russia, October 9-12, 2018 49 new products and releases come out, interfaces and organization of the knowledge based on the integration functionality change. And, finally, what is the most of the case base with the domain ontology. As a result of important for the dynamic IT field, CBR-systems can be such integration we obtain a semantic matrix, the self-learning, thus, it is possible to obtain new cases and application to which methods of data analysis allows us even rules from the case base. to improve the procedure for retrieving relevant cases for At the same time there are essential shortcomings of solving IT user's problems. traditional CBR. The major one reveals itself when the The paper is organized as follows. In Section 2, we number of cases accumulated in the knowledge base describe the most important features determining the becomes great. The large case base results in reduced structure of case base for the IT support field. We system performance. It is difficult to determine good consider the problems We accumulated the cases of IT criteria for indexing and comparison of cases. problems arising from users working in the personnel To overcome the disadvantages of traditional CBR, it and accounting departments of the commercial company, has been widely integrated with other methods in various although similar problems can also be experienced by IT application domains [2,3]. Some systems (ADIOP, users of non-profit companies, universities, etc. In CADRE, CADSYN, CHARADE, COMPOSER, section 3 we describe the ontology of concepts to which IDIOM, JULIA) integrated CBR with constraint the cases in the IT support field could be referred. In satisfaction problem (CSP) algorithm. Some systems section 4 the proposed mechanism for the integration of (ANAPRON, AUGUSTE, CAMPER, CABARET, cases with the ontology concepts and obtaining the GREBE, GYMEL and SAXEX) combined CBR with semantic matrix "case–terminal" are presented. In rule-based reasoning (RBR) approach. It is worth to Section 5 the modification of principal component noting that the first prototype of the system, integrating analysis is given and its application to the semantic CBR with RBR was CABARET system [4]. In [5] it is matrix allowing to identify groups of interrelated proposed possible connection of CBR with RBR and its concepts and to interpret them. In section 6 we give application to the financial domain implemented in conclusion. prototype system MARS. Various types of coupling models involving combinations of CBR and RBR such 2 The structure of case base as sequential processing, co-processing and embedded processing are described in [6]. CBR can be combined CBR is an approach that allows to solve a new problem with fuzzy logic in fruitful ways in order to handle by using or adapting a solution previously taken in a imprecision. A usual approach is the incorporation of similar situation. In CBR method the knowledge base fuzzy logic into a CBR system in order to improve CBR consists of cases forming a case base. A case is a aspects [7-10]. In [11] combinations of CBR with other description of a problem or situation in conjunction with intelligent methods are considered. a detailed enumeration of actions taken in this situation Ontologies facilitate knowledge sharing and reuse. to solve the problem. When a new situation is considered, They can provide an explicit conceptualization the system finds a similar case in the knowledge base as describing data semantics and ensuring common an analog of the problem being solved and tries to use the understanding of the domain knowledge. To enhance the solution of the found case. If necessary, a close case is case retrieval and case adaptation, in [12] it was created adapted to the current situation. After applying the the domain ontology in the field of railroad accidents solution obtained from CBR to the current problem, the from which cases are instantiated in the case base and results are analyzed, then a new case is added to the case operational ontology in the form of decision rules. In [13] base for its use in the future. Thus, CBR-method includes integration of CBR with domain ontology is applied for four stages that form the so-called CBR-cycle, or the 4R Fault Diagnosis of steam turbine. In [14] jCOLIBRI cycle (Retrieve, Reuse, Revise, Retain) [17]. (Cases and Ontology Libraries Integration for Building Case-based reasoning (CBR) literature defines the Reasoning Infrastructures) is proposed to create process of building case base as a hard and time- knowledge-intensive and domain-independent CBR consuming task. In [18] methods are presented that can architecture. In [15] ontology-oriented CBR approach is be used to build the initial case base including the steps presented for trainings adaptive delivery. taken in order to make sure that the quality of the initial Despite the fact that there is a significant number of case set is appropriate. The case should include the papers concerning integration of CBR with other following elements: description of the situation with the intelligent methods, and even with the ontologies, only help of attributes; the decision that was made in this very few papers consider its application for the IT situation; the result of applying the solution. consultation problem. For example, in paper [16], the When developing a case structure for describing the representation of the IT application domain in the form problems of IT users, the description of the situation of ontology was used to improve the semantic search for should contain, if possible, all the information that is documents based on the indexing of documents by the necessary to achieve the goal, i.e. choosing the most ontology concepts in comparison with the usual indexing appropriate solution. The more detailed the expert will by keywords. However this paper does not use describe the current problem, the faster the answer will possibilities if CBR in order to apply past information for be found. Quite often, users form a request very briefly, solving current problems. for example: "There was a problem in the personnel In this paper we propose an original approach to the order." Here it is not clear in which order an error 50 occurred, because the order number is not specified, and Axapta, etc.); it is not specified which kind of problem arose. To clarify – UserRole - user can be a human resources officer, the issue the time is wasted, and the solution will be an accountant, a timekeeper, a chief accountant, a deputy given to the user not immediately, but after a while. chief accountant, an auditor, etc. The functionality that The decision that was made contains: a set of can be used to solve the problem depends on the user's operations that must be performed to obtain successful role; result, i.e. for the decision of a question of the user. The – VersionProgram - release or version of the software description of the solution may include links to other product. Software products are constantly updated, the cases, text information, an attached document with an developers fix bugs, therefore, before answering the instruction, and so on. The result of applying the solution user's question, it is necessary to understand which is the feedback that occurs when the solution is applied release the user is working on. to the current situation. The Changes property of the Precedent class is useful The cases can be represented in various ways. It is for the case where several consultants work with the case necessary to choose a case representation model based base. You can always understand who changed the case on the overall objectives of the system. The main and when. This attribute has the following subordinate problems when presenting a case are: the choice of properties: information that should be included in the description of – Period - the date and time when the case was the case, the search for a convenient case structure and created, or changes were made; the organization of a knowledge base for optimal and – User - the name of the user who has made the efficient search. change. We propose a hierarchical structure of the case in the The Files property has the following subordinate field of IT support, which is specified using the properties: Precedent class. The purpose of this class is to create the – FilesDescription - a brief description of the file; most complete structure for the information about the – FileName - the path to the file attached to the case. cases for counseling (solving the user problem), and also This can be a file with the error that occurs in this request, to establish a connection with the domain ontology. This or a file with a troubleshooting guide. class includes three groups of properties - Main, Changes The proposed structure of the case, which was and Files, whose purpose is structurally and described above, has necessary completeness and non- meaningfully to divide the information included in the redundancy, since it specifies the main characteristics of description of the case (see Figure 1). the user's request: user description, error, a set of keywords, software product, software version, user role and, finally, the decision of the user problem. The consultant gives a professional description that characterizes the user's problem. The case also contains information about making changes to the case: the date when the changes were made, by whom they were made, so that it is possible to analyze the changes made. One can attach a file to the case which contains instructions for solving the problem, or user errors that can be attached to the case. This information is sufficient to solve the user's problem and quickly find a suitable precedent. A set of Keyword 1..3 properties is reserved to establish relations from the Case to the concepts of the Figure 1 The sructure of Precedent class Domain Ontology described in the following section. These relationships allow to organize efficient retrieval The Main property has the following subordinate of cases being relevant to the current problems. properties: – Decision - a complete description of the sequence 3 Domain Ontology in the IT support field of actions (technology) to solve the problem; – DescriptonUser - information about the problem The concepts of the IT support field (relevant to that the user informs the consultant when formulating the personnel and accounting departments user's problems) request; are organizes in the form of ontology. Ontology is a – Error - technical error that can be solved only by formal explicit description of the concepts and the reprogramming (filled or not); relations between them. The ontology can be represented – Keyword 1 ... 3 - one or more attributes for the by the following tuple concepts of the domain that characterize the problem. O = C , R, S , T , With these attributes the case is related with the ontology; where C = {c i | i = 1, n} is a set of classes (concepts) – SoftwareProduct - software product where a user describing the basic notions of the domain; error occurred is made as a selection from the list (1C, 51 R = {ri | i = 1, m} is a set of binary relations between the question of the user "In the receipt of goods, the rate in the nomenclature is shown without VAT, why?". It is classes, R ⊆ C × C , R = {R ISA } ∪ {R ASS } , R ISA is an advisable to relate this case to the Purchase concept. antisymmetric, transitive, non-reflexive hierarchy The concept Payroll describes the basic subsections relation; R ASS is an associative relationship used to of the taxonomy "Calculations with the staff". In this establish a link from the case to the ontology; taxonomy, the tasks of automating the activities of both managers who make decisions on the salary of staff and S = {s i | i = 1, k} is a set of class properties; T is a set, accountants of salaries are being solved. Users can have which determines the vocabulary of the domain various questions related to these concepts. For example, concepts, built on a set of basic terms (a set of ontology a human resource officer may have the following classes) B = {bi | i = 1, n} . The structure of the class is questions: "When creating an employee, there is a defined as mistake that an individual already exists, what should I do?", "How do I make a sick list?" These questions can c = Name, (is − a c parent ), ( s1 ,...s n ( c ) ) , be related to the PersonnelRecords concept. where с, c parent ∈ С are the ontology classes connected The ContractUnit concept describes the main subsections of the subject area "Contractual Block". This by the hierarchy relation RISA, si ∈ S are the class slots, block is intended to automation of work in the sphere of Namec ∈ B is the class name being the base term of the registration and conducting contracts of counterparts. vocabulary T. Taxonomy of classes is formed by means For example, a contractor may have the following of indicating the relation «is-a» and the name of the questions: "How to put the contract into effect?", "Why parent cparent in the descendant class. Terminal concepts is there no accrual under the contract?" related to the that have no descendants will be called terminals. ContractCounterparties concept. Ontology was created in the Ontology Editor Protégé The hierarchy of concepts contains 71 concepts of the 4.2 which is free software. The ontologies built in this Payroll taxonomy, 82 concepts of the Accounting editor are exported to many formats, this software has an taxonomy and 11 concepts of the ContractUnit open and easily extensible architecture. A fragment of taxonomy. the hierarchy of the top-level concepts, which are direct descendants, is shown in Figure 2. 4 Integration of Case Base with the Domain Ontology In the conventional CBR method, the measure of closeness (distance) in a multidimensional space defined by the case features is used to retrieve cases. However, not necessarily the closest case is the most relevant in the semantic terms. Therefore it seems promising to make a comparison between the current situation and cases, assessing the degree of their connection with the concepts of ontology. Thus, closeness of cases to each Figure 2 Ontological graph of top-level concepts other is estimated by the degree of semantic closeness of the concepts associated with these cases. To achieve that The main classes of the top-level ontology are it is necessary to determine the semantic links of the Precedent (class for cases instances), Accounting, newly introduced cases with the ontology concepts at the Payroll and ContractUnit. The Accounting concept stage of creating the initial case base. describes the main subsections of accounting. The link of the instances of the Precedent class with Accounting is an orderly system for collecting, recording the ontology concepts is established by setting the and summarizing information in monetary terms about associative relation R ASS for the Keywords property property, liabilities of organizations and their movement through continuous, continuous and documented group for Precedent class that has type Dclass . accounting of all business transactions. Specifying the type Dclass for each of the I properties Accounting forms a taxonomy, which is formed by Keywords involves specifying an additional argument – twelve subordinate concepts. The Bank and Cash concepts reflect the conduct of transactions with cash. the associated ontology concept. If, for example, the i - The Concept Sale reflects the design of operations for the th slot of the group Keywords has the type Dclass with sales of goods and services to customers, this concept is the associated class C i , then as slot values when creating one of the main for the conduct of the enterprise. The the class Precedent instances we can use the classes of Concept Purchase is designed to take into account the conduct of transactions for the purchase of goods and the transitive closure Tr (C i ) of the concept Ci including services from suppliers. The Warehouse concept reflects (0) С i = C i and all its subclasses below in the hierarchy: the accounting of the movement of materials in the warehouse, etc. These concepts help to express the Tr (C i ) = {C i = C i(0) } ISA(C i(0) ) , where meaning of questions that from users. For example, the 52 L taxonomy. To do this, each concept with a parent is ISA(C (0) ) = {C (l ) ∈ C | ∃RISA (C (l −1) , C (l ) )} , L being the l =1 added an attribute – the weight of the concept. In present maximum depth of the class сi descendants. Here the version of the ontology it is assumed that all the children of the same parent have the same weight, equal to 1 / G , classes Precedent and Ci are connected by the associative where G is the number of children of the given parent. relation RASS ( Precedent , Ci ) . Thus, all the cases stored in the case base are Establishing the connection of a specific case with the connected with the ontology concepts. Each concept is ontology, the analyst chooses concepts that are included into the case representation with a weight semantically closest to the case. It can be either terminal, calculated on the basis of the associative relationships the most specified concepts, and non-terminal between the case and the ontology concepts. As a result (intermediate) concepts that have a more general we obtain semantic matrix with the values of weights meaning. It should be emphasized that in our approach w j , j = 1, J , for each case in the case base being the we allow setting several links for one case with different instances of the Precedent class. The number of rows of ontology concepts. This expands the expressive the semantic matrix is equal to the number of cases, and possibilities of the approach and can be usedwhen the the number of columns is equal to J – the number of problem arises at the junction of several concepts, and its adequate description requires consideration of this the ontology terminal concepts. One can further apply interdisciplinary character. data mining and machine learning methods to the Let in addition to the concept name Сi , the weight semantic matrix extracting knowledge from data. In the next section we propose application of the principal value vi , 0 ≤ vi ≤ 1 , ∑iI=1 vi = 1 , is given as an attribute component analysis to this data. for the i -th slot of the group Keywords establishing the strength of the relation between the case and the 5 Modification of Principal Component ontology concept. The more is the weight vi , the closer Analysis for Grouping Ontology Concepts by the meaning the case is to the corresponding concept Despite the fact that the concepts are carefully organized of the application domain.Let wehave J terminals in the into the ontology by a domain specialist, the IT problems ontology, and each terminal kw j , j = 1, J corresponds to of the users are often arise at the junction of various concepts. Therefore, the cases often refer to different the weight w j , j = 1, J , ∑ Jj =1 w j = 1 , that can be hierarchical branches of the ontology. The application of computed from the weights vi for the cases and the methods for grouping the concepts could identify the weights of the hierarchy relations in the ontology. The most frequent combination of concepts describing the procedure for forming a vector of weight coefficients user's problems. w j , j = 1, J , for the terminals kw j , j = 1, J , can be To group similar concepts, we apply the principal component analysis. However, the values of weight presented as follows. Suppose that considered case is coefficients, which show the semantic connection related to the concepts C1 , C 2 ,..., C I . between concepts and cases, take a limited number of 1. First we assign w j = 0, ∀j = 1, J . rational values as a result of multiplication of simple fractions. Thus, the original data are discrete. The 2. Second, cycle for all concepts `C , i = 1, I connected standard principal component analysis uses a correlation i with the case: matrix consisting of Pearson's correlation coefficients, which are based on the assumption of a multidimensional - if Сi is a terminal concept ( kw j = Сi = Ci(0) ), then normal distribution of variables. In our case this w j = w j + vi ; assumption is violated. It is more correct to use special correlation measures - if Сi is not a terminal concept, i.e. terminal concept for discrete variables, in particular, polychoric kw j is the L - level descendant of the intermediate correlations. They have several advantages over the L standard Pearson's correlation coefficient. First, they concept Сi , kw j = Ci(L) , then w = w + vi ⋅ ∏ v(l ) , where allow a better recovering of the theoretical model by j j i l =1 means of factor analysis [19]. Secondly, they are a (l ) is the weight of the hierarchical relation vi measure of monotonous dependence, that is, they allow us to reveal nonlinear relationship. Third, due to the fact R ISA (c i(l −1) , c i(l ) ) from the concept parent C i(l −1) to the that only the order of the values is taken into account, not child concept C i(l ) on the way from the concept Сi the interval between them, polychoric correlations are more robust to outliers. connected with the case instance to the terminal concept However, they have a number of drawbacks. First, kw j . The weights of concepts being descendants to the estimation of polychoric correlation is based on the one parent in the ontology are considered to be the optimization procedure and uses the values of bivariate same.In principle, if the descendants of a certain parent normal distribution function, so the calculation is rather have unequal influence on the parent concept, then it is slow with a large number of categories. To solve this possible to introduce weight coefficients into the problem, we developed an algorithms described in [20]. 53 Second, the definition of polychoric correlation is based analysis, the calculation of loadings on the principal on the assumption of a joint normal distribution of latent components and the extraction of interrelated concepts. variables [21]. To overcome this limitation, one can use With the use of this method, five principal skewed distributions and distributions with heavy "tails" components were extracted. It allows to present concepts [22]. In particular, in [23] generalizations of the of the domain ontology in a space of small dimension. polychoric correlation were proposed to improve The loadings on the principal components are presented flexibility. For this purpose bivariate Student and in Table 1. Their absolute values reflect the closeness of generalized lambda distributions were used allowing to relationship between the concepts and the principal increase the number of cases in which the data are components. The advantages of using the proposed consistent with distributional assumptions. approach in comparison with the standard one Finally, third, it was found that with a certain (calculation of the Pearson's correlation) should consist structure of the contingency table, polychoric correlation in increasing the percentage of variance of concepts erroneously indicates a strong relationship. This is a explained by the extracted components. So, with the particular problem for sparse frequency tables with a standard approach, the five extracted components sum up large number of zero values. This problem arose in the only 38,9% of the initial variation of the concepts, course of analysis of the semantic connection between whereas the proposed approach allows to explain 55,1% concepts and precedents. For a number of concepts, the of the variance. As a result, it allows us to break down structure of contingency tables containing the frequency the concepts into a smaller number of groups, the dij of the fact that the semantic connection for the first interconnections within which are closer. concept was assigned to the i-th category, and for the The obtained results can be interpreted from the point of second to the j-th category, was reduced to the form view of IT consulting practice. The concepts, combined the presented in Table 1, where v1, v2 are the weights for the first principal component, reflect the most common user first and second concepts. errors in the calculations. If there is an incorrect calculation, then as a rule the error arises either in the incorrect Table 1 Two-way contingency table formulation of vacation or sick leave, and the problem with the time-keeping. At the same time, problems with vacation Semantic v2=0, no v2>0, some and sick leave can lead to the errors in reporting on taxes (2- correspondence correspondence correspondence NDFL and / or 6-NDFL). Reports on personal income tax are v1=0, no d11≠0 d1k≠0 also interrelated, if there is an error or a question on one correspondence report, then the second one most likely will also have an error. v1>0, some dk1≠0 dkl=0, ∀ k, l≠1 The second group of concepts deals with problems in correspondence personnel reporting. If there is a question on the admission / dismissal orders, there will be a problem with personnel In this case, the polychoric correlation is equal to -1, reporting, and vice versa, if there is an error in the report, then which indicates a strong negative relationship. From the it is worth checking the personnel orders (admission, Table 1 it can be clearly seen that this problem dismissal). corresponds to the situation where there are no cases The concept Recalculation is connected with the third associated with two selected concepts, but a lot of cases principal component. When recalculating, as a rule, users not related to either one or the other. Logically, this forget to remake taxes, so there are errors in taxes, correlation must be zero. Thus, the polychoric insurance payment and wirings as a consequence. correlation is erroneous. Wirings also fell into the fourth group. The problem In order to avoid such problem situations when with wirings also arises when the calculation is calculating the correlation matrix, it is proposed to incorrectly. These are interdisciplinary issues. replace the polychoric coefficient by the correlation Calculation and Payment at the average wage are ratio, which is actively used in factor analysis of mixed mutually exclusive types, that is, at the same time a sick data (FAMD) [24].Thus, for grouping similar concepts leave (payment at the average wage) and calculation of ontology a method is proposed, which consists of the (salary payment) cannot meet together, this is a mistake. following steps. So, the user needs to make changes. Step 1.Calculation of polychoric correlations ρ. The fifth principal component associates with Step 2.Identification of problem situations by Calculation prepayment, Calculation of deductions and frequency tables, as well as by the values of the Salary. In the payment documents, it is always necessary polychoric correlations close to –1. to check the calculation of deductions, so that everything Step 3. Replacement of polychoric correlations is reflected correctly in the 6-NDFL statements. Also in the problem situations, revealed at the step 2, by the through salary payment documents a prepayment is values of the correlation ratios η, calculated as the mean formed. The prepayment is usually a fixed amount, between ηY|X and ηX|Y, taken with sign(ρ). sometimes as half of the salary, and then in the payment Step 4. Based on the resulting correlation matrix document deductions are reflected. But such questions consisting of polychoric correlations and correlation are rare. ratios, the implementation of the principal component Thus, concepts are combined into the groups by how often the errors occur when working with the software products. The first group of concepts is associated with 54 the most frequently encountered user's problems, since with this group of concepts. The prepayment, deductions the calculation errors are usually more frequent. The and salary are, as a rule, the most recent operations in the second most popular are the problems with personnel general list of all operations, and if everything was done documents (the errors of the second group). The correctly in the previous steps, there are very few errors problems with taxes and the average wage are not very associated with this group. frequent operations, this part is fairly well implemented As a result, concepts from different hierarchical in the programs. So, there are fewer questions connected branches of the ontology were grouped. Table 2 Loadings on principal components and cumulative percentage of explained variance Concepts Principal components 1 2 3 4 5 Order on admission 0,664 Оrder of dismissal 0,573 Vacation -0,629 Sick leave -0,514 Time-keeping -0,549 Reporting 0,806 Calculation prepayment 0,573 Calculation 0,748 Payment at the average wage -0,543 Calculation of deductions 0,476 Salary 0,576 Recalculation -0,607 2-NDFL 0,838 6-NDFL 0,727 Insurance payment -0,677 Other taxes -0,491 Wirings -0,461 0,415 Cumulative explained variance, % 14,7 27,0 37,7 46,6 55,1 different concepts can be identified. The latter can be used for the intelligent help for the user what additional 6 Conclusion concepts (in addition to the one already selected) to choose for the link with the current case (user problem). Thus, we proposed an original approach to the indexing According to the users of the IT support department, of cases through integration with the ontology concepts, after the introduction of the knowledge management as a result of which the semantic matrix "case-terminal" system, user satisfaction increased by 15% in average. is generated. The elements of this matrix are calculated User satisfaction was measured as an integrated on the basis of the initial assignment of the weights to the indicator, which includes both the quality of problem relationships of cases with the ontology concepts, and the solving and the time during which the user received a subsequent "descending" of the weights to the lowest response from the support service. level (terminal concepts) of the hierarchy. This One of the directions for further research involves the numerical matrix contains the knowledge about the most introduction of knowledge domains based on the stable, non-trivial relationships between the ontology extending the ontology for the users of other departments concepts that determine semantics of the frequently used of the organization, and, accordingly, the accumulation cases. of cases in these domains. Another direction involves To identify groups of interrelated concepts we using principal components to structure the case base proposed modification of principal component analysis. using clustering methods. Standard clustering algorithms Its main difference from the standard method is that are sensitive to noise in the data [25], so the reduction in instead of Pearson correlation coefficients combination the dimension is often used for preliminary data of polychoric correlations and the correlation ratio is processing. According to the results of [26], this allows used. It allows to increase the percentage of variance of increasing the classification accuracy. In addition, the concepts explained by the principal components. statistical efficiency of using the principal component Interpretation of the matrix of loadings on the principal analysis should consist in increasing the stability of components allows us to identify groups of interrelated clustering results. concepts from different hierarchical branches of the ontology. Thus, problems that are at the junction of Acknowledgments. The reported study was funded by 55 Russian Ministry of Education and Science, according to and Innovations in Intelligent Systems XIV, pp. the research project No. 2.2327.2017/4.6. 149-162. Springer (2007) [15] Mansouri, D., Hamdi-Cherif, A.: Ontology- References oriented case-based reasoning (CBR) approach for trainings adaptive delivery. Recent [1] Watson, I., and Marir, F.: Case-based Researches in Computer Science. In: Proc. of reasoning: A review. The Knowledge the 15th WSEAS International Conference on Engineering Review 9(4), 327-354 (1994) Computers, Corfu Island, Greece, July 15-17, [2] Marling, C., Sqalli, M., Rissland, E., Hector, pp. 328-333 (2011) M.A. and Aha, D.: Case-based reasoning [16] Shanavas N, Asokan S.: Ontology-Based integrations. AI Magazine 23(1), 69-86 (2002) Document Mining System for IT Support [3] Yang, H.L. and Wang, C.S.: Two stages of Service. In: Procedia Computer Science. case-based reasoning - Integrating genetic International Conference on Information and algorithm with data mining mechanism. Expert Communication Technologies (ICICT 2014), Systems with Applications 35, 262–272 (2008) 46, 329-336 (2015) [4] Rissland, E.L. and Skala, D.B.: Combining [17] Aamodt, A., Plaza, E.: Case–Based Reasoning: case-based and rule-based reasoning: A foundational issues, methodological variations, heuristic approach. In: Eleventh International and system approaches. AI Communications Joint Conference on Artificial Intelligence, pp. 7(1), 39-59 (1994) 524-530. IJCAI-89, Detroit (1989) [18] Wienhofen, L.W.M., Mathisen, B.M.: Defining [5] Dutta, S., Bonissone P.P.: Integrating case- and the Initial Case-Base for a CBR Operator rule-based reasoning. International Journal of Support System in Digital Finishing: A Approximate Reasoning 8(3), 163-203 (1993) Methodological Knowledge Acquisition [6] Prentzas, J., Hatzilygeroudis, I.: Categorizing Approach. Lecture Notes in Computer Science Approaches Combining Rule-Based and Case- 9969 (1), 430-444 (2016) Based Reasoning. Expert Systems 24, 97-122 [19] Holgado–Tello, F.P., Chacón–Moscoso, S., (2007) Barbero–García, I., Vila–Abad, E.: Polychoric [7] Cheetam, W., Shiu, S.C.K., Weber, R.O.: Soft versus Pearson correlations in exploratory and Case-Based Reasoning. Knowledge confirmatory factor analysis of ordinal Engineering Review 20, 267-269 (2006) variables. Quality & Quantity 44, 153-166 [8] Pal, S.K., Shiu, S.C.K.: Foundations of Soft (2010) Case-Based Reasoning. John Wiley (2004) [20] Timofeeva, A. Y.: Data type detection for [9] Avdeenko, T., Makarova, E.: Integration of choosing an appropriate correlation coefficient case-based and rule-based reasoningthrough in the bivariate case. CEUR Workshop fuzzy inference in decision support systems. Proceedings 1837, 188-194 (2017) Procedia Computer Science 103, 447-453 [21] Olsson, U.: Maximum Likelihood Estimation of (2017) the Polychoric Correlation Coefficient. [10] Avdeenko, T.V., Makarova, E.S.: Acquisition Psychometrica 44, 443-460 (1979) of knowledge in the form of fuzzy rules for [22] Uebersax, J.S., Grove,W.M.:A Latent Trait cases classification. Lecture Notes in Computer Finite Mixture Model for the Analysis of Rating Science 10387, 536-544 (2017) Agreement. Biometrics 49, 823-835 (1993) [11] Prentzas, J. and Hatzilygeroudis, I.: [23] Timofeeva, A. Y., Khailenko, E.A.: Combinations of Case-Based Reasoning with Generalizations of the polychoric correlation Other Intelligent Methods. CEUR Workshop approach for analyzing survey data. In: Proc. Proceedings 375, 55-58 (2008) 2016 11th International Forum on Strategic [12] Maalel, A., Mejri, L., Hadj-Mabrouk, H., Technology, IFOST 2016, pp. 254-258 (2016) Ben,Ghézela H.: Towards a Case-Based [24] Pagès, J.: Multiple Factor Analysis by Example Reasoning Approach Based on Ontologies Using R. London, Chapman & Hall/CRC The R Application to Railroad Accidents, In: Series (2014) Proceeding of Third International Conference of [25] Balcan, M.F., Liang, Y.,Gupta P.: Robust Data and Knowledge Engineering (ICDKE), hierarchical clustering. The Journal of Machine 48-55 (2012) Learning Research 15, 3831-71 (2014) [13] Dendani-Hadiby, N., Khadir, M.T.: A Case [26] Ding, C., Xiaofeng H.: K-means clustering via based Reasoning System based on Domain principal component analysis. In: Proc. of the Ontology for Fault Diagnosis of Steam twenty-first international conference on Turbines. International Journal of Hybrid Machine learning. ACM (2004) Information Technology 5(3), 89-103 (2012) [14] Recio-Garía, J., and Díaz-Agudo, B.: Ontology based CBR with jCOLIBR. In: Applications 56