Integration of Ontological Case-Based Reasoning with
         Principal Component Analysis: Application to the IT
                           Support Service
          © Tatiana Avdeenko       © Anastasiia Timofeeva         © Ekaterina Makarova
                             Novosibirsk State Technical University,
                                      Novosibirsk, Russia
             avdeenko@corp.nstu.ru    a.timofeeva@corp.nstu.ru       katmc@yandex.ru
            Abstract. In present paper we propose an original approach to the indexing of cases by ontology
     concepts, as a result of which the special semantic data matrix is generated. The elements of this matrix are
     semantic links between cases and terminal concepts of the ontology. This matrix contains knowledge about
     the most stable, non-trivial relationships between the ontology concepts that determine the most frequently
     used cases. To identify these groups of concepts we propose and approve an approach based on modification
     of the principal component analysis with use of combination of polychoric correlations and correlation ratio.
     Interpretation of the loadings matrix on the principal components allows us to identify groups of interrelated
     concepts from different hierarchical branches of the ontology. Thus, problems that are at the junction of
     different concepts can be identified. The proposed method is implemented in the knowledge management
     system for IT support service.
            Keywords: case-based reasoning, ontology, principal component analysis, polychoric correlation.

                                                                     much experience, but a very large flow of telephone calls
 1 Introduction                                                      from customers.
                                                                         users, the IT consultant has to determine the scope of
 Maintenance (support) of the software is the process of             the problem, to analyze the primary information and,
 improving, optimizing and correcting software defects               using personal experience and (or) reference materials,
 after putting it into operation. Software maintenance is            to formulate the answer to the question. Our analysis
 one of the phases of the software life cycle. In the course         shows that the average time taken to make a decision by
 of maintenance changes are made to the program in order             a novice consultant and an experienced specialist differs
 to correct the defects discovered during the use, as well           2-4 times with the same complexity of the problem. At
 as to add new functionality increasing the usability and            the same time, the use of even very simple means of
 applicability of the software.                                      recording and extracting knowledge about solving
     There are two different points of view on the terms             similar problems in the past (handwritten, text editor,
 "software maintenance" and "software support". The first            spreadsheet editor, etc.) makes it possible to bring the
 one considers these two terms as synonyms. We hold the              effectiveness of a novice consultant closer to the
 opposite view on this issue, when there is a difference             effectiveness of the experienced analyst. Thus, it seems
 between these concepts. Maintenance of the software is              promising to build a knowledge management system that
 executed by a maintainer who can be both the external               helps to accumulate, systematize, integrate and
 organization or the organization, which uses the software           effectively use the experience of analysts to solve IT
 (department or a separate employee). Support is provided            problems of employees of the organization.
 exclusively by employees of the department of the                       The most important component of the knowledge
 organization that uses the software. They are less                  management system is the knowledge representation
 qualified specialists than maintainers.                             model, as well as the mechanism that allows this
     To implement the stage of software maintenance in               knowledge to be extracted and adapted to the solution of
 organizations there appear IT departments containing the            the required problem. It seems to us that Case-Based
 staff of analysts, programmers, consultants, most of                Reasoning (CBR) is best suited for solving the problems
 whose work consists of consulting support of the users.             of IT users than Rule-Based Reasoning (RBR) [1]. First,
 Typically, several maintenance lines are distinguished,             cases are the most natural way to write down the
 differing, on the one hand, with the experience and                 experience of already made decisions, implementation of
 qualifications of IT support specialists, on the other hand,        the system is reduced to the identification of essential
 the burden on consultants. On the zero-line (call-center,           features describing the case. Second, identical or nearly
 information center, hotline) consultants have not very              identical user's problems are very common, especially if
                                                                     the organization has many branches. Third, it is almost
Proceedings of the XX International Conference                       impossible to build static rule-based model in an
“Data Analytics and Management in Data Intensive                     extremely rapidly changing IT field, when very often
Domains” (DAMDID/RCDL’2018), Moscow, Russia,
October 9-12, 2018


                                                                49
new products and releases come out, interfaces and                 organization of the knowledge based on the integration
functionality change. And, finally, what is the most               of the case base with the domain ontology. As a result of
important for the dynamic IT field, CBR-systems can be             such integration we obtain a semantic matrix, the
self-learning, thus, it is possible to obtain new cases and        application to which methods of data analysis allows us
even rules from the case base.                                     to improve the procedure for retrieving relevant cases for
    At the same time there are essential shortcomings of           solving IT user's problems.
traditional CBR. The major one reveals itself when the                 The paper is organized as follows. In Section 2, we
number of cases accumulated in the knowledge base                  describe the most important features determining the
becomes great. The large case base results in reduced              structure of case base for the IT support field. We
system performance. It is difficult to determine good              consider the problems We accumulated the cases of IT
criteria for indexing and comparison of cases.                     problems arising from users working in the personnel
    To overcome the disadvantages of traditional CBR, it           and accounting departments of the commercial company,
has been widely integrated with other methods in various           although similar problems can also be experienced by IT
application domains [2,3]. Some systems (ADIOP,                    users of non-profit companies, universities, etc. In
CADRE, CADSYN, CHARADE, COMPOSER,                                  section 3 we describe the ontology of concepts to which
IDIOM, JULIA) integrated CBR with constraint                       the cases in the IT support field could be referred. In
satisfaction problem (CSP) algorithm. Some systems                 section 4 the proposed mechanism for the integration of
(ANAPRON, AUGUSTE, CAMPER, CABARET,                                cases with the ontology concepts and obtaining the
GREBE, GYMEL and SAXEX) combined CBR with                          semantic matrix "case–terminal" are presented. In
rule-based reasoning (RBR) approach. It is worth to                Section 5 the modification of principal component
noting that the first prototype of the system, integrating         analysis is given and its application to the semantic
CBR with RBR was CABARET system [4]. In [5] it is                  matrix allowing to identify groups of interrelated
proposed possible connection of CBR with RBR and its               concepts and to interpret them. In section 6 we give
application to the financial domain implemented in                 conclusion.
prototype system MARS. Various types of coupling
models involving combinations of CBR and RBR such                  2 The structure of case base
as sequential processing, co-processing and embedded
processing are described in [6]. CBR can be combined               CBR is an approach that allows to solve a new problem
with fuzzy logic in fruitful ways in order to handle               by using or adapting a solution previously taken in a
imprecision. A usual approach is the incorporation of              similar situation. In CBR method the knowledge base
fuzzy logic into a CBR system in order to improve CBR              consists of cases forming a case base. A case is a
aspects [7-10]. In [11] combinations of CBR with other             description of a problem or situation in conjunction with
intelligent methods are considered.                                a detailed enumeration of actions taken in this situation
    Ontologies facilitate knowledge sharing and reuse.             to solve the problem. When a new situation is considered,
They can provide an explicit conceptualization                     the system finds a similar case in the knowledge base as
describing data semantics and ensuring common                      an analog of the problem being solved and tries to use the
understanding of the domain knowledge. To enhance the              solution of the found case. If necessary, a close case is
case retrieval and case adaptation, in [12] it was created         adapted to the current situation. After applying the
the domain ontology in the field of railroad accidents             solution obtained from CBR to the current problem, the
from which cases are instantiated in the case base and             results are analyzed, then a new case is added to the case
operational ontology in the form of decision rules. In [13]        base for its use in the future. Thus, CBR-method includes
integration of CBR with domain ontology is applied for             four stages that form the so-called CBR-cycle, or the 4R
Fault Diagnosis of steam turbine. In [14] jCOLIBRI                 cycle (Retrieve, Reuse, Revise, Retain) [17].
(Cases and Ontology Libraries Integration for Building                 Case-based reasoning (CBR) literature defines the
Reasoning Infrastructures) is proposed to create                   process of building case base as a hard and time-
knowledge-intensive and domain-independent CBR                     consuming task. In [18] methods are presented that can
architecture. In [15] ontology-oriented CBR approach is            be used to build the initial case base including the steps
presented for trainings adaptive delivery.                         taken in order to make sure that the quality of the initial
    Despite the fact that there is a significant number of         case set is appropriate. The case should include the
papers concerning integration of CBR with other                    following elements: description of the situation with the
intelligent methods, and even with the ontologies, only            help of attributes; the decision that was made in this
very few papers consider its application for the IT                situation; the result of applying the solution.
consultation problem. For example, in paper [16], the                  When developing a case structure for describing the
representation of the IT application domain in the form            problems of IT users, the description of the situation
of ontology was used to improve the semantic search for            should contain, if possible, all the information that is
documents based on the indexing of documents by the                necessary to achieve the goal, i.e. choosing the most
ontology concepts in comparison with the usual indexing            appropriate solution. The more detailed the expert will
by keywords. However this paper does not use                       describe the current problem, the faster the answer will
possibilities if CBR in order to apply past information for        be found. Quite often, users form a request very briefly,
solving current problems.                                          for example: "There was a problem in the personnel
    In this paper we propose an original approach to the           order." Here it is not clear in which order an error


                                                              50
occurred, because the order number is not specified, and            Axapta, etc.);
it is not specified which kind of problem arose. To clarify             – UserRole - user can be a human resources officer,
the issue the time is wasted, and the solution will be              an accountant, a timekeeper, a chief accountant, a deputy
given to the user not immediately, but after a while.               chief accountant, an auditor, etc. The functionality that
     The decision that was made contains: a set of                  can be used to solve the problem depends on the user's
operations that must be performed to obtain successful              role;
result, i.e. for the decision of a question of the user. The            – VersionProgram - release or version of the software
description of the solution may include links to other              product. Software products are constantly updated, the
cases, text information, an attached document with an               developers fix bugs, therefore, before answering the
instruction, and so on. The result of applying the solution         user's question, it is necessary to understand which
is the feedback that occurs when the solution is applied            release the user is working on.
to the current situation.                                               The Changes property of the Precedent class is useful
     The cases can be represented in various ways. It is            for the case where several consultants work with the case
necessary to choose a case representation model based               base. You can always understand who changed the case
on the overall objectives of the system. The main                   and when. This attribute has the following subordinate
problems when presenting a case are: the choice of                  properties:
information that should be included in the description of               – Period - the date and time when the case was
the case, the search for a convenient case structure and            created, or changes were made;
the organization of a knowledge base for optimal and                    – User - the name of the user who has made the
efficient search.                                                   change.
     We propose a hierarchical structure of the case in the             The Files property has the following subordinate
field of IT support, which is specified using the                   properties:
Precedent class. The purpose of this class is to create the             – FilesDescription - a brief description of the file;
most complete structure for the information about the                   – FileName - the path to the file attached to the case.
cases for counseling (solving the user problem), and also           This can be a file with the error that occurs in this request,
to establish a connection with the domain ontology. This            or a file with a troubleshooting guide.
class includes three groups of properties - Main, Changes               The proposed structure of the case, which was
and Files, whose purpose is structurally and                        described above, has necessary completeness and non-
meaningfully to divide the information included in the              redundancy, since it specifies the main characteristics of
description of the case (see Figure 1).                             the user's request: user description, error, a set of
                                                                    keywords, software product, software version, user role
                                                                    and, finally, the decision of the user problem. The
                                                                    consultant gives a professional description that
                                                                    characterizes the user's problem. The case also contains
                                                                    information about making changes to the case: the date
                                                                    when the changes were made, by whom they were made,
                                                                    so that it is possible to analyze the changes made. One
                                                                    can attach a file to the case which contains instructions
                                                                    for solving the problem, or user errors that can be
                                                                    attached to the case. This information is sufficient to
                                                                    solve the user's problem and quickly find a suitable
                                                                    precedent.
                                                                        A set of Keyword 1..3 properties is reserved to
                                                                    establish relations from the Case to the concepts of the
Figure 1 The sructure of Precedent class                            Domain Ontology described in the following section.
                                                                    These relationships allow to organize efficient retrieval
    The Main property has the following subordinate                 of cases being relevant to the current problems.
properties:
    – Decision - a complete description of the sequence             3 Domain Ontology in the IT support field
of actions (technology) to solve the problem;
    – DescriptonUser - information about the problem                The concepts of the IT support field (relevant to
that the user informs the consultant when formulating the           personnel and accounting departments user's problems)
request;                                                            are organizes in the form of ontology. Ontology is a
    – Error - technical error that can be solved only by            formal explicit description of the concepts and the
reprogramming (filled or not);                                      relations between them. The ontology can be represented
    – Keyword 1 ... 3 - one or more attributes for the              by the following tuple
concepts of the domain that characterize the problem.                                    O = C , R, S , T ,
With these attributes the case is related with the
ontology;                                                           where C = {c i | i = 1, n} is a set of classes (concepts)
    – SoftwareProduct - software product where a user               describing the basic notions of the domain;
error occurred is made as a selection from the list (1C,


                                                               51
 R = {ri | i = 1, m} is a set of binary relations between the         question of the user "In the receipt of goods, the rate in
                                                                      the nomenclature is shown without VAT, why?". It is
classes, R ⊆ C × C , R = {R ISA } ∪ {R ASS } , R ISA is an            advisable to relate this case to the Purchase concept.
antisymmetric, transitive, non-reflexive hierarchy                        The concept Payroll describes the basic subsections
relation; R ASS is an associative relationship used to                of the taxonomy "Calculations with the staff". In this
establish a link from the case to the ontology;                       taxonomy, the tasks of automating the activities of both
                                                                      managers who make decisions on the salary of staff and
S = {s i | i = 1, k} is a set of class properties; T is a set,
                                                                      accountants of salaries are being solved. Users can have
which determines the vocabulary of the domain                         various questions related to these concepts. For example,
concepts, built on a set of basic terms (a set of ontology            a human resource officer may have the following
classes) B = {bi | i = 1, n} . The structure of the class is          questions: "When creating an employee, there is a
defined as                                                            mistake that an individual already exists, what should I
                                                                      do?", "How do I make a sick list?" These questions can
        c = Name, (is − a c parent ), ( s1 ,...s n ( c ) ) ,          be related to the PersonnelRecords concept.
where с, c parent ∈ С are the ontology classes connected                  The ContractUnit concept describes the main
                                                                      subsections of the subject area "Contractual Block". This
by the hierarchy relation RISA, si ∈ S are the class slots,           block is intended to automation of work in the sphere of
Namec ∈ B is the class name being the base term of the                registration and conducting contracts of counterparts.
vocabulary T. Taxonomy of classes is formed by means                  For example, a contractor may have the following
of indicating the relation «is-a» and the name of the                 questions: "How to put the contract into effect?", "Why
parent cparent in the descendant class. Terminal concepts             is there no accrual under the contract?" related to the
that have no descendants will be called terminals.                    ContractCounterparties concept.
    Ontology was created in the Ontology Editor Protégé                   The hierarchy of concepts contains 71 concepts of the
4.2 which is free software. The ontologies built in this              Payroll taxonomy, 82 concepts of the Accounting
editor are exported to many formats, this software has an             taxonomy and 11 concepts of the ContractUnit
open and easily extensible architecture. A fragment of                taxonomy.
the hierarchy of the top-level concepts, which are direct
descendants, is shown in Figure 2.                                    4 Integration of Case Base with the Domain
                                                                      Ontology
                                                                      In the conventional CBR method, the measure of
                                                                      closeness (distance) in a multidimensional space defined
                                                                      by the case features is used to retrieve cases. However,
                                                                      not necessarily the closest case is the most relevant in the
                                                                      semantic terms. Therefore it seems promising to make a
                                                                      comparison between the current situation and cases,
                                                                      assessing the degree of their connection with the
                                                                      concepts of ontology. Thus, closeness of cases to each
Figure 2 Ontological graph of top-level concepts                      other is estimated by the degree of semantic closeness of
                                                                      the concepts associated with these cases. To achieve that
    The main classes of the top-level ontology are                    it is necessary to determine the semantic links of the
Precedent (class for cases instances), Accounting,                    newly introduced cases with the ontology concepts at the
Payroll and ContractUnit. The Accounting concept                      stage of creating the initial case base.
describes the main subsections of accounting.                             The link of the instances of the Precedent class with
Accounting is an orderly system for collecting, recording             the ontology concepts is established by setting the
and summarizing information in monetary terms about
                                                                      associative relation R ASS for the Keywords property
property, liabilities of organizations and their movement
through continuous, continuous and documented                         group for Precedent class that has type Dclass .
accounting of all business transactions.                              Specifying the type Dclass for each of the I properties
    Accounting forms a taxonomy, which is formed by                    Keywords involves specifying an additional argument –
twelve subordinate concepts. The Bank and Cash
concepts reflect the conduct of transactions with cash.               the associated ontology concept. If, for example, the i -
The Concept Sale reflects the design of operations for the            th slot of the group Keywords has the type Dclass with
sales of goods and services to customers, this concept is             the associated class C i , then as slot values when creating
one of the main for the conduct of the enterprise. The                the class Precedent instances we can use the classes of
Concept Purchase is designed to take into account the
conduct of transactions for the purchase of goods and                 the transitive closure Tr (C i ) of the concept Ci including
services from suppliers. The Warehouse concept reflects                        (0)
                                                                       С i = C i and all its subclasses below in the hierarchy:
the accounting of the movement of materials in the
warehouse, etc. These concepts help to express the                    Tr (C i ) = {C i = C i(0) } ISA(C i(0) ) ,          where
meaning of questions that from users. For example, the


                                                                 52
               L                                                              taxonomy. To do this, each concept with a parent is
ISA(C (0) ) = {C (l ) ∈ C | ∃RISA (C (l −1) , C (l ) )} , L being the
              l =1
                                                                              added an attribute – the weight of the concept. In present
maximum depth of the class сi descendants. Here the                           version of the ontology it is assumed that all the children
                                                                              of the same parent have the same weight, equal to 1 / G ,
classes Precedent and Ci are connected by the associative                     where G is the number of children of the given parent.
relation RASS ( Precedent , Ci ) .                                                 Thus, all the cases stored in the case base are
    Establishing the connection of a specific case with the                   connected with the ontology concepts. Each concept is
ontology, the analyst chooses concepts that are                               included into the case representation with a weight
semantically closest to the case. It can be either terminal,                  calculated on the basis of the associative relationships
the most specified concepts, and non-terminal                                 between the case and the ontology concepts. As a result
(intermediate) concepts that have a more general                              we obtain semantic matrix with the values of weights
meaning. It should be emphasized that in our approach                          w j , j = 1, J , for each case in the case base being the
we allow setting several links for one case with different
                                                                              instances of the Precedent class. The number of rows of
ontology concepts. This expands the expressive
                                                                              the semantic matrix is equal to the number of cases, and
possibilities of the approach and can be usedwhen the
                                                                              the number of columns is equal to J – the number of
problem arises at the junction of several concepts, and its
adequate description requires consideration of this                           the ontology terminal concepts. One can further apply
interdisciplinary character.                                                  data mining and machine learning methods to the
    Let in addition to the concept name Сi , the weight                       semantic matrix extracting knowledge from data. In the
                                                                              next section we propose application of the principal
value vi , 0 ≤ vi ≤ 1 , ∑iI=1 vi = 1 , is given as an attribute               component analysis to this data.
for the i -th slot of the group Keywords establishing the
strength of the relation between the case and the                             5 Modification of Principal Component
ontology concept. The more is the weight vi , the closer                      Analysis for Grouping Ontology Concepts
by the meaning the case is to the corresponding concept                       Despite the fact that the concepts are carefully organized
of the application domain.Let wehave J terminals in the                       into the ontology by a domain specialist, the IT problems
ontology, and each terminal kw j , j = 1, J corresponds to                    of the users are often arise at the junction of various
                                                                              concepts. Therefore, the cases often refer to different
the weight w j , j = 1, J , ∑ Jj =1 w j = 1 , that can be
                                                                              hierarchical branches of the ontology. The application of
computed from the weights vi for the cases and the                            methods for grouping the concepts could identify the
weights of the hierarchy relations in the ontology. The                       most frequent combination of concepts describing the
procedure for forming a vector of weight coefficients                         user's problems.
 w j , j = 1, J , for the terminals kw j , j = 1, J , can be                      To group similar concepts, we apply the principal
                                                                              component analysis. However, the values of weight
presented as follows. Suppose that considered case is                         coefficients, which show the semantic connection
related to the concepts C1 , C 2 ,..., C I .                                  between concepts and cases, take a limited number of
    1. First we assign w j = 0, ∀j = 1, J .                                   rational values as a result of multiplication of simple
                                                                              fractions. Thus, the original data are discrete. The
    2. Second, cycle for all concepts `C , i = 1, I connected                 standard principal component analysis uses a correlation
                                                  i
with the case:                                                                matrix consisting of Pearson's correlation coefficients,
                                                                              which are based on the assumption of a multidimensional
-   if Сi is a terminal concept ( kw j = Сi = Ci(0) ), then
                                                                              normal distribution of variables. In our case this
w j = w j + vi ;                                                              assumption is violated.
                                                                                  It is more correct to use special correlation measures
- if Сi is not a terminal concept, i.e. terminal concept                      for discrete variables, in particular, polychoric
kw j is the L - level descendant of the intermediate                          correlations. They have several advantages over the
                                                       L
                                                                              standard Pearson's correlation coefficient. First, they
concept Сi , kw j = Ci(L) , then w = w + vi ⋅ ∏ v(l ) , where                 allow a better recovering of the theoretical model by
                                        j     j              i
                                                      l =1                    means of factor analysis [19]. Secondly, they are a
 (l )   is the weight of the hierarchical relation
vi                                                                            measure of monotonous dependence, that is, they allow
                                                                              us to reveal nonlinear relationship. Third, due to the fact
R ISA (c i(l −1) , c i(l ) ) from the concept parent C i(l −1) to the         that only the order of the values is taken into account, not
child concept C i(l ) on the way from the concept Сi                          the interval between them, polychoric correlations are
                                                                              more robust to outliers.
connected with the case instance to the terminal concept
                                                                                  However, they have a number of drawbacks. First,
kw j . The weights of concepts being descendants to the                       estimation of polychoric correlation is based on the
one parent in the ontology are considered to be the                           optimization procedure and uses the values of bivariate
same.In principle, if the descendants of a certain parent                     normal distribution function, so the calculation is rather
have unequal influence on the parent concept, then it is                      slow with a large number of categories. To solve this
possible to introduce weight coefficients into the                            problem, we developed an algorithms described in [20].


                                                                         53
Second, the definition of polychoric correlation is based           analysis, the calculation of loadings on the principal
on the assumption of a joint normal distribution of latent          components and the extraction of interrelated concepts.
variables [21]. To overcome this limitation, one can use                 With the use of this method, five principal
skewed distributions and distributions with heavy "tails"           components were extracted. It allows to present concepts
[22]. In particular, in [23] generalizations of the                 of the domain ontology in a space of small dimension.
polychoric correlation were proposed to improve                     The loadings on the principal components are presented
flexibility. For this purpose bivariate Student and                 in Table 1. Their absolute values reflect the closeness of
generalized lambda distributions were used allowing to              relationship between the concepts and the principal
increase the number of cases in which the data are                  components. The advantages of using the proposed
consistent with distributional assumptions.                         approach in comparison with the standard one
    Finally, third, it was found that with a certain                (calculation of the Pearson's correlation) should consist
structure of the contingency table, polychoric correlation          in increasing the percentage of variance of concepts
erroneously indicates a strong relationship. This is a              explained by the extracted components. So, with the
particular problem for sparse frequency tables with a               standard approach, the five extracted components sum up
large number of zero values. This problem arose in the              only 38,9% of the initial variation of the concepts,
course of analysis of the semantic connection between               whereas the proposed approach allows to explain 55,1%
concepts and precedents. For a number of concepts, the              of the variance. As a result, it allows us to break down
structure of contingency tables containing the frequency            the concepts into a smaller number of groups, the
dij of the fact that the semantic connection for the first          interconnections within which are closer.
concept was assigned to the i-th category, and for the                   The obtained results can be interpreted from the point of
second to the j-th category, was reduced to the form                view of IT consulting practice. The concepts, combined the
presented in Table 1, where v1, v2 are the weights for the          first principal component, reflect the most common user
first and second concepts.                                          errors in the calculations. If there is an incorrect calculation,
                                                                    then as a rule the error arises either in the incorrect
Table 1 Two-way contingency table                                   formulation of vacation or sick leave, and the problem with
                                                                    the time-keeping. At the same time, problems with vacation
  Semantic            v2=0, no           v2>0,     some
                                                                    and sick leave can lead to the errors in reporting on taxes (2-
  correspondence      correspondence     correspondence             NDFL and / or 6-NDFL). Reports on personal income tax are
  v1=0, no            d11≠0              d1k≠0                      also interrelated, if there is an error or a question on one
  correspondence                                                    report, then the second one most likely will also have an error.
  v1>0, some          dk1≠0              dkl=0, ∀ k, l≠1            The second group of concepts deals with problems in
  correspondence                                                    personnel reporting. If there is a question on the admission /
                                                                    dismissal orders, there will be a problem with personnel
    In this case, the polychoric correlation is equal to -1,        reporting, and vice versa, if there is an error in the report, then
which indicates a strong negative relationship. From the            it is worth checking the personnel orders (admission,
Table 1 it can be clearly seen that this problem                    dismissal).
corresponds to the situation where there are no cases                    The concept Recalculation is connected with the third
associated with two selected concepts, but a lot of cases           principal component. When recalculating, as a rule, users
not related to either one or the other. Logically, this             forget to remake taxes, so there are errors in taxes,
correlation must be zero. Thus, the polychoric                      insurance payment and wirings as a consequence.
correlation is erroneous.                                                Wirings also fell into the fourth group. The problem
    In order to avoid such problem situations when                  with wirings also arises when the calculation is
calculating the correlation matrix, it is proposed to               incorrectly. These are interdisciplinary issues.
replace the polychoric coefficient by the correlation               Calculation and Payment at the average wage are
ratio, which is actively used in factor analysis of mixed           mutually exclusive types, that is, at the same time a sick
data (FAMD) [24].Thus, for grouping similar concepts                leave (payment at the average wage) and calculation
of ontology a method is proposed, which consists of the             (salary payment) cannot meet together, this is a mistake.
following steps.                                                    So, the user needs to make changes.
      Step 1.Calculation of polychoric correlations ρ.                   The fifth principal component associates with
      Step 2.Identification of problem situations by                Calculation prepayment, Calculation of deductions and
frequency tables, as well as by the values of the                   Salary. In the payment documents, it is always necessary
polychoric correlations close to –1.                                to check the calculation of deductions, so that everything
      Step 3. Replacement of polychoric correlations                is reflected correctly in the 6-NDFL statements. Also
in the problem situations, revealed at the step 2, by the           through salary payment documents a prepayment is
values of the correlation ratios η, calculated as the mean          formed. The prepayment is usually a fixed amount,
between ηY|X and ηX|Y, taken with sign(ρ).                          sometimes as half of the salary, and then in the payment
      Step 4. Based on the resulting correlation matrix             document deductions are reflected. But such questions
consisting of polychoric correlations and correlation               are rare.
ratios, the implementation of the principal component                    Thus, concepts are combined into the groups by how
                                                                    often the errors occur when working with the software
                                                                    products. The first group of concepts is associated with


                                                               54
the most frequently encountered user's problems, since              with this group of concepts. The prepayment, deductions
the calculation errors are usually more frequent. The               and salary are, as a rule, the most recent operations in the
second most popular are the problems with personnel                 general list of all operations, and if everything was done
documents (the errors of the second group). The                     correctly in the previous steps, there are very few errors
problems with taxes and the average wage are not very               associated with this group.
frequent operations, this part is fairly well implemented              As a result, concepts from different hierarchical
in the programs. So, there are fewer questions connected            branches of the ontology were grouped.

Table 2 Loadings on principal components and cumulative percentage of explained variance
             Concepts                                                        Principal components
                                                               1            2            3            4         5
             Order on admission                                           0,664
             Оrder of dismissal                                           0,573
             Vacation                                    -0,629
             Sick leave                                  -0,514
             Time-keeping                                -0,549
             Reporting                                                     0,806
             Calculation prepayment                                                                            0,573
             Calculation                                                                             0,748
             Payment at the average wage                                                            -0,543
             Calculation of deductions                                                                       0,476
             Salary                                                                                          0,576
             Recalculation                                                            -0,607
             2-NDFL                                       0,838
             6-NDFL                                       0,727
             Insurance payment                                                        -0,677
             Other taxes                                                              -0,491
             Wirings                                                                  -0,461        0,415
             Cumulative explained variance, %             14,7             27,0        37,7         46,6      55,1

                                                                    different concepts can be identified. The latter can be
                                                                    used for the intelligent help for the user what additional
6 Conclusion                                                        concepts (in addition to the one already selected) to
                                                                    choose for the link with the current case (user problem).
Thus, we proposed an original approach to the indexing                  According to the users of the IT support department,
of cases through integration with the ontology concepts,            after the introduction of the knowledge management
as a result of which the semantic matrix "case-terminal"            system, user satisfaction increased by 15% in average.
is generated. The elements of this matrix are calculated            User satisfaction was measured as an integrated
on the basis of the initial assignment of the weights to the        indicator, which includes both the quality of problem
relationships of cases with the ontology concepts, and the          solving and the time during which the user received a
subsequent "descending" of the weights to the lowest                response from the support service.
level (terminal concepts) of the hierarchy. This                        One of the directions for further research involves the
numerical matrix contains the knowledge about the most              introduction of knowledge domains based on the
stable, non-trivial relationships between the ontology              extending the ontology for the users of other departments
concepts that determine semantics of the frequently used            of the organization, and, accordingly, the accumulation
cases.                                                              of cases in these domains. Another direction involves
    To identify groups of interrelated concepts we                  using principal components to structure the case base
proposed modification of principal component analysis.              using clustering methods. Standard clustering algorithms
Its main difference from the standard method is that                are sensitive to noise in the data [25], so the reduction in
instead of Pearson correlation coefficients combination             the dimension is often used for preliminary data
of polychoric correlations and the correlation ratio is             processing. According to the results of [26], this allows
used. It allows to increase the percentage of variance of           increasing the classification accuracy. In addition, the
concepts explained by the principal components.                     statistical efficiency of using the principal component
Interpretation of the matrix of loadings on the principal           analysis should consist in increasing the stability of
components allows us to identify groups of interrelated             clustering results.
concepts from different hierarchical branches of the
ontology. Thus, problems that are at the junction of                Acknowledgments. The reported study was funded by


                                                               55
Russian Ministry of Education and Science, according to              and Innovations in Intelligent Systems XIV, pp.
the research project No. 2.2327.2017/4.6.                            149-162. Springer (2007)
                                                                [15] Mansouri, D., Hamdi-Cherif, A.: Ontology-
References                                                           oriented case-based reasoning (CBR) approach
                                                                     for trainings adaptive delivery. Recent
   [1] Watson, I., and Marir, F.: Case-based
                                                                     Researches in Computer Science. In: Proc. of
        reasoning: A review. The Knowledge                           the 15th WSEAS International Conference on
        Engineering Review 9(4), 327-354 (1994)                      Computers, Corfu Island, Greece, July 15-17,
   [2] Marling, C., Sqalli, M., Rissland, E., Hector,                pp. 328-333 (2011)
        M.A. and Aha, D.: Case-based reasoning                  [16] Shanavas N, Asokan S.: Ontology-Based
        integrations. AI Magazine 23(1), 69-86 (2002)                Document Mining System for IT Support
   [3] Yang, H.L. and Wang, C.S.: Two stages of                      Service. In: Procedia Computer Science.
        case-based reasoning - Integrating genetic                   International Conference on Information and
        algorithm with data mining mechanism. Expert                 Communication Technologies (ICICT 2014),
        Systems with Applications 35, 262–272 (2008)                 46, 329-336 (2015)
   [4] Rissland, E.L. and Skala, D.B.: Combining                [17] Aamodt, A., Plaza, E.: Case–Based Reasoning:
        case-based and rule-based reasoning: A                       foundational issues, methodological variations,
        heuristic approach. In: Eleventh International               and system approaches. AI Communications
        Joint Conference on Artificial Intelligence, pp.             7(1), 39-59 (1994)
        524-530. IJCAI-89, Detroit (1989)                       [18] Wienhofen, L.W.M., Mathisen, B.M.: Defining
   [5] Dutta, S., Bonissone P.P.: Integrating case- and              the Initial Case-Base for a CBR Operator
        rule-based reasoning. International Journal of               Support System in Digital Finishing: A
        Approximate Reasoning 8(3), 163-203 (1993)                   Methodological Knowledge Acquisition
   [6] Prentzas, J., Hatzilygeroudis, I.: Categorizing               Approach. Lecture Notes in Computer Science
        Approaches Combining Rule-Based and Case-                    9969 (1), 430-444 (2016)
        Based Reasoning. Expert Systems 24, 97-122              [19] Holgado–Tello, F.P., Chacón–Moscoso, S.,
        (2007)                                                       Barbero–García, I., Vila–Abad, E.: Polychoric
   [7] Cheetam, W., Shiu, S.C.K., Weber, R.O.: Soft                  versus Pearson correlations in exploratory and
        Case-Based Reasoning. Knowledge                              confirmatory factor analysis of ordinal
        Engineering Review 20, 267-269 (2006)                        variables. Quality & Quantity 44, 153-166
   [8] Pal, S.K., Shiu, S.C.K.: Foundations of Soft                  (2010)
        Case-Based Reasoning. John Wiley (2004)                 [20] Timofeeva, A. Y.: Data type detection for
   [9] Avdeenko, T., Makarova, E.: Integration of                    choosing an appropriate correlation coefficient
        case-based and rule-based reasoningthrough                   in the bivariate case. CEUR Workshop
        fuzzy inference in decision support systems.                 Proceedings 1837, 188-194 (2017)
        Procedia Computer Science 103, 447-453                  [21] Olsson, U.: Maximum Likelihood Estimation of
        (2017)                                                       the Polychoric Correlation Coefficient.
   [10] Avdeenko, T.V., Makarova, E.S.: Acquisition                  Psychometrica 44, 443-460 (1979)
        of knowledge in the form of fuzzy rules for             [22] Uebersax, J.S., Grove,W.M.:A Latent Trait
        cases classification. Lecture Notes in Computer              Finite Mixture Model for the Analysis of Rating
        Science 10387, 536-544 (2017)                                Agreement. Biometrics 49, 823-835 (1993)
   [11] Prentzas, J. and Hatzilygeroudis, I.:                   [23] Timofeeva, A. Y., Khailenko, E.A.:
        Combinations of Case-Based Reasoning with                    Generalizations of the polychoric correlation
        Other Intelligent Methods. CEUR Workshop                     approach for analyzing survey data. In: Proc.
        Proceedings 375, 55-58 (2008)                                2016 11th International Forum on Strategic
   [12] Maalel, A., Mejri, L., Hadj-Mabrouk, H.,                     Technology, IFOST 2016, pp. 254-258 (2016)
        Ben,Ghézela H.: Towards a Case-Based                    [24] Pagès, J.: Multiple Factor Analysis by Example
        Reasoning Approach Based on Ontologies                       Using R. London, Chapman & Hall/CRC The R
        Application to Railroad Accidents, In:                       Series (2014)
        Proceeding of Third International Conference of         [25] Balcan, M.F., Liang, Y.,Gupta P.: Robust
        Data and Knowledge Engineering (ICDKE),                      hierarchical clustering. The Journal of Machine
        48-55 (2012)                                                 Learning Research 15, 3831-71 (2014)
   [13] Dendani-Hadiby, N., Khadir, M.T.: A Case                [26] Ding, C., Xiaofeng H.: K-means clustering via
        based Reasoning System based on Domain                       principal component analysis. In: Proc. of the
        Ontology for Fault Diagnosis of Steam                        twenty-first international conference on
        Turbines. International Journal of Hybrid                    Machine learning. ACM (2004)
        Information Technology 5(3), 89-103 (2012)
   [14] Recio-Garía, J., and Díaz-Agudo, B.: Ontology
        based CBR with jCOLIBR. In: Applications


                                                           56