<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combined use of correlation measures for selecting semantically close concepts of the ontology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A Yu Timofeeva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T V Avdeenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E S Makarova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M Sh Murtazina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Novosibirsk State Technical University</institution>
          ,
          <addr-line>K. Marks ave. 20, Novosibirsk, Russia, 630073</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>349</fpage>
      <lpage>358</lpage>
      <abstract>
        <p>The paper suggests a new approach to the selection of correlated concepts for the ontology. It is based on the principal component analysis, but, unlike the standard approach, not Pearson correlation coefficients, but other correlation measures are used. This is due to the fact that the selection of concepts is based on data on the semantic association between concepts and cases, which are represented in the form of weight coefficients that take discrete values and a significant number of zero values. For such cases, the most appropriate is the polychoric correlation coefficient. It allows one to detect a monotonous dependence on the contingency table. However, for a certain table structure, the coefficient erroneously indicates a close relationship. This problem has been analysed in detail, and it has been suggested to use the correlation ratio in problem cases. Using the example of the problem of selecting concepts for the ontology in the IT consulting practice, the advantages of the proposed approach are shown. The first oneis the increasein the percentage of variance of concepts explained by the principal components. The second one is that more concepts are selected based on unsupervised feature selection using weighted principal components.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>One of the key trends in the development of artificial intelligence is associated with the transition from
the storage and processing of data to the accumulation and processing of knowledge. In this process,
ontology, as a form of representation of knowledge, plays an important role. The main components of
ontology are concepts of the subject domain. It is important to select concepts in such a way as to
avoid their redundancy. So the semantically close concepts should be selected. This problem can be
considered as one of the tasks of machine learning - the feature selectionor the feature extraction.</p>
      <p>There are several approaches to solving the problem of feature selecting [1]: filter methods,
wrapping techniques, embedded methods.</p>
      <p>Filter methods [2] are the simplest. They evaluate each variable according to individual criteria
(information gain, chi-square statistics, etc.). An example is the selection algorithm Relief [2, 3]. The
disadvantage of filter methodsis that the correlation between the features is not taken into account,
therefore, redundant attributes can be selected.</p>
      <p>Embedded methods perform the feature selection as part of the model construction process. An
example is the LASSO regression [4], constraining the weights of some features and shrinking of
others to zero. Thereby a sparse solution is achieved. It includes only relevant features. Estimates of
such regression, however, have no analytical expression. It requires the use of numerical optimization
algorithms. In addition, the solution is very sensitive to the regularization parameter, which affects the
degree of sparseness of the solution.</p>
      <p>This drawback is attempted to be eliminated by using "wrappers", search procedures that include
learning and evaluating the model using a potential subset of features. However, such procedures
require, ideally, a search from all possible subsets of the feature set. So that algorithms are
characterized by exponential complexity in terms of the number of features. This, as a rule, is
unacceptable, and one must resort to "greedy" search algorithms, which never revise the earlier choice.
For example, forward selection and backward elimination are used. However, they can give a locally
optimal solution.</p>
      <p>
        Typically, the described approaches are used in supervised learning. This requires a response
variable. Based on the quality of its prediction, the attributes are selected. For example, the selection
of ontologyconcepts could be done in order to improve the quality of classification of cases. However,
the cases do not always have a class label. In this situation, the unsupervised feature selection is
performed. This is a more difficult problem [
        <xref ref-type="bibr" rid="ref1">5</xref>
        ]. The approaches used can be categorized into cluster
recognition and redundancy minimization.
      </p>
      <p>
        Methods, that involve clustering, [
        <xref ref-type="bibr" rid="ref2">6</xref>
        ] select attributes to group data points (in our example, cases) in
the best way. Other approaches are not restricted to clustering problems. Their goal is to select the
smallest subset of attributes while preserving the most relevant information about the data [
        <xref ref-type="bibr" rid="ref3">7</xref>
        ]. The
simplest criterion for selecting such subset can be the data variance. The explained variance can be a
criterion for both selection and extraction of variables. The most popular approach here is the principal
component analysis [
        <xref ref-type="bibr" rid="ref4">8</xref>
        ], which uses the decomposition of the covariance (correlation) matrix. Its
results are used in the feature selection on the basis of weighted principal components [
        <xref ref-type="bibr" rid="ref5">9</xref>
        ]. These
approaches are described in section 2.
      </p>
      <p>
        However, when using correlation coefficients, it is necessary to take into account that the data, as a
rule, are not continuous. Typically, the Pearson correlation coefficient is used, which can give biased
results in the case of discrete data. For example, it was shown in [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ] that when the validity of
constructs is analyzed from ordinal values measured in the Likert scale, the results of factor analysis
better reflect the theoretical model when factorization is performed using polychoric correlations
rather than Pearson correlation coefficients. Nevertheless, the polychoriccorrelation coefficient has a
number of drawbacks; in particular, with a certain structure of the contingency table, it erroneously
reveals the presence of a strong relationship. This is a particular problem for sparse tables with a large
number of zero values.Further, in Section 3, situations with poor behavior of the polychoriccorrelation
are analyzed and other appropriate correlation measures are considered. Section 4 compares the
various correlation measures and suggests ways to combine them to select concepts. Section 5 presents
the results of applying the proposed approach for the selection of ontology concepts in IT consulting
practice. Finally, Section 6 gives an interpretation of the results obtained and discusses the directions
for their further application.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Dimensionality reduction techniques</title>
      <p>
        Traditionally, dimensional reduction techniques [
        <xref ref-type="bibr" rid="ref7">11</xref>
        ] were developed for the analysis of either
quantitative (principal component analysis) [
        <xref ref-type="bibr" rid="ref8">12</xref>
        ] or categorical data (correspondence analysis). Lately
a lot of attention has been paid to approaches to the analysis of discrete data. It is suggested in [
        <xref ref-type="bibr" rid="ref6">10</xref>
        ] to
use the polychoric correlation coefficients to reduce the dimensionality of such data. In addition,
exploratory analysis methods for mixed data are being actively investigated. For example, the French
school Analyze des données, founded by Jean-Paul Benzécri, develops factor analysis of mixed data
[
        <xref ref-type="bibr" rid="ref9">13</xref>
        ]. These approaches differ in the way the correlation matrix is calculated. In general, the procedure
of the principal component analysis remains standard, it is described below.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Principal component analysis</title>
        <p>Let M be the correlation matrix ofk features. On its basisone can obtain weights, which are the
association between the variables and the components, so called loadings. Loadings vector for j-th
principal component is calculated as</p>
        <p>a j = v j λ j
where v j is eigenvector of matrix M corresponding to the eigenvalue λ j . The values of the eigenvector
are normalized to the sum of the squares of the values. The matrix A of loadings forq principal
components contains the vectors a1,K , aq , q ≤ k , k is the number of features. The matrix of values of
principal components (factor scores) can therefore be given as</p>
        <p>F = XA( A′A)−1 ,
where X is initial data matrix n × k , n is the number of cases. Note that the columns of matrix X are
normalized, i.e. the sample mean of each variable has been shifted to zero, and the sample variance
has been shifted to unit. The choice of number qis usually based on a scree plot, which shows the
proportion of variance explained by each component.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Unsupervised feature selection</title>
        <p>
          For the feature selection the results of the principal component analysis described in the previous
subsection are used. The approach is based on the calculation of the weighted sum of the loadings for
the i-th feature [
          <xref ref-type="bibr" rid="ref5">9</xref>
          ]:
q
ωi = ∑ aij s j (2)
j=1
where aij is i-th element of the vector a j , i.e. the loading for j-th principal component by i-th feature,
s j is the fraction of the explained variance calculated as
λ
s j = k j .
        </p>
        <p>∑λl
l=1</p>
        <p>
          The ordering of the features in order of decreasing weights ωi allows us to separate the essential
concepts from the irrelevant ones. It is proposed in [
          <xref ref-type="bibr" rid="ref5">9</xref>
          ] to determine the threshold of weights on the
basis of the ideas of a moving average control chart that has been widely used in quality control [
          <xref ref-type="bibr" rid="ref10">14</xref>
          ].
The difference is that the weights are not ordered. Therefore, it is proposed to use their random
permutations and calculate the indicator
whereωi1 ,ωi2 ,K ,ωik
sufficiently large to obtain stable results, for example, 1000. Further, the results are averaged:
MR* = 1 ∑I MRi .
        </p>
        <p>I i=1
MRi = ωi1 − ωi2 + ωi2 − ωi3 + K + ωik − ωik−1</p>
        <p>k
is i-th random permutation. The number of permutations I should be taken
Finally, the threshold is calculated as follows:
γ
=ω+ Φ−1 (1 − α )
π
2</p>
        <p>MR
*
1 k
where ω = ∑ω j , Φ−1 (1 − α ) is a quantile of the standard normal distribution of order (1 − α ) , α
k j=1
is a given significance level, usually, 0.05. Based on the threshold the indicator of relevance of i-th
feature can be constructed:</p>
        <p>P (i ) = 1, ωi ≥ γ , (3)</p>
        <p>0, ωi &lt; γ .</p>
        <p>All the features for which P = 1 are recognized as relevant and selected.</p>
        <p>
          In the article [
          <xref ref-type="bibr" rid="ref5">9</xref>
          ], which offers the described approach, it is not specified how the numberq of
extracted principal components is chosen. For a different number of components, different weights
will be obtained, which will affect the ordering of the features and the threshold γ . Further, this
problem is investigated using the example of concept selecting.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Correlation measures</title>
      <p>To analyze the association between attributeswhich are difficult to give objective quantification and
whose values are ordered categories, the polychoric correlation coefficient is intended. It can be used
also when counting data is analyzed, that is, discrete, taking a limited number of numerical values. It
can also be rounded data, as well as data are subjectively and inaccurately, for example, expert ratings.</p>
      <sec id="sec-3-1">
        <title>3.1. Polychoric correlation</title>
        <p>
          Polychoric correlation ρ indicates an association between two theorized normally distributed
continuous latent variables, from two observed ordinal variables. Its estimation is usually based on
maximum likelihood method [
          <xref ref-type="bibr" rid="ref11">15</xref>
          ]. Polychoric correlation has the following properties:
• −1 ≤ ρ ≤ 1.
• It is symmetrical.
• ρ = 0 in the case of independence.
• If ρ = 1 then there is a strong monotonic relation.
        </p>
        <p>The latter property is an advantage over the Pearson correlation coefficient, which reveals only a
linear relationship. At the same time, the advantage of the polychoriccorrelation coefficient may turn
out to be a significant drawback. So let us consider a number of examples of tables of relative
frequencies:</p>
        <p> 0.5 0.25  0.74 0.01  0.74 0.25</p>
        <p>D1 =  0.25 0  , D2 =  0.25 0  , D3 =  0.01 0  .</p>
        <p>In all three cases, the value of the polychoriccorrelation is -1. Thus, the result does not depend on
the frequency of non-zero values, as long as d22 = 0 , and the rest were non-zero frequency. But if in
the first case it is still possible to presume the presence of some nonlinear dependence, in other cases
the small relative frequency of 0.01 can simply be a consequence of the presence of outliers.</p>
        <p>The problem also remains for tables of higher dimension that satisfy conditions:
d1i ≠ 0 ∀i, d j1 ≠ 0 ∀j, dkl = 0, ∀k, l ≠ 1 .
(4)</p>
        <p>
          If the matrix is close to such a structure, the coefficient will be close to -1 and erroneously indicate
an association. A similar problem is characteristic for the Yule coefficient [
          <xref ref-type="bibr" rid="ref12">16</xref>
          ], which reveals the
relationship between binary variables. It is noted that it is unstable to small frequencies. However, the
scientific literature does not offer approaches to solving this problem, which could be directly applied
to the problem of feature selection.
        </p>
        <p>Obviously, if the contingency table has a structure described by relations (4), then the use of the
polychoric correlation coefficient leads to incorrect results. For this reason, it is necessary to involve
other correlation measures that make it possible to identifynonlinear relationships and be appropriate
for analysis discrete variables. In this case, they should be more sensitive to non-zero values of the
observed frequencies d1i ≠ 0 ∀i, d j1 ≠ 0 ∀j .</p>
        <p>The simplest approach would be to replace the polychoric correlation coefficient by the Pearson
correlation coefficient in those cases where the first one falsely indicates a strong relationship. Such a
trivial approach will also be analyzed, but it is better to choose a measure that is more suitable for
analyzing relationships on discrete data.</p>
        <p>−1 ≤ ρ XY ≤ 1 .</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Polyserial correlation</title>
        <p>One option is the polyserial correlation coefficient ρ XY . It reveals a latent correlation between a
continuous variable X and a ordered categorical variable Y . It has the following properties:
•
• It is not symmetric, that is, ρ XY ≠ ρYX .
•</p>
        <p>ρ XY = 0 in the case of independence.
• If ρ XY = 1 then there is a strong association between X and Y.</p>
        <p>
          Like polychoric correlation, an estimate of polyserial correlation is the result of maximizing the
likelihood function [
          <xref ref-type="bibr" rid="ref13">17</xref>
          ]. Since, according to the properties, the coefficient ρ XY is not symmetrical, it
is therefore important here which of the variables is assumed to be continuous and which is discrete.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Correlation ratio</title>
        <p>Presumably, the same drawbacks as the polychoric correlation may be inherent in the polyserial
correlation coefficient. Therefore, in addition, we consider the correlation ratio of a random variable
Yat random variableX defined as
X, DY is unconditional variance of a random variable Y. It is obvious from relation (2) that the
correlation ratio is always nonnegative. The correlation ratio is asymmetric, that is, ηY2|X ≠η X2 |Y . A zero
value indicates that there is no association. For comparison with the correlation coefficients, it is better
to consider the value ηY|X or η X |Y .</p>
        <p>
          To analyze the possibilities of combined use of correlation measures, the polychoric, polyserial
correlation coefficients and the correlation ratio have been calculated using the free software for
statistical analysis R. For these purposes, a number of user-functions have been implemented [
          <xref ref-type="bibr" rid="ref14">18</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Combined use of correlation measures</title>
      <p>
        The selection of the ontology concepts is based on their semantic relations with the cases.The
closeness of the semantic relation is determined by some weights that take values from 0 to 1. As a
rule, weights are assigned expertly, so take discrete values (for example, rounded).The values of
weight coefficients can be calculated on the basis of associative relationships between the case and the
ontology concepts [
        <xref ref-type="bibr" rid="ref15">19</xref>
        ]. In this case they take a limited number of rational values as a result of
multiplication of simple fractions. Thus, the weights are discrete.
      </p>
      <p>
        The empirical study used data on the semantic association of cases and ontology concepts in the
practice of IT consulting [
        <xref ref-type="bibr" rid="ref16">20</xref>
        ]. The data contains 120 cases and 20 concepts. First, a matrix of
polychoric correlations between all the concepts was constructed.In total, the matrix (lower triangle)
contains 190 correlation coefficients. As a result, it was found that 99 coefficients (about half) are
close to -1. It should be noted that in the problemcases the numerical optimization of maximum
likelihood function does not always give the estimate values exactly equal to -1, since values, close to
-1, give approximately the same value of the objective function.
      </p>
      <p>For comparison, the Pearson correlation coefficients rxy are calculated. Figure 1 shows the results in
the form of a scatter plot. Here and below (figures 2-4), the line represents the equality of the
correlation coefficients, that is, for figure 1, it is a graph of equation rxy = ρ .</p>
      <p>It can be clearly seen from figure 1 that in most cases the polychoric correlation coefficient
indicates a closer association between the concepts than the Pearson correlation coefficient.However,
there are also clearly visible problem points - close to -1 values of the polychoric coefficient. In these
cases, the values of the Pearson correlation coefficient range from 0 to -0.2, which indicates a rather
weak relationship. Nevertheless, the Pearson correlation coefficient does not reveal a non-linear
relationship, and therefore may underestimate the closeness of relation.</p>
      <p>Both the correlation ratio and the polyserial coefficient are asymmetric. So, next, the correlation
ratio η was calculated as the average value between η Y|X andη X |Y . In the same way the polyserial
correlation ρ s was calculated as the average between ρ XY and ρYX .
rxy
5
.
0
4
.
0
3
.
0
2
.
0
1
.
0
0
.
0
6
.
0
5
.
0
4
.
0
3
.
0
2
.
0
1
.
0
0
.</p>
      <p>0
-1.0
-0.5
0.0</p>
      <p>0.5</p>
      <p>As noted above, the correlation ratio does not indicate the direction of the relationship, since it
takes only non-negative values. For this reason, it is more correct to compare it with the absolute
values of correlation coefficients. So figure 2 compares its values with the absolute values of the
polychoric correlation coefficient. It can be seen that in a number of cases the correlation ratio shows a
closer relationship, and in others the polychoric coefficient. There are also problem situations, in these
cases the correlation ratio takes values close to the absolute values of the Pearson correlation
coefficient, and indicates a weak relationship.But the values of the correlation ratioη , according to its
properties, are always greater or equal to | rxy | . So it is better to use the correlation ratio. In order to
take into account the direction of the relationship, one must take the sign of the polychoric correlation
coefficient.</p>
      <p>If we compare the polychoric and polyserial correlation coefficients (figure 3), in most cases (77
coefficients out of 91, not related to the problem ones), the polychoric coefficients indicate a closer
relationship than the polyserial ones. Thus, polyserial coefficients systematically underestimate the
closeness of the relation. What can not be said about the correlation ratio: out of 91 non-problematic
coefficients, only 44 polychoric correlation coefficients have the absolute value greater than the
correlation ratio.</p>
      <p>At the same time, in the problem cases, the polyserial coefficient shows a closer relation between
the concepts, since it takes values from -0.6 to -0.2. However, this indicates, rather, that this
coefficient also negatively reacts to a certain structure of the contingency tables. This is clearly seen in
figure 4, which compares the absolute values of the polyserial coefficient and the correlation
ratio.Situations, in which the values of the polychoric coefficient are close to -1, are highlighted in
gray. They obviously stand out from the rest of the points on the graph.</p>
      <p>Thus, we propose an approach to ontology concepts selection consisting of the following steps.
Step 1. Calculation of polychoric correlations ρ .</p>
      <p>Step 2. Identification of problem situations by frequency tables satisfying (4), as well as by the
values of the polychoric correlations close to –1.</p>
      <p>Step 3. Replacement of the polychoric correlations in the problem situations, revealed at the step 2,
by the values of the correlation ratios η , calculated as the mean betweenη Y|X and η X |Y , multiplied by
sign(ρ ) .</p>
      <p>Step 4. Based on the resulting correlation matrix M, consisting of polychoric correlations and
correlation ratios, calculation of loadings vector for j-th principal component by the formula (1).
Interpretation of results allows us to identify blocks of interrelated concepts.
2
.0</p>
      <p>Step 5. For the concept selection, the calculation of the weighted sum of the loadings by the
formula (2). The weightsω1,...,ωk allow us to order concepts by relevance. The calculation of
indicator P by the formula (3), and the selection of most informative concepts.
ρ s η .70
-1.0
-0.5
0.0
0.5
0.0
0.1
0.2
0.3
0.4</p>
      <p>0.5
ρ
0.6
| ρ s |</p>
      <p>The advantages of the proposed approach as compared to the standard one (calculation of the
Pearson correlation) should consist in increasing the percentage of variance of concepts explained by
the extracted components. As a result, this allows us to partitionthe concepts into a smaller number of
groups, the interrelation within which are closer.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Application in the practice of IT- consulting</title>
      <p>Despite the fact that the concepts are carefully organized into the ontology by a domain specialist, the
IT problem of the user being solved is often at the junction of various concepts. Therefore, the cases
often refer to different hierarchical branches. The application of methods of grouping concepts could
identify the most informative groups of concepts, as well as the most frequent combination of concepts
describing the user's problems. The latter can be used in the decision support system for intellectual
help for the user what additional concepts (in addition to the one already selected) to choose for the
link with the current case (user problem).</p>
      <p>In the above exampleof ontology in the practice of IT consulting we selected the most relevant
concept using the proposed approach. For comparison the standard method based on Pearson
correlation coefficients was used. The number qof extracted principal components was varied from 1
to 15. Table 1 shows the values of indicator function P defined given by (3). The results are presented
for a limited number of principal components in order to demonstrate the differences between
approaches. It can be seen that number and composition of selected concepts vary depending on q.
Generally, the number of selected concepts is very small.It is difficult to detect any pattern of changes
in a subset of selected concepts with an increase in the number of extracted components. Using the
proposed approach, it is possible to increase the number of selected concepts. The maximum number
of them is achieved when fiveprincipal components are extracted.Based on this, the optimal number q
is chosen equal to five.</p>
      <p>The loadings matrices for five principal components are presented in table 2. It allows to present
concepts of the ontology of IT consulting in a space of small dimension. For clarity, only significant
values of the loadings are given in the table. Their absolute values indicate the closeness of the
relationship between concepts and the principal components. The weak relation of concepts (for
example, "Vacation" in the standard approach) with all five components indicates that such concepts
could not be included in the identified groups. The results of the feature selection (table 1) can be
Concept
Order on admission
Оrder of dismissal
Vacation
Sick leave
Time-keeping
Reporting
Calculation
prepayment
Calculation
Payment at the
average wage
Calculation of
deductions
Salary
Recalculation
2-NDFL
6-NDFL
Insurance payment
Other taxes
Wirings
Cumulative explained
variance, %
compared with the results of the principal component analysis (table 2). For example, if one uses only
the first component, only the attributes that closely correlate with this component remain in the set.</p>
      <p>As can be seen, from the results of table 2, due to the proposed approach, the percentage of the
explained variance significantly increases.So, with the standard approach, the five extracted
components sum up only 38.9% of the initial variation of the concepts, whereas the proposed approach
allows to explain 55.1% of the variance. In addition, five identified groups were able to include more
concepts, additionally included "Vacation", "Other taxes" and "Wirings". Thus, the desired effect is
achieved.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results and discussion</title>
      <p>The obtained results can be interpreted from the point of view of IT consulting practice.</p>
      <p>The concepts, combined the first principal component, reflect the most common user errors in the
calculations. If there is an incorrect calculation, then as a rule the error arises either in the incorrect
1
1
1
1
1
1
1
0
0
1
0
1
1
1
formulation of vacation or sick leave, and the problem with the time-keeping. At the same time,
problems with vacation and sick leave can lead to the errors in reporting on taxes (2-NDFL and / or
6NDFL). Reports on personal income tax are also interrelated, if there is an error or a question on one
report, then the second one most likely will also have an error.</p>
      <p>The second group of concepts deals with problems in personnel reporting. If there is a question on
the admission / dismissal orders, there will be a problem with personnel reporting, and vice versa, if
there is an error in the report, then it is worth checking the personnel orders (admission, dismissal).
The concept "Recalculation" is connected with the third principal component. When recalculating, as a
rule, users forget to remake taxes, so there are errors in taxes, insurance payment and wirings as a
consequence.</p>
      <p>Wirings also fell into the fourth group. The problem with wirings also arises when the calculation
is incorrectly. These are interdisciplinary issues. The calculation and the payment at the average wage
are mutually exclusive types, that is, at the same time a sick leave (payment at the average wage) and
calculation (salary payment) can not meet together, this is a mistake. So, the user needs to make
changes.</p>
      <p>The fifth principal component associates with calculation prepayment, calculation of deductions
and salary. In the payment documents, it is always necessary to check the calculation of deductions, so
that everything is reflected correctly in the 6-NDFL statements. Also through salary payment
documents a prepayment is formed. The prepayment is usually a fixed amount, sometimes as half of
the salary, then in the payment document deductions are reflected. But such questions are rare.</p>
      <p>Thus, concepts are combined into the groups by how often the errors occur when working with the
software products. The first group of concepts is the most frequently encountered problematic
situation, since the calculation errors are usually more frequent. The second most popular are the
problems with personnel documents (the errors of the second group). The problems with taxes and the
average wage are not very frequent operations and this part is fairly well implemented in the
programs. Therefore, there are fewer questions on this part. The prepayment, deductions and salary
are, as a rule, the most recent operations in the general list of all operations, and if everything was
done correctly in the previous steps, then there are very few errors in this part.</p>
      <p>As a result, concepts of different hierarchical branches were combined, i.e. errors often arise at the
junction of the concepts that fall into groups. As a recommendation to improve the decision support
system, this can be used, for example, if the consultant chooses one concept for linking the case, the
system may recommend him choosing other concepts from the group identified on the basis of the
principal components.</p>
      <p>Thus, during the ontology concept selection based on their semantic relationships with cases, the
use of the principal component analysis requires the choice of appropriate correlation measures. Due
to the characteristics of the data representing the weight coefficients and taking discrete values, it is
suggested to use the polychoric correlation. However, as it turned out, it gives incorrect results for a
certain structure of the contingency tables. At the same time, in the conducted empirical study of the
ontology of the IT consulting domain, this structure occurs quite often (in half the cases). Therefore, it
is suggested in the problem situations to replace the polychoric correlation coefficient by the
correlation ratio, which reveals nonlinear relationships and is appropriate for discrete data. As a result,
such a combined use of correlation measures makes it possible to increase the percentage of the
explained variance of principal components. So it allowsto increasethe number of selected concepts
based on unsupervised feature selection using weighted principal components.
7. References
[1] Flach P 2012 Machine learning: the art and science of algorithms that make sense of data
(Cambridge University Press)
[2] Kira K and Rendell L A 1992 Proceedings of the ninth international workshop on Machine
learning 249-256
[3] Robnik-Sikonja M and Kononenko I 2003 Machine learning 53 23-69
[4] Tibshirani R1996 Journal of the Royal Statistical Society. Series B (Methodological) 58
267288</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The reported study was funded by Russian Ministry of Education and Science, according to the
research project No. 2.2327.2017/4.6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Dy J G and Brodley</surname>
            <given-names>C E</given-names>
          </string-name>
          <year>2004</year>
          <source>Journal of machine learning research</source>
          <volume>5</volume>
          <fpage>845</fpage>
          -
          <lpage>889</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [6]
          <string-name>
            <surname>He</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            <given-names>D</given-names>
          </string-name>
          , and Niyogi P 2006
          <source>Advances in neural information processing systems</source>
          <volume>18</volume>
          <fpage>507</fpage>
          -
          <lpage>514</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Golay</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kanevski</surname>
            <given-names>M 2017</given-names>
          </string-name>
          <source>Knowledge-Based Systems 135</source>
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Parveen</surname>
            <given-names>A N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nisthana</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inbarani H H and Kumar E N S 2012 Proc. Int</surname>
          </string-name>
          .
          <source>Conf. on Computing, Communication and Applications</source>
          (ICCCA)
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kim S B and Rattakorn</surname>
            <given-names>P 2011</given-names>
          </string-name>
          <article-title>Expert systems</article-title>
          with applications
          <volume>38</volume>
          <fpage>5704</fpage>
          -
          <lpage>5710</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Holgado-Tello F</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chacón-Moscoso</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbero-García</surname>
            <given-names>I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vila-Abad</surname>
            <given-names>E</given-names>
          </string-name>
          2010 Quality &amp; Quantity 44
          <fpage>153</fpage>
          -
          <lpage>166</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>E V</given-names>
          </string-name>
          2017 Computer Optics
          <volume>41</volume>
          (
          <issue>4</issue>
          )
          <fpage>564</fpage>
          -
          <lpage>572</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-4-
          <fpage>564</fpage>
          -572
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Spitsyn</surname>
            <given-names>V G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolotova Yu</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phan N H and Bui T T T 2016 Computer</surname>
            <given-names>Optics</given-names>
          </string-name>
          40(
          <issue>2</issue>
          )
          <fpage>249</fpage>
          -
          <lpage>257</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2016-40-2-
          <fpage>249</fpage>
          -257
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Pagès J 2014 Multiple Factor Analysis by Example Using R (London</surname>
          </string-name>
          , Chapman &amp; Hall/CRC The R Series)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Vermaat</surname>
            <given-names>M B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ion</surname>
            <given-names>R A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Does R J M M and Klaassen C A J 2003</surname>
          </string-name>
          <article-title>Quality</article-title>
          and Reliability Engineering International 19
          <fpage>337</fpage>
          -
          <lpage>353</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Olsson</surname>
            <given-names>U</given-names>
          </string-name>
          1979 Psychometrica 44
          <fpage>443</fpage>
          -
          <lpage>460</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kendall</surname>
            <given-names>M</given-names>
          </string-name>
          and
          <article-title>Stuart A 1961 The Advanced Theory of Statistics: Inference and relationship (London: Charles Griffin and Co</article-title>
          ., Ltd.) p
          <fpage>676</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Drasgow</surname>
            <given-names>F 1988</given-names>
          </string-name>
          <article-title>Encyclopedia of statistical sciences (John Wiley</article-title>
          &amp; Sons)
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Timofeeva A Y 2017 CEUR Workshop</surname>
          </string-name>
          Proceedings 1837
          <fpage>188</fpage>
          -
          <lpage>194</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Avdeenko</surname>
            <given-names>T V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Makarova E S 2017 CEUR Workshop</surname>
          </string-name>
          Proceedings 2005
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Avdeenko</surname>
            <given-names>T V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Makarova</surname>
            <given-names>E S</given-names>
          </string-name>
          <year>2017</year>
          <source>Journal of Physics: Conference Series 803 012008</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>