<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Upwards Excursion Algorithm Providing the Weight Rankings Coefficients of Universities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>O.A.Zyateva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Petrozavodsk State University</institution>
          ,
          <addr-line>Petrozavodsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents the results which purpose of cases of distortion and manual adjustment of objective ratings based on individual indicators. The aim of the study was to build an algorithm for determining unfairly placed places in the ranking as a result of fraud, by identifying the weight coefficients of private ratings of universities in the form of a known functional dependence of the overall rating in the form of an additive convolution. The analysis of techniques of creation of one of popular Russian ratings was for this purpose carried out, mathematical models of dependence of the general rating of higher education institution on its private ratings are received. Researches showed existence of subjectivity in creation of rating which are shown in the form of nonrandom "emissions". Standard methods do not allow to reveal them and to construct an objective picture. Therefore the offered algorithm allows to find, first, "emissions", secondly, to exclude them from selection and, thirdly, to define weight coefficients which will be closer to aprioristic values on the remained set. The model of construction of an objective rating obtained as a result of the algorithm can be used to determine the unjustly obtained places by the participants of the competition. Educational organizations are interested in taking good positions in the leading ratings therefore the understanding of rules of formation of ratings and knowledge of numerical values of the corresponding weight coefficients will allow the management, from the practical point of view, to be realistic about positions of the higher education institution and to beforehand take measures to strengthen them in advance.</p>
      </abstract>
      <kwd-group>
        <kwd>Statistical Analysis</kwd>
        <kwd>Approximation</kwd>
        <kwd>Ordinary Least Squares</kwd>
        <kwd>Rankings</kwd>
        <kwd>Indicators</kwd>
        <kwd>Higher Education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recently almost all spheres of activity, in particular, higher education institutions, are
subject to rating. Their main goal is to help the applicant to make the "right" choice of
educational organization. Scientific community places high emphasis on the existing
Russian and international rankings of universities, as well as on their formation. The
analysis of the methods of three biggest global rankings ARWU, QS and THE are
presented in the article [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The article [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes the problems that existed in the
ranking of universities 7-10 years ago and the present situation. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] the main
characteristics of the system of higher education that are evaluated by the performance
monitoring in Russia, as well as similar indicators used in international monitoring
systems are presented.
      </p>
      <p>
        In recent years, a number of educational rankings have appeared in Russia such as
Russian universities are national universities ratings conducted by "Interfax" agency,
by "Expert RA", Rating demand for Russian universities and their rankings
(individual rankings of faculties) in order to help high school students and graduates, as well as
employers, make their choice. The open access to the rating methodology is a
significant argument for them, and such a rating can be trusted [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>As a rule, the overall rating of the University is a linear combination of particular,
taken with certain a priori weight coefficients. Thus, taking this assumption and
knowing the results of particular ratings, it is possible to calculate the total for the
linear model, if the weights are known. But often the "rules of the game" are unknown
to the uninitiated, and the method of calculating the overall rating remains hidden and
is not published anywhere. In many cases, this is done in order to make it impossible
to repeat the calculation of the official rating in an independent way and check the
correctness of the places placed in the rating. Attempts to construct a linear
approximation function of the official rating from particular traditional methods (for example,
the ordinary least squares or its modification with iterative recalculation of weights)
leads to disappointing results, since the places in the official rating have already been
subjected to subjective intervention "by expert means". In this case, when identifying
the parameters of the rating function model, the values of the weight coefficients are
shifted, since the values of the model rating are between the initial and adjusted
values of the official rating. As a result, there is a problem of development of an
algorithm for identification of rating parameters, which could find the adjusted rating
places, exclude them from the sample and get, as a result, close to a priori values of
the parameters of the rating function model.
2</p>
      <p>Overall ranking simulation based on private rankings
Let us dwell in more detail on the results of the rating of Russian universities, which
has been successfully held for the eighth time. This final overall rating consists of six
private ratings by activity. In total, more than 200 universities took part in the
ranking, including classical, research, technical, agricultural, humanitarian, medical, as
well as universities from the field of management.</p>
      <p>There was a problem of determining the correctness of the rating for which it
became necessary to analyze this rating and identify whether it was subjected to
"expert" adjustments or not. The task was facilitated by the fact that there was reliable
"insider" information about the values of the weights of private ratings. Therefore, it
was not difficult to build a true rating model and find that there are points of
significant discrepancy between the values of the official and model ratings, which can not
be explained by chance.</p>
      <p>
        Initially, an attempt was made to use existing data mining packages to find
"nonrandom" outliers. In particular, the analysis Services package built into the MS SQL
Server 2012 DBMS was used to search for outliers. We used the Highlight Exceptions
tool, which uses the Microsoft clustering algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The clustering model defines
groups of rows with similar characteristics. Highlight Exceptions tools emit
illumination of the cell in the original data table, suspicious. Exceptions were found as a result
of this service. Unfortunately, they did not quite coincide with the real exceptions
which were known in advance. The universal methods used to detect emissions in
Data Mining are not suitable due to the specifics of their algorithm, because they do
not allow to establish a functional relationship between the incoming private ratings
and the output final rating. They use clustering-based methods to identify outliers that
do not involve the identification of functional relationships between factors. The class
of problems under study contains a functional dependence, so the correct solution can
not be found by clustering. The attempt to use tools of approximation of nonlinear
dependences, for example, neural networks, did not bring the expected result. This is
because the original data that could be used to train the network initially contains
distortions. Therefore, it was decided to investigate other known algorithms to
choose from them suitable for the solution of the problem, or in their absence to
develop their own algorithm.
      </p>
      <p>The well-known least squares method was used as an algorithm for the initial
evaluation of the model rating. As a result, it turned out that the official rating and
modeled on the basis of private ones generally fit well. At the same time, there are several
unexpected "outliers" – the deviation of the published rating values from the model
values modulo exceeding 5%. About half of all emissions (7 out of 16) are in the first
fifty positions, but they are absent in the first fifteen, and from the sixteenth to the
thirtieth of them only 2 (see Fig. 1). Thus, subjectivity in the ranking becomes
relevant in the race for places from the second to the fifth ten.</p>
      <p>There were good statistical estimates, namely, the normalized coefficient of
determination is 0.99, the mean absolute error (hereinafter – MAE) is 4.4 and the mean
absolute percentage error (hereinafter – MAPE) was 1%.
1200
1000
800
600
400
200
0</p>
      <p>Official rating</p>
      <p>Model rating</p>
      <p>Despite this, by using the specified estimation model of private ratings, adjusted
ratings are specified, and these emissions can not be explained. Therefore, there is a
need to develop an algorithm that would allow to find emission sites and determine
the coefficients that will be closer to the a priori values.</p>
      <p>It is worth noting that similar ratings are used in a variety of areas. For example,
when assessing the effectiveness of enterprises, when judges evaluate individual
elements in sports competitions, when assessing the financial position of the
organization. Rating assessments of tender documentation help to identify the corruption
component in determining the winner in the procedure of applications for execution of
State contracts within the framework of the current legislation.</p>
      <p>The proposed algorithm makes it possible to identify violations in the situation of
linear weighted convolution of the overall rating with respect to particular ones.
3</p>
      <p>Search algorithm for nonrandom "emissions"
At first glance, it is not obvious whether all take their places, or some positions are
adjusted and calculated contrary to the rules of formation. Therefore, there is a
problem of parametric identification of weighting coefficients of particular ratings with a
known, in this case, linear form of functional dependence.</p>
      <p>The available standard methods, such as the least squares method, the two-step
least squares method, the maximum likelihood method, the method of instrumental
variables, the least squares method with iterative weight recalculation, etc. are either
not suitable for solving the problem, or diverge, or give estimates other than the true
ones.</p>
      <p>So, there was an attempt to use the least squares method with iterative weight
recalculation. For items that are suspected to emissions, sets reduction weighting factor
is inversely proportional to the square of the distance of the element from the
approximating straight line. As a result, at the first three iterations, the method began to
converge, but then it began to diverge (negative weights began to appear) and move away
from the true solution.</p>
      <p>In this regard, a new algorithm for finding coefficients was proposed. The
difference between the proposed algorithm and the existing one is that a logical choice rule
is added to the approximation algorithm. It consists in the calculation and comparison
of two auxiliary indicators. The final criterion for excluding the ejection points from
the initial set was the finding of non-zero-length tuples consisting of the results of the
auxiliary indicators conjunctions at the very beginning of the list of elements sorted in
descending order. The algorithm is described below.</p>
      <p>===========================================================</p>
      <p>Algorithm for estimation of parameters of multiple linear regression model taking
into account exceptions in data
===========================================================
Set  0 – initial length of the rating's array
Set  0 = 0 – length of first iteration nonzero leading sequence
Calculate the estimation of the function  ̂(  ):
For every  ∈ [1,   ]:</p>
      <p>̂ = ∑ =1</p>
      <p>(  )
Calculate 
,</p>
      <p>,  2 statistics
Calculate RMSE_1 statistics:</p>
    </sec>
    <sec id="sec-2">
      <title>Calculate RMSE_2 statistics:</title>
      <p>For every  ∈ [1,   ]:
For every  ∈ [1,   ]:</p>
      <p>= |  −̂ | – calculate relative error
∑ 
∑ 
_1 =  =1(  −̂ )2</p>
      <p>− 
_2 =  =1(  −̂ )2</p>
      <p>−2 
Calculate direction indicator   =   ( (  ), 
Calculate amplitude indicator   =   (
Calculate integral indicator  
=  
∧</p>
      <p>,  ) ∈ {0,1}
,  ) ∈ {0,1}</p>
      <p>factor:
(
,  (  ))
Set  = 0 – number of iterations of algorithm
Set</p>
      <p>= 6 – number of partial ratings
Get  ( 0
) = {</p>
      <p>},  ∈ [1,  0] – integral official rating's array
For every  ∈ [1,  ]:
Set Condition(j) = 
While Condition(j)</p>
      <p>Get   ( 0</p>
      <p>) – arrays of partial ratings.</p>
      <p>Estimate coefficients of multiple linear regression   ,  ∈ [1,  ] for
 =  (  ) and   =   (  ) by Ordinary Least Squares method</p>
      <p>+ 1
For every  ∈ [1,   ]:</p>
      <p>=  + 
 =  + 1
Sorting  (  ),</p>
      <p>,  arrays by descending of 
(  ) = {  },  ∈ [1,  0] =</p>
      <p>_</p>
      <p>Searching for leading sequence with nonzero integral indicator in the
top of 
(  ):
Set   = 0 – length of nonzero leading sequence
Set Condition(  ) = 
While Condition(  )</p>
      <p>If    +1 = 0 then Condition(  ) = 
otherwise   =
If   = 0 then Condition( ) =</p>
      <p>otherwise:
  =  −   – reducing length of rating  (  ) for   elements</p>
      <p>– writing reduced rating without exceptions
===========================================================
The first step of the algorithm is as follows. Using the OLS method, we find the
coefficients of the model with which the private ratings are included in the General
official rating and calculate the model rating, after which the module of the relative
deviation of the model rating from the initial. Given that the rating values of
universities should be arranged in descending order, the rating is further investigated in order
to find points that are suspicious of "emission".</p>
      <p>The analysis of places of a rating on emissions is carried out by means of special
indicators. For this purpose, a special indicator is calculated for each rating place,
which in turn is a product of two auxiliary indicators. The first-the direction indicator
"release" - finds those places in the rating array, which are knocked out of the overall
decreasing sequence. Moreover, it is equal to 1 if the value is higher than it should be
at the appropriate place in the array, and -1 if the value is lower. The second indicator
of the strength of "release". It is equal to 1, if the absolute value of the relative
deviation of the model and the overall rating of a particular University, more than twice the
same average for all universities, otherwise-0. Thus, if both auxiliary indicators give a
non-zero value in relation to a particular point – it becomes suspicious of "emission".</p>
      <p>After that, the array of rating values together with the final indicator is sorted in
descending order of the relative deviation module of the model rating from the
General one. Then discard a group of universities with a non-zero indicator, standing at
the beginning of the sorted list, until the first 0 and repeat the procedure. The
specified step of the algorithm is repeated desired number of times until a break condition
of the algorithm. This condition will be the situation when, after the next sorting at the
i-th step, the maximum value of the relative deviation module will correspond to the
indicator equal to 0. In other words, there will be no group with a non-zero indicator
value at the beginning of the list.</p>
      <p>When implementing each step of this algorithm a number of indicators are
calculated, such as MAE (Mean absolute error) and MAPE (Mean absolute percentage
error), normalized R^2 (R-squared), and estimates of the approximation quality in the
form of RMSE_1 and RMSE_2 (Root mean square error), taking into account the
penalty for excluding values with emissions from the total sample. The size of the
penalty to assess RMSE_1 equal to the number excluded in the current step from the
ranking of exceptions. In the case of RMSE_2, the penalty value is doubled.</p>
      <p>The proposed algorithm was implemented in MathCAD and tested on the data of
one of the popular Russian ratings. This official rating consists of the General and
several private ratings in the directions, and it is declared that the General rating is
constructed as a linear combination of the private, taken with certain a priori weight
coefficients. The results of the algorithm are given in Table 1.
3
8
3
1
0,0917%
0,0912%
0,93
0,93
1,01
1,02</p>
    </sec>
    <sec id="sec-3">
      <title>Number The average value of the deviation</title>
      <p>of i-th of the weights from the true values
iteration of the absolute value
The average value of the deviations of
the weights from those obtained in the</p>
      <p>previous step
11,686%
1,907%
0,293%
0,305%
0,266%
0,278%
0,272%</p>
      <p>From the presented data it follows that for sixteen universities their places in the
ranking are not subject to the General rules of calculation. This is visually confirmed
by a fragment of the rating shown in Fig. 2, as well as the corresponding data in Table
2.</p>
      <p>Official rating</p>
      <p>Model rating
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
Today, there are a large number of ratings to assess the educational activities of
universities. It can be argued that ratings are necessary for us as some objective means of
evaluation, though not without drawbacks. They can be used to improve the quality of
education in a particular University, to compare universities with each other, as well
as to assess the situation in the higher education system as a whole.</p>
      <p>As a result of this study we obtained an algorithm for finding nonrandom
"emissions" to determine the weight coefficients in the rankings of universities, based on
the step-by-step exclusion of groups of participants with the maximum values of the
exclusion criteria.</p>
      <p>This algorithm was tested on a series of artificially modeled ratings, in the
construction of which the weights of private ratings were known in advance. The ratings were
artificially adjusted, which is a model of nonrandom "emissions". As a result of the
algorithm, the weight coefficients were obtained, which differ, on average, from the
true ones by no more than 0.73%. The algorithm demonstrated high convergence (up
to 10 steps).</p>
      <p>The results of this study allow us to assess the weight coefficients of private ratings
with which they are included in the overall official rating and to determine the places
in the rating that have undergone manual adjustment, to identify inflated and
understated positions of universities in the rating. From a practical point of view, such
knowledge can be useful for University managers to make management decisions
related to improving the position of their organization in the rankings.
ACKNOWLEDGEMENTS
The article presents the results of Federal State task 31.12656.2018/12.1, and the
Program of Strategic development of PetrSU</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kincharova</surname>
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Methodology of international university rankings: analysis and criticism</article-title>
          .
          <source>University Management: Practice and Analysis</source>
          . Vol.
          <volume>2</volume>
          (
          <issue>90</issue>
          ). pp.
          <fpage>70</fpage>
          -
          <lpage>80</lpage>
          .
          <string-name>
            <surname>Ekaterinburg</surname>
          </string-name>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Polozov</surname>
            <given-names>A. A.</given-names>
          </string-name>
          :
          <article-title>The rating of high school: evolution of the problem</article-title>
          .
          <source>University Management: Practice and Analysis</source>
          . Vol.
          <volume>2</volume>
          (
          <issue>72</issue>
          ). pp.
          <fpage>85</fpage>
          -
          <lpage>89</lpage>
          .
          <string-name>
            <surname>Ekaterinburg</surname>
          </string-name>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Melikian</given-names>
            <surname>А</surname>
          </string-name>
          . V.:
          <article-title>Performance criteria in higher education monitoring systems in Russia and abroad</article-title>
          .
          <source>University Management: Practice and Analysis</source>
          . Vol.
          <volume>3</volume>
          (
          <issue>91</issue>
          ). pp.
          <fpage>58</fpage>
          -
          <lpage>66</lpage>
          .
          <string-name>
            <surname>Ekaterinburg</surname>
          </string-name>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Zyateva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitukhin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peshkova</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>University performance indicators impact on their ranking position</article-title>
          .
          <source>In: 8th International Conference on Education and New Learning Technologies</source>
          , pp.
          <fpage>8751</fpage>
          -
          <lpage>8759</lpage>
          . INTED,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>LNCS</given-names>
            <surname>Homepage</surname>
          </string-name>
          , https://docs.microsoft.
          <article-title>com/ru-ru/sql/analysis-services/highlightexceptions-table-analysis-tools-for-excel?view=sql-server-2014, last accessed</article-title>
          <year>2018</year>
          /09/21.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>