-

Upwards Excursion Algorithm Providing the Weight Rankings Coefficients of Universities

O.A.Zyateva

0 0 Petrozavodsk State University , Petrozavodsk , Russia

This article presents the results which purpose of cases of distortion and manual adjustment of objective ratings based on individual indicators. The aim of the study was to build an algorithm for determining unfairly placed places in the ranking as a result of fraud, by identifying the weight coefficients of private ratings of universities in the form of a known functional dependence of the overall rating in the form of an additive convolution. The analysis of techniques of creation of one of popular Russian ratings was for this purpose carried out, mathematical models of dependence of the general rating of higher education institution on its private ratings are received. Researches showed existence of subjectivity in creation of rating which are shown in the form of nonrandom "emissions". Standard methods do not allow to reveal them and to construct an objective picture. Therefore the offered algorithm allows to find, first, "emissions", secondly, to exclude them from selection and, thirdly, to define weight coefficients which will be closer to aprioristic values on the remained set. The model of construction of an objective rating obtained as a result of the algorithm can be used to determine the unjustly obtained places by the participants of the competition. Educational organizations are interested in taking good positions in the leading ratings therefore the understanding of rules of formation of ratings and knowledge of numerical values of the corresponding weight coefficients will allow the management, from the practical point of view, to be realistic about positions of the higher education institution and to beforehand take measures to strengthen them in advance.

Statistical Analysis Approximation Ordinary Least Squares Rankings Indicators Higher Education

Recently almost all spheres of activity, in particular, higher education institutions, are subject to rating. Their main goal is to help the applicant to make the "right" choice of educational organization. Scientific community places high emphasis on the existing Russian and international rankings of universities, as well as on their formation. The analysis of the methods of three biggest global rankings ARWU, QS and THE are presented in the article [ 1 ]. The article [ 2 ] describes the problems that existed in the ranking of universities 7-10 years ago and the present situation. In [ 3 ] the main characteristics of the system of higher education that are evaluated by the performance monitoring in Russia, as well as similar indicators used in international monitoring systems are presented.

In recent years, a number of educational rankings have appeared in Russia such as Russian universities are national universities ratings conducted by "Interfax" agency, by "Expert RA", Rating demand for Russian universities and their rankings (individual rankings of faculties) in order to help high school students and graduates, as well as employers, make their choice. The open access to the rating methodology is a significant argument for them, and such a rating can be trusted [ 4 ].

As a rule, the overall rating of the University is a linear combination of particular, taken with certain a priori weight coefficients. Thus, taking this assumption and knowing the results of particular ratings, it is possible to calculate the total for the linear model, if the weights are known. But often the "rules of the game" are unknown to the uninitiated, and the method of calculating the overall rating remains hidden and is not published anywhere. In many cases, this is done in order to make it impossible to repeat the calculation of the official rating in an independent way and check the correctness of the places placed in the rating. Attempts to construct a linear approximation function of the official rating from particular traditional methods (for example, the ordinary least squares or its modification with iterative recalculation of weights) leads to disappointing results, since the places in the official rating have already been subjected to subjective intervention "by expert means". In this case, when identifying the parameters of the rating function model, the values of the weight coefficients are shifted, since the values of the model rating are between the initial and adjusted values of the official rating. As a result, there is a problem of development of an algorithm for identification of rating parameters, which could find the adjusted rating places, exclude them from the sample and get, as a result, close to a priori values of the parameters of the rating function model. 2

Overall ranking simulation based on private rankings Let us dwell in more detail on the results of the rating of Russian universities, which has been successfully held for the eighth time. This final overall rating consists of six private ratings by activity. In total, more than 200 universities took part in the ranking, including classical, research, technical, agricultural, humanitarian, medical, as well as universities from the field of management.

There was a problem of determining the correctness of the rating for which it became necessary to analyze this rating and identify whether it was subjected to "expert" adjustments or not. The task was facilitated by the fact that there was reliable "insider" information about the values of the weights of private ratings. Therefore, it was not difficult to build a true rating model and find that there are points of significant discrepancy between the values of the official and model ratings, which can not be explained by chance.

Initially, an attempt was made to use existing data mining packages to find "nonrandom" outliers. In particular, the analysis Services package built into the MS SQL Server 2012 DBMS was used to search for outliers. We used the Highlight Exceptions tool, which uses the Microsoft clustering algorithm [ 5 ]. The clustering model defines groups of rows with similar characteristics. Highlight Exceptions tools emit illumination of the cell in the original data table, suspicious. Exceptions were found as a result of this service. Unfortunately, they did not quite coincide with the real exceptions which were known in advance. The universal methods used to detect emissions in Data Mining are not suitable due to the specifics of their algorithm, because they do not allow to establish a functional relationship between the incoming private ratings and the output final rating. They use clustering-based methods to identify outliers that do not involve the identification of functional relationships between factors. The class of problems under study contains a functional dependence, so the correct solution can not be found by clustering. The attempt to use tools of approximation of nonlinear dependences, for example, neural networks, did not bring the expected result. This is because the original data that could be used to train the network initially contains distortions. Therefore, it was decided to investigate other known algorithms to choose from them suitable for the solution of the problem, or in their absence to develop their own algorithm.

The well-known least squares method was used as an algorithm for the initial evaluation of the model rating. As a result, it turned out that the official rating and modeled on the basis of private ones generally fit well. At the same time, there are several unexpected "outliers" – the deviation of the published rating values from the model values modulo exceeding 5%. About half of all emissions (7 out of 16) are in the first fifty positions, but they are absent in the first fifteen, and from the sixteenth to the thirtieth of them only 2 (see Fig. 1). Thus, subjectivity in the ranking becomes relevant in the race for places from the second to the fifth ten.

There were good statistical estimates, namely, the normalized coefficient of determination is 0.99, the mean absolute error (hereinafter – MAE) is 4.4 and the mean absolute percentage error (hereinafter – MAPE) was 1%. 1200 1000 800 600 400 200 0

Official rating

Model rating

Despite this, by using the specified estimation model of private ratings, adjusted ratings are specified, and these emissions can not be explained. Therefore, there is a need to develop an algorithm that would allow to find emission sites and determine the coefficients that will be closer to the a priori values.

It is worth noting that similar ratings are used in a variety of areas. For example, when assessing the effectiveness of enterprises, when judges evaluate individual elements in sports competitions, when assessing the financial position of the organization. Rating assessments of tender documentation help to identify the corruption component in determining the winner in the procedure of applications for execution of State contracts within the framework of the current legislation.

The proposed algorithm makes it possible to identify violations in the situation of linear weighted convolution of the overall rating with respect to particular ones. 3

Search algorithm for nonrandom "emissions" At first glance, it is not obvious whether all take their places, or some positions are adjusted and calculated contrary to the rules of formation. Therefore, there is a problem of parametric identification of weighting coefficients of particular ratings with a known, in this case, linear form of functional dependence.

The available standard methods, such as the least squares method, the two-step least squares method, the maximum likelihood method, the method of instrumental variables, the least squares method with iterative weight recalculation, etc. are either not suitable for solving the problem, or diverge, or give estimates other than the true ones.

So, there was an attempt to use the least squares method with iterative weight recalculation. For items that are suspected to emissions, sets reduction weighting factor is inversely proportional to the square of the distance of the element from the approximating straight line. As a result, at the first three iterations, the method began to converge, but then it began to diverge (negative weights began to appear) and move away from the true solution.

In this regard, a new algorithm for finding coefficients was proposed. The difference between the proposed algorithm and the existing one is that a logical choice rule is added to the approximation algorithm. It consists in the calculation and comparison of two auxiliary indicators. The final criterion for excluding the ejection points from the initial set was the finding of non-zero-length tuples consisting of the results of the auxiliary indicators conjunctions at the very beginning of the list of elements sorted in descending order. The algorithm is described below.

===========================================================

Algorithm for estimation of parameters of multiple linear regression model taking into account exceptions in data =========================================================== Set 0 – initial length of the rating's array Set 0 = 0 – length of first iteration nonzero leading sequence Calculate the estimation of the function ̂( ): For every ∈ [1, ]:

̂ = ∑ =1

( ) Calculate ,

, 2 statistics Calculate RMSE_1 statistics:

Calculate RMSE_2 statistics:

For every ∈ [1, ]: For every ∈ [1, ]:

= | −̂ | – calculate relative error ∑ ∑ _1 = =1( −̂ )2

− _2 = =1( −̂ )2

−2 Calculate direction indicator = ( ( ), Calculate amplitude indicator = ( Calculate integral indicator = ∧

, ) ∈ {0,1} , ) ∈ {0,1}

factor: ( , ( )) Set = 0 – number of iterations of algorithm Set

= 6 – number of partial ratings Get ( 0 ) = {

}, ∈ [1, 0] – integral official rating's array For every ∈ [1, ]: Set Condition(j) = While Condition(j)

Get ( 0

) – arrays of partial ratings.

Estimate coefficients of multiple linear regression , ∈ [1, ] for = ( ) and = ( ) by Ordinary Least Squares method

+ 1 For every ∈ [1, ]:

= + = + 1 Sorting ( ),

, arrays by descending of ( ) = { }, ∈ [1, 0] =

Searching for leading sequence with nonzero integral indicator in the top of ( ): Set = 0 – length of nonzero leading sequence Set Condition( ) = While Condition( )

If +1 = 0 then Condition( ) = otherwise = If = 0 then Condition( ) =

otherwise: = − – reducing length of rating ( ) for elements

– writing reduced rating without exceptions =========================================================== The first step of the algorithm is as follows. Using the OLS method, we find the coefficients of the model with which the private ratings are included in the General official rating and calculate the model rating, after which the module of the relative deviation of the model rating from the initial. Given that the rating values of universities should be arranged in descending order, the rating is further investigated in order to find points that are suspicious of "emission".

The analysis of places of a rating on emissions is carried out by means of special indicators. For this purpose, a special indicator is calculated for each rating place, which in turn is a product of two auxiliary indicators. The first-the direction indicator "release" - finds those places in the rating array, which are knocked out of the overall decreasing sequence. Moreover, it is equal to 1 if the value is higher than it should be at the appropriate place in the array, and -1 if the value is lower. The second indicator of the strength of "release". It is equal to 1, if the absolute value of the relative deviation of the model and the overall rating of a particular University, more than twice the same average for all universities, otherwise-0. Thus, if both auxiliary indicators give a non-zero value in relation to a particular point – it becomes suspicious of "emission".

After that, the array of rating values together with the final indicator is sorted in descending order of the relative deviation module of the model rating from the General one. Then discard a group of universities with a non-zero indicator, standing at the beginning of the sorted list, until the first 0 and repeat the procedure. The specified step of the algorithm is repeated desired number of times until a break condition of the algorithm. This condition will be the situation when, after the next sorting at the i-th step, the maximum value of the relative deviation module will correspond to the indicator equal to 0. In other words, there will be no group with a non-zero indicator value at the beginning of the list.

When implementing each step of this algorithm a number of indicators are calculated, such as MAE (Mean absolute error) and MAPE (Mean absolute percentage error), normalized R^2 (R-squared), and estimates of the approximation quality in the form of RMSE_1 and RMSE_2 (Root mean square error), taking into account the penalty for excluding values with emissions from the total sample. The size of the penalty to assess RMSE_1 equal to the number excluded in the current step from the ranking of exceptions. In the case of RMSE_2, the penalty value is doubled.

The proposed algorithm was implemented in MathCAD and tested on the data of one of the popular Russian ratings. This official rating consists of the General and several private ratings in the directions, and it is declared that the General rating is constructed as a linear combination of the private, taken with certain a priori weight coefficients. The results of the algorithm are given in Table 1. 3 8 3 1 0,0917% 0,0912% 0,93 0,93 1,01 1,02

Number The average value of the deviation

of i-th of the weights from the true values iteration of the absolute value The average value of the deviations of the weights from those obtained in the

previous step 11,686% 1,907% 0,293% 0,305% 0,266% 0,278% 0,272%

From the presented data it follows that for sixteen universities their places in the ranking are not subject to the General rules of calculation. This is visually confirmed by a fragment of the rating shown in Fig. 2, as well as the corresponding data in Table 2.

Official rating

Model rating 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Today, there are a large number of ratings to assess the educational activities of universities. It can be argued that ratings are necessary for us as some objective means of evaluation, though not without drawbacks. They can be used to improve the quality of education in a particular University, to compare universities with each other, as well as to assess the situation in the higher education system as a whole.

As a result of this study we obtained an algorithm for finding nonrandom "emissions" to determine the weight coefficients in the rankings of universities, based on the step-by-step exclusion of groups of participants with the maximum values of the exclusion criteria.

This algorithm was tested on a series of artificially modeled ratings, in the construction of which the weights of private ratings were known in advance. The ratings were artificially adjusted, which is a model of nonrandom "emissions". As a result of the algorithm, the weight coefficients were obtained, which differ, on average, from the true ones by no more than 0.73%. The algorithm demonstrated high convergence (up to 10 steps).

The results of this study allow us to assess the weight coefficients of private ratings with which they are included in the overall official rating and to determine the places in the rating that have undergone manual adjustment, to identify inflated and understated positions of universities in the rating. From a practical point of view, such knowledge can be useful for University managers to make management decisions related to improving the position of their organization in the rankings. ACKNOWLEDGEMENTS The article presents the results of Federal State task 31.12656.2018/12.1, and the Program of Strategic development of PetrSU

1. Kincharova

A.V.

: Methodology of international university rankings: analysis and criticism . University Management: Practice and Analysis . Vol. 2 ( 90 ). pp. 70 - 80 . Ekaterinburg ( 2014 ).

2. Polozov

A. A.

: The rating of high school: evolution of the problem . University Management: Practice and Analysis . Vol. 2 ( 72 ). pp. 85 - 89 . Ekaterinburg ( 2011 ).

Melikian

А . V.: Performance criteria in higher education monitoring systems in Russia and abroad . University Management: Practice and Analysis . Vol. 3 ( 91 ). pp. 58 - 66 . Ekaterinburg ( 2014 ).

4. Zyateva , O. , Pitukhin , E. , Peshkova , I.: University performance indicators impact on their ranking position . In: 8th International Conference on Education and New Learning Technologies , pp. 8751 - 8759 . INTED, Spain ( 2016 ).

LNCS

Homepage , https://docs.microsoft. com/ru-ru/sql/analysis-services/highlightexceptions-table-analysis-tools-for-excel?view=sql-server-2014, last accessed 2018 /09/21.