=Paper= {{Paper |id=Vol-2067/paper19 |storemode=property |title=Considering Importance of Information Sources during Aggregation of Alternative Rankings |pdfUrl=https://ceur-ws.org/Vol-2067/paper19.pdf |volume=Vol-2067 |authors=Vitaliy V. Tsyganok,Sergii V. Kadenko,Oleh V. Andriichuk |dblpUrl=https://dblp.org/rec/conf/its2/TsyganokKA17 }} ==Considering Importance of Information Sources during Aggregation of Alternative Rankings== https://ceur-ws.org/Vol-2067/paper19.pdf

Considering Importance of Information Sources during
Aggregation of Alternative Rankings
© Vitaliy V. Tsyganok © Sergii V. Kadenko © Oleh V. Andriichuk
Institute for Information Recording of National Academy of Sciences of Ukraine,
Kyiv, Ukraine
tsyganok@ipri.kiev.ua seriga2009@gmail.com oleh.andriichuk@i.ua

Abstract
The paper outlines several approaches to aggregation of rankings of alternatives, provided by a set of
individual information sources (IS). It is shown, that, depending on dimensionality of the set of alternatives,
different aggregation methods can be applied. Particular attention is given to weighted aggregation of alternative
rankings provided by different IS. Some intuitive assumptions concerning IS weight calculation are set forth. It is
assumed that IS weight should reflect both quantity and quality of information it provides, as well as expert estimate
of IS credibility based on previous experience of IS usage. Based on these assumptions, expressions for IS weight
calculation are suggested. The approaches, suggested in the paper can be applied to information, provided by
sources of different nature, i.e., experts, search engines, paper and online documents, etc.
Keywords: alternative ranking, information source, rank ordering, expert competence, relative weight,
aggregate ranking.

1 Introduction
In the process of informational and analytic research a common problem, often faced by analysts is
compiling a ranking of a set of objects or alternatives (products, electoral candidates, political parties etc) according
to some criterion. This criterion may be explicitly formulated by a decision-maker, or presented in the form of a
particular topical query (such as “top comic actors”, “best online footwear stores”, “downtown restaurants”, etc).
Even these randomly selected formulations demonstrate that it is problematic to select these alternatives, satisfying
the respective queries based on solely numeric data, as there is no quantitative criterion, according to which the
alternatives could be measured. A common approach, summarized in the most explicit way, perhaps, by Tom
Saaty ([1]), is as follows: if you cannot measure the alternatives, the best way to numerically describe them, is to
compare them among themselves. Conceptually, comparisons of alternatives according to some pre-defined criterion
can be classified as either cardinal or ordinal estimates. Cardinal pair-wise comparisons of alternatives bear
information about numeric relation between them according to the given criterion, while ordinal comparisons
indicate only the ordering of these alternatives. Cardinal pair-wise comparisons are good for relatively small
numbers of alternatives, lying within the same order of magnitude. Ordinal pair-wise comparisons are easier to
obtain and process and they can be used to compile rankings of large numbers of alternatives.
Amount of data, used in informational and analytical research is, usually, large, and that is why ordering of
alternatives is more frequently used. For example, this is one of the possible reasons why web search engines
operate mostly with rankings (orderings) of references and not with their numerically expressed ratings of some
kind.
In order to compile a ranking of alternatives based on all available information, rankings, coming from
several information sources, need to be taken into consideration and aggregated in some way. Authors of [2] show
that information space is influenced by multiple factors of qualitative nature, which are difficult to formalize and
describe in numeric terms. Based on these characteristics, it was shown in [3] that information space was a typical
example of a weakly structured subject domain. In order to provide at least some analytical description of a weakly
structured domain, data, coming from any available sources, needs to be taken into consideration. In this paper we
will try to address the most general case, when information sources can include paper and online documentation,
web search results, and expert judgments.
The problem with aggregation of data, coming from different information sources is that, in the general
case, these sources have different weights, depending, again, on several factors. For example, it is reasonable to
assume, that more credible information source, providing larger amount of information, should be assigned greater
weight prior to data aggregation. While there is, roughly, a dozen of methods of ranking aggregation (proposed by
Borda, Condorcet, Kemeny, and others), that are commonly known and described in a multitude of publications, the
problem of defining the weights of data sources (voters, judges, experts, search engines etc.) during data aggregation
is still relevant and open to discussion.
So, in further sections of our paper, we are going to set forth a method, allowing to aggregate data,
presented in the form of alternative rankings, coming from different information sources, particularly focusing on
different aspects of information source weight definition problem.

132
2 Literature review
The problem of rank aggregation became relevant back in the ancient times, when democratic societies and
voting systems started to emerge. An extensive analysis of voting rules and their evolution is provided, for example,
in [4].
The first rank aggregation methods, that emerged in the process of democratic voting evolution, which are
still relevant today, were suggested in the late 18th century by Borda ([5]) and Condorcet ([6]). Since then multiple
approaches to aggregation of individual preferences using different social welfare functions were devised. We find it
most appropriate to mention the period of 1950-s and 1960-s – the time when Arrow’s impossibility theorem was
formulated (see [7]) and attempts were made to bypass its constraints (particularly, by Kemeny [8] and Copeland
[9]). F.Aleskerov in [10] summarizes Arrovian aggregation rules in his book.
In the 2000s, with growing popularity of online information search engines, the problem of rank
aggregation became even more relevant, as it became evident, that beside data obtained from voters, experts, and
analysts, it was necessary to take online information sources into account (sometimes, with minimum human
involvement). In this context, we should mention several papers from the 2000s, particularly [11-13]. In these
publications the authors suggest and compare several approaches, targeted at aggregation of data from online
information sources.
Definition of weights of information sources and weighted aggregation of rankings represents a separate
matter. Methods of Borda and Condorcet can be easily extrapolated to the case when information sources have
different weights, as shown by Totsenko in [14]. Kemeny’s median is a more problematic case when information
sources have different weights. Ordinal factorial analysis methods, described in [15, 16] allow analysts to calculate
weights based on available sets of alternative rankings, previously provided by experts. This approach can be
extrapolated to calculation of weights of online information sources as well, if evaluators provide some global
alternative ranking a priori.
Dwork et al in [11] stress the importance of involvement of human evaluators in the process of preliminary
information source weight definition. However, their paper focuses mostly on comparative analysis of rank
aggregation methods and does not address the problem of IS weight calculation directly.
As for IS weights, we can, again, mention Tom Saaty ([1]) and his academic school (including
Ramanathan & Ganesh [17], Forman & Peniwati [18], as well as Yang et al [19]). However, their methods are
primarily targeted at aggregation of expert estimates, represented in the form of pair-wise comparison matrices
(PCM), provided in some ratio scales (and not rankings).
In our current paper we are going to suggest information source weight calculation methods based on both
preliminary human evaluation of information source credibility and amount and quality of information, provided by
a given information source.

3 The problem statement
What is given: n information sources (IS1 – ISn). Each of IS provides a ranking Ri, that includes mi
alternatives (i=1..n). Information sources can be experts, search engines, analysts who monitor online content, paper
or online documents, etc. We should stress, that the number of alternatives, provided by different information
sources is, in the general case, different ( mi  m j ; i, j  1..n ).
We should find an aggregated (global) ranking of alternatives.

4 Solution ideas for the case of equal weights of information sources

We suggest building a solution algorithm based on available methods of aggregation of individual rankings
(ordinal estimates). The most common ranking aggregation methods were proposed by Borda, Condorcet and,
perhaps, Kemeny. We should stress that if we are dealing with experts, the number of alternatives an expert can
analyze “in one go” is no more than 7±2. However, search engines can return rankings of hundreds of links. That is
why, usage of domination matrix – based methods is not recommended for such dimensionalities. For example, if
we are dealing with 10 IS, providing rankings of 1000 alternatives each, then we have to analyze and process 10
matrices with 1000×1000 cells. In such cases Borda method seems to be the most appropriate option, as it operates
with ranking vectors and not domination matrices. Besides, it is easier to extrapolate Borda method to the case when
IS have different weights (while, for in stance, extrapolation of Markov chain-based methods looks more
problematic). If all IS have the same weight, we can use the following algorithm.
1) Find the length of a ranking with all unique alternatives, featured in individual rankings
n
M  card ({ A(ji ) ; j  1..mi }) .
i 1

133
2) Form a unified set of alternatives. For this purpose we should add to each ranking Ri of mi alternatives
n
all the remaining alternatives {A ; j  1..m } /{ A ; j  1..m } with rank m +1. During unification of
k 1
(k )
j k
(i )
j i i

alternatives, provided by different IS, we should check whether alternatives are conceptually the same. If IS are
experts, then for this purpose we can use the methods of semantic similarity determination, suggested in [20].
3) As a result, we will get M alternatives in each ranking. Each ranking Ri will include mi alternatives with
different ranks and (M – mi ) alternatives with the same rank (mi +1). (A similar approach was mentioned by Dwork
et al in [11]).
4) Add the ranks of each alternative across all IS.
5) Sort (order) the alternatives in the order of increment of individual rank sums. This sorting will result in
the rank order that we should find.

n n
 rkj   rlj  rk  rl , k ,l  1..M (1),
j 1 j 1

where rkj и rlj – are the ranks of alternatives Ak and Al in the ranking, provided by IS number j.

5 Usage of different methods under smaller cardinality of the set of alternatives
If the number of alternatives amounts to one or two dozens, or if the decision-maker (analyst) is interested
only in the rank order of the first few alternatives, provided by IS, then it makes sense to use other ranking methods
(beside or instead of Borda) in order to aggregate the individual ranking results.
For example, let us assume, we have 10-15 IS, and each of them provides a ranking of alternatives, from
which we are particularly interested in the top 10-20 items. If all alternatives (items) are different, then we have 300
(1520) different items at most. In fact (especially if we are talking about search engines), many items featured in
individual rankings across different IS are the same. So, if we have 15 IS and each of them provides a ranking of 20
alternatives, then the cardinality of the set of unique items will amount to approximately 100 alternatives.
Consequently, if we use aggregation methods operating with ordinal pair-wise comparison matrices (PCM) (such as
Condorcet rule or Kemeny’s median) and not with ranking vectors (like Borda or Markov Chain based methods), we
will have to analyze 15 square matrices containing 100100 cells. As strict rank order relationship is reciprocally
symmetrical (if alternative А1 ranks higher than А2, then А2 ranks lower than А1), during aggregation of PCM we will
have to analyze just matrix elements above the principal diagonal {dik, i