=Paper= {{Paper |id=Vol-1745/paper3 |storemode=property |title=Computing the Shapley Value in Allocation Problems: Approximations and Bounds, with an Application to the Italian VQR Research Assessment Program |pdfUrl=https://ceur-ws.org/Vol-1745/paper3.pdf |volume=Vol-1745 |authors=Francesco Lupia,Angelo Mendicelli,Andrea Ribichini,Francesco Scarcello,Marco Schaerf |dblpUrl=https://dblp.org/rec/conf/aiia/LupiaMRSS16 }} ==Computing the Shapley Value in Allocation Problems: Approximations and Bounds, with an Application to the Italian VQR Research Assessment Program== https://ceur-ws.org/Vol-1745/paper3.pdf

Computing the Shapley Value in Allocation Problems:
Approximations and Bounds, with an Application to the
Italian VQR Research Assessment Program

Francesco Lupia1 , Angelo Mendicelli1 , Andrea Ribichini2 , Francesco Scarcello1 , and
Marco Schaerf3
1
Dept. of Computer Science, Modeling, Electronics and Systems Engineering, University of
Calabria, 87036, Rende, Italy
{lupia,a.mendicelli,scarcello}@dimes.unical.it
2
Dept. of Physics, Sapienza University, 00185, Rome, Italy
ribichini@dis.uniroma1.it
3
Dept. of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza
University, 00185, Rome, Italy
marco.schaerf@uniroma1.it

Abstract. In allocation problems, a given set of goods are assigned to agents in
such a way that the social welfare is maximized, that is, the largest possible global
worth is achieved. When goods are indivisible, it is possible to use money com-
pensation to perform a fair allocation taking into account the actual contribution
of all agents to the social welfare. Coalitional games provide a formal mathe-
matical framework to model such problems, in particular the Shapley value is a
solution concept widely used for assigning worths to agents in a fair way. Unfor-
tunately, computing this value is a #P-hard problem, so that applying this good
theoretical notion is often quite difficult in real-world problems.
In this paper, we first review the application of the Shapley value to an alloca-
tion problem that models the evaluation of the Italian research structures with a
procedure known as VQR. For large universities, the problem involves thousands
of agents and goods (here, researchers and their research products). We then de-
scribe some useful properties that allow us to greatly simplify many such large
instances. Moreover, we propose new algorithms for computing lower bounds
and upper bounds of the Shapley value, which in some cases provide the exact
result and that can be combined with approximation algorithms. The proposed
techniques have been tested on large real-world instances of the VQR research
evaluation problem.

1 Introduction
1.1 Coalitional Game Theory
Coalitional games provide a rich mathematical framework to analyze interactions be-
tween intelligent agents. We consider coalitional games of the form G = hN, vi, con-
sisting of a set N of n agents and a characteristic function v. The latter maps each
coalition C ⊆ N to the worth that agents in C can obtain by collaborating with each
other. In this context, the crucial problem is to find a mechanism to allocate the worth
v(N ), i.e., the value of the grand-coalition N , in a way that is fair for all players and that

27
F.Lupia et al. Computing the Shapley value in allocation problems

additionally satisfies some further important properties such as efficiency: we distribute
precisely the available budget v(N ) to players (not more and not less). Moreover, for
stability reasons, it is usually required that every group of agents C gets at least the
worth v(C) that it can guarantee to the game.
Several solution concepts have been considered in the literature as “fair allocation”
schemes and, among them, a prominent one is the Shapley value [16]. According to
this notion, the worth of any agent i is determined by considering its actual contribution
to all the possible coalitions of agents. More precisely, it is considered the so-called
marginal contribution to any coalition C, that is, the difference between what can be
obtained when i collaborates with the agents in C and what can be obtained without the
contribution of i. More formally, the Shapley value of a player i ∈ N is defined by the
following weighted average of all such marginal contributions:

X |C|!(n − |C| − 1)!
φi (G) = v(C ∪ {i}) − v(C) .
n!
C⊆N \{i}

1.2 Allocation Games

Among the various classes of coalitional games, we focus in this paper on allocation
games, which is a setting for analyzing fair division problems where monetary compen-
sations are allowed and utilities are quasi-linear [11]. Allocation games naturally arise
in various application domains, ranging from house allocation to room assignment-
rent division, to (cooperative) scheduling and task allocation, to protocols for wireless
communication networks, and to queuing problems (see, e.g., [4,9,10,11,13] and the
references therein).
Computing the Shapley value of such games is a difficult problem, indeed it is #P-
hard even if goods can only have two different possible values [1]. In this paper we
focus on large instances of this problem, involving thousands of agents and goods,
for which no algorithm described in the literature is able to provide an exact solution.
There are however some promising recent advances that identify islands of tractability
for the allocation problems where at most one good is allocated to each agent: it has
been recently shown that those instances where the treewidth of the agents’ interaction-
graph is bounded by some constant (i.e., have a low degree of cyclicity) can be solved
in polynomial-time [1]. The result is based on recent advances on counting solutions of
conjunctive queries with existential variables [3]. Unfortunately, if the structure is quite
cyclic this technique cannot be applied to large instances, because its computational
complexity has an exponential dependency on the treewidth.
In some applications, one can be satisfied with some approximations of the Shapley
value. With this respect, things are quite good in principle, since we know there exists
a fully polynomial-time randomised approximation scheme to compute the Shapley
value in supermodular games [8]. The algorithm can thus be tuned to obtain the desired
maximum expected error. However, not very surprisingly, for very large instances one
has to consider a huge number of samples, in order to stay below a reasonable expected
error.

28
F.Lupia et al. Computing the Shapley value in allocation problems

1.3 Contribution

In order to attack large instances of allocation problems, we start by proving some
useful properties of these problems that allow us to decompose instances into smaller
pieces, which can be solved independently. Moreover, some of these properties identify
cases where the computation of the worth function can be obtained in a very efficient
way.
With these properties, we are able to use the randomised approximation algorithm
even on instances that (when not decomposed) are very large.
Furthermore, we note that in some applications one may prefer to determine a guar-
anteed interval for the Shapley value, rather than one probably good point. Therefore,
we propose algorithms for computing a lower bound and an upper bound of the Shap-
ley value for allocation problems. In most cases the distance between the two bounds
is quite small, and often they coincide, which means that we actually computed the ex-
act value. We also used these algorithms together with the approximation algorithm, to
provide a more accurate evaluation of the maximum error of the randomised solution,
for the considered instances.

1.4 The Case Study

The proposed techniques have been tested on large real-world instances of the VQR2011-
2014 Italian research assessment exercise: Every Italian research structure R has to se-
lect some research products, and submit them to an evaluation agency called ANVUR.
While doing so, the structure R is in competition with all other Italian research struc-
tures, as the outcome of the evaluation will be used to proportionally transfer the funds
allocated by the Ministry to support research activities in the next years (until the sub-
sequent evaluation process). Every structure R is therefore interested in selecting and
submitting its best research products. For the sake of simplicity, we next simply speak
of publications instead of research products (which can also be patents, books, etc.),
and of universities and departments instead of structures and substructures (which can
be other research subjects). The programme is articulated in two phases: (1) Based on
the self-evaluations4 of the authors, R selects and submits to ANVUR (at most) two
publications for each one of its authors5 , in such a way that any product is formally
associated with at most one author. (2) ANVUR formulates its independent quality
judgment about the submitted publications (the score assigned to each publication is
made known only to its authors), and the sum of their “true” scores (i.e., those resulting
by ANVUR actual evaluation) is then the VQR score of R. Eventually, R will receive
funds in subsequent years proportional to this score. Furthermore, for the VQR2004-
2010 research assessment exercise (data for VQR2011-2014 have not been released
yet) ANVUR also published an evaluation of all departments, based on the product
scores (the score of each department was computed as the sum of the scores of the
products formally assigned to the authors in that department). Finally, the scores were
4
To be more precise, ANVUR provided reference tables to assist in the evaluation of products,
at least for some research areas. To our ends, this detail is immaterial.
5
There are exceptions to this rule: in specific circumstances, fewer than two publications are
expected for some authors. To our ends, this detail is immaterial.

29
F.Lupia et al. Computing the Shapley value in allocation problems

also used for evaluating individual researchers that had been recently hired by R (this
also greatly influenced R’s funds in subsequent years), as well as those researchers that
were members of PhD committees. Scores for recently hired researchers were com-
puted as the sum of the scores of the products formally assigned to them; data in this
respect were published by ANVUR in aggregated form only, for each department and
for each scientific disciplinary sector. Evaluations for researchers that were members of
PhD committees were computed as the sum of the scores of the best publications each
one of them had coauthored, among all the publications submitted for the VQR (for this
evaluation, the formal assignment of publications to authors was irrelevant); data in this
respect were published by ANVUR in aggregated form only, for each PhD committee.
The way ANVUR currently uses product scores, for the purposes described above,
yields evaluations that do not satisfy the desirable properties outlined in Section 4. In
order to deal with this issue, we have modeled the problem as an allocation game [2],
with a fair way to divide the total score of the university among researchers, groups, de-
partments based on the Shapley value. The proposed rule enjoys many desirable prop-
erties, such as the independence of the specific allocation of research products, the in-
dependence of the preliminary (optimal) products selection, the guarantee of the actual
(marginal) contribution, and so on. Unfortunately, due to the high computational com-
plexity of the approach we propose, it proved unfeasible to compute the desired alloca-
tion for our largest test case (namely, the researchers of Sapienza University of Rome
who participated in the research assessment exercise VQR2011-2014), even using the
approximation algorithms known in the literature. We are therefore proposing improve-
ments and modifications to the approximate algorithms, in an attempt to solve this prob-
lem. Our approach has yielded some very encouraging results (even though, as reported
in Section 7.2, we are still not able to compute, or even approximate within acceptable
bounds, an allocation for a non–negligible fraction of the agents in the Sapienza test
case), outlining a promising line of research.

2 Preliminaries

In the setting considered in this paper, a game is defined by an allocation scenario
A = hN, G, Ω, val, ki comprising a set of agents N and a set of goods G, whose
values are given by the real function val. The function Ω associates each agent with
the set of goods (s)he is interested in. Moreover, the natural number k provides the
maximum number of goods that can be assigned to each agent. Goods are indivisible
and unsharable.
For a coalition C ⊆ N , a (feasible) allocation πA [C] is a mapping from C to sets
of goods from G such that: each agent i ∈ C gets a set of goods πA (i) ⊆ Ω(i) with
|πA (i)| ≤ k, and πA (i) ∩ πA (j) = ∅, for any other agent j ∈ C (each good can be
assigned to one agent at most).
We denote by S img(πA [C]) the set of all goods in the image of πA [C], that is,
img(πA [C]) = i∈C πA [C](i). With a slight abuse of notation, we denote by val(S)
the sum of all the values of a set of goods S ⊆ G, and by val(πA [C]) the value
val(img(πA [C])). An allocation πA [C] is optimal if there exists no allocation πA
0
[C]
with val(πA [C]) > val(πA [C]). The total value of such an optimal allocation for
0

30
F.Lupia et al. Computing the Shapley value in allocation problems

the coalition C is denoted by optA (C). The budget available for A, also called the
(maximum) social welfare is optA (N ), that is, the value of any optimal allocation for
the whole set of agents N (the grand-coalition). The coalitional game defined by the
scenario A is the pair hN, optA i, that is, the game where the worth of any coalition
is given by the value of any of its optimal allocations. Note that optA (C) ≥ 0 holds,
for each C ⊆ N , since the allocation where no agent receives any goods is a feasible
one (the value of an empty set of goods is 0). The definition trivializes for C = ∅, with
optA (∅) = 0.

3 2 1 1 3 2 1 1

g1 g2 g3 g4 a1 a2 a1
3 2 1 1 3 2 1 1 3 2 1 1

a1 a3 a2
3 2 1 1 3 2 1 1
a1 a2 a3
a2 a3 a3

Fig. 1: Allocation scenario A0 in Example 1.

Example 1. Consider the allocation scenario A0 , depicted in a graphical way in Fig-
ure 1. We have a set {g1 , g2 , g3 , g4 } of goods that have to be allocated to three agents.
For simplicity assume that it is possible to allocate just one good to each agent, there-
fore each edge connects an agent to a good she is interested in. Edges in bold identify
an optimal allocation, i.e., a feasible allocation whose sum of values of the allocated
goods is the maximum possible one. The value of this allocation is val(g1 )+val(g2 )+
val(g3 ) = 3 + 2 + 1 = 6.
For each C ⊂ {1, 2, 3} with C 6= ∅, an optimal allocation restricted to the agents
in C is also reported. Then, the associated coalitional game is GA0 = h{1, 2, 3}, vA0 i,
where vA0 ({1, 2, 3}) = 6, vA0 ({1, 2}) = 5, vA0 ({1, 3}) = vA0 ({2, 3}) = 4, vA0 ({1}) =
vA0 ({2}) = 3, and vA0 ({3}) = 1.
For any allocation scenario A = hN, G, Ω, val, ki, we define the agents graph as
the undirected graph G(A) = (N, E) such that {i, j} ∈ E if there is a good g ∈
Ω(i) ∩ Ω(j).
Notice that, in this specific case, the agents graph contains an edge between two
agents (authors) if they are coauthors of at least one publication (we could call it the
collaboration graph).

3 The VQR Allocation Game
Note that the VQR research assessment exercise can be naturally modeled as an alloca-
tion scenario A = hR, P, products, val, 2i where R is the set of researchers affiliated
with a certain university R, P is the set of publications selected by R for the assess-
ment exercise, products maps authors to the set of publications they have written, and

31
F.Lupia et al. Computing the Shapley value in allocation problems

val assigns a value to each publication. In the current VQR programme (covering years
2011-2014), the range of val is {0, 0.1, 0.4, 0.7, 1}, with the latter value reserved to the
excellent products.
In the submission phase, the values are estimated by the universities according to
criteria published by ANVUR. The final result may sometimes be different, in particular
for those publications that undergo a peer-review process by experts selected by AN-
VUR, which clearly introduces a subjective factor in the evaluation. This is however
immaterial for the purpose of this paper, so that we assume that the values in the pre-
liminary phase do coincide with the final ANVUR evaluation for those products. At the
end of the program, R will receive an amount of funds proportional to VR = val(P),
that is, to the considered measure of the quality of the research produced by the univer-
sity R. The first combinatorial problem, which is easily seen to be a weighted matching
problem, is to identify the best allocation scenario for the university. That is, to select a
set of publications P to be submitted, having the maximum possible total value among
all those authored by R in the considered period.

p1 p2 p3 p4 p5 p6 p7 p8

7 10 6 7 6 8 7 7

r1 r2 r3

Fig. 2: Authors and products in Example 2.

Example 2. Let us consider the weighted bipartite graph in Figure 2, whose vertices
are the researchers R = {r1 , r2 , r3 } of a university R and all the publications they
have written. Edges encode the authorship relation, and weights encode the publica-
tions values. Consider the allocation ψ such that ψ(r1 ) = {p1 , p3 }, ψ(r2 ) = {p2 , p4 },
and ψ(r3 ) = {p6 , p7 }, encoded by the solid lines in the figure. Then, the set of pub-
lications submitted for the evaluation is Pψ = {p1 , p2 , p3 , p4 , p6 , p7 }. Note that p2 is
co-authored by r1 , r2 , and r3 , while p3 is co-authored by r1 and r2 . According to the
values of each product , we get opt(R) = 45.
The problem that we face is how to compute, from the total value obtained by R, a
fair score for individual researchers, or groups, or departments, and so on. As mentioned
above, product scores are currently used for evaluating the hiring policy of universities
and the PhD committees, and it seems that such scores will contribute to evaluate the
quality of teaching, too. Unfortunately, as we will argue, this is currently done in a
way that fails to satisfy the properties that we outline below. Instead, following [2], we
propose to use the Shapley value of the allocation game defined by the scenario selected
by the given structure R as the division rule to distribute the available total value (or
budget) to all the participating agents. For the allocation scenario in Example 2, we get
φr1 = 29 2 , φr2 = 2 , and φr3 = 16. Notice that the Shapley value is not a percentage
29

assignment of publications to authors, but takes into account all possible coalitions of
agents.

32
F.Lupia et al. Computing the Shapley value in allocation problems

We will now recall the main desirable properties enjoyed by this solution. We re-
fer the interested reader to [2] for a more detailed description and discussion of these
properties.
Budget-balance.P The division rule precisely distributes the VQR score of R over
all its members, i.e., r∈R φr = VR .
Fairness. The division rule is indifferent w.r.t. the specific optimal allocation used
to submit the products to ANVUR. In particular, the score of each researcher is inde-
pendent of the particular products assigned to him in the submission phase; moreover,
it is independent of the specific set of products P selected by the university, as long as
the choice is optimal (i.e., with the same maximum value VR ).
Marginality.
P For any group of researchers S ⊆ R, φS ≥ marg(S, R), where
φS = i∈φS φi and marg(S, R) = opt(R) − opt(R \ S). That is, every group is
granted at least its marginal contribution to the performance of the grand-coalition R.
We remark the importance of the fairness property, as the choice of a specific opti-
mal set of products is immaterial for R, but it may lead to quite different scores for indi-
viduals (and for their aggregations). Indeed, this happens for the division rules adopted
by ANVUR for the evaluation of both departments and newly hired researchers (see
Section 1.4), which are not based on the Shapley value. Budget-balance, on the other
hand, is violated by the division rule for evaluating researchers who are members of
PhD committees.

4 Useful properties to deal with large real-world problems

Recall that computing the Shapley value is #P-hard for many classes of games (see,
e.g., [6,5,7,12]), including the allocation games, even if goods may have only two pos-
sible values [4]. In our case study, where agents are researchers and goods are research
products, for large universities we may have to deal with thousands of agents and goods.
The brute-force approach would need to compute, for each agent, 2|N | optimization
problems, which is clearly infeasible for our application.
In this section we describe some properties of the Shapley value, in particular for
allocation problems, which allow us to solve some large real-world instances or at least
to obtain good approximations for them. Let us consider in this section an allocation
game G = hN, vi, whose agents graph is G = (N, E).

Fact 1 (Modularity) Let C1 and C2 be two sets of agents that induce two (disjoint)
connected components of the agents graph, and denote by G1 = hC1 , v1 i (resp., G2 =
hC2 , v2 i) the coalitional game restricted to agents in C1 (resp., C2 ). Then, for each
agent i ∈ N , φi (G) = φi (G1 ) + φi (G2 ).

Proof. It follows by the additivity property of the Shapley value: indeed, for each coali-
tion C ⊆ N , we have v(C) = v1 (C ∩ C1 ) + v2 (C ∩ C2 ).

Modularity is an important properties when dealing with large instances, because it
states that we can solve in a separate way the problems associated with each component.

33
F.Lupia et al. Computing the Shapley value in allocation problems

The next properties that we point out below try to leverage such property by removing
agents interactions having no actual impact on the game output.
Recall that we assumed that all goods are non-negative. Then the following property,
which trivially holds, says that we can always assume that any good with value 0 is not
shared by agents.

Fact 2 (No shared null goods) Let B the set of goods having value 0. By removing
all such goods from G we get an equivalent allocation game. Also, if it is useful in
algorithms, we can add any number of goods having no value to each agent, with all
such goods being of interest to one agent only.

Fact 3 (Disconnected agent) Consider an agent i and a coalition C that is discon-
nected from i, that is, such that Neigh(i) ∩ C = ∅. Then, opt({i} ∪ C) = opt({i}) +
opt(C).

Theorem 4 (Separability). Let C be any coalition such that opt(C) + opt(N \ C) ≤
opt(N ). Then, we can define two allocation scenarios over agents C and N \ C that
can be solved separately, that is, such that, for each player i ∈ N , its Shapley value on
the restricted allocation scenario where i occurs is the same as in the original game.

Proof. (Sketch.) Denote N \ C by C̄, and consider the games G1 = hC, v1 i and G2 =
hC̄, v2 i restricted to agents in C and C̄, respectively. Since the allocation games are
supermodular [4], we have opt(C) + opt(C̄) ≥ opt(N ), which entails that, for the
considered coalition, opt(C) + opt(C̄) is actually equal to opt(N ). This means that
the values of the goods not used in any optimal allocation for C̄ is equal to the sum of
the values of the best goods for the agents in C.
We claim that in this case, for each optimal allocation π for N , a set of goods
S ⊆ Ω(C) with val(S) = opt(C) is assigned to players in C, so that both C and C̄
get their best goods according to π. Assume that C gets a value v = val(S) ≤ opt(C),
while C̄ gets a value v̄ ≤ opt(C̄). We know that opt(C) + opt(N \ C) = opt(N )
and, by the optimality of π, v + v̄ = opt(N ), from which the claim follows.
Consider now any coalition C 0 ⊆ N , and let Ca = C 0 ∩ C and Cb = C 0 ∩ C̄.
Let π 0 be an optimal allocation for C 0 . We claim that there is an optimal allocation πa
mapping goods from S to C with valπa (Ca ) = valπ0 (Ca ), and an optimal allocation
πb mapping goods not in S to C̄ with valπb (Cb ) = valπ0 (Cb ). Assume this is not
the case. Then at least one of those allocations lead to values smaller than those in π 0
(note that π 0 cannot be worse, because the union of the two restricted allocations is
a valid candidate mapping for C 0 ). Assume Ca gets smaller values (the other case is
symmetrical), that is, valπa (Ca ) < valπ0 (Ca ). Then, there exists some agent i and a
good p ∈ / S so that p ∈ π 0 (i). By using Theorem 4.4 in [4], we can show that this would
contradict the fact that val(S) = opt(C). Intuitively, goods such as p that are shared
with agents outside C and that allows us to get a better value for the agents in Ca ⊆ C,
could be used to improve the choice of the available goods S for the full set C.
The statement then follows easily by the additivity of the Shapley value, by con-
sidering two allocation scenarios restricted to C with the goods in S and to C̄ with the
goods outside S, respectively.

34
F.Lupia et al. Computing the Shapley value in allocation problems

As a special case, when C is a singleton {i}, we can immediately drop from the
game such an agent i for which the value of her best goods is equal to her marginal
contribution to the grand-coalition. Note that we also immediately get φ(i) = opt({i}).

5 Approximating the Shapley Value

In order to approximate the Shapley value, we use the Fully Polynomial-time Random-
ized Approximation Scheme (FPRAS) proposed in [8]: for any > 0 and δ > 0,
it is possible to compute in polynomial-time an −approximation of the Shapley value
with probability of failure at most δ. The technique works for supermodular and mono-
tone coalitional games, and it can be shown that our allocation games indeed meet these
properties [4].
The method is based on generating a certain number of permutations (of all agents)
and computing the marginal contribution of each agent to the coalition of agents occur-
ring before her (him) in the considered permutation. Then the Shapley value of each
player is computed as the average of all such marginal contributions. The above proce-
dure is repeated O(log(1/δ)) times, in indepedent runs, with the result for each agent
consisting of the median of all computed values for her (him). Finally, the obtained val-
ues are scaled (i.e., they are all multiplied by a common numerical factor) to ensure that
the budget-balance property is not violated.
Clearly enough, the more permutations are considered, the closer to the Shapley
value the result will be. We next report a slightly modified version of the basic procedure
of this algorithm, where we avoid the computation of some marginal contributions, if
we can obtain the result by using Fact 3.
As a preliminary, we compute the required number of permutations m to meet the
required error guarantee. In each of the m iterations, the algorithm generates a random
permutation from the set of agents N . We then iterate through this permutation and
compute the marginal contribution of each agent j to the set of agents C occurring
before j in the permutation at hand. If some neighbour of j in the agents graph occurs
in C, the algorithm proceeds as usual by resolving the optimisation problem to compute
vA (C ∪ {j}) (i.e., to get the value opt(C ∪ {j}) of an optimal allocation for C ∪
{j}). Note that to get the desired marginal contribution we need to solve one allocation
problem only, because the value opt(C) for all the preceding agents is known from the
previous step. Moreover, by Fact 3, we know that for those permutations in which all
the players in Neigh(j) follow j, the marginal contribution of j is just opt({j}) (see
step 10). Finally, at steps 16–18 for each agent the algorithm divides the sum of her
contributions by the number of performed iterations m. The correctness of the whole
algorithm follows from Theorem 4 in [8].
Computation Time Analysis. Let n = |N | be the number of agents, and let m be the
required number of iterations. The cost of the algorithm is O(m × n × margBlock),
where margBlock denotes the cost of computing each marginal contribution (steps
7–11). This requires the computation of an optimal weighted matching in a bipartite
graph, which is feasible in O(n3 ), via the classical Hungarian algorithm. However, if
the current agent is disconnected from the rest of the coalition, the cost is simply given
by the lookup in the cache where the best allocation for each single agent is stored.

35
F.Lupia et al. Computing the Shapley value in allocation problems

Algorithm 1 Shapley value approximation in allocation games
Input: An allocation game GA = hN, vA i;
Parameters: Real numbers 1 > > 0 and 1 > δ > 0;
Output: A vector φ̃ that is an -approximation of the Shapley value of GA , with probability 1−δ;
1: m = |N |·(|N
δ·2
|−1)
;
2: i = 0;
3: while i < m do
4: shuffle(N );
5: C := {∅};
6: for all j ∈ N do
7: if Neigh(j) ∩ C 6= ∅ then
8: φ̃j += vA (C ∪ {j}) − vA (C);
9: else
10: φ̃j += vA ({j});
11: end if
12: C := C ∪ {j};
13: i = i + 1;
14: end for
15: end while
16: for all j ∈ N do
φ̃
17: φ̃j = mj ;
18: end for
19: return φ̃;

6 Lower and Upper Bounds for the Shapley value

In this section we describe the computation of exact (not probabilistic) bounds for the
Shapley value of any given allocation game GA = hN, vA i. Preliminarily recall that
two very simple bounds are easily obtained by looking at the definition of the Shapley
value: for each player i, marg({i}, N ) ≤ φi ≤ opt({i}).
To obtain tighter bounds we observe that, intuitively, the neighbours of i in a coali-
tion C are the agents having the higher influence on the marginal contribution of i to C.
Indeed, they are precisely those agents interested in using the goods of i when (s)he does
not belong to the coalition. We already observed that in the extreme case that no neigh-
bours are present then i contributes with all her (his) best goods. Therefore, the first
idea is to consider the power-set of Neigh(i) to see what happens for every combina-
tion of these agents. The second observation is that, for each pair of coalitions C1 ⊆ C2 ,
marg({i}, C2 ) ≤ marg({i}, C1 ). Putting it all together, we proceed as follows. Let P 0
be a set of neighbors of i, let Z = Neigh(i) \ P 0 , let C = N \ (Neigh(i) ∪ {i}) and
let l = |C|. We compute the marginal contribution of i to C ∪ P 0 , but use this same
value for the marginal contributions of i to each coalition of the form C 0 ∪ X, for
any C 0 ⊆ C. This is performed by multiplying such contribution by a factor y com-
Pl 0
puted as k=0 (l−k+|P|N|)!·(|Z|+k)!
|! · kl . Thus, for each element P 0 of the power set of
the neighbours, we consider the worst situation for i where all other agents in C are
present.

36
F.Lupia et al. Computing the Shapley value in allocation problems

The case of the upper bound is obtained in the dual way, by using instead the most
favorable case where only the neighbours in P 0 belong to the considered
coalitions
(i.e.,
C = ∅). We also use the same factor y, by using the property kl = l−k l
.
Then, we can show the next algorithm compute, for each agent i, a lower bound
LBi and an upper bound U Bi for her/his Shapley value.

Algorithm 2 Computing Bounds for the Shapley Value in Allocation Games
Input: An allocation game GA = hN, vA i;
Output: A pair of vectors (LB, U B) encoding, respectively, a lower bound and an upper bound
of the Shapley value of GA ;
1: for all i ∈ N do
2: P := P owerset(Neigh(i));
3: C := N \ (Neigh(i) ∪ {i});
4: l = |C|;
5: for all P 0 ∈ P do
6: Z := Neigh(i) \ P 0 ;
P 0
7: y = lk=0 (l−k+|P|N|)!·(|Z|+k)!
|!
· kl ;
8: LBi += y · (vA (C ∪ P 0 ∪ {i}) − vA (C ∪ P 0 ));
9: U Bi += y · (vA (P 0 ∪ {i}) − vA (P 0 ));
10: end for
11: end for
12: return (LB, U B);

Computation Time Analysis. The for loops in steps 5–10 show that both algorithms are
exponential in the maximum degree of the agents graphs, that is, maxi∈N |Neigh(i)|.
In practice, after the simplifications that we perform on the instance, these sets are not
large and the bounds can be computed in a reasonable time (see the next section).

7 Implementation Details and Experimental Evaluation

7.1 Parallel Implementation of the Approximate Algorithm

We found the approximate Shapley value algorithm of Niben-Nowell et al. [8] (we
briefly discussed their approach in Section 5) to be amenable to parallel implemta-
tion. We engineered our parallel implementation as follows. Besides the input alloca-
tion game, and the two parameters δ and , we added a third parameter, the thread pool
size. During the execution of the algorithm, each thread (there are as many threads as the
thread pool size dictates) is responsible for generating a certain number of permutations
according to the requested approximation factor and, for each permutation, it computes
the marginal contributions of all authors to that permutation, and saves them in a local
cache. When a thread has generated its assigned number of permutations, it delivers its
local cache of computed scores to a synchronized output acceptor (which increments
the overall score of each author accordingly), and then shuts itself down, as its work is

37
F.Lupia et al. Computing the Shapley value in allocation problems

completed. When all threads have shut down, each entry of the acceptor’s output vec-
tor is averaged over the total number of permutations, yielding the final approximate
Shapley vector for that run. The above procedure is repeated for each independent run.
When all runs are done, the component-wise median of all final approximate Shapley
vectors is computed, and the resulting vector is scaled (i.e., all entries are multiplied by
a number such that the Shapley values budget balance property is enforced), yielding
the desired approximation with the desired probability.

7.2 Experimental Results

Hardware and software configuration. Experiments have been performed on two ded-
icated machines. In particular, sequential algorithms were run on a machine with an
Intel Core i7-3770k 3.5 GHz processor, 12 GB (DDR3 1600 MHz) of RAM, and oper-
ating system Linux Debian Jessie. We tested the parallel implementation on a machine
equipped with two Intel Xeon E5-4610 v2 @ 2.30GHz with 8 cores and 16 logical
processors each, for a total of 32 logical processors, 128 GB of RAM, and operating
system Linux Debian Wheezy. Algorithms were implemented in Java, and the code
was executed on the JDK 1.8.0 05-b13, for the Intel Core i7 machine, and on the Open-
JDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-1 deb7u1), for the Intel Xeon
machine.
Dataset description. We applied the algorithms to the computation of a fair division
of the scores for the researchers of Sapienza University of Rome who partecipated in
the research assessment exercise VQR2011-2014. Sapienza contributors to the exer-
cise were 3562 and almost all were required to submit 2 publications for review. We
computed the scores of each publication by applying, when available, the bibliographic
assessment tables provided by ANVUR.
Preprocessing. The analysis was carried out by preliminarily simplifying the input using
the properties discussed in Section 4, as explained next.
Starting with a setting with 3562 researchers and 5909 publications, first we re-
moved each researcher having no publications for review. After this step a total of
370 authors were removed. Then, by exploiting the simplification described in Fact 2,
we removed 2323 publications. By using Theorem 4, the graph was subsequently fil-
tered removing each author whose marginal contribution to the grand coalition co-
incides with the optimal allocation restricted to the author himself. After this step
2427 researchers out of 3562 were removed. Then we divided the resulting agents
graph into connected components obtaining a total number of 156 connected com-
ponents and we discovered only two components consisting of more than 10 agents.
The sizes of these components are 691 and 15. Eventually, the components were fur-
ther decomposed as follows: For any agent i, a good g is removed from its set Ω(i) if
val(g)+maxg0 ∈Ω(i)\{g} val(g 0 ) < v(N )−v(N \{i}). After the whole preprocessing
phase, we obtained a total of 159 connected components with the largest one having 685
nodes. The size of the second largest is just 15 while all the others remain very small
(less than 10 nodes). In the rest of the section, we shall illustrate results of experimental
activity conducted over the various methods. To this end, we fixed the value δ = 0.01.

38
F.Lupia et al. Computing the Shapley value in allocation problems

210
200
190
180
170
160
150
140
130
120
110
Values

100
90
80
70
60
50
40
LB
30
randomised
20 exact
10 UB
0
a10 a4 a3 a1 a11 a13 a14 a6 a7 a2 a8 a9 a12 a0 a5
Agents

Fig. 3: Methods comparison (n = 15)

The value was chosen heuristically based on a series of tests conducted on various CNR
Areas of Sapienza, where CNR Areas are (large) scientific disciplines such as Math and
Computer Science (Area 01) or Physics (Area 02).
Test with components of variable size. As we have already pointed out, after the prepro-
cessing step we obtained very small connected components (less than 10 nodes) except
for the first two (685 and 15 nodes, respectively). For all components with less than 10
nodes, the exact algorithm (of which we used the sequential implementation) tended to
perform very well (a few milliseconds), therefore we omit the analysis here. In order to
test all the other algorithms, besides the two largest components, we randomly extracted
a sample of (distinct) nodes out of the original graph, to produce different subgraphs
with size |N | ∈ {23, 26, 30, 40}.
For the the considered cases we do not find significant enough differences between
the randomized method and the exact one, see, e.g., figures 3 and 4. Notably, apart from
a small number of cases, our bounds (especially the lower bounds) are always very
close to the exact value. In particular, for n = 26 we were able to get immediately the
Shapley value for all agents, since upper and lower bounds coincide for all of them.
Moreover, it emerges that the randomized method is much better than the theoretical
guarantee on the maximum approximation error. In the considered instances, the values
obtained by using the randomized method are within the correct bounds.
We also evaluated how many computations of optimal allocations have been avoided
by considering whether or not an agent has neighbours before her in the permutation at

39
F.Lupia et al. Computing the Shapley value in allocation problems

210
200
190
180
170
160
150
140
130
120
110
Values

100
90
80
70
60
50
40
30
LB
20 randomised
10 UB
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Agents

Fig. 4: Methods comparison (n = 40)

hands (and hence executing in the latter case Step 10 rather than Step 8 in Algorithm 1).
By fixing the approximation error at = 0.3, for each n ∈ {15, 23, 26, 30, 40} we get
the following savings: 9.65 · 105 out of 3.5 · 106 (i.e., 28%), 2.34 · 106 out of 1.29 · 107
(18%), 5.36·106 out of 1.87·107 (29%), 8.78·106 out of 2.9·107 (30%), and 1.46·107 out
of 6.93 · 107 (21%), respectively. Finally, we also report the real maximum and average
approximation errors (denoted with X and Y, respectively) of our implementation w.r.t.
the exact algorithm for each n ∈ {15, 23, 26}. In particular, for n = 15 we get X = 0.01
and Y = 3·10−3 , for n = 23 we get X = 1.5·10−3 and Y = 1.7·10−4 and, for n = 26 we
get X = 1.06 · 10−4 and Y = 1.59 · 10−5 . In all the cases, the maximum approximation
error was about 1% (or less) and therefore considerably below the theoretical guarantee
(30%).
We stress that we were not able to fully analyse the component with 685 nodes, as
it would simply have taken too long. However, we hope to be able to do that in a future
work, possibly developing more powerful preprocessing techniques. For the time being,
we report that the computation of our upper and lower bounds, when applied to this
component, allowed us to approximate the Shapley value of 581 agents within a 30%
error bound, of which 362 agents are within a 10% error bound, and 204 within 5%.
Moreover, we were able to compute the exact Shapley value for 53 agents.

Running Times. Figures 5 and 6 report the computation time of the various algorithms
In particular, Figure 5 focuses on the brute-force algorithm for computing the exact

40
F.Lupia et al. Computing the Shapley value in allocation problems

2.0 12
exact
LB
UB 10
1.5
8
Time (h)

Time (s)
1.0 6

4
0.5
2

0.0 0
15 23 26 30 40 52
Instance size

Fig. 5: Running times for the computation of the exact value by using the brute-force
algorithm (green) and of the upper and lower bounds (blue) vs instances size

values, and on the algorithms for the computation of the upper and lower bounds. For
the experiments, we computed separately the two bounds in order to point out that the
computation of the lower bound requires in general more time, because it considers
allocation over larger coalitions than those considered for the computation of the upper
bound. Moreover, as discussed in Section 6, the running times for computing the bounds
heavily depends on the cardinality of agents’ neighbours. This explains why the running
times for the case n = 52 are smaller than those for the case n = 40.
Figure 6 shows the running time of the randomized parallel algorithm using 24
threads, for different values of . In particular, for δ = 0.01 we performed five trials
over the different (sub)games described above, and averaged measures are reported. We
can see that for games of reasonable size we can achieve high theoretical approximation
error guarantee. For instance, for the biggest considered game (n = 52) we were able
to compute the approximate Shapley value with = 0.1 (and δ = 0.01) in less than
90 minutes. There is a big gap between the performances of the randomized algorithm
when using the extreme values for the approximation error that we considered. How-
ever, as already pointed out, even when we used a poor theoretical guarantee on the
approximation error, we still got a quite reasonable accuracy.
Notably, for the case n = 15 the exact algorithm is faster than the sequential ran-
domized algorithm, while it is definitely slower than the lower and upper bound algo-
rithms.

41
F.Lupia et al. Computing the Shapley value in allocation problems

89.12 min
n=52
n=40
80
n=30
n=26
n=23
n=15

60
Time (min)

40
37.82 min

1.08 min
1.64 min 8.53 min
0.41 min
0.25 min 4.60 min
0.21 min 3.74 min
0 0.06 min 0.41 min
90 80 70 60 50 40 30 20 10 5
Maximum allowable MC error (%)

Fig. 6: Parallel implementation: running times vs

8 Future work

We are working on further theoretical properties of the Shapley value in allocation
problems. The goal is to get some more tools for simplifying the VQR real-world large
instances.
Moreover, we would like to extend the structure-based technique described in [1]
to the more general class of games where more than one good can be allocated to each
agent (as it is the case in VQR allocations). This way, we could compute efficiently the
exact Shapley value for large games, provided that the treewidth of the agents graph
is small. With this respect, we note that, by applying the simplifications described in
this paper, all the connected components of the big VQR instance relative to Sapienza
University of Rome have a small treewidth (at most 8) except for the largest one (com-
prising 685 agents), which has treewidth at most 64. For instance, the component having
52 agents has treewidth 5.
Finally, we would like to obtain tighter lower and upper bounds, with a computa-
tional effort that can be tuned to meet given time constraints.

42
F.Lupia et al. Computing the Shapley value in allocation problems

References
1. G. Greco, F. Lupia, and F. Scarcello. Structural tractability of shapley and banzhaf values in
allocation games. In Proc. of IJCAI’15, pages 547–553, 2015.
2. G. Greco and F. Scarcello. Fair division rules for funds distribution: The case of the italian
research assessment program (vqr 2004-2010). Intelligenza Artificiale, 7(1):45–56, 2013.
3. G. Greco and F. Scarcello. Counting solutions to conjunctive queries: structural and hybrid
tractability. In Proc. of PODS’14, pages 132–143, 2014.
4. G. Greco and F. Scarcello. Mechanisms for fair allocation problems: No-punishment pay-
ment rules in verifiable settings. J. Artif. Intell. Res. (JAIR), 49:403–449, 2014.
5. H. Aziz and B. de Keijzer. Shapley meets shapley. In Proc. of STACS’14, pp. 99–111.
6. Y. Bachrach and Y.S. Rosenschein. Power in threshold network flow games, pp. 106–132,
2009.
7. X. Deng and C.H. Papadimitriou. On the complexity of cooperative solution concepts. Math-
ematics of Operations Research, 19:257–266, May 1994.
8. D. Liben-Nowell, A. Sharp, T. Wexler, and K. Woods. Computing Shapley value in super-
modular coalitional games. In Proc. of COCOON’12, pages 568–579. 2012.
9. F. Maniquet. A characterization of the Shapley value in queueing problems. Journal of
Economic Theory, 109(1):90–103, 2003.
10. D. Mishra and B. Rangarajan. Cost sharing in a job scheduling problem. Social Choice and
Welfare, 29(3):369–382, October 2007.
11. H. Moulin. An application of the Shapley value to fair division with money. Econometrica,
60(6):1331–49, November 1992.
12. H. Nagamochi and Dao-Zhi Zeng and Naohisa Kabutoya and Toshihide Ibaraki. Complexity
of the Minimum Base Game on Matroids Math. Oper. Res., 22(1):146–164, 1997.
13. A. Iera and L. Militano and L. Romeo and F. Scarcello. Fair Cost Allocation in
Cellular-Bluetooth Cooperation Scenarios EEE Transactions on Wireless Communications.,
10(8):2566–2576, 2011.
14. R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal
of Computer and System Sciences, 79(6):984–1001, 2013.
15. N. Robertson and P. Seymour. Graph minors iii: Planar tree-width. Journal of Combinatorial
Theory, Series B, 36(1):49–64, February 1984.
16. L. S. Shapley. A value for n-person games. Contributions to the theory of games, 2, 1953.