=Paper= {{Paper |id=Vol-1745/paper3 |storemode=property |title=Computing the Shapley Value in Allocation Problems: Approximations and Bounds, with an Application to the Italian VQR Research Assessment Program |pdfUrl=https://ceur-ws.org/Vol-1745/paper3.pdf |volume=Vol-1745 |authors=Francesco Lupia,Angelo Mendicelli,Andrea Ribichini,Francesco Scarcello,Marco Schaerf |dblpUrl=https://dblp.org/rec/conf/aiia/LupiaMRSS16 }} ==Computing the Shapley Value in Allocation Problems: Approximations and Bounds, with an Application to the Italian VQR Research Assessment Program== https://ceur-ws.org/Vol-1745/paper3.pdf
 Computing the Shapley Value in Allocation Problems:
Approximations and Bounds, with an Application to the
     Italian VQR Research Assessment Program

Francesco Lupia1 , Angelo Mendicelli1 , Andrea Ribichini2 , Francesco Scarcello1 , and
                                Marco Schaerf3
 1
      Dept. of Computer Science, Modeling, Electronics and Systems Engineering, University of
                                  Calabria, 87036, Rende, Italy
                {lupia,a.mendicelli,scarcello}@dimes.unical.it
                   2
                     Dept. of Physics, Sapienza University, 00185, Rome, Italy
                              ribichini@dis.uniroma1.it
      3
        Dept. of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza
                                 University, 00185, Rome, Italy
                              marco.schaerf@uniroma1.it

         Abstract. In allocation problems, a given set of goods are assigned to agents in
         such a way that the social welfare is maximized, that is, the largest possible global
         worth is achieved. When goods are indivisible, it is possible to use money com-
         pensation to perform a fair allocation taking into account the actual contribution
         of all agents to the social welfare. Coalitional games provide a formal mathe-
         matical framework to model such problems, in particular the Shapley value is a
         solution concept widely used for assigning worths to agents in a fair way. Unfor-
         tunately, computing this value is a #P-hard problem, so that applying this good
         theoretical notion is often quite difficult in real-world problems.
         In this paper, we first review the application of the Shapley value to an alloca-
         tion problem that models the evaluation of the Italian research structures with a
         procedure known as VQR. For large universities, the problem involves thousands
         of agents and goods (here, researchers and their research products). We then de-
         scribe some useful properties that allow us to greatly simplify many such large
         instances. Moreover, we propose new algorithms for computing lower bounds
         and upper bounds of the Shapley value, which in some cases provide the exact
         result and that can be combined with approximation algorithms. The proposed
         techniques have been tested on large real-world instances of the VQR research
         evaluation problem.


1      Introduction
1.1     Coalitional Game Theory
Coalitional games provide a rich mathematical framework to analyze interactions be-
tween intelligent agents. We consider coalitional games of the form G = hN, vi, con-
sisting of a set N of n agents and a characteristic function v. The latter maps each
coalition C ⊆ N to the worth that agents in C can obtain by collaborating with each
other. In this context, the crucial problem is to find a mechanism to allocate the worth
v(N ), i.e., the value of the grand-coalition N , in a way that is fair for all players and that


                                                27
F.Lupia et al. Computing the Shapley value in allocation problems

  additionally satisfies some further important properties such as efficiency: we distribute
  precisely the available budget v(N ) to players (not more and not less). Moreover, for
  stability reasons, it is usually required that every group of agents C gets at least the
  worth v(C) that it can guarantee to the game.
      Several solution concepts have been considered in the literature as “fair allocation”
  schemes and, among them, a prominent one is the Shapley value [16]. According to
  this notion, the worth of any agent i is determined by considering its actual contribution
  to all the possible coalitions of agents. More precisely, it is considered the so-called
  marginal contribution to any coalition C, that is, the difference between what can be
  obtained when i collaborates with the agents in C and what can be obtained without the
  contribution of i. More formally, the Shapley value of a player i ∈ N is defined by the
  following weighted average of all such marginal contributions:

                            X        |C|!(n − |C| − 1)!                   
               φi (G) =                                   v(C ∪ {i}) − v(C) .
                                             n!
                          C⊆N \{i}




  1.2   Allocation Games

  Among the various classes of coalitional games, we focus in this paper on allocation
  games, which is a setting for analyzing fair division problems where monetary compen-
  sations are allowed and utilities are quasi-linear [11]. Allocation games naturally arise
  in various application domains, ranging from house allocation to room assignment-
  rent division, to (cooperative) scheduling and task allocation, to protocols for wireless
  communication networks, and to queuing problems (see, e.g., [4,9,10,11,13] and the
  references therein).
      Computing the Shapley value of such games is a difficult problem, indeed it is #P-
  hard even if goods can only have two different possible values [1]. In this paper we
  focus on large instances of this problem, involving thousands of agents and goods,
  for which no algorithm described in the literature is able to provide an exact solution.
  There are however some promising recent advances that identify islands of tractability
  for the allocation problems where at most one good is allocated to each agent: it has
  been recently shown that those instances where the treewidth of the agents’ interaction-
  graph is bounded by some constant (i.e., have a low degree of cyclicity) can be solved
  in polynomial-time [1]. The result is based on recent advances on counting solutions of
  conjunctive queries with existential variables [3]. Unfortunately, if the structure is quite
  cyclic this technique cannot be applied to large instances, because its computational
  complexity has an exponential dependency on the treewidth.
      In some applications, one can be satisfied with some approximations of the Shapley
  value. With this respect, things are quite good in principle, since we know there exists
  a fully polynomial-time randomised approximation scheme to compute the Shapley
  value in supermodular games [8]. The algorithm can thus be tuned to obtain the desired
  maximum expected error. However, not very surprisingly, for very large instances one
  has to consider a huge number of samples, in order to stay below a reasonable expected
  error.


                                              28
F.Lupia et al. Computing the Shapley value in allocation problems

  1.3   Contribution

  In order to attack large instances of allocation problems, we start by proving some
  useful properties of these problems that allow us to decompose instances into smaller
  pieces, which can be solved independently. Moreover, some of these properties identify
  cases where the computation of the worth function can be obtained in a very efficient
  way.
      With these properties, we are able to use the randomised approximation algorithm
  even on instances that (when not decomposed) are very large.
      Furthermore, we note that in some applications one may prefer to determine a guar-
  anteed interval for the Shapley value, rather than one probably good point. Therefore,
  we propose algorithms for computing a lower bound and an upper bound of the Shap-
  ley value for allocation problems. In most cases the distance between the two bounds
  is quite small, and often they coincide, which means that we actually computed the ex-
  act value. We also used these algorithms together with the approximation algorithm, to
  provide a more accurate evaluation of the maximum error of the randomised solution,
  for the considered instances.

  1.4   The Case Study

  The proposed techniques have been tested on large real-world instances of the VQR2011-
  2014 Italian research assessment exercise: Every Italian research structure R has to se-
  lect some research products, and submit them to an evaluation agency called ANVUR.
  While doing so, the structure R is in competition with all other Italian research struc-
  tures, as the outcome of the evaluation will be used to proportionally transfer the funds
  allocated by the Ministry to support research activities in the next years (until the sub-
  sequent evaluation process). Every structure R is therefore interested in selecting and
  submitting its best research products. For the sake of simplicity, we next simply speak
  of publications instead of research products (which can also be patents, books, etc.),
  and of universities and departments instead of structures and substructures (which can
  be other research subjects). The programme is articulated in two phases: (1) Based on
  the self-evaluations4 of the authors, R selects and submits to ANVUR (at most) two
  publications for each one of its authors5 , in such a way that any product is formally
  associated with at most one author. (2) ANVUR formulates its independent quality
  judgment about the submitted publications (the score assigned to each publication is
  made known only to its authors), and the sum of their “true” scores (i.e., those resulting
  by ANVUR actual evaluation) is then the VQR score of R. Eventually, R will receive
  funds in subsequent years proportional to this score. Furthermore, for the VQR2004-
  2010 research assessment exercise (data for VQR2011-2014 have not been released
  yet) ANVUR also published an evaluation of all departments, based on the product
  scores (the score of each department was computed as the sum of the scores of the
  products formally assigned to the authors in that department). Finally, the scores were
   4
     To be more precise, ANVUR provided reference tables to assist in the evaluation of products,
     at least for some research areas. To our ends, this detail is immaterial.
   5
     There are exceptions to this rule: in specific circumstances, fewer than two publications are
     expected for some authors. To our ends, this detail is immaterial.



                                               29
F.Lupia et al. Computing the Shapley value in allocation problems

  also used for evaluating individual researchers that had been recently hired by R (this
  also greatly influenced R’s funds in subsequent years), as well as those researchers that
  were members of PhD committees. Scores for recently hired researchers were com-
  puted as the sum of the scores of the products formally assigned to them; data in this
  respect were published by ANVUR in aggregated form only, for each department and
  for each scientific disciplinary sector. Evaluations for researchers that were members of
  PhD committees were computed as the sum of the scores of the best publications each
  one of them had coauthored, among all the publications submitted for the VQR (for this
  evaluation, the formal assignment of publications to authors was irrelevant); data in this
  respect were published by ANVUR in aggregated form only, for each PhD committee.
      The way ANVUR currently uses product scores, for the purposes described above,
  yields evaluations that do not satisfy the desirable properties outlined in Section 4. In
  order to deal with this issue, we have modeled the problem as an allocation game [2],
  with a fair way to divide the total score of the university among researchers, groups, de-
  partments based on the Shapley value. The proposed rule enjoys many desirable prop-
  erties, such as the independence of the specific allocation of research products, the in-
  dependence of the preliminary (optimal) products selection, the guarantee of the actual
  (marginal) contribution, and so on. Unfortunately, due to the high computational com-
  plexity of the approach we propose, it proved unfeasible to compute the desired alloca-
  tion for our largest test case (namely, the researchers of Sapienza University of Rome
  who participated in the research assessment exercise VQR2011-2014), even using the
  approximation algorithms known in the literature. We are therefore proposing improve-
  ments and modifications to the approximate algorithms, in an attempt to solve this prob-
  lem. Our approach has yielded some very encouraging results (even though, as reported
  in Section 7.2, we are still not able to compute, or even approximate within acceptable
  bounds, an allocation for a non–negligible fraction of the agents in the Sapienza test
  case), outlining a promising line of research.


  2   Preliminaries

  In the setting considered in this paper, a game is defined by an allocation scenario
  A = hN, G, Ω, val, ki comprising a set of agents N and a set of goods G, whose
  values are given by the real function val. The function Ω associates each agent with
  the set of goods (s)he is interested in. Moreover, the natural number k provides the
  maximum number of goods that can be assigned to each agent. Goods are indivisible
  and unsharable.
      For a coalition C ⊆ N , a (feasible) allocation πA [C] is a mapping from C to sets
  of goods from G such that: each agent i ∈ C gets a set of goods πA (i) ⊆ Ω(i) with
  |πA (i)| ≤ k, and πA (i) ∩ πA (j) = ∅, for any other agent j ∈ C (each good can be
  assigned to one agent at most).
      We denote by S img(πA [C]) the set of all goods in the image of πA [C], that is,
  img(πA [C]) = i∈C πA [C](i). With a slight abuse of notation, we denote by val(S)
  the sum of all the values of a set of goods S ⊆ G, and by val(πA [C]) the value
  val(img(πA [C])). An allocation πA [C] is optimal if there exists no allocation πA
                                                                                   0
                                                                                     [C]
  with val(πA [C]) > val(πA [C]). The total value of such an optimal allocation for
              0




                                            30
F.Lupia et al. Computing the Shapley value in allocation problems

  the coalition C is denoted by optA (C). The budget available for A, also called the
  (maximum) social welfare is optA (N ), that is, the value of any optimal allocation for
  the whole set of agents N (the grand-coalition). The coalitional game defined by the
  scenario A is the pair hN, optA i, that is, the game where the worth of any coalition
  is given by the value of any of its optimal allocations. Note that optA (C) ≥ 0 holds,
  for each C ⊆ N , since the allocation where no agent receives any goods is a feasible
  one (the value of an empty set of goods is 0). The definition trivializes for C = ∅, with
  optA (∅) = 0.


                                                          3        2        1        1   3        2        1        1



                      g1        g2        g3        g4        a1       a2                    a1
                  3             2         1         1     3        2        1        1   3        2        1        1




                                                              a1                a3                    a2
                                                          3        2        1        1   3        2        1        1
                           a1        a2        a3
                                                                       a2       a3                             a3



                            Fig. 1: Allocation scenario A0 in Example 1.


  Example 1. Consider the allocation scenario A0 , depicted in a graphical way in Fig-
  ure 1. We have a set {g1 , g2 , g3 , g4 } of goods that have to be allocated to three agents.
  For simplicity assume that it is possible to allocate just one good to each agent, there-
  fore each edge connects an agent to a good she is interested in. Edges in bold identify
  an optimal allocation, i.e., a feasible allocation whose sum of values of the allocated
  goods is the maximum possible one. The value of this allocation is val(g1 )+val(g2 )+
  val(g3 ) = 3 + 2 + 1 = 6.
      For each C ⊂ {1, 2, 3} with C 6= ∅, an optimal allocation restricted to the agents
  in C is also reported. Then, the associated coalitional game is GA0 = h{1, 2, 3}, vA0 i,
  where vA0 ({1, 2, 3}) = 6, vA0 ({1, 2}) = 5, vA0 ({1, 3}) = vA0 ({2, 3}) = 4, vA0 ({1}) =
  vA0 ({2}) = 3, and vA0 ({3}) = 1.                                                          
      For any allocation scenario A = hN, G, Ω, val, ki, we define the agents graph as
  the undirected graph G(A) = (N, E) such that {i, j} ∈ E if there is a good g ∈
  Ω(i) ∩ Ω(j).
      Notice that, in this specific case, the agents graph contains an edge between two
  agents (authors) if they are coauthors of at least one publication (we could call it the
  collaboration graph).


  3   The VQR Allocation Game
  Note that the VQR research assessment exercise can be naturally modeled as an alloca-
  tion scenario A = hR, P, products, val, 2i where R is the set of researchers affiliated
  with a certain university R, P is the set of publications selected by R for the assess-
  ment exercise, products maps authors to the set of publications they have written, and


                                                         31
F.Lupia et al. Computing the Shapley value in allocation problems

  val assigns a value to each publication. In the current VQR programme (covering years
  2011-2014), the range of val is {0, 0.1, 0.4, 0.7, 1}, with the latter value reserved to the
  excellent products.
       In the submission phase, the values are estimated by the universities according to
  criteria published by ANVUR. The final result may sometimes be different, in particular
  for those publications that undergo a peer-review process by experts selected by AN-
  VUR, which clearly introduces a subjective factor in the evaluation. This is however
  immaterial for the purpose of this paper, so that we assume that the values in the pre-
  liminary phase do coincide with the final ANVUR evaluation for those products. At the
  end of the program, R will receive an amount of funds proportional to VR = val(P),
  that is, to the considered measure of the quality of the research produced by the univer-
  sity R. The first combinatorial problem, which is easily seen to be a weighted matching
  problem, is to identify the best allocation scenario for the university. That is, to select a
  set of publications P to be submitted, having the maximum possible total value among
  all those authored by R in the considered period.

                          p1         p2       p3       p4       p5       p6       p7       p8

                      7        10         6        7        6        8        7        7




                                    r1                 r2                         r3


                           Fig. 2: Authors and products in Example 2.

  Example 2. Let us consider the weighted bipartite graph in Figure 2, whose vertices
  are the researchers R = {r1 , r2 , r3 } of a university R and all the publications they
  have written. Edges encode the authorship relation, and weights encode the publica-
  tions values. Consider the allocation ψ such that ψ(r1 ) = {p1 , p3 }, ψ(r2 ) = {p2 , p4 },
  and ψ(r3 ) = {p6 , p7 }, encoded by the solid lines in the figure. Then, the set of pub-
  lications submitted for the evaluation is Pψ = {p1 , p2 , p3 , p4 , p6 , p7 }. Note that p2 is
  co-authored by r1 , r2 , and r3 , while p3 is co-authored by r1 and r2 . According to the
  values of each product , we get opt(R) = 45.                                                
       The problem that we face is how to compute, from the total value obtained by R, a
  fair score for individual researchers, or groups, or departments, and so on. As mentioned
  above, product scores are currently used for evaluating the hiring policy of universities
  and the PhD committees, and it seems that such scores will contribute to evaluate the
  quality of teaching, too. Unfortunately, as we will argue, this is currently done in a
  way that fails to satisfy the properties that we outline below. Instead, following [2], we
  propose to use the Shapley value of the allocation game defined by the scenario selected
  by the given structure R as the division rule to distribute the available total value (or
  budget) to all the participating agents. For the allocation scenario in Example 2, we get
  φr1 = 29 2 , φr2 = 2 , and φr3 = 16. Notice that the Shapley value is not a percentage
                      29

  assignment of publications to authors, but takes into account all possible coalitions of
  agents.


                                                       32
F.Lupia et al. Computing the Shapley value in allocation problems

      We will now recall the main desirable properties enjoyed by this solution. We re-
  fer the interested reader to [2] for a more detailed description and discussion of these
  properties.
       Budget-balance.P  The division rule precisely distributes the VQR score of R over
  all its members, i.e., r∈R φr = VR .
       Fairness. The division rule is indifferent w.r.t. the specific optimal allocation used
  to submit the products to ANVUR. In particular, the score of each researcher is inde-
  pendent of the particular products assigned to him in the submission phase; moreover,
  it is independent of the specific set of products P selected by the university, as long as
  the choice is optimal (i.e., with the same maximum value VR ).
     Marginality.
        P            For any group of researchers S ⊆ R, φS ≥ marg(S, R), where
  φS = i∈φS φi and marg(S, R) = opt(R) − opt(R \ S). That is, every group is
  granted at least its marginal contribution to the performance of the grand-coalition R.
     We remark the importance of the fairness property, as the choice of a specific opti-
  mal set of products is immaterial for R, but it may lead to quite different scores for indi-
  viduals (and for their aggregations). Indeed, this happens for the division rules adopted
  by ANVUR for the evaluation of both departments and newly hired researchers (see
  Section 1.4), which are not based on the Shapley value. Budget-balance, on the other
  hand, is violated by the division rule for evaluating researchers who are members of
  PhD committees.


  4   Useful properties to deal with large real-world problems

  Recall that computing the Shapley value is #P-hard for many classes of games (see,
  e.g., [6,5,7,12]), including the allocation games, even if goods may have only two pos-
  sible values [4]. In our case study, where agents are researchers and goods are research
  products, for large universities we may have to deal with thousands of agents and goods.
  The brute-force approach would need to compute, for each agent, 2|N | optimization
  problems, which is clearly infeasible for our application.
      In this section we describe some properties of the Shapley value, in particular for
  allocation problems, which allow us to solve some large real-world instances or at least
  to obtain good approximations for them. Let us consider in this section an allocation
  game G = hN, vi, whose agents graph is G = (N, E).

  Fact 1 (Modularity) Let C1 and C2 be two sets of agents that induce two (disjoint)
  connected components of the agents graph, and denote by G1 = hC1 , v1 i (resp., G2 =
  hC2 , v2 i) the coalitional game restricted to agents in C1 (resp., C2 ). Then, for each
  agent i ∈ N , φi (G) = φi (G1 ) + φi (G2 ).

  Proof. It follows by the additivity property of the Shapley value: indeed, for each coali-
  tion C ⊆ N , we have v(C) = v1 (C ∩ C1 ) + v2 (C ∩ C2 ).

      Modularity is an important properties when dealing with large instances, because it
  states that we can solve in a separate way the problems associated with each component.


                                             33
F.Lupia et al. Computing the Shapley value in allocation problems

  The next properties that we point out below try to leverage such property by removing
  agents interactions having no actual impact on the game output.
      Recall that we assumed that all goods are non-negative. Then the following property,
  which trivially holds, says that we can always assume that any good with value 0 is not
  shared by agents.

  Fact 2 (No shared null goods) Let B the set of goods having value 0. By removing
  all such goods from G we get an equivalent allocation game. Also, if it is useful in
  algorithms, we can add any number of goods having no value to each agent, with all
  such goods being of interest to one agent only.

  Fact 3 (Disconnected agent) Consider an agent i and a coalition C that is discon-
  nected from i, that is, such that Neigh(i) ∩ C = ∅. Then, opt({i} ∪ C) = opt({i}) +
  opt(C).

  Theorem 4 (Separability). Let C be any coalition such that opt(C) + opt(N \ C) ≤
  opt(N ). Then, we can define two allocation scenarios over agents C and N \ C that
  can be solved separately, that is, such that, for each player i ∈ N , its Shapley value on
  the restricted allocation scenario where i occurs is the same as in the original game.

  Proof. (Sketch.) Denote N \ C by C̄, and consider the games G1 = hC, v1 i and G2 =
  hC̄, v2 i restricted to agents in C and C̄, respectively. Since the allocation games are
  supermodular [4], we have opt(C) + opt(C̄) ≥ opt(N ), which entails that, for the
  considered coalition, opt(C) + opt(C̄) is actually equal to opt(N ). This means that
  the values of the goods not used in any optimal allocation for C̄ is equal to the sum of
  the values of the best goods for the agents in C.
      We claim that in this case, for each optimal allocation π for N , a set of goods
  S ⊆ Ω(C) with val(S) = opt(C) is assigned to players in C, so that both C and C̄
  get their best goods according to π. Assume that C gets a value v = val(S) ≤ opt(C),
  while C̄ gets a value v̄ ≤ opt(C̄). We know that opt(C) + opt(N \ C) = opt(N )
  and, by the optimality of π, v + v̄ = opt(N ), from which the claim follows.
      Consider now any coalition C 0 ⊆ N , and let Ca = C 0 ∩ C and Cb = C 0 ∩ C̄.
  Let π 0 be an optimal allocation for C 0 . We claim that there is an optimal allocation πa
  mapping goods from S to C with valπa (Ca ) = valπ0 (Ca ), and an optimal allocation
  πb mapping goods not in S to C̄ with valπb (Cb ) = valπ0 (Cb ). Assume this is not
  the case. Then at least one of those allocations lead to values smaller than those in π 0
  (note that π 0 cannot be worse, because the union of the two restricted allocations is
  a valid candidate mapping for C 0 ). Assume Ca gets smaller values (the other case is
  symmetrical), that is, valπa (Ca ) < valπ0 (Ca ). Then, there exists some agent i and a
  good p ∈ / S so that p ∈ π 0 (i). By using Theorem 4.4 in [4], we can show that this would
  contradict the fact that val(S) = opt(C). Intuitively, goods such as p that are shared
  with agents outside C and that allows us to get a better value for the agents in Ca ⊆ C,
  could be used to improve the choice of the available goods S for the full set C.
      The statement then follows easily by the additivity of the Shapley value, by con-
  sidering two allocation scenarios restricted to C with the goods in S and to C̄ with the
  goods outside S, respectively.


                                            34
F.Lupia et al. Computing the Shapley value in allocation problems

     As a special case, when C is a singleton {i}, we can immediately drop from the
  game such an agent i for which the value of her best goods is equal to her marginal
  contribution to the grand-coalition. Note that we also immediately get φ(i) = opt({i}).


  5   Approximating the Shapley Value

  In order to approximate the Shapley value, we use the Fully Polynomial-time Random-
  ized Approximation Scheme (FPRAS) proposed in [8]: for any  > 0 and δ > 0,
  it is possible to compute in polynomial-time an −approximation of the Shapley value
  with probability of failure at most δ. The technique works for supermodular and mono-
  tone coalitional games, and it can be shown that our allocation games indeed meet these
  properties [4].
       The method is based on generating a certain number of permutations (of all agents)
  and computing the marginal contribution of each agent to the coalition of agents occur-
  ring before her (him) in the considered permutation. Then the Shapley value of each
  player is computed as the average of all such marginal contributions. The above proce-
  dure is repeated O(log(1/δ)) times, in indepedent runs, with the result for each agent
  consisting of the median of all computed values for her (him). Finally, the obtained val-
  ues are scaled (i.e., they are all multiplied by a common numerical factor) to ensure that
  the budget-balance property is not violated.
       Clearly enough, the more permutations are considered, the closer to the Shapley
  value the result will be. We next report a slightly modified version of the basic procedure
  of this algorithm, where we avoid the computation of some marginal contributions, if
  we can obtain the result by using Fact 3.
       As a preliminary, we compute the required number of permutations m to meet the
  required error guarantee. In each of the m iterations, the algorithm generates a random
  permutation from the set of agents N . We then iterate through this permutation and
  compute the marginal contribution of each agent j to the set of agents C occurring
  before j in the permutation at hand. If some neighbour of j in the agents graph occurs
  in C, the algorithm proceeds as usual by resolving the optimisation problem to compute
  vA (C ∪ {j}) (i.e., to get the value opt(C ∪ {j}) of an optimal allocation for C ∪
  {j}). Note that to get the desired marginal contribution we need to solve one allocation
  problem only, because the value opt(C) for all the preceding agents is known from the
  previous step. Moreover, by Fact 3, we know that for those permutations in which all
  the players in Neigh(j) follow j, the marginal contribution of j is just opt({j}) (see
  step 10). Finally, at steps 16–18 for each agent the algorithm divides the sum of her
  contributions by the number of performed iterations m. The correctness of the whole
  algorithm follows from Theorem 4 in [8].
  Computation Time Analysis. Let n = |N | be the number of agents, and let m be the
  required number of iterations. The cost of the algorithm is O(m × n × margBlock),
  where margBlock denotes the cost of computing each marginal contribution (steps
  7–11). This requires the computation of an optimal weighted matching in a bipartite
  graph, which is feasible in O(n3 ), via the classical Hungarian algorithm. However, if
  the current agent is disconnected from the rest of the coalition, the cost is simply given
  by the lookup in the cache where the best allocation for each single agent is stored.


                                             35
F.Lupia et al. Computing the Shapley value in allocation problems

  Algorithm 1 Shapley value approximation in allocation games
  Input: An allocation game GA = hN, vA i;
  Parameters: Real numbers 1 >  > 0 and 1 > δ > 0;
  Output: A vector φ̃ that is an -approximation of the Shapley value of GA , with probability 1−δ;
   1: m = |N |·(|N
                 δ·2
                      |−1)
                           ;
   2: i = 0;
   3: while i < m do
   4:     shuffle(N );
   5:     C := {∅};
   6:     for all j ∈ N do
   7:         if Neigh(j) ∩ C 6= ∅ then
   8:               φ̃j += vA (C ∪ {j}) − vA (C);
   9:         else
  10:               φ̃j += vA ({j});
  11:         end if
  12:         C := C ∪ {j};
  13:         i = i + 1;
  14:     end for
  15: end while
  16: for all j ∈ N do
                  φ̃
  17:     φ̃j = mj ;
  18: end for
  19: return φ̃;



  6    Lower and Upper Bounds for the Shapley value

  In this section we describe the computation of exact (not probabilistic) bounds for the
  Shapley value of any given allocation game GA = hN, vA i. Preliminarily recall that
  two very simple bounds are easily obtained by looking at the definition of the Shapley
  value: for each player i, marg({i}, N ) ≤ φi ≤ opt({i}).
       To obtain tighter bounds we observe that, intuitively, the neighbours of i in a coali-
  tion C are the agents having the higher influence on the marginal contribution of i to C.
  Indeed, they are precisely those agents interested in using the goods of i when (s)he does
  not belong to the coalition. We already observed that in the extreme case that no neigh-
  bours are present then i contributes with all her (his) best goods. Therefore, the first
  idea is to consider the power-set of Neigh(i) to see what happens for every combina-
  tion of these agents. The second observation is that, for each pair of coalitions C1 ⊆ C2 ,
  marg({i}, C2 ) ≤ marg({i}, C1 ). Putting it all together, we proceed as follows. Let P 0
  be a set of neighbors of i, let Z = Neigh(i) \ P 0 , let C = N \ (Neigh(i) ∪ {i}) and
  let l = |C|. We compute the marginal contribution of i to C ∪ P 0 , but use this same
  value for the marginal contributions of i to each coalition of the form C 0 ∪ X, for
  any C 0 ⊆ C. This is performed by multiplying such contribution by a factor y com-
            Pl             0               
  puted as k=0 (l−k+|P|N|)!·(|Z|+k)!
                             |!       · kl . Thus, for each element P 0 of the power set of
  the neighbours, we consider the worst situation for i where all other agents in C are
  present.


                                               36
F.Lupia et al. Computing the Shapley value in allocation problems

      The case of the upper bound is obtained in the dual way, by using instead the most
  favorable case where only the neighbours in P 0 belong to the considered
                                                                          coalitions
                                                                                     (i.e.,
  C = ∅). We also use the same factor y, by using the property kl = l−k  l
                                                                              .
      Then, we can show the next algorithm compute, for each agent i, a lower bound
  LBi and an upper bound U Bi for her/his Shapley value.


  Algorithm 2 Computing Bounds for the Shapley Value in Allocation Games
  Input: An allocation game GA = hN, vA i;
  Output: A pair of vectors (LB, U B) encoding, respectively, a lower bound and an upper bound
  of the Shapley value of GA ;
   1: for all i ∈ N do
   2:     P := P owerset(Neigh(i));
   3:     C := N \ (Neigh(i) ∪ {i});
   4:     l = |C|;
   5:     for all P 0 ∈ P do
   6:          Z := Neigh(i) \ P 0 ;
                   P             0                
   7:          y = lk=0 (l−k+|P|N|)!·(|Z|+k)!
                                     |!
                                              · kl ;
   8:          LBi += y · (vA (C ∪ P 0 ∪ {i}) − vA (C ∪ P 0 ));
   9:          U Bi += y · (vA (P 0 ∪ {i}) − vA (P 0 ));
  10:     end for
  11: end for
  12: return (LB, U B);




  Computation Time Analysis. The for loops in steps 5–10 show that both algorithms are
  exponential in the maximum degree of the agents graphs, that is, maxi∈N |Neigh(i)|.
  In practice, after the simplifications that we perform on the instance, these sets are not
  large and the bounds can be computed in a reasonable time (see the next section).


  7     Implementation Details and Experimental Evaluation

  7.1   Parallel Implementation of the Approximate Algorithm

  We found the approximate Shapley value algorithm of Niben-Nowell et al. [8] (we
  briefly discussed their approach in Section 5) to be amenable to parallel implemta-
  tion. We engineered our parallel implementation as follows. Besides the input alloca-
  tion game, and the two parameters δ and , we added a third parameter, the thread pool
  size. During the execution of the algorithm, each thread (there are as many threads as the
  thread pool size dictates) is responsible for generating a certain number of permutations
  according to the requested approximation factor and, for each permutation, it computes
  the marginal contributions of all authors to that permutation, and saves them in a local
  cache. When a thread has generated its assigned number of permutations, it delivers its
  local cache of computed scores to a synchronized output acceptor (which increments
  the overall score of each author accordingly), and then shuts itself down, as its work is


                                             37
F.Lupia et al. Computing the Shapley value in allocation problems

  completed. When all threads have shut down, each entry of the acceptor’s output vec-
  tor is averaged over the total number of permutations, yielding the final approximate
  Shapley vector for that run. The above procedure is repeated for each independent run.
  When all runs are done, the component-wise median of all final approximate Shapley
  vectors is computed, and the resulting vector is scaled (i.e., all entries are multiplied by
  a number such that the Shapley values budget balance property is enforced), yielding
  the desired approximation with the desired probability.


  7.2   Experimental Results

  Hardware and software configuration. Experiments have been performed on two ded-
  icated machines. In particular, sequential algorithms were run on a machine with an
  Intel Core i7-3770k 3.5 GHz processor, 12 GB (DDR3 1600 MHz) of RAM, and oper-
  ating system Linux Debian Jessie. We tested the parallel implementation on a machine
  equipped with two Intel Xeon E5-4610 v2 @ 2.30GHz with 8 cores and 16 logical
  processors each, for a total of 32 logical processors, 128 GB of RAM, and operating
  system Linux Debian Wheezy. Algorithms were implemented in Java, and the code
  was executed on the JDK 1.8.0 05-b13, for the Intel Core i7 machine, and on the Open-
  JDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-1 deb7u1), for the Intel Xeon
  machine.
  Dataset description. We applied the algorithms to the computation of a fair division
  of the scores for the researchers of Sapienza University of Rome who partecipated in
  the research assessment exercise VQR2011-2014. Sapienza contributors to the exer-
  cise were 3562 and almost all were required to submit 2 publications for review. We
  computed the scores of each publication by applying, when available, the bibliographic
  assessment tables provided by ANVUR.
  Preprocessing. The analysis was carried out by preliminarily simplifying the input using
  the properties discussed in Section 4, as explained next.
      Starting with a setting with 3562 researchers and 5909 publications, first we re-
  moved each researcher having no publications for review. After this step a total of
  370 authors were removed. Then, by exploiting the simplification described in Fact 2,
  we removed 2323 publications. By using Theorem 4, the graph was subsequently fil-
  tered removing each author whose marginal contribution to the grand coalition co-
  incides with the optimal allocation restricted to the author himself. After this step
  2427 researchers out of 3562 were removed. Then we divided the resulting agents
  graph into connected components obtaining a total number of 156 connected com-
  ponents and we discovered only two components consisting of more than 10 agents.
  The sizes of these components are 691 and 15. Eventually, the components were fur-
  ther decomposed as follows: For any agent i, a good g is removed from its set Ω(i) if
  val(g)+maxg0 ∈Ω(i)\{g} val(g 0 ) < v(N )−v(N \{i}). After the whole preprocessing
  phase, we obtained a total of 159 connected components with the largest one having 685
  nodes. The size of the second largest is just 15 while all the others remain very small
  (less than 10 nodes). In the rest of the section, we shall illustrate results of experimental
  activity conducted over the various methods. To this end, we fixed the value δ = 0.01.


                                              38
F.Lupia et al. Computing the Shapley value in allocation problems

                210
                200
                190
                180
                170
                160
                150
                140
                130
                120
                110
       Values




                100
                 90
                 80
                 70
                 60
                 50
                 40
                                                                                                  LB
                 30
                                                                                                  randomised
                 20                                                                               exact
                 10                                                                               UB
                  0
                      a10   a4   a3     a1   a11   a13   a14     a6     a7   a2   a8   a9   a12   a0   a5
                                                               Agents



                                      Fig. 3: Methods comparison (n = 15)


  The value was chosen heuristically based on a series of tests conducted on various CNR
  Areas of Sapienza, where CNR Areas are (large) scientific disciplines such as Math and
  Computer Science (Area 01) or Physics (Area 02).
  Test with components of variable size. As we have already pointed out, after the prepro-
  cessing step we obtained very small connected components (less than 10 nodes) except
  for the first two (685 and 15 nodes, respectively). For all components with less than 10
  nodes, the exact algorithm (of which we used the sequential implementation) tended to
  perform very well (a few milliseconds), therefore we omit the analysis here. In order to
  test all the other algorithms, besides the two largest components, we randomly extracted
  a sample of (distinct) nodes out of the original graph, to produce different subgraphs
  with size |N | ∈ {23, 26, 30, 40}.
       For the the considered cases we do not find significant enough differences between
  the randomized method and the exact one, see, e.g., figures 3 and 4. Notably, apart from
  a small number of cases, our bounds (especially the lower bounds) are always very
  close to the exact value. In particular, for n = 26 we were able to get immediately the
  Shapley value for all agents, since upper and lower bounds coincide for all of them.
  Moreover, it emerges that the randomized method is much better than the theoretical
  guarantee on the maximum approximation error. In the considered instances, the values
  obtained by using the randomized method are within the correct bounds.
       We also evaluated how many computations of optimal allocations have been avoided
  by considering whether or not an agent has neighbours before her in the permutation at


                                                          39
F.Lupia et al. Computing the Shapley value in allocation problems

               210
               200
               190
               180
               170
               160
               150
               140
               130
               120
               110
      Values




               100
                90
                80
                70
                60
                50
                40
                30
                                                                                                                    LB
                20                                                                                                  randomised
                10                                                                                                  UB
                 0
                     0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
                                                                    Agents



                                            Fig. 4: Methods comparison (n = 40)



  hands (and hence executing in the latter case Step 10 rather than Step 8 in Algorithm 1).
  By fixing the approximation error at  = 0.3, for each n ∈ {15, 23, 26, 30, 40} we get
  the following savings: 9.65 · 105 out of 3.5 · 106 (i.e., 28%), 2.34 · 106 out of 1.29 · 107
  (18%), 5.36·106 out of 1.87·107 (29%), 8.78·106 out of 2.9·107 (30%), and 1.46·107 out
  of 6.93 · 107 (21%), respectively. Finally, we also report the real maximum and average
  approximation errors (denoted with X and Y, respectively) of our implementation w.r.t.
  the exact algorithm for each n ∈ {15, 23, 26}. In particular, for n = 15 we get X = 0.01
  and Y = 3·10−3 , for n = 23 we get X = 1.5·10−3 and Y = 1.7·10−4 and, for n = 26 we
  get X = 1.06 · 10−4 and Y = 1.59 · 10−5 . In all the cases, the maximum approximation
  error was about 1% (or less) and therefore considerably below the theoretical guarantee
  (30%).
      We stress that we were not able to fully analyse the component with 685 nodes, as
  it would simply have taken too long. However, we hope to be able to do that in a future
  work, possibly developing more powerful preprocessing techniques. For the time being,
  we report that the computation of our upper and lower bounds, when applied to this
  component, allowed us to approximate the Shapley value of 581 agents within a 30%
  error bound, of which 362 agents are within a 10% error bound, and 204 within 5%.
  Moreover, we were able to compute the exact Shapley value for 53 agents.

  Running Times. Figures 5 and 6 report the computation time of the various algorithms
  In particular, Figure 5 focuses on the brute-force algorithm for computing the exact


                                                                     40
F.Lupia et al. Computing the Shapley value in allocation problems

                2.0                                                             12
                        exact
                        LB
                        UB                                                      10
                1.5
                                                                                8
     Time (h)




                                                                                    Time (s)
                1.0                                                             6

                                                                                4
                0.5
                                                                                2

                0.0                                                             0
                   15      23         26             30          40           52
                                        Instance size

  Fig. 5: Running times for the computation of the exact value by using the brute-force
  algorithm (green) and of the upper and lower bounds (blue) vs instances size



  values, and on the algorithms for the computation of the upper and lower bounds. For
  the experiments, we computed separately the two bounds in order to point out that the
  computation of the lower bound requires in general more time, because it considers
  allocation over larger coalitions than those considered for the computation of the upper
  bound. Moreover, as discussed in Section 6, the running times for computing the bounds
  heavily depends on the cardinality of agents’ neighbours. This explains why the running
  times for the case n = 52 are smaller than those for the case n = 40.
      Figure 6 shows the running time of the randomized parallel algorithm using 24
  threads, for different values of . In particular, for δ = 0.01 we performed five trials
  over the different (sub)games described above, and averaged measures are reported. We
  can see that for games of reasonable size we can achieve high theoretical approximation
  error guarantee. For instance, for the biggest considered game (n = 52) we were able
  to compute the approximate Shapley value with  = 0.1 (and δ = 0.01) in less than
  90 minutes. There is a big gap between the performances of the randomized algorithm
  when using the extreme values for the approximation error that we considered. How-
  ever, as already pointed out, even when we used a poor theoretical guarantee on the
  approximation error, we still got a quite reasonable accuracy.
      Notably, for the case n = 15 the exact algorithm is faster than the sequential ran-
  domized algorithm, while it is definitely slower than the lower and upper bound algo-
  rithms.


                                           41
F.Lupia et al. Computing the Shapley value in allocation problems

                                                                                                        89.12 min
                              n=52
                              n=40
                   80
                              n=30
                              n=26
                              n=23
                              n=15

                   60
      Time (min)




                   40
                                                                                                        37.82 min




                   20


                       1.08 min
                       1.64 min                                                                         8.53 min
                       0.41 min
                       0.25 min                                                                         4.60 min
                       0.21 min                                                                         3.74 min
                    0 0.06 min                                                                          0.41 min
                     90         80         70      60           50         40            30   20   10               5
                                                        Maximum allowable MC error (%)



                                     Fig. 6: Parallel implementation: running times vs 


  8    Future work

  We are working on further theoretical properties of the Shapley value in allocation
  problems. The goal is to get some more tools for simplifying the VQR real-world large
  instances.
      Moreover, we would like to extend the structure-based technique described in [1]
  to the more general class of games where more than one good can be allocated to each
  agent (as it is the case in VQR allocations). This way, we could compute efficiently the
  exact Shapley value for large games, provided that the treewidth of the agents graph
  is small. With this respect, we note that, by applying the simplifications described in
  this paper, all the connected components of the big VQR instance relative to Sapienza
  University of Rome have a small treewidth (at most 8) except for the largest one (com-
  prising 685 agents), which has treewidth at most 64. For instance, the component having
  52 agents has treewidth 5.
      Finally, we would like to obtain tighter lower and upper bounds, with a computa-
  tional effort that can be tuned to meet given time constraints.




                                                                 42
F.Lupia et al. Computing the Shapley value in allocation problems

  References
   1. G. Greco, F. Lupia, and F. Scarcello. Structural tractability of shapley and banzhaf values in
      allocation games. In Proc. of IJCAI’15, pages 547–553, 2015.
   2. G. Greco and F. Scarcello. Fair division rules for funds distribution: The case of the italian
      research assessment program (vqr 2004-2010). Intelligenza Artificiale, 7(1):45–56, 2013.
   3. G. Greco and F. Scarcello. Counting solutions to conjunctive queries: structural and hybrid
      tractability. In Proc. of PODS’14, pages 132–143, 2014.
   4. G. Greco and F. Scarcello. Mechanisms for fair allocation problems: No-punishment pay-
      ment rules in verifiable settings. J. Artif. Intell. Res. (JAIR), 49:403–449, 2014.
   5. H. Aziz and B. de Keijzer. Shapley meets shapley. In Proc. of STACS’14, pp. 99–111.
   6. Y. Bachrach and Y.S. Rosenschein. Power in threshold network flow games, pp. 106–132,
      2009.
   7. X. Deng and C.H. Papadimitriou. On the complexity of cooperative solution concepts. Math-
      ematics of Operations Research, 19:257–266, May 1994.
   8. D. Liben-Nowell, A. Sharp, T. Wexler, and K. Woods. Computing Shapley value in super-
      modular coalitional games. In Proc. of COCOON’12, pages 568–579. 2012.
   9. F. Maniquet. A characterization of the Shapley value in queueing problems. Journal of
      Economic Theory, 109(1):90–103, 2003.
  10. D. Mishra and B. Rangarajan. Cost sharing in a job scheduling problem. Social Choice and
      Welfare, 29(3):369–382, October 2007.
  11. H. Moulin. An application of the Shapley value to fair division with money. Econometrica,
      60(6):1331–49, November 1992.
  12. H. Nagamochi and Dao-Zhi Zeng and Naohisa Kabutoya and Toshihide Ibaraki. Complexity
      of the Minimum Base Game on Matroids Math. Oper. Res., 22(1):146–164, 1997.
  13. A. Iera and L. Militano and L. Romeo and F. Scarcello. Fair Cost Allocation in
      Cellular-Bluetooth Cooperation Scenarios EEE Transactions on Wireless Communications.,
      10(8):2566–2576, 2011.
  14. R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal
      of Computer and System Sciences, 79(6):984–1001, 2013.
  15. N. Robertson and P. Seymour. Graph minors iii: Planar tree-width. Journal of Combinatorial
      Theory, Series B, 36(1):49–64, February 1984.
  16. L. S. Shapley. A value for n-person games. Contributions to the theory of games, 2, 1953.




                                                43