<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>E ective Team Formation in Expert Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Morteza Zihayat</string-name>
          <email>mzihayat@ryerson.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aijun An$</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukasz Golaby</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehdi Kargar</string-name>
          <email>kargar@ryerson.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslaw Szlichta</string-name>
          <email>jaroslaw.szlichta@uoit.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ryerson University</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Ontario Institute of Technology</institution>
          ,
          <addr-line>Oshawa</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Given a project whose completion requires a set of skills, team formation is the problem of nding a set of experts who collectively cover all the required skills in a way that optimizes one or more business objectives. In this paper, we present a new framework for nding an e ective team from a network of experts. The proposed framework considers di erent business objectives to nd the best team to perform the given tasks. Experimental results on a real dataset show the e ectiveness of the proposed framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With the exponential growth of the Internet and Web 2.0 services, there are many
expert network providers (e.g., LinkedIn, DBLP and GitHub) that connect professionals
having specialized skills and experience. Such networks are one of the most popular
tools used by businesses seeking subject matter experts to complete a project.</p>
      <p>
        In recent years, there has been interest in nding teams of experts from such
networks [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ]. Given a project, the goal is to nd teams of experts who cover all the
required skills and also to optimize the communication cost among the team members
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The expert network is modeled as a graph where nodes represent experts and
each edge indicates prior collaboration between two experts. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], two functions are
proposed to compute communication costs. The rst function is the sum of the shortest
paths among experts in a team while the second function de nes the communication
cost as the diameter of the subgraph (team), where the diameter of a graph is the
largest shortest path between any two nodes in the network. Then, two algorithms are
proposed to discover teams minimizing the communication cost functions. In recent
years, several methods have been proposed to nd expert teams e ciently. However,
existing approaches do not consider other business objectives such as personnel cost,
experts' authority, etc. Therefore, they fail to discover e ective teams when there are
other important requirements.
      </p>
      <p>To motivate our approach and illustrate the shortcomings of existing methods,
assume that all the feasible teams of experts for a project are presented in Figure 1.
Each team is represented as a subgraph whose nodes are either skill holders (team
members who have the desired skills) or connectors. Existing methods discover teams
with minimal communication costs. Thus, among the four feasible teams presented in
Figure 1, Ta is selected since its communication cost is the lowest (5). However, other
teams can be more desirable if we consider other objective functions. For example, if
we want to nd a team with the minimum personnel cost, Tb is best since the budget
required to hire the team members is $62. Furthermore, experts may be associated
with authority metric such as the h-index or the number of publications. In this case,
we may want to discover a team with the maximum authority. Even if all the skill</p>
      <p>hR‐ainted:e $x8: 09 hR‐ainted:e $x1: 55 hR‐iantdee: x$:3 152 hR‐iantdee: x$:1 192
hR‐ainted:e $x8: 57 2 A 1 hR‐ainted:e $x9: 06 hR‐ainted:e $x2: 47 22 A 1 hR‐ainted:e $x2: 35 hR‐ainted:e $x2: 17 19 A 4 hR‐ainted:e $x1: 95 hR‐ainted:e $x3: 17 22h‐indAex: 551 hR‐ainted:e $x2: 95</p>
      <p>B 2 C B C B C B 3 2 C
12
( )
8
( )
( )
( D )
holders have the same authority (e.g., Tc and Td), Td may be preferable because its
connector (e.g., expert D) has a higher authority. More importantly, if we want a team
in which more than one of these objectives are optimized at the same time, there is
not an obvious best choice.</p>
      <p>
        We propose an e ective framework to solve the problem of team formation from
an expert network. Our framework considering objective functions which have not
been considered in previous studies. Particularly, we consider the personnel cost of
team members, the authority of skill holders and the authority of connectors. Then, we
discover teams of experts optimizing the above objective functions. We note that this
short paper is a summary of our published results in [
        <xref ref-type="bibr" rid="ref2 ref3 ref5 ref6">2, 3, 5, 6</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Team Formation Framework</title>
      <p>Let C = fc1; c2; : : : ; cmg be a set of m experts, and S = fs1; s2; : : : ; srg be a set of r
skills. An expert ci has a set of skills, denoted as S(ci), where S(ci) S. If sj 2 S(ci),
expert ci has skill sj . Furthermore, a subset of experts C0 C have skill sj if at least
one of them has sj . For each skill sj , the set of all experts having skill sj is denoted
as C(sj ) = fcijsj 2 S(ci)g. A project P S is a set of required skills. A subset of
experts C0 C covers a project P if 8sj 2 P 9 ci 2 C0; sj 2 S(ci).</p>
      <p>
        Given an expert network G and a project P that requires the set of skills fs1; s2; : : : ;
sng, a feasible expert team (FET) T is a connected subgraph of G whose nodes
cover P . With each team, we associate a set of n skill-expert pairs: fhs1; cs1 i; hs2; cs2 i;
: : : ; hsn; csn ig, where csj is an expert in T that has skill sj for j = 1; : : : ; n. Since
there may be many teams covering the required skills and some teams may not be
interesting, teams are ranked by their communication cost [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Suppose the edges
of a team T are denoted as fe1; e2; : : : ; etg. The communication cost of T is de ned as
CC(T ) = Pt
      </p>
      <p>i=1 w(ei), where w(ei) is the weight of edge ei. We proved that minimizing
the communication cost is an NP-hard problem by a reduction from 3-SAT.</p>
      <p>Search Algorithm. Given a project P and an expert network G, our
framework returns a subtree of G corresponding to a team with the lowest sum of edge
weights. It rst considers each expert cr as a potential root node for the subtree.
Then, to build a tree around cr, for each required skill si, the nearest skill holder is
selected (i.e., nearestExpert), that contains si. The nearestExpert is connected to
the current team, meaning that any additional nodes along the path from the root to
nearestExpert are also added. The tree with the lowest sum of edge weights is the
best team. Note that, when nding a team with the minimum communication cost,
edge weights in the input graph represent the shortest path among experts.</p>
      <p>Objective Functions. We want to nd a team whose members collaborate e
ectively and where another objective (e.g., the personnel cost of the team) is optimized.
In this situation, there is not an obvious best choice since there is a trade-o between
objectives (e.g., the personnel cost and the communication cost). Moreover, it is
possible that an objective function is de ned based on node weights (e.g., experts' cost).
To nd the best team, we transform the expert network G to G0 by moving all values
associated with experts (node weights) onto the edge weights and then running the
aforementioned method on the transformed graph to nd the best team of experts.
Below, we introduce new objective functions and we discuss how we build G0 based
on these objective functions.</p>
      <p>
        a) Experts' Authority. Suppose that the connectors of a team T (all nodes
excluding skill holders) are denoted as fc1; c2; : : : ; cqg. The connector authority of
T is de ned as CA(T ) = Piq=1 a0(ci) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We are also interested in optimizing the
authority of skill holders. Suppose that the skill holders of a team T are denoted as
fc1; c2; : : : ; cng. The skill holder authority of T is de ned as SA(T ) = Pn
i=1 a0(ci). To
build G0, we use a hybrid function as follows: CA-CC(T ) = CA(T )+(1 ) CC(T ).
In order to consider the authority of skill holders, we use the following hybrid function,
SA-CA-CC(T ) = SA(T ) + (1 ) CA-CC(T ) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        b) Personnel Cost. Let the set of experts in a team T be fc1; c2; : : : ; cqg. The
personnel cost of T is de ned as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: P C(T ) = Pq
i=1 t(ci) where t(ci) is the required
budget to hire expert ci. Given a team T of experts from graph G for a project and a
trade-o between the communication and personnel costs, the combined cost of T
with respect to G is de ned as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: PC-CC(T ) = (p 1)(1 ) P C(T ) + 2 CC(T ),
where p is the number of required skills.
      </p>
      <p>
        We also propose two other approaches to take personnel cost into account [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
rst approach is to consider a limit on one of the objectives and then nd the best
team based on the other objective. The second approach is to discover a set of teams
that are not worse than any other teams based on the objectives. These teams are
called Pareto-optimal teams. All these solutions are approximation algorithms and
have provable bounds (recall that the problem we solve is NP-hard).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Evaluation</title>
      <p>
        In this section, we use the proposed algorithm and various objective functions
explained above to implement our framework for team discovery which optimizes CC,
CA-CC, SA-CA-CC and PC-CC. We use various datasets including the DBLP XML
dataset1 to build an expert graph [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The algorithms are implemented in Java and
the experiments are conducted on an Intel(R) Core(TM) i7 2.80 GHz computer with
4 GB of RAM.
      </p>
      <p>For comparison, we also implemented Random, which randomly builds 10,000
teams and selects the one with the lowest value of SA-CA-CC (in Figure 3 (a)) or
PC-CC (in Figure 3 (b)), and Exact which performs an exhaustive search to nd an
(SA-CA-CC)-optimal or (PC-CC)-optimal solution. Figure 3 (a) illustrates the
SACA-CC scores of di erent objective functions for di erent values of (lower is better).
The projects used in the experiments are generated as follows. We set the number
of skills in a project to 4, 6, 8 or 10. For each number of skills, 50 sets of skills are
generated randomly, corresponding to 50 random projects. The average scores over
the 50 projects are computed for each objective function. According to Figure 3,
SACA-CC produces results that are close to those of Exact. Since all the functions use
the same algorithm, CC, CA-CC and SA-CA-CC have similar runtime. Figure 3 (b)
1 http://dblp.uni-trier.de/xml/
8
C
6C
‐
4
A
C
‐
2A
S
0</p>
      <p>SA‐CA‐CC</p>
      <p>Exact
6
4
C
C
‐
2
C
P
0
0.2
0.4
0.6
0.8
0.2
0.4
a
shows the average PC-CC cost values of teams for di erent functions for the DBLP
dataset. The results show that all of the algorithms outperform the Random method.
The results also suggest that the PC-CC method has the lowest cost values among
non-exact methods.</p>
      <p>We check if the top-5 teams returned by CC and SA-CA-CC were successful in
real life. To do so, we examined the rankings of the publication venues of these teams
according to the Microsoft Academic conference ranking. We used the DBLP dataset
up to 2015 for team discovery, and only consider papers published in 2016. We set
and to 0.6 and generate 5 di erent projects with four di erent skills. From the teams
that co-authored papers in 2016, we found that 78% of the time the teams found by
SA-CA-CC published in more highly-rated venues than those found by CC.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We studied the problem of nding teams of experts from an expert network in a way
that optimizes di erent objectives: the communication cost among team members, the
authority of skill holders and connectors, and the personnel cost. We proposed a series
of objective functions and methods to nd the best team of experts. In future work,
in order to nd the distance between two experts, we plan to investigate the use of
statistical methods (e.g., random walk with restart) instead of the shortest path.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Basu</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lakshmanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.V.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          : From group recommendations to group formation.
          <source>In: Proceedings of SIGMOD '15</source>
          , pp.
          <volume>1603</volume>
          {
          <issue>1616</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kargar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zihayat</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>E cient bi-objective team formation in social networks</article-title>
          .
          <source>In: Proceedings of ECML/PKDD'12</source>
          , pp.
          <volume>483</volume>
          {
          <fpage>498</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kargar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zihayat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Finding a ordable and collaborative teams from a network of experts</article-title>
          .
          <source>In: Proceedings of SDM'13</source>
          , pp.
          <volume>587</volume>
          {
          <fpage>595</fpage>
          .
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lappas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terzi</surname>
          </string-name>
          , E.:
          <article-title>Finding a team of experts in social networks</article-title>
          .
          <source>In: Proceedings of KDD'09, KDD '09</source>
          , pp.
          <volume>467</volume>
          {
          <fpage>476</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zihayat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kargar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szlichta</surname>
          </string-name>
          , J.:
          <article-title>Authority-based team discovery in social networks</article-title>
          .
          <source>In: Proceedings of EDBT'17</source>
          , pp.
          <volume>498</volume>
          {
          <issue>501</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zihayat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kargar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Two-phase pareto set discovery for team formation in social networks</article-title>
          .
          <source>In: In Proceedings of WI/IAT</source>
          , vol.
          <volume>2</volume>
          , pp.
          <volume>304</volume>
          {
          <fpage>311</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>