<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>uQantifying and Reducing Imbalance in Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yoosof Mashayekhi</string-name>
          <email>yoosof.mashayekhi@ugent.be</email>
          <email>yoosof.mashayekhi@ugent.be bo.kang@ugent.be Ghent University Ghent, Belgium</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jefrey Lijfijt</string-name>
          <email>jefrey.lijfijt@ugent.be</email>
          <email>jefrey.lijfijt@ugent.be tijl.debie@ugent.be Ghent University Ghent, Belgium</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bo Kang</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tijl De Bie</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <issue>2021</issue>
      <abstract>
        <p>Real-world data can often be represented as a heterogeneous network relating nodes of diferent types. For example, envision a network of the job market, where nodes may be job seekers, skills, and jobs, and where links to skill nodes could indicate having that skill (if linked to a job seeker) or having the skill as a requirement (for jobs). It can be relevant to consider the imbalance in such a network between the nodes of diferent types. In the example, this imbalance could correspond to the mismatch between supply and demand of jobs due to a mismatch in skills, an efect known as 'friction'. Identifying and reducing such friction is a problem of great economic and societal significance. We introduce a quantification of the imbalance in a network between two sets of nodes (nodes of diferent types, attributes, etc.) based on the embedding of a network, i.e., a real-valued vector space representation of the network nodes. Moreover, we introduce an algorithm named GraB (Graph Balancing) which ranks unconnected node pairs according to how well they would reduce the imbalance in a network if an edge were added between them. E.g., in the job network, GraB could be used to rank skills that job seekers do not yet have but could strive to acquire, to move them closer in the embedding towards an area where there is an abundance of jobs, and hence to reduce job market imbalance. We evaluated GraB on several datasets, including a job market network, and find that GraB outperforms baselines in reducing network imbalance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Graphs (or networks) are natural models for a wide range of
realworld structures [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], arising from e.g., social networks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], biology
(e.g., Protein-Protein interaction networks) [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and linguistics (e.g.,
word co-occurrence networks) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Network embedding provides
an eficient way to solve graph analytics problems by mapping
nodes into a real-valued space, which can later be used as an input
feature vector to a machine learning model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Using these vector
representations, machine learning methods can be applied on graph
datasets to perform graph analysis tasks such as link prediction
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], information difusion [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and node classification [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>An imbalance between two sets of nodes is an undesirable
phenomenon in some networks. This paper studies how to quantify
network imbalance using the embedding of a network and proposes
a method to reduce network imbalance by adding a limited number
of links to the network.</p>
      <p>Motivation: There are many networks for which it is be
desirable to minimize the imbalance between specific sets of nodes. Let
us consider an example of a job market network with three sets of
nodes job vacancies, job seekers, and skills, and where job
vacancies are connected to the skills they require, and job seekers are
connected to the skills they have and possibly to the job vacancies
they have shown an interest in.</p>
      <p>Imagine there are many Python developers seeking a job, and few
vacancies requiring Python programming, while there are many
vacancies requiring Java programming but few Java developers. As
a result, Java jobs would remain unfilled and many Python
programmers would remain unemployed. With an ever faster evolving
job market, such imbalances are increasingly common and serious,
harming job market eficiency and ultimately the economy. Thus,
quantifying such imbalances would provide policy makers with an
objective picture of the current state of afairs.</p>
      <p>Moreover, the ability to quantify imbalance also opens up the
possibility of trying to reduce it through targeted interventions
and incentives. While policy makers may not be able to influence
employers to shift their requirements, they can provide courses and
training material for specific worker profiles lacking sought-after
skills, to shift their area of expertise and better meet the demand of
the job market. In network terms, it is equivalent to adding a certain
number of links connecting job seekers (let us call this the set of
source nodes) to skills (auxiliary nodes), to reduce the imbalance
between job seekers (source nodes) and job vacancies (target nodes).</p>
      <p>Our approach: In a job market network, a matching with lowest
cost—where cost could be defined as the required training time
of employees in the company, or the efort a job seeker has to
make to be suited for a job—between job seekers and job vacancies
appears to be the ideal situation. However, matching job seekers
to connected jobs only is not always desirable, because a link may
only indicate a job seeker’s expressed interest in a job. Yet, we could
let job seekers be matched with jobs even if they are not connected.</p>
      <p>Formally. we denote an undirected network by  = ( , ), where
 and  are the sets of nodes and links respectively. Moreover, we
define three sets of nodes, namely source nodes  (e.g. job seekers
in the job market network), target nodes  (e.g. job vacancies), and
auxiliary nodes  (e.g. skills). Sets ,  , and  are three disjoint
subsets of  . Given such a network with sets  and  , we let every
node in  be matched with every node in  .</p>
      <p>
        Hence, in efect we model a complete bipartite network  ′ from
node sets  and  . We define the imbalance between  and  in
 as the cost of a so-called optimal fractional perfect b-matching
(see, e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) in  ′. Specifying the fractional perfect b-matching
requires the definition of the cost of matching any pair of nodes
from  and  in  ′. As the (inverse of the) distance between a pair
of nodes’ embeddings in a network embedding usually represents
some type of afinity between these nodes, we propose to let this
cost be based on the embedding distance of nodes in  . More details
are presented in Section 2.2.
      </p>
      <p>Next, we define the problem of adding a limited number of links
between sets  and  —in the original graph  —, to reduce the
imbalance between sets  and  . Adding links will change the
embedding of the network  and thus modify the cost of matching node
pairs in  ′. We propose a method called Graph Balancing (GraB) to
tackle this problem, which is based on a local approximation of the
imbalance that we may compute analytically, thus providing the
necessary scalability.</p>
      <p>Example: To understand the relevance of the embedding,
consider the sample network shown in Figure 1a. The figure shows
three clusters of nodes. In the top-left cluster, source and target
nodes are mixed, while the bottom-left cluster only contains nodes
in  while the top-right cluster only contains nodes in  . Our goal
is to quantify which links between  and  (the green nodes here)
would reduce the imbalance between  and  . Figure 1b shows the
network embedding after adding the top 200 links from GraB to the
network. Now,  and  are well-mixed in the network.</p>
      <p>
        Related work: Previous studies on imbalance in supply and
demand on the job market mostly focus on the factors influencing the
imbalance (such as retirement, salary, etc.) and do not consider the
imbalance between nodes individually (diferent jobs require people
with specific skills) [
        <xref ref-type="bibr" rid="ref27 ref29">27, 29</xref>
        ]. The literature on graph matching is
related to our work, as we also define the imbalance in a network
between two sets of nodes using the cost of a matching between
them. However, the focus in this paper is not on the computational
problem of identifying this matching, we simply use the cost of the
optimal fractional perfect b-matching as the imbalance measure.
These two relations are further discussed in Section 5.
      </p>
      <p>
        The main contributions of this paper are:
• We define the imbalance between two sets of nodes,  and  ,
in a network and propose the measure  (, ,  ) for
quantifying it, where we compute the cost of matching node pairs
using the embedding distances .
• We introduce the novel problem of reducing the imbalance
in a network by adding a limited number of links.
• We also propose a novel generic method, Graph Balancing
( GraB), to optimally select those links. Because this is
computationally challenging, we introduce a link utility that uses
a local imbalance measure as a proxy and employ a greedy
algorithm to select the links.
• To better understand the merits of GraB, we propose two
baselines (a naive and a more intelligent baseline) for the
novel problem of reducing the imbalance in a network.
• GraB is proposed as a generic method, applicable to a wide
range of network embedding methods. We also develop a
concrete instantiation using conditional network embedding
(CNE) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], a state-of-the-art network embedding method.
• We perform several experiments to compare the performance
of GraB and baselines in reducing the imbalance in a network
embedding. The experiments show that GraB outperforms
the baselines in this task.
      </p>
      <p>Outline. In Section 2, we define and quantify imbalance and
formalize the problem. In Section 3, we introduce our method GraB
to reduce the imbalance in a network using its embedding. In
Section 4, we provide an experimental evaluation of GraB. In Section 5,
we discuss the related work. In Section 6, we conclude and outline
avenues for future work.
2</p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM DEFINITION</title>
      <p>In this section, we first provide the relevant background and
notation. Next, we provide a definition and quantification of imbalance
in networks. Finally, we formulate the problem of reducing
imbalance by adding links to a network.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Background</title>
      <p>An undirected network is denoted by  = ( , ), where  is the set
of | | =  nodes and  ⊆ 2 is the set of links between nodes. For
convenience, we will index the set of nodes by natural numbers, i.e.
 = {1, . . . , }. Let  denote the adjacency matrix, and   is the
element of the adjacency matrix corresponding to the link between
node  and node  , i.e.   = 1 if {,  } ∈ . Network embedding
methods find a mapping each node  ∈  to a -dimensional real
vector  ∈ R . For convenience, these may be aggregated in a
matrix  = (1, ..., )′ ∈ R× . In this paper, we assume there is
a given network embedding method to find  .
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Network Imbalance</title>
      <p>To generically define our proposed notion of imbalance, we develop
it first for the concrete example of the job market. There, we define
the imbalance as the cost of matching all job seekers to job vacancies.
Here, we allow a job seeker and a job to be matched even if they
are not connected by a link. Indeed, a link between a job seeker
and a job vacancy might mean that the job seeker has applied for
the job or has otherwise expressed an interest, not necessarily that
they were employed for that job. Moreover, the absence of a link
does not imply that the job seeker would not be a good candidate
for the job. It is this property that distinguishes our work from the
literature on combinatorial matching problems in graphs.</p>
      <p>However, the skills to which the job seeker and job vacancy are
linked, and jobs vacancies they are linked to, provide information
on whether the job seeker is suited for the job; and the more suited,
the smaller the cost of a match between them should be. Hence, the
cost of matching a job seeker and a job vacancy should be a function
of the network structure, and adding or removing skills to that job
seeker or job vacancy should influence the cost of matching them.
Hence, the cost could be defined using any model that provides
link costs or link probabilities (so the cost of node pairs could
be computed based on them) based on the network structure. In
this paper, we investigate using the embedding of the network for
this, as it is a state-of-the-art approach to summarize the network
structure, where proximity between node embeddings reflects the
probability that both nodes should be linked.
2.2.1 Formal definition of network imbalance. The imbalance can
be formalized as follows. We create a complete bipartite network
(in which all node pairs between the two sets are linked), and assign
equal weights  to each node  in a set, such that the sum of weights
in two sets are equal. Given a cost defined for each link  = [  ]
(e.g., based on the link probability or distance in the embedding
space), we define the imbalance as the cost of the fractional perfect
b-matching with minimum cost in the bipartite network.</p>
      <p>
        A fractional perfect b-matching  = [  ] is defined as an
assignment of nonnegative real numbers   to the links of a network
such that the sum of those numbers over links incident to any node
 is equal to a specified weight  of that node [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The cost of  in
a undirected network with  nodes is then defined as Õ Õ     .
=1 =+1
We thus define the imbalance in a network as follows:
      </p>
      <p>Definition 1 (Imbalance). Consider a network  = ( , ), two
disjoint non-trivial subsets of its nodes referred to as the source nodes
∅ ⊂  ⊂  and the target nodes ∅ ⊂  ⊂  with  ∩  = ∅, and
the costs   of matching the pairs of nodes  and  for all  ∈ 
and  ∈  , arranged in a matrix  = [  ]. Moreover, consider
the complete edge-weighted and node-weighted bipartite network
 ′ = ( ′ =  ∪  ,  ′ =  ×  ), with weight of edge {,  } with  ∈ 
and  ∈  equal to   , and weight of node  ∈  equal to  = |1 |
and weight of node  ∈  equal to   = |1 | . Then the imbalance
between nodes in sets  and  , denoted as  (, ,  ), is defined as the
cost of the minimum cost fractional perfect b-matching in  ′.</p>
      <p>Computing the imbalance defined in this way can be done by
solving a linear programming problem for finding the matching
 = [  ] in  ′ that minimizes the overall cost:
 = Õ</p>
      <p>Õ     ,
 ∈  ∈
s.t.   ≥ 0
Õ   =  ∀ ∈ ,
 ∈</p>
      <p>∈
∀(,  ) ∈  ×  ,
Õ
  =  
∀ ∈  .</p>
      <p>The imbalance measure  is defined as the minimum cost of the
optimal matching. (The matching itself is not of interest to us for
the purposes of this paper.)
2.2.2 The matching cost, and a relation to the earth mover’s distance.
While the cost for each edge could be defined in several ways,
network embeddings arguably ofer a natural way to define them: our
approach is to use the distance of nodes in the network
embedding of  as the matching cost   of each node pair in  ′. For
embedding methods modeling first-order similarities in networks,
these distances are directly related to the link probability between
nodes. Moreover, the embedding of a node aggregates all relevant
information about the network structure in relation to the node.</p>
      <p>
        Interestingly, with this matching cost, the proposed definition
of the imbalance is equivalent with the Earth Mover’s Distance
(EMD) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] between the source and target sets in the embedding
space. The EMD computes the minimum cost to transform one
distribution into another.
      </p>
      <p>Proposition 1 (Network Imbalance based on fractional b–
matching and EMD are eqivalent). Consider the two empirical
distributions of the node embeddings for  and  in the embedding
space. The EMD between these two distributions is equal to the network
imbalance measure  .</p>
      <p>We refer to Appendix A1 for a more precise formulation of this
proposition and a proof.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Formulating the problem of reducing network imbalance</title>
      <p>As the cost of matching node pairs depends on the structure of
the network, it will change by modifying the network. Motivated
by the job market example, we propose the operation of adding
links as the kind of modification that can be made. We further
propose to restrict the problem by allowing links to be added only
between nodes from the source set and nodes from an auxiliary
set of nodes. This is again motivated by the job market, where we
can realistically add new links between skills and job seekers (by
training job seekers), but not between job vacancies and skills. More
formally, we introduce the following problem:</p>
      <p>Problem 1 (Imbalance Reduction). Given a network  = ( , )
and three mutually disjoint sets of nodes source nodes ∅ ⊂  ⊂  ,
target nodes ∅ ⊂ ⊂  and auxiliary nodes ∅ ⊂ ⊂  , the cost of
matching each pair of nodes  = [  ] (that depends on the network
structure  ), and imbalance measure  (, ,  ), find the optimal 
links E connecting nodes from set  with nodes from set  , that reduce
the imbalance between the nodes in sets  and  . Formally,
argmin  ( E, ,  ),</p>
      <p>E</p>
      <p>E⊆× ,E∩=∅
where  E is computed based on  E = ( ,  ∪ E).</p>
      <p>In the next section, we introduce GraB to solve this problem.</p>
      <sec id="sec-6-1">
        <title>1https://github.com/aida-ugent/GraB/blob/master/GraB_appendix.pdf</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>REDUCING NETWORK IMBALANCE: GRAB</title>
      <p>In this section, we introduce GraB, a generic method to solve the
imbalance reduction problem, i.e., how to add  links to a network
in order to maximally reduce the imbalance in the network, as
defined in Definition 1. We first provide a sketch of the solution
and then provide more details in the respective sections below.</p>
      <p>The exact minimization problem amounts to finding a set of links
that jointly minimize the imbalance. This exact approach may be
computationally intractable when the number of candidate links is
large, since it may require computing the imbalance reduction for
every possible set of  links. Besides the vast number of possible
sets, to compute the reduction in imbalance we need to re-embed
the network and compute the imbalance again. Recomputing the
embedding is computationally demanding, and is practically
impossible to do even for a moderate number of candidate link sets.
Motivated by these two problems, our approach is as follows.</p>
      <p>Firstly, rather than re-embedding the network and observing the
change in imbalance, we introduce a proxy measure for the change,
based on an infinitesimal change to any link {,  }. We refer to this
proxy measure as the link utility. The link utility is based on three
elements. (1) Since the formulation of  is a linear programming
problem, the derivative of  w.r.t. links cannot be directly computed.
Hence, we introduce a local imbalance measure to be used as a
proxy for  , and quantify the infinitesimal changes in links for that
measure. (2) This local imbalance measure is then used to derive
a measure of the utility of links. (3) The local imbalance measure
relies on the estimation of the density of nodes at any point in
the embedding. For this, we employ multivariate Gaussian kernel
density estimation. These elements are presented in Section 3.1.</p>
      <p>Secondly, the problem that we cannot test the imbalance
reduction for all possible sets of links remains. Link utility does not
account for any interactions between links on the amount of
imbalance in the network. To find a good balance between accuracy
and computational tractability, GraB employs a greedy selection
strategy using link utility in combination with re-embedding the
network every  steps (the batch size). The GraB algorithm
implementing this strategy is introduced in Section 3.2.</p>
      <p>Finally, we need to select a suitable embedding method. We
briefly discuss suitable methods in Section 3.3.
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>The Link Utility Measure</title>
      <p>In the embedding of a network, there are areas where the set  (or
 ) is denser than the other one. Given a network with  nodes and
their corresponding embeddings {1, ...,  }, the idea behind the
proxy measure is that to reduce the imbalance, we should add links
connecting source nodes with auxiliary nodes, such that the source
nodes are moved to areas with a higher density of target nodes and
fewer source nodes. We thus introduce a local imbalance measure
to quantify the density imbalance between source and target node
sets at any point in the embedding. Moving source nodes to areas
with higher local imbalance (more target nodes than source nodes)
would reduce the local imbalance in those areas. We define the
utility of adding a link using the first order of approximation of this
local imbalance measure.
3.1.1 Local Imbalance Quantification. Skipping for a moment how
to quantify the density of a set of nodes at a specific point in the
embedding, we define the local imbalance measure which evaluates
the imbalance between the embedding of two sets of nodes  and
 locally at any point  in the embedding space as follows:</p>
      <p>Definition 2 (Local Imbalance Measure , ). Given a
network  = ( , ) with the embedding  and two disjoint sets of nodes,
source nodes  with ∅ ⊂ ⊂  and target nodes  with ∅ ⊂ ⊂  ,
denote the density function of the target nodes as estimated based
on the embedding  and evaluated at point  by  (;  ), and let
 (;  ) denote the estimated density function for the source nodes
evaluated at . We use the log ratio of the density of the two sets of
nodes as local imbalance measure , evaluated at point :
, (;  ) = ln
 (;  ) .</p>
      <p>(;  )</p>
      <p>If the densities are diferentiable and non-zero everywhere, also
, becomes diferentiable and suitable for optimization.</p>
      <p>Example: Let us illustrate the idea behind GraB and the local
imbalance as a proxy to optimize the imbalance  . Figure 2a shows
a 2D embedding of a toy network with equal number of source
and target nodes. Hence, each source node should be matched with
exactly one target node to compute the imbalance  . Visually, 1
should be matched with 3, since they are close to each other and
the cost of matching them is low. Thus, 1 and 2 should be matched
with 2 and 3, although the matching cost (their distance) would
be high. GraB can then be used to move source nodes 1 and 2 to
areas with higher local imbalance , which is the area where 2
and 3 reside. If we move 1 and 2 closer to 2 and 3, the matching
costs between them would be reduced and the imbalance  would
be reduced as well. GraB’s goal is to find links between source
and auxiliary nodes such that adding them would move the source
nodes to areas with higher local imbalance . GraB uses a link utility
measure for this purpose, which will be discussed in the next section.
Figure 2b shows the 2D embedding of the network after adding two
links {1, 3} and {2, 3} suggested by GraB, demonstrating that
adding the links indeed reduces  , from 3.44 to 2.71.
3.1.2</p>
      <sec id="sec-8-1">
        <title>Link Utility. We can now define the utility of a link:</title>
        <p>Definition 3 (Link Utility). The utility of adding a link {,  }
for reducing the local imbalance at the embedding   of source node
 ∈  is defined as the rate at which the local imbalance evaluated at
  changes when increasing   , or mathematically: , (; ) .</p>
        <p>Note that two efects of adding the link {,  } are accounted for in
this definition: the fact that the embedding   of node  will move,
possibly to a denser or sparser region, and the fact that the density
functions themselves may change. Both efects can be separated by
computing the total derivative:
(1)
(2)
, (;  ) = ∇, (;  )  ()</p>
        <p>+
Õ ∇ , (;  ) ( ) +
Õ ∇ , (;  )  () ,
 
 ∈  ∈
where the first term accounts for the change in position  where
the densities are measured, and the second and third terms account
fo the changes in both estimated densities.</p>
        <p>
          Evaluating all these terms is costly. However, since changing  
has a direct efect only on the embeddings of nodes  and   [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
we argue that the summation terms over target nodes and over
source nodes where  ≠  can be neglected. With  set equal to ,
we can thus approximate the utility as follows:
, ( ;  )
 

≈ ∇, (;  ) + ∇ , (;  ) |= .
(3)
        </p>
        <p>
          Computing this approximation is far more eficient than a
bruteforce computation of the utility. It is scalable especially if the
derivatives  () of the embeddings can be computed analytically. An

embedding method for which that is the case is presented in Sec. 3.3.
3.1.3 KDE as Density Estimation Method. Kernel density
estimation (KDE), also known as Parzen window estimation [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], is a
nonparametric density estimator. The flexibility arising from KDE’s
non-parametric nature makes it a very popular approach for data
drawn from a complicated distribution [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We use KDE as the
density estimator in the local imbalance measure.
 () .
        </p>
        <p />
        <p>
          Definition 4 (Kernel Density Estimation [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]). Given a set of
-dimensional points {1, ...,  } forming the rows of data matrix  ,
and a kernel function , the KDE for an arbitrary point  is defined
as:
 (;  ) =
1 Õ
 =1
 ( −  ).
        </p>
        <p>For the kernel, we use multivariate Gaussian KDE:</p>
        <p>
          () = (2 )−/2 | |−1/2 − 21   −1,
where the so-called bandwidth matrix  is computed using Scott’s
rule [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Thus, we can quantify the utility (Definition 3) of adding
link {,  } at node  by rewriting Eq. (3) with  as the density
estimator. Doing this, observe that ∇  ( − )|= = 0, such
that also ∇ , (;  )|= = 0. Hence, with this KDE as density
estimator, Eq. (3) simplifies to:
, ( ;  )
        </p>
        <p>().
≈ ∇, (;  )|= .  
(4)</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>3.2 The GraB Algorithm</title>
      <p>Having defined the network imbalance measure and the link utility
measure to be used as a proxy, we can now craft a scalable algorithm
to optimize the imbalance. We designed GraB using three concepts.</p>
      <p>1. Greedy link selection: As discussed, solving problem 1
exactly requires computing the imbalance reduction for every possible
set of  links between set  and  , which is computationally
infeasible. Instead, GraB picks links greedily based on the link utility
measure. Although the link selection step aims to provide the most
beneficial links to add to the network, the network embedding
after adding the links might be diferent than expected, due to the
following two issues.</p>
      <p>2. Include links in batches: The first issue is that we are using
the utility of adding a link, assuming that the rest of the network
embedding remains the same. However, the efect of adding a link may
change after another link is included (especially if they are close to
each other in the network). A solution would be to re-embed the
network after the inclusion of each link, but this is computationally
costly. Thus, as a trade-of between cost and accuracy, we add links
to the network in several batches: in each iteration, we select a
batch of  links from the top candidates ranked by link utility, add
them to the network, and re-embed the network. Additionally, since
the embedding of a node is afected more by its direct links, we add
at most one link per source node in a batch.</p>
      <p>
        3. Post-hoc filtering : The second issue is that adding a link
may actually not reduce the imbalance, for example because the
source node moves a lot in the embedding and ‘jumps over’ the area
of higher local imbalance , . This is the case when the derivative
from Definition 3 is not a good approximation to the finite diference
of the local imbalance measure. An example is shown in Figure 3,
which shows the heatmap of , on a 2-dimensional embedding
of the Weibo network [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] using conditional network embedding.
In this example, node  jumps over the area with a high , and
ends up in a worse position in terms of , .
      </p>
      <p>Hence, to find the most beneficial  links to add to the network
in each batch, we instead select the  · top candidate links using
the link utility measure ( ≥ 1 controls how many more links than 
has to be added to the network for re-embedding). After adding  ·
candidate links, we re-embed the network and select the first  links
that caused their connected source nodes be moved to positions
with higher , . The parameter  has to be large enough so that at
least  links would end up in a better position after re-embedding.</p>
      <p>GraB: In summary, we select  links to add to the network to
reduce the imbalance in several batches, each of size . For each
batch, we select the  · top candidate links (at most one link per
source node) using the link utility measure, add them to the network,
re-embed the network, and select the top  links that caused their
connected source nodes to move to positions with higher , .</p>
      <p>We call this method Graph Balancing ( GraB). Full pseudocode
for the algorithm is given in Appendix B2.
3.3</p>
    </sec>
    <sec id="sec-10">
      <title>Choice of Embedding Method</title>
      <p>
        The generic method GraB applies to network embedding methods
where the optimal embeddings are diferentiable w.r.t. changes in
link strength, i.e. where  () for  ∈  and  ∈  and   the
embedding of node , can be evaluated. To the best of our knowledge,
two methods satisfy this requirement: LINE [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and Conditional
network embedding (CNE) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. We chose to use CNE for three
reasons. First, LINE uses the inner product as the similarity measure
between node embeddings, whereas KDE (and CNE) are based
on the Euclidean distance. This mismatch would make the local
imbalance proxy less efective. Second, re-embedding the network
starting from an initial embedding is easily done with CNE, greatly
speeding up GraB. And third, CNE was shown to outperform LINE.
      </p>
      <p>Let   denote the (analytically computable) Hessian of the
objective function of CNE w.r.t.  . Then, with  a hyper-parameter</p>
      <sec id="sec-10-1">
        <title>2https://github.com/aida-ugent/GraB/blob/master/GraB_appendix.pdf</title>
        <p>
          of CNE, Kang et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] showed that  () is given by:
 
 () = −  −1 (  −  ).
        </p>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>In this section, we describe the experimental evaluation of GraB. In
a qualitative experiment, we investigate question Q1: Does GraB
work as expected in moving source nodes towards areas with more
target nodes compared to source nodes? In the quantitative
experiments, we investigate two questions Q2: How does GraB perform
in reducing the imbalance  compared to the baselines? Q3: Does
GraB constantly reduce imbalance by increasing the number of
links added to the networks? In a hyper-parameter sensitivity
experiment, we investigate question Q4: Is GraB sensitive to the batch
size ? Finally, we investigate question Q5: How does GraB perform
in terms of execution time compared to the baselines?</p>
      <p>We first discuss the datasets, baselines, and settings. Next, we
present the result of each experiment. The source code for repeating
the experiments are available here 3.
4.1</p>
    </sec>
    <sec id="sec-12">
      <title>Datasets</title>
      <p>We evaluated the methods using three datasets described below:</p>
      <p>VDAB 4: VDAB is the employment service of Flanders in
Belgium. It provides a platform for job seekers to find jobs. The dataset
contains a sample of the applications made by job seekers to
available job vacancies from January 2018 until October 2020. We
construct the job market network with three sets of nodes: job seekers,
job vacancies, and skills. Job vacancies are connected to job seekers
that have applied for them and to the skills they require. Our goal
is to reduce the imbalance between job seekers (source nodes) and
job vacancies (target nodes) by adding links connecting job seekers
with skills. This could be seen as teaching a group of job seekers
some skills, in a way that balances the job market network.</p>
      <p>
        Weibo [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]: Weibo is the most popular Chinese microblogging
service. The dataset contains tweets of the users and the topic
distribution of each tweet. We construct the network with three sets
of nodes: male users, female users, and topics. Users are connected
to their friends (reciprocal follow relationships) and the top topics
of their tweets. To find the top topics for each tweet, we first sort
the topic probabilities (relevance of topic for the tweet) in
descending order. Next, we select the top topics until the diference of the
probabilities of two consecutive topics is greater than a very small
threshold (1e-6). We select a sample from the dataset by only
considering tweets for the first year of the data. Our goal is to reduce
the imbalance between males (source nodes or target nodes) and
females (target nodes or source nodes) by adding new links
connecting males/females with topics (auxiliary nodes). It is more like
recommending tweets with specific topics to the users to increase
their interest in that topic. In the experiments, we call this dataset
Weibo-mf if males are the source nodes and females are the target
nodes. Otherwise, we call the dataset Weibo-fm.
      </p>
      <p>
        Movielens [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]: MovieLens is a web-based movie recommender
system. The dataset contains 100000 user ratings on movies. We
construct the network with three sets of nodes: users, movies, and
      </p>
      <sec id="sec-12-1">
        <title>3https://github.com/aida-ugent/GraB 4https://www.vdab.be/</title>
        <p>movie genres. There is a link between each user and movie for
each rating. We also connect each movie to its genres. Our goal is
to reduce the imbalance between movies (source nodes or target
nodes) and users (target nodes or source nodes) by adding new
links connecting movies with genres (auxiliary nodes).</p>
        <p>We only used the largest connected component in each network.
Table 1 shows the main statistics of each of the networks.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>GraB-variants and baselines evaluated</title>
      <p>
        As mentioned earlier, we are the first to introduce the concept of
imbalance in a network in the way described in Section 2.2 and to
reduce it by adding links to the network. However, there exist other
methods that try to add links to the networks to make them more
cohesive. We consider two of those methods [
        <xref ref-type="bibr" rid="ref10 ref20">10, 20</xref>
        ] for comparison.
      </p>
      <p>
        Parotsidis et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] minimize the average shortest path in a
network by adding links. Garimella et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] compute controversy
between two sets of nodes using the random walks starting from
one set, and ending in the same or the other set. The main
diference between the imbalance and controversy is that the amount
of links between nodes in the same set has a major efect on the
controversy, which is not necessarily the case in computing the
imbalance. Moreover, we use the distance in the embedding space
as the cost of matching node pairs, while they consider the actual
links in the network to compute the controversy.
      </p>
      <p>We also designed a random method combined with our proposed
greedy algorithm, and a pure random method for comparison.</p>
      <p>In summary, the following methods will be evaluated:
GraB: The main method proposed in this paper.</p>
      <p>S-GraB: ‘Simple Graph Balancing’ is the same as GraB without
comparing the value of , of the previous and the new embedding
of the source node (i.e. without post-hoc filtering). I.e., S-GraB
selects  links connecting source nodes with auxiliary nodes for
each batch from the link selection step. To select  links to add
to the network, S-GraB runs  iterations. S-GraB is evaluated to
assess the importance of the post-hoc filtering.</p>
      <p>Since the problem of graph balancing has not been studied before,
we also propose two simple baselines for comparison:</p>
      <p>
        ROV [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: ’Recommend opposing view’ adds links to the
network to reduce controversy. In this work,  links between high
degree nodes in sets  and  that reduce controversy the most,
are added using a greedy algorithm. We adopt this method for our
problem setting by adding links between sets  and  using the
same method.
      </p>
      <p>
        SSW [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]: ’Shortcuts for a smaller world’ adds links to the
network to reduce the average shortest path length. In this work, 
links are added to the network using diferent strategies. We employ
the greedy strategy, since it has the best performance [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>S-Random: The ‘Simple random’ baseline selects  random links
connecting source nodes with auxiliary nodes.</p>
      <p>I-Random: The ‘Intelligent random’ baseline is the same as
GraB, except that we select random links connecting source nodes
with auxiliary nodes in the link selection step. Since the link
selection step is a random algorithm, adding links in several batches
probably does not help the performance. Hence, I-Random adds
links in one batch and re-embeds the network to select links that
moved their source nodes to positions with higher , (post-hoc
ifltering). We expect that adding random links results in making
the graph denser and nodes tend to be closer to each other.</p>
      <p>In summary, I-Random selects  ·  random links connecting
source nodes with auxiliary nodes, re-embeds the network, and the
same as GraB, it only selects links that helped their source nodes
end up in a position with a higher , (post-hoc filtering).
4.3</p>
    </sec>
    <sec id="sec-14">
      <title>Experimental Settings</title>
      <p>
        In the quantitative experiments, we compare methods in terms of 
(Definition 1). We conduct the experiments on CNE with dimensions
2 and 4 with a combination of block and degree prior (see [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). We
average the results for I-Random and S-Random over 3 repetitions
to smoothen out random fluctuations.
      </p>
      <p>In the qualitative experiment of GraB, we set  = 10 and  = 5.
We run the experiment for 2-dimensional embedding.</p>
      <p>In the experiment in Sec. 4.5.1, the methods are evaluated on
several datasets. We set  = 100 for this experiment. We tune 
from values {25, 100} for GraB and S-GraB. We also tune  from
values {3, 5} for I-Random and GraB (S-GraB does not have
hyperparameter , since it does not need to compare the value of ,
of the previous and the new embedding of the source node). We
use the author’s implementation of computing the controversy in a
network between two sets and used their default hyper-parameters,
which is used in ROV. We used 10% of the high degree source nodes
and 20 high degree auxiliary nodes for the candidate selection in
ROV. We also use the author’s implementation of SSW. SSW does
not require to set any particular hyper-parameter.</p>
      <p>In the experiment in Sec. 4.5.2, we analyze the behavior of GraB
with diferent values of . We increase  from 25 to 1000 by 25 and
report  values. We set  = 25 and  = 5.</p>
      <p>In the experiment in Sec. 4.6, we analyze the behavior of GraB
with diferent values of batch size . We set  = 100 and  = 5 for
this experiment and vary  from 1 to 100.
4.4</p>
    </sec>
    <sec id="sec-15">
      <title>Qualitative Evaluation</title>
      <p>In this section, we show that GraB moves source nodes to the areas
in the embedding space with having more target node percentage
in their neighborhood than before (Q1). We first add 10 links to
each dataset. Next, we analyze two sample source nodes that we
added links to and compare the percentage of target nodes in their
neighborhood. The percentage of target nodes in 25 nearest
neighbors (only neighbors from source nodes and target nodes) of two
sample source nodes for each network is presented in Table 2.</p>
      <p>The number of target nodes in 25 nearest neighbors of the two
source nodes is increased for each network. It means that GraB
moved source nodes to areas where they have more access to target
nodes, and also fewer source nodes are competing with them to
access the same target nodes.
4.5</p>
    </sec>
    <sec id="sec-16">
      <title>Quantitative Evaluation</title>
      <p>4.5.1 Baseline Comparison . Here we compare the methods in
terms of  on all datasets. We report  on the main graph as well
(Q2). Table 3 shows the result of this experiment.</p>
      <p>GraB reduces  over the main graph and outperforms the other
methods in all datasets. S-GraB also outperforms other baselines
in most datasets. This shows that the link selection based on the
link utility is choosing proper links to add to the network, and also
post-hoc filtering of links in GraB improves the results.</p>
      <p>For the baselines, I-Random improves  over the main graph
and S-Random. S-Random however, is not efective and gains the
same results as the main graph. Hence, the greedy algorithm with
post-hoc filtering proposed in this paper (applied in I-Random) is
helpful even with a random link selection.</p>
      <p>The other methods SSW and ROV do not perform particularly
well (in some datasets even increases the imbalance, and the amount
of imbalance reduction in other datasets is less than GraB), since
they are designed for a diferent purpose and objective function.</p>
      <p>As a result, we can conclude that both the link selection step
based on link utility and the greedy algorithm of GraB with
posthoc filtering are crucial to solve the problem and they are both
playing an important role in reducing  of the main graph.
4.5.2 Efect of adding a diferent number of links on GraB. In this
experiment, we evaluate the performance of GraB for diferent
values of , i.e., number of links added to the network (Q3). Figure 4
shows the result of this experiment for all datasets.</p>
      <p>In most datasets,  decreases by adding more links to the
network. The amount of reduction in  mostly decreases, or even in
some datasets  increases as we add more links to the network. The
reason is that in the beginning, there are more links available to
add to the network. As we gradually add the most beneficial links
to the network, the remaining candidate links might have smaller
utilities and less certainty. Hence, GraB selects links from those
candidates which results in worsening the performance globally.
So GraB does not constantly reduce imbalance by increasing the
number of links added to the networks.
In this section, we evaluate the sensitivity of GraB w.r.t. the batch
size  (Q4). Figure 5 shows the result of this experiment.</p>
      <p>Smaller  means fewer links are added to the network and hence,
the network embedding is more stable. As a result, the algorithm
ifnds links with the highest utilities more accurately. The issue of
small values of  is that the efect of adding links on each other is
neglected. For example, consider adding link {,  } and {, } to a
network one by one ( = 1). Since GraB adds {,  } at the first step, it
means that node  has moved to an area with higher local imbalance
, . Yet, it is possible that after adding {, } the local imbalance
, at the position of node  becomes less than its original position
(since moving node  changes the density of the source nodes in
the embedding space). On the other hand, by adding the two links
at the same time ( = 2), {,  } will be filtered in post-hoc filtering
step and will not be selected and there will be the opportunity to
select other links to reduce the imbalance.</p>
      <p>On the other hand, having a large  also is not always the best
option because by adding  · links at the same time, the density
of source nodes and target nodes are not stable and varies a lot
(since by filtering some links and selecting only  links, the actual
densities after re-embedding will be diferent). Moreover, by adding
so many links at the same time, the behavior of the embeddings
is less predictable due to the impact of links on each other. Hence,
the method will not be accurate in selecting and adding links to the
network, to reduce the imbalance.</p>
      <p>Middle values of  show more stable results in most datasets.</p>
      <p>There is a blank spot in Weibo-fm-CNE4 dataset for  = 100
because GraB could not find 100 links to add to the network (because
after re-embedding, not 100 links moved their source nodes to a
position with a higher , ). Hence, we did not report  .
4.7</p>
    </sec>
    <sec id="sec-17">
      <title>Execution Time</title>
      <p>In this experiment, we compare the methods in terms of the
execution time (Q5) with the same settings as experiment 4.5.1. Figure 6
shows the execution time in seconds (log-scale) for all methods,
including the time for hyper-parameter tuning.</p>
      <p>GraB and ROV have the highest execution time among all
methods. GraB has a high execution time due to the number of
hyperparameters to be tuned, the link selection step, and also re-embedding
after adding each batch. ROV has a high execution time due to the
large number of candidates and the time needed for computing the
controversy after adding each candidate link to the graph.</p>
      <p>S-GraB has a greater execution time than I-Random. For more
analysis, we first count the number of times they need to re-embed
the networks. I-Random re-embeds the networks once for each
hyper-parameter selection because it does not add links in batches.
Since we tune  from 2 values, I-Random selects links and re-embeds
the networks 2 times in total. S-GraB re-embeds the networks  − 1
times for each value of  . Since  = 100 and we tune  from values
{25, 100}, S-GraB re-embeds the networks 3 times in total. Besides,
S-GraB performs the link selection step  times. Thus, it performs
the link selection step 5 times in total (once when  = 100 and 4
times when  = 25). Moreover, the link selection step is more time
consuming in S-GraB than I-Random which selects links randomly.
Hence, S-GraB is slower than I-Random.</p>
      <p>Moreover, S-Random is a simple random method and the
execution time is almost zero for all datasets.
5</p>
    </sec>
    <sec id="sec-18">
      <title>RELATED WORK</title>
      <p>
        Imbalance in the workforce has been studied in various systems [
        <xref ref-type="bibr" rid="ref27 ref29">27,
29</xref>
        ]. However, our work difers from these studies. Previous studies
are mostly domain-specific and they analyze the supply and demand
based on domain-specific features such as educational training
program length, retirement, and salary. Previous studies in this
area also lack a global measure to quantify the imbalance between
two diferent entities. In contrast to this, we tackle the problem
from a graph analysis approach. We propose a quantification of
the imbalance in the network using its embedding and a method to
reduce the imbalance by adding links to the network.
      </p>
      <p>
        Another line of research focuses on matching problems [
        <xref ref-type="bibr" rid="ref18 ref22">18, 22</xref>
        ].
Finding the cost of the fractional perfect b-matching [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with
minimum cost in bipartite networks is related to our work. In this
problem, the goal is to find a matching between two sets of nodes
in a network with minimum total cost. We define the imbalance
as the cost of the minimum cost fractional perfect b-matching on
a new bipartite network created from the original network. Our
work difers from the studies focusing on matching, since we do not
address the computational problem of how to find the matching, we
only use the cost of the matching to quantify the imbalance.
Moreover, we add links to the network (not directly between the two
sets of nodes of interest), to change the cost of links between the
two sets of nodes, and hence, reduce the imbalance in the network.
      </p>
      <p>
        Fairness in node embeddings is studied in various research
papers [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. These studies try to learn an unbiased network
embedding. The similarity between learning an unbiased embedding
and reducing the imbalance in a network embedding is that they
both try to have a mix between nodes with diferent attributes or
diferent types (job seeker and job vacancy, female and male, etc.).
The main diference is that debiasing methods learn an unbiased
embedding based on the original network, while we modify the
network in order to make it more balanced. A secondary diference
is in the quantification of the imbalance or unfairness. In our setting,
we intend to bring two sets of nodes closer, to reduce the imbalance,
while the goal to reduce unfairness is that the two sets of nodes
cannot be separated, which is not important in our case.
      </p>
      <p>
        There are also several papers aiming to add links to a network
to modify the network structure. Some of the researches focus on
adding links to a network to make it more cohesive, where
cohesiveness is quantified using network properties such as shortest
paths [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ], diameter [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], information unfairness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
controversy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and structural bias [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The paper by Garimella et al.
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is most related to our work, since they add links to the network
to reduce the controversy between two sets of nodes. However,
the diference between our works is that we consider a diferent
measure to compute the imbalance and use a diferent approach to
optimize it.
6
      </p>
    </sec>
    <sec id="sec-19">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>We defined and quantified the concept of imbalance between two
sets of nodes in a network, and introduced the novel problem of
reducing that imbalance by introducing new links. We proposed
GraB, a scalable algorithm to tackle this problem, leveraging a
number of well-motivated heuristics to trade-of speed with accuracy.
We presented experiments applying GraB together with CNE as the
network embedding method to various networks. The
experimental results indicate that GraB outperforms (also newly proposed)
baselines for reducing imbalance in a network embedding.</p>
      <p>In future work, we plan to investigate the benefits of a new link
for individual nodes (e.g., improving the access to a target set of
nodes) instead of just for the global balance of the network, as
well as other problem settings such as reducing the imbalance in a
network by removing a specific number of links from the network
(e.g., changing job contents and required skills, making jobs more
accessible), or both adding and removing links at the same time.
Moreover, a more detailed investigation of the impact of
hyperparameters such as the KDE bandwidth would be useful.</p>
    </sec>
    <sec id="sec-20">
      <title>ACKNOWLEDGMENTS</title>
      <p>The research leading to these results has received funding from the
European Research Council under the European Union’s Seventh
Framework Programme (FP7/2007-2013) / ERC Grant Agreement
no. 615517, from the Flemish Government under the
“Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme,
and from the FWO (project no. G091017N, G0F9816N, 3G042220).
Part of the experiments were conducted on pseudonimized HR data
generously provided by VDAB.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Roger</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Behrend</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Fractional perfect b-matching polytopes I: General theory</article-title>
          .
          <source>Linear Algebra Appl</source>
          .
          <volume>439</volume>
          ,
          <issue>12</issue>
          (
          <year>2013</year>
          ),
          <fpage>3822</fpage>
          -
          <lpage>3858</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Avishek</given-names>
            <surname>Joey Bose and William L Hamilton</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Compositional fairness constraints for graph embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>10674</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Maarten</given-names>
            <surname>Buyl and Tijl De Bie</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>DeBayes: a Bayesian method for debiasing network embeddings</article-title>
          . arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>11442</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Hongyun</given-names>
            <surname>Cai</surname>
          </string-name>
          , Vincent W Zheng, and
          <string-name>
            <surname>Kevin</surname>
            <given-names>Chen-Chuan</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A comprehensive survey of graph embedding: Problems, techniques, and applications</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>30</volume>
          ,
          <issue>9</issue>
          (
          <year>2018</year>
          ),
          <fpage>1616</fpage>
          -
          <lpage>1637</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Ramon</given-names>
            <surname>Ferrer I Cancho</surname>
          </string-name>
          and Richard V Solé.
          <year>2001</year>
          .
          <article-title>The small world of human language</article-title>
          .
          <source>Proceedings of the Royal Society of London. Series B: Biological Sciences</source>
          <volume>268</volume>
          ,
          <issue>1482</issue>
          (
          <year>2001</year>
          ),
          <fpage>2261</fpage>
          -
          <lpage>2265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Yen-Chi Chen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A tutorial on kernel density estimation and recent advances</article-title>
          .
          <source>Biostatistics &amp; Epidemiology</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          (
          <year>2017</year>
          ),
          <fpage>161</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Erik</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Demaine</surname>
            and
            <given-names>Morteza</given-names>
          </string-name>
          <string-name>
            <surname>Zadimoghaddam</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Minimizing the diameter of a network using shortcut edges</article-title>
          .
          <source>In Scandinavian Workshop on Algorithm Theory</source>
          . Springer,
          <fpage>420</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Linton</surname>
            <given-names>C</given-names>
          </string-name>
          <string-name>
            <surname>Freeman</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Visualizing social networks</article-title>
          .
          <source>Journal of social structure 1</source>
          ,
          <issue>1</issue>
          (
          <year>2000</year>
          ),
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Sheng</given-names>
            <surname>Gao</surname>
          </string-name>
          , Huacan Pang, Patrick Gallinari, Jun Guo, and
          <string-name>
            <given-names>Nei</given-names>
            <surname>Kato</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A novel embedding method for information difusion prediction in social network big data</article-title>
          .
          <source>IEEE Transactions on Industrial Informatics</source>
          <volume>13</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>2097</fpage>
          -
          <lpage>2105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kiran</surname>
            <given-names>Garimella</given-names>
          </string-name>
          , Gianmarco De Francisci Morales, Aristides Gionis, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Reducing controversy by connecting opposing views</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining</source>
          .
          <fpage>81</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Palash</given-names>
            <surname>Goyal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Emilio</given-names>
            <surname>Ferrara</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Graph embedding techniques, applications, and performance: A survey</article-title>
          .
          <source>Knowledge-Based Systems</source>
          <volume>151</volume>
          (
          <year>2018</year>
          ),
          <fpage>78</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          .
          <volume>855</volume>
          -
          <fpage>864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Shahrzad</surname>
            <given-names>Haddadan</given-names>
          </string-name>
          , Cristina Menghini, Matteo Riondato, and
          <string-name>
            <given-names>Eli</given-names>
            <surname>Upfal</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions</article-title>
          .
          <source>arXiv preprint arXiv:2101.04751</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F Maxwell</given-names>
            <surname>Harper and Joseph A Konstan</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The movielens datasets: History and context</article-title>
          .
          <source>Acm transactions on interactive intelligent systems (tiis) 5</source>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Zeinab</surname>
            <given-names>S Jalali</given-names>
          </string-name>
          , Weixiang Wang, Myunghwan Kim, Hema Raghavan, and
          <string-name>
            <given-names>Sucheta</given-names>
            <surname>Soundarajan</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>On the Information Unfairness of Social Networks</article-title>
          .
          <source>In Proceedings of the 2020 SIAM International Conference on Data Mining. SIAM</source>
          ,
          <fpage>613</fpage>
          -
          <lpage>521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Bo</surname>
            <given-names>Kang</given-names>
          </string-name>
          , Jefrey Lijfijt, and Tijl De Bie.
          <year>2018</year>
          .
          <article-title>Conditional network embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>07544</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Bo</surname>
            <given-names>Kang</given-names>
          </string-name>
          , Jefrey Lijfijt, and Tijl De Bie.
          <year>2019</year>
          .
          <article-title>Explaine: An approach for explaining network embedding-based link predictions</article-title>
          . preprint arXiv:
          <year>1904</year>
          .
          <volume>12694</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Kolmogorov</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Blossom V: a new implementation of a minimum cost perfect matching algorithm</article-title>
          .
          <source>Mathematical Programming Computation</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          (
          <year>2009</year>
          ),
          <fpage>43</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Manos</surname>
            <given-names>Papagelis</given-names>
          </string-name>
          , Francesco Bonchi, and
          <string-name>
            <given-names>Aristides</given-names>
            <surname>Gionis</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Suggesting ghost edges for a smaller world</article-title>
          .
          <source>In Proceedings of the 20th ACM international conference on Information and knowledge management</source>
          .
          <volume>2305</volume>
          -
          <fpage>2308</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Nikos</surname>
            <given-names>Parotsidis</given-names>
          </string-name>
          , Evaggelia Pitoura, and
          <string-name>
            <given-names>Panayiotis</given-names>
            <surname>Tsaparas</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Selecting shortcuts for a smaller world</article-title>
          .
          <source>In Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM</source>
          ,
          <fpage>28</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Emanuel</given-names>
            <surname>Parzen</surname>
          </string-name>
          .
          <year>1962</year>
          .
          <article-title>On estimation of a probability density function and mode</article-title>
          .
          <source>The annals of mathematical statistics 33</source>
          ,
          <issue>3</issue>
          (
          <year>1962</year>
          ),
          <fpage>1065</fpage>
          -
          <lpage>1076</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Lyle</given-names>
            <surname>Ramshaw</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robert E</given-names>
            <surname>Tarjan</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>On minimum-cost assignments in unbalanced bipartite graphs</article-title>
          .
          <source>HP Labs</source>
          , Palo Alto, CA, USA,
          <source>Tech. Rep. HPL-2012- 40R1</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Yossi</surname>
            <given-names>Rubner</given-names>
          </string-name>
          , Carlo Tomasi, and
          <string-name>
            <surname>Leonidas</surname>
          </string-name>
          J Guibas.
          <year>1998</year>
          .
          <article-title>A metric for distributions with applications to image databases</article-title>
          .
          <source>In Sixth International Conference on Computer Vision</source>
          (IEEE Cat.
          <source>No. 98CH36271)</source>
          . IEEE,
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>David</surname>
            <given-names>W Scott.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Multivariate density estimation: theory, practice, and visualization</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Jian</surname>
            <given-names>Tang</given-names>
          </string-name>
          , Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and
          <string-name>
            <given-names>Qiaozhu</given-names>
            <surname>Mei</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Line: Large-scale information network embedding</article-title>
          .
          <source>In Proceedings of the 24th international conference on world wide web. 1067-1077.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Athanasios</surname>
            <given-names>Theocharidis</given-names>
          </string-name>
          , Stjin Van Dongen,
          <string-name>
            <surname>Anton J Enright</surname>
          </string-name>
          , and Tom C Freeman.
          <year>2009</year>
          .
          <article-title>Network visualization and analysis of gene expression data using BioLayout Express 3D</article-title>
          .
          <source>Nature protocols 4</source>
          ,
          <issue>10</issue>
          (
          <year>2009</year>
          ),
          <fpage>1535</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Graham</surname>
            <given-names>Willis</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Woodward</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Siôn</given-names>
            <surname>Cave</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Robust workforce planning for the English medical workforce</article-title>
          .
          <source>In Conference Proceedings, The 31st International Conference of the System Dynamics Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Jing</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Biao Liu, Jie Tang, Ting Chen, and
          <string-name>
            <given-names>Juanzi</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Social influence locality for modeling retweeting behaviors</article-title>
          .
          <source>In Twenty-Third International Joint Conference on Artificial Intelligence .</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Pascal</surname>
            <given-names>Zurn</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mario R Dal Poz</surname>
            , Barbara Stilwell, and
            <given-names>Orvill</given-names>
          </string-name>
          <string-name>
            <surname>Adams</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Imbalance in the health workforce</article-title>
          .
          <source>Human resources for health 2</source>
          ,
          <issue>1</issue>
          (
          <year>2004</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>