<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IEEE Journal of Selected Topics in Signal Processing 8 (2014) 514-523.
[15] M. De Domenico</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j</article-id>
      <title-group>
        <article-title>The Challenge of Finding Degree Centrality Nodes in Heterogeneous Multilayer Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kiran Mukunda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abhishek Santra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharma Chakravarthy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT Lab and CSE Department, University of Texas at Arlington</institution>
          ,
          <addr-line>Arlington, Texas</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <volume>108</volume>
      <fpage>514</fpage>
      <lpage>523</lpage>
      <abstract>
        <p>Complex data sets with diferent types of entities and relationships can be elegantly modeled using Heterogeneous Multilayer Networks (HeMLNs), where diferent sets of nodes are connected within and across layers. To identify highly influential nodes in these networks, it is imperative that we are able to define and compute centrality metrics directly on MLNs. Currently, MLNs are converted into a single graph using aggregate and projection alternatives, and centrality and other metrics are computed. However, this approach has been shown to lose information, and structure, and makes result interpretation dificult. In this paper, we extend the simple graph degree centrality definition to HeMLNs and use the novel decoupling approach. For this, we propose heuristics to develop algorithms for identifying degree centrality nodes in heterogeneous MLNs. The proposed heuristics improve the accuracy with respect to ground truth when additional information from each layer is used to improve accuracy with respect to ground truth. However, identifying that additional minimal information is the challenge. We provide intuition behind the heuristics proposed and provide extensive experimental results using large and diverse synthetic and real-world data sets to demonstrate improved accuracy, precision, and eficiency across graph characteristics. We have also shown that the decoupling approach is significantly more eficient than the computation of ground truth.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>Graphs represent relationships between entities in a system using nodes and edges. This
representation allows us to model and perform various types of analysis depending on the
relationships in the data. For example, the individuals in a friendship network can be related
to their residential cities in a transport network; authors can be related if they publish at the
same conferences, etc. As graph data sets are becoming larger and more complicated in the
real world, we need to expand not only the analysis methodologies but also representations in
appropriate ways.</p>
      <p>
        One way to handle both the size and complexity of relationships in a data set is to use
alternative modeling approaches, such as multilayer networks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. Instead of a single
graph trying to capture all the relationships, a separate layer is used for each relationship
making the representation or the model easy to understand. Even the relationships between the
nodes in diferent layers can be captured in this model. Hence, MLNs are becoming popular for
big data analysis. However, the downside is that there are not many algorithms for computing
the analysis objectives (e.g., community, centrality, substructure, etc.) on MLNs directly. They
are typically converted into a single graph representation to use existing algorithms. Research
in this direction is becoming important due to the benefits of MLNs for modeling, eficient
analysis, and the ability to handle large data sets in a flexible manner. All these are illustrated
in this paper for degree centrality computation of heterogeneous MLNs.
      </p>
      <p>There are two distinct types of multilayer
networks: Homogeneous and Heterogeneous.</p>
      <p>If each layer of a MLN has a common set of
entities, they are homogeneous MLN (HoMLNs).</p>
      <p>For example, the US Airline data set can be
modeled using a HoMLN, where nodes in
each layer represent the cities and edges
correspond to the flights between cities as shown
in Figure 1 (a). The other type of multilayer
network is the heterogeneous MLN (HeMLN),
where the sets of entities are diferent across
layers. IMDb data set requires HeMLNs to
model actors, directors, and movies as
difer</p>
      <p>Figure 1: MLN Types ent layers along with their inter-layer edges
as shown in Figure 1 (b) to capture relationships, such as directs-an-actor, directs-a-movie, etc.</p>
      <p>
        Current approaches to centrality detection in MLNs, such as type-independent [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
projection-based [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], do not support structure and semantics preservation without elaborate
mappings as they aggregate (or collapse) layers into a simple graph in diferent ways. As
observed in the literature, without additional mappings, currently-used aggregation approaches
are likely to result in some information loss [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], distortion of properties [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or hide the efect
of diferent entity types and/or diferent intra- or inter-layer relationships as elaborated in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Furthermore, structure and semantics preservation is critical for understanding the layer to
which the nodes of interest belong during drill-down analysis of results. From an analysis
perspective, lack of structure and semantics makes the drill-down and visualization of results
extremely dificult (or even impossible) and hence their understanding. In our approach, analysis
results clearly show the structure and ease of drill-down to see patterns in terms of original
layers, labels, and relationships.
      </p>
      <p>Centrality nodes are the most influential nodes in a network or graph. Diferent centrality
metrics, such as degree, betweenness (multiple types), closeness, eigenvector, katz centrality,
page rank, and percolation centrality are defined for single graphs for various purposes. Some
of them are local (e.g., degree) and some are global (e.g., betweenness, closeness) properties of
the network. Typically, it takes more efort to compute global measures as compared to local
ones. Main memory algorithms exist for computing the above metrics which we leverage in
our decoupling-based approach. In this paper, we focus on degree centrality metric. In general,
degree centrality defines the relative importance of a node within the given network based on
the number of edges incident on it, that is its immediate or 1-hop neighborhood. It can be used
to infer the hubs of an airline network. Although there are algorithms for computing it for a
graph, to the best of our knowledge, there is no proper definition and algorithm that can be
applied directly to MLNs. Using aggregation or projection has a number of disadvantages as
discussed earlier. Hence, the focus of this paper is to develop such algorithms using a novel
technique termed the decoupling approach.</p>
      <p>Contributions of this paper are:
• Degree Centrality definition for Heterogeneous MLNs
• Two heuristics to improve accuracy, precision, and eficiency of computed results based
on decoupling-based approach
• Algorithms for computing degree centrality nodes directly on HeMLNs
• Experimental analysis on a number of large (200K vertices and 12 Million edges)
synthetic, and real-world graphs with diverse characteristics
• Accuracy, Precision, and Eficiency comparisons with ground truth and naive
approach</p>
      <p>The rest of the paper is organized as follows: Section 2 discusses related work. Section 3
introduces the decoupling approach used for MLN analysis and discusses its advantages and
challenges. Section 4 gives the degree centrality definition for HeMLNs. Section 4.1 discusses
ground truth, naive approach, and the challenges involved in developing techniques for centrality
detection. Section 4.2 and 4.3 give the details of the proposed heuristics. Section 5 describes the
experimental setup, data sets, and result analysis. Conclusions are in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The concept of centrality of a graph was first proposed by Bavelas in 1948 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and it has
been researched extensively thereafter. Traditionally, these centrality measures have been
implemented as main memory algorithms. With the advent of social media and web 2.0, the
amount of data being used for analysis has exploded in volume which has challenged the
computation of these on large data sets. Here we summarize the computation of centrality on
simple graphs. Our focus, however, is to develop algorithms for centrality on heterogeneous
multilayer networks using the decoupling-based framework proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Degree of a node is defined as the total number of edges that are incident on it. It was used as
a centrality measure by Shah in 1954. If the network is directed, then the total number of edges
that are directed towards the node is called indegree and that are directed outwards from the
node is called outdegree [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. To normalize, the degree of a node is divided by the maximum
number of nodes it can have edges with, that is | | − 1 for an undirected graph.
      </p>
      <p>
        While the above algorithms are focused on a single graph, there is a need to extend them
to MLN. There has been some work to detect degree centrality in homogeneous multilayer
networks (also called multiplexes), where the same set of nodes are connected in multiple
layers/networks [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. In this paper, we focus on computing degree centrality directly on
heterogeneous MLNs where each layer has a diferent set of nodes that are connected by
intraand inter-layer edges. Further, we use the decoupling approach due to its advantages.
3. Decoupling Approach For Multilayer Networks (or MLNs)
Multilayer networks consist of multiple layers of simple graphs where each layer represents
a feature of its entities and their relationships in the graph. MLNs are being used to model
and analyze large and complex data sets from diverse domains, such as social networks for
community detection [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], anomaly detection [14], biological networks to analyze the patterns
of human brains [15], finding solutions to oil leakages [ 16], and so on. However, most
algorithms convert the MLN (or a subset of it) into a simple graph using aggregation or projection
techniques, leading to a loss of structure, semantics, and information from the final analysis
results. Moreover, the existing single graph algorithms cannot be applied directly on the MLNs.
      </p>
      <p>In this paper, we use a novel
decoupling-based framework
adopted by a few of the
recent works, which preserves
the structure and semantics of
HeMLNs [17, 18] while
performing analysis on complex
data sets without losing any
information, unlike the
tradi</p>
      <p>
        Figure 2: Decoupling Approach for HeMLN Degree Hubs tional approach. The network
decoupling approach has been illustrated with respect to the degree centrality computation
in Figure 2. It consists of two functions: analysis (Ψ) and composition (Θ). Using the analysis
function, each layer in the network is independently analyzed. Then, the partial results from any
two layers are combined with the inter-layer edges and processed by the composition function
to produce the results for the two layers. This binary composition can be easily extended to n
layers by applying it repeatedly on previous results. It is also possible to analyze each layer in
parallel to improve the eficiency [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Further, due to the layer-wise analysis, each graph is small, which requires less memory
for computing layer-wise results. Each layer is analyzed once, and the existing single graph
algorithms can be used for analyzing individual layers. The results obtained are then used by
the composition function. This approach is also application-independent.</p>
      <p>
        In this approach, the major challenge is to develop the composition algorithm. For accuracy
guarantees, it becomes critical to determine the minimal additional information required from
each layer during analysis phase to be used for composition, without efecting overall eficiency.
4. Degree Centrality: HeMLN Definition and Heuristics
Degree centrality tells us about the relative importance of a node within the network based on
the number of edges it has. While degree centrality is defined for a single graph, there are no
definitions/algorithms for HeMLNs that we are aware of 1. We first define the degree centrality
measure for HeMLNs. Note that there are multiple ways to define the degree centrality for a
1On the other hand, degree centrality for HoMLNs is defined in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as cross-layer degree centrality (or CLDC)
which corresponds to degree centrality definition using the Boolean OR operation in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>HeMLN. We take the traditional aggregation approach to define them. This definition extends
the definition for single graphs to MLNs using union (or Boolean OR) aggregation.
Definition 1. A heterogeneous multilayer network   (, ), is defined by two sets of
graphs. The set  = {1, 2, . . . , } contains simple graphs, where (, ) is defined
by a set of vertices  and a set of edges . An edge (, ) ∈ , connects vertices  and ,
where ,  ∈ . The set  = {1,2, 1,3, . . . , − 1,} consists of bipartite graphs. Each graph
, (,  , , ) is defined by two sets of vertices  and  and a set of edges (or links) , , such
that for every link (, ) ∈ , ,  ∈  and  ∈  , where  ( ) is the vertex set of graph 
( ). For a HeMLN, the set  is defined only for those layers that have inter-layer edges. For
HeMLNs, ’s are disjoint and some ’s may be empty.</p>
      <p>Definition 2. Degree centrality of a node, z, in a   (, ) with n layers where |V| =
Σ=1|| is defined as,
()
  () = (1)
| | − 1
where, () denotes the number of 1-hop neighbors of node  in the MLN. The
highdegree centrality nodes (also called degree hubs) are the ones that have more than or equal to the
average value.</p>
      <p>The above definition of degree centrality covers both single graphs and HeMLNs. In this
paper, we focus on detecting the degree centrality hubs using the decoupling-based approach
for undirected graphs using the above definition.
4.1. Ground Truth, Naive Approach for Baseline Accuracy, and Challenges
The ground truth for the HeMLN is
computed on the ground truth graph as per
centrality definition given above. This graph is
the union (or Boolean OR on the edges of
the layers and includes inter-layer edges) of
HeMLN layers resulting in a single graph.
Existing algorithms are used for computing the
degree centrality nodes of the ground truth
graph. This can be done for any k layers of the
HeMLN. The composition step, being binary,
uses two layers. Results of the heuristic-based
algorithms using the decoupling approach are Figure 3: Degree Hubs (marked in yellow) in the
validated against this ground truth. individual layers vs. entire HeMLN</p>
      <p>The naive approach for computing degree centrality hubs of the HeMLN using the
decoupling approach is used as a baseline for comparing and improving the accuracy and precision
with proposed heuristic-based approaches. Naive composition takes hubs (computed
independently) from each layer, performs union of those hubs (due to OR aggregation), and considers
them to be the hubs for the corresponding two-layer HeMLN. This baseline approach does not
use any additional information from the layers. However, based on observations 1 and 2 below,
results from the naive approach are not likely to match the ground truth results due to the
presence of false positives and false negatives, respectively. This is considered a minimum/baseline
accuracy that we can further improve by using selective additional information from each layer
and inter-layer edges during composition.</p>
      <p>Observation 1. A node that is a hub in either layer  or , may not be a hub in the HeMLN
of , , and ,
Example: It can be clearly observed from Figure 3 that node  despite being a hub in layer ,
does not have enough inter-layer edge connectivity with the nodes in layer , and thus ceases
to be a hub in the HeMLN of , , and ,.</p>
      <p>Observation 2. A node that is not a hub in either layer  or , may become be a hub in the
HeMLN of , , and ,
Example: Again, from Figure 3, it can be observed that node  despite being a node that has
low 1-hop connectivity with other nodes in layer  (that is a non-hub), has many inter-layer
edges with the nodes in layer , and thus becomes a hub in the HeMLN of , , and ,.</p>
      <p>The above observations clearly indicate why false positives and false negatives can be
generated by the naive composition depending on layer characteristics and inter-layer edges.
Therefore, in order to obtain, in general, accuracy closer to the ground truth (or always
matching the ground truth), the challenge is to identify the additional information that needs to
be maintained from the layers. Our goal is to develop heuristics for composition functions
that maximize accuracy (expressed as Jaccard coeficient with respect to ground truth) and the
computation cost is still significantly below the ground truth computation cost . We test results
from heuristic-based algorithms with the ground truth and naive approach. Our heuristics can
be applied repeatedly to the results and composed with other layers. This will involve ordering
of computation to maximize accuracy. Due to space constraints, in this paper, we validate
results on two layers. In general, heuristics can be applied as a binary function for any number
of layers.</p>
      <sec id="sec-2-1">
        <title>4.2. Heuristic HeMLN-PG for Precision Guarantee</title>
        <p>The question is what additional information from layers can we use that can guarantee reduction
or elimination false positives and negatives. What can we gain if we keep the degree value of
all the hubs from each layer as well as the information to compute the average degree? It turns
out that this can totally eliminate false positives as indicated by the following lemma.
Lemma 1. By keeping the number of 1-hop neighbors (or only degree information) for all hubs
along with the number of vertices and edges in layers  and , false positives can be completely
eliminated in the computation of degree hubs in the HeMLN generated by , , and , using
the decoupling approach.</p>
        <p>Proof. Based on observation 2, a non-hub from a layer may become a hub in the HeMLN and is
not detected as such. This results in a false negative. With the information kept only for hubs,
this can happen. However, a false positive is one where a layer hub becomes a non-hub in the
HeMLN, but is detected as a hub. This cannot happen if degree information for all hubs (from
each layer) is kept and their degrees are updated correctly using inter-layer edges. Also, the
average degree of the HeMLN can be correctly computed using node, edge information from
each layer, and inter-layer edges. Hence, whether a previous layer hub can still be a HeMLN
hub or not can be correctly determined without generating any false positives.</p>
        <p>Based on lemma 1, heuristic HeMLN-PG is proposed. Algorithm 1 presents the composition
function algorithm.</p>
        <p>Algorithm 1 Composition Algorithm Θ for Heuristic HeMLN-PG
INPUT:
, ||, ||, , ||, ||
 = {1 : 1, 2 : 2, ...,  : }
 = {1 : 1, 2 : 2, ...,  : }
, = {(1, 1), (2, 2), ...}
ALGORITHM:
1: , ← ∅
2: for (, ) ∈ , do
3: if u ∈  || [] == 1 then
4: [] = [] + 1
5: else
6: [] = 1
7: end if
8: if v ∈  || [] == 1 then
9: [] = [] + 1
10: else
11: [] = 1
12: end if
13: end for
14: , = 2* (||+||+|,|)</p>
        <p>||+||
15: for [] ∈  ∪  do
16: if [] ≥ , then
17: , ← , ∪ 
18: end if
19: end for</p>
        <p>During analysis, for each layer , in addition to degree hubs (), we collect their degree
values ([]), total number of nodes (||), and edges (||) as additional information to be
used during composition.</p>
        <p>In the composition function, for each inter-layer edge ((, ) ∈ , ), the degree value of
both the nodes ([],  []) on which the edge is incident upon is incremented by 1. This
step will be able to calculate the correct degree of nodes that are hubs in the individual layers
and approximate the degree of other nodes as their original layer degree are unavailable. The
average degree of HeMLN (, ) is calculated as two times the number of total edges in
the HeMLN (|| + | | + |, |) divided by the total number of nodes in the HeMLN (|| + | |).
The estimated degree hubs for HeMLN are the ones whose final degree value after scanning all
inter-layer edges is greater than or equal to the average degree of the entire network. In order
to increase the degree for each inter-layer edge, a hash lookup is carried out on both nodes.</p>
        <p>The proposed heuristic has been illustrated in figure 4 (top half). Here, the hubs from layer
 are nodes C (deg = 3) and E (deg = 4), and the hub from layer  is node Q (deg = 3). The
degree of these nodes gets increased in the composition function for each inter-layer edge they
are a part of. The 6 inter-layer edges are marked (F, Q), (F, R), (F, S), (E, P), (E, S), (E, R), and (E, Q).
Therefore, we add to the degrees of nodes E, Q, F, P, R, and S. Since F, P, R, and S are not degree
hubs in the layers and we did not record their layer degree information, thus, their degrees
are initialized to 1 in the composition function, when the first corresponding inter-layer edge
is encountered. Heuristic HeMLN-PG identifies E, and Q as the degree hubs for the HeMLN
as these nodes have degrees more than or equal to the average degree (≥ 3.4). In this case, a
false negative was generated in the form of node F, which also came out as a hub as part of the
ground truth (E, F, Q) in Figure 3.</p>
        <p>HeMLN-PG will never generate false positives, as it is able to correctly calculate the degree
information of hubs which moreover gets compared against the correct value of the average
HeMLN degree. That is, the precision for this heuristic is always accurate.</p>
      </sec>
      <sec id="sec-2-2">
        <title>4.3. Heuristic HeMLN-AG for Accuracy Guarantee</title>
        <p>The major drawback of HeMLN-PG is that it may generate some false negatives as it does not
calculate the correct degree value of the nodes that are not layer-wise hubs, thus leading to
inconsistent recall values. However, if we keep degree information for all nodes in each layer,
this can be avoided. This leads to the following lemma.</p>
        <p>Lemma 2. It is suficient to maintain the number of 1-hop neighbors of each node (not just the
hubs) along with the number of nodes and edges in layers  and  to compute degree hubs in
the HeMLN generated by , , and , accurately (i.e., to match the ground truth) using the
decoupling approach.</p>
        <p>Proof. Using the inter-layer edges, the revised degree of each node in each layer can be correctly
computed. The average HeMLN degree can also be computed using the number of layer nodes,
intra-, and inter-layer edges. Hence, degree hubs for the HeMLN can be correctly computed.</p>
        <p>Note that anything less than the above is not suficient (although may not be necessary in
many cases) to compute degree hubs correctly. Any node can become a hub in the HeMLN
depending upon the number of inter-layer edges connected to it which is not available prior to
composition step. In order to understand this better, we conducted experiments using a
percentage of high degree nodes (instead of all nodes) and it turns out that a smaller
percentage – 25% to 50% – can give very high accuracy (shown in Fig. 9 of Section 5.)</p>
        <p>The two heuristics also clearly indicate the relevance and amount of additional information
needed for the decoupling-based approach for improving accuracy. If only precision is required,
less additional information can be used. We will also show that even with the additional
information used for obtaining ground truth accuracy, the cost incurred by the
decouplingbased approach is significantly less than that of ground truth computation cost.</p>
        <p>On the basis of lemma 2, the second heuristic has been proposed to eliminate both false
negatives and false positives to provide accuracy guarantee. In this case, from the analysis
phase along with the degree hubs information, we also retain the degree values of the remaining
nodes from each layer. We do not add any changes to the composition function, so it remains
the same as HeMLN-PG, where the degree of a node is updated if there is an inter-layer edge
incident to that node. Thus, we are able to correctly compute the degree value of every node using
this heuristic. This is also illustrated in the bottom half of Figure 4, where all the hub nodes (E,
F, Q) are correctly generated.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Experimental Analysis</title>
      <sec id="sec-3-1">
        <title>5.1. Data Sets</title>
        <p>Both synthetic and real-world data sets have been used for validating accuracy and performance
gain. Subgen [19] and Recursive-Matrix (R-MAT) ([20]) are used to create synthetic data sets. A
parallel version of R-MAT termed Parallel R-MAT (PaRMAT) has been used to create large data
sets.</p>
        <p>Synthetic Graphs: Table 2 provides a list of synthetic data sets we used in our experiments
along with their characteristics. Graph sizes vary from 25K vertices and 100K edges to 200K
vertices and 10 Million edges. Their characteristics vary as well in terms of node distribution in
layers, sparsity, number of connected components, and average degree.</p>
        <p>Layer #Nodes #Intra-Layer Edges Inter-Layer
Actor 9486 996527 Actor-Director
Director 4511 250845 Actor-Movie
Movie 7952 8777618 Director-Movie
(a) IMDb Data Set
#Inter-Layer Edges
32033
31422
8581</p>
        <p>Layer #Nodes #Intra-Layer Edges Inter-Layer #Inter-Layer Edges
Coauthor 16918 2483 Author-Paper 37142
Papers 10326 12044080 Author-Year 29984
Year 18 18 Paper-Year 10326
(b) DBLP Data Set</p>
        <p>Real-world Data Sets: In addition to the above, experiments have been performed on the
International Movie Database (IMDb), the DBLP Computer Science Bibliography, and data sets
from The Laboratory for Web Algorithmics [21, 22]. Only IMDb (Table 1a) and DBLP (Table 1b)
details have been provided due to space constraints.</p>
        <p>Node distri- Average de- Max de- Min de- Dangling
bution (%) gree gree gree nodes
#connected
components</p>
        <p>Largest com- Sparsity (%)
ponent
25KV100KE
25KV200KE
25KV300KE
25KV400KE
35KV400KE
45KV400KE
55KV400KE
100KV1ME
100KV2ME
100KV3ME
100KV4ME
200KV1ME
200KV5ME
200KV10ME
5.2. Experimental Results of HeMLN Degree Centrality
posed heuristics give consistently higher
accuracy as compared to the baseline naive
approach with respect to the ground truth.</p>
        <p>Finally, the eficiency evaluation of heuristic HeMLN-AG has been presented in Figure 8 as
compared to the ground truth evaluation. The time taken for HeMLN-AG is calculated as the
sum of the maximum time taken for the layers (as layer analysis is done in parallel) and the
composition time. The experiments validate that heuristic HeMLN-AG generated 100% accuracy
with savings in computation time ranging between 15% to 47% for synthetic data sets,
and between 3% to 68% for real-world data sets.</p>
        <p>We have applied HeMLN-AG as a binary function to compute the results of HeMLNs with
more than 2 layers. We have analyzed our results of HeMLN-AG up to three layers for real-world
data sets IMDb and DBLP and found out the accuracy of data sets remain 100% with significant
savings in computational times.
Efect of Additional
Information: We performed
experiments to understand
the efect of additional
layerwise information. For
heuristic HeMLN-PG (only hub
degrees maintained - one end of
the spectrum) and heuristic
HeMLN-AG (all node degrees
maintained - the other end of Figure 9: Impact of Additional Information on Accuracy
the spectrum), it can be clearly observed that the increase in accuracy in HeMLN-AG comes at
the cost of an increase in the amount of information retained from each layer. Moreover, we
modified HeMLN-AG, to include from the layers the degree information of the top x% nodes
(sorted in the decreasing order of degree values) + the hub nodes. Figure 9 demonstrates for few
of the larger synthetic data sets (100KV1ME to 100KV4ME) how maintaining such additional
information gradually increases the accuracy until it reaches a saturation point. However,
more layer information increases composition phase cost as well. This validates the trade-of
between the savings in computational cost and the accuracy of the results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <p>In this paper, we defined degree centrality for heterogeneous multilayer networks. Based
on intuition of degree hubs, we proposed two heuristics for the decoupling approach - one
that provides precision guarantee and the other that provides accuracy guarantee. We
were able to prove the need for minimal additional information to obtain the guarantees, Each
may be useful for diferent applications. A large number of experiments were performed on
diverse synthetic and real-world data sets to validate accuracy, precision, and eficiency of our
algorithms. Information retained and accuracy versus eficiency trade-of was also demonstrated.
Acknowledgments: For this work, Drs. Sharma Chakravarthy and Abhishek Santra were partly
supported by NSF Grant CCF-1955798. Dr. Sharma Chakravarthy was also partly supported by
NSF Grant CNS-2120393.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fortunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castellano</surname>
          </string-name>
          ,
          <article-title>Community structure in graphs</article-title>
          ,
          <source>in: Ency. of Complexity and Systems Science</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kivelä</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barthelemy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Gleeson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <article-title>Multilayer networks</article-title>
          ,
          <source>CoRR abs/1309</source>
          .7233 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Santra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <article-title>Holistic analysis of multi-source, multi-feature data: Modeling and computation challenges</article-title>
          , in: P.
          <string-name>
            <surname>K. Reddy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sureka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chakravarthy</surname>
          </string-name>
          , S. Bhalla (Eds.),
          <source>Big Data Analytics - 5th International Conference, BDA</source>
          <year>2017</year>
          , Hyderabad, India,
          <source>December 12-15</source>
          ,
          <year>2017</year>
          , Proceedings, volume
          <volume>10721</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -72413-
          <issue>3</issue>
          _4. doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -72413-3\_4.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Santra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Komar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakravarthy</surname>
          </string-name>
          ,
          <article-title>From base data to knowledge discovery - A life cycle approach - using multilayer networks</article-title>
          ,
          <source>Data Knowl. Eng</source>
          .
          <volume>141</volume>
          (
          <year>2022</year>
          )
          <article-title>102058</article-title>
          . URL: https://doi.org/10.1016/j.datak.
          <year>2022</year>
          .
          <volume>102058</volume>
          . doi:
          <volume>10</volume>
          .1016/j.datak.
          <year>2022</year>
          .
          <volume>102058</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Domenico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nicosia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Latora</surname>
          </string-name>
          ,
          <article-title>Layer aggregation and reducibility of multilayer interconnected networks</article-title>
          ,
          <source>CoRR abs/1405</source>
          .0425 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Magarinos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chernomoretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aguero</surname>
          </string-name>
          ,
          <article-title>A multilayer network approach for guiding drug repositioning in neglected diseases</article-title>
          ,
          <source>PLOS</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>M. De Domenico</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Solé-Ribalta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gómez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
          </string-name>
          ,
          <article-title>Navigability of interconnected networks under random failures</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          (
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          .1073/pnas.1318469111. arXiv:https://www.pnas.org/content/early/2014/05/21/1318469111.full.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Delling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pajor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Werneck</surname>
          </string-name>
          ,
          <article-title>Computing classic closeness centrality, at scale</article-title>
          ,
          <source>in: Proceedings of the Second ACM Conference on Online Social Networks, COSN '14</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>37</fpage>
          -
          <lpage>50</lpage>
          . URL: https://doi.org/10.1145/2660460.2660465. doi:
          <volume>10</volume>
          .1145/2660460.2660465.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Santra</surname>
          </string-name>
          ,
          <article-title>Analysis of Complex Data Sets Using Multilayer Networks: A Decoupling-based Framework</article-title>
          ,
          <source>Ph.D. thesis</source>
          , The University of Texas at Arlington,
          <year>2020</year>
          . https://itlab.uta.edu/ students/alumni/PhD/Abhishek_Santra/ASantra_PhD2020.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Surolia</surname>
          </string-name>
          , Degree Centrality, Springer New York, New York, NY,
          <year>2013</year>
          , pp.
          <fpage>558</fpage>
          -
          <lpage>558</lpage>
          . URL: https://doi.org/10.1007/978-1-
          <fpage>4419</fpage>
          -9863-7_
          <fpage>935</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-1-
          <fpage>4419</fpage>
          -9863-7_
          <fpage>935</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <article-title>Centrality in social networks conceptual clarification</article-title>
          ,
          <source>Social networks 1</source>
          (
          <year>1978</year>
          )
          <fpage>215</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Santra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakravarthy</surname>
          </string-name>
          , Hubify:
          <article-title>Eficient estimation of central entities across multiplex layer compositions</article-title>
          , in: R.
          <string-name>
            <surname>Gottumukkala</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Aluru</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Miele</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Wu (Eds.),
          <source>2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops</source>
          <year>2017</year>
          , New Orleans, LA, USA, November
          <volume>18</volume>
          -
          <issue>21</issue>
          ,
          <year>2017</year>
          , IEEE Computer Society,
          <year>2017</year>
          , pp.
          <fpage>142</fpage>
          -
          <lpage>149</lpage>
          . URL: https://doi.org/10.1109/ICDMW.
          <year>2017</year>
          .
          <volume>24</volume>
          . doi:
          <volume>10</volume>
          .1109/ICDMW.
          <year>2017</year>
          .
          <volume>24</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bródka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Skibicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kazienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Musial</surname>
          </string-name>
          ,
          <article-title>A degree centrality in multi-layered social network</article-title>
          ,
          <year>2011</year>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>242</lpage>
          . doi:
          <volume>10</volume>
          .1109/CASON.
          <year>2011</year>
          .
          <volume>6085951</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>