A Human Communication Network Model

                       Oksana Pichugina1, Babak Farzad2
        1
            Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
                               pichugina os@mail.ru
                      2
                        Brock University, St. Catharines, Canada
                                 bfarzad@brocku.ca


       Abstract. A number of attributed formation models based on Erdos-
       Renyi and Barabasi-Albert random graph models are presented. One of
       them is a Human Communication Network (HCN) model based on time
       restrictions on face-to-face communication. Construction of this weighted
       network requires a few numerical parameters and allows to transform any
       unweighted node attributed network into weighted. This transformation
       helps solving numerous problems in Network Analysis such as commu-
       nity detection, network topology inference, etc. Understanding nature of
       human communication networks allows to solve many practical problems
       starting with fast spreading any information and innovation through the
       networks and ending with detecting key people, collaboration with whom
       helps achieving different goals.


Keywords. Social Networks, Community Detection, Attributed Networks, Node
Partition, Random Graphs

Key Terms. Network Decoration, Community Detection, Random Graphs


1    Introduction
Network Analysis is an area of research that has been studied intensively lately.
Researches investigate structural characteristics of different networks, network
formation models, and many other related questions. Among a variety of net-
works, social networks, which reflect a diversity of people relationships, are a
priority [5], [7]. Study of social networks is important since it helps understand-
ing how our world is organized, what place each of us takes in it, how this
situation affects us and how the knowledge can be used to achieve our goals.
Social networks are characterized by heterogeneity of nodes and edges, spar-
sity, high average clustering coefficient, small average shortest path length and
power-law degree distribution, existing observable and tightly bound groups of
elements called communities [5]. Most of these properties are united in ”small-
world networks” and ”scale-free networks” concepts [1],[4]. Many attempts have
been made to construct social networks, but still no satisfactory solution to
simulate all the listed properties is found [1],[4],[5]. It is also important to find
efficient ways of these community detection (CD) [6],[8]. We believe that the key


ICTERI 2016, Kyiv, Ukraine, June 21-24, 2016
Copyright © 2016 by the paper authors
                                          - 34 -


in qualitative CD in social networks is in using the heterogeneity (making them
multi-layer ones) and study the issue of such networks formation.

2    Definitions and Notations
Definition 1. [6] A social network is a hybrid graph, which is represented in
the form:
                              G = (V, E, Λ, Λ0 ),                         (1)
where V is the set of nodes (the social network’s users), E is the set of edges
(these users relationships), Λ and Λ0 contain an information about attributes
related to each node v ∈ V and each edge {u, v} ∈ E, respectively.
The network represented in the form (1) is an attributed network if Λ ∪ Λ0 =  6 ∅.
So, any attributed network representing individuals’ relationships is social.
                             0
Let Λ ∈ IRn×K , Λ0 ∈ IRm×K hence V, E are of size |V | = n, |E| = m. Initially,
we consider an unweighted node-attributed network G = {V, E, Λ}, K ≥ 1. Then
we assign weights to its edges (decorate the edges) and come to consideration of
a weighted network Gw = {V, E, Λ, Λ0 } with Λ0 being a matrix-column of edge
weights (K 0 = 1). Gw is a node-edge-attributed and used then for CD.
Introduce some notations: G[.] = {V, E [.] , Λ} - is an unweighted node-attributed
                                              [.]
network with an adjacency matrix B [.] = (bij ) ∈ IRn×n . After decoration E [.] by
weights, the new weighted network is denoted by Gw[.] = {V, E [.] , Λ, Λ0 } and its
                                                      [.]
weighted adjacency matrix (WAM) - by A[.] = (aij ) ∈ IRn×n .
                    [.]
The node degree di of a node vi ∈ V in G[.] is the number of its incident edges:
 [.]       [.]          [.]                                                             [.]
di = |Ni | where Ni = {u ∈ V : u ↔ vi , {u, vi } ∈ E [.] }. The node strength si
                                                               [.]
of vi ∈ V is a sum of weights of its incident edges in G . In terms of adjacency
                                                           [.]     P [.] [.] P [.]
and weighted adjacency matrices, these values are: di = j bij , si = j aij .
                                                                               SL
A network cover is a division of the network nodes C = {Cl } satisfying l=1 Cl =
V . If in the division the node clusters Cl , l ∈ JL = {1, ..., L}, are pairwise disjoint,
then it is called a network partition.
Assume that the nodes are decorated by K discrete attributes {AT k }, Λ = (atki )
where atki is the value of AT k for a node vi , and there are Lk different values
of AT k . Let AClk ∈ V be a set of nodes with l-th value of AT k . We call it
a node attribute cluster (AC) and denote a G-partition into ACs related to
different values of AT k by AC k = {AClk }l∈JLk (nkl = |AClk |). Let G[.]k be a G[.] -
subnetwork related to AT k . A sum of unweighted     S networks       {Gk }k of the same
                                                          k
node set V is an unweighted network G = {V, k E , Λ}. A linear combination
of weighted
     S k networks           {Gwk }k of a node
                                         P setk V is a weighted network G =
                                                                                     w
                  0
{V, k E , Λ, Λ } with a WAM A = k αk A where {αk } ⊂ IR are coefficients
                                         [.]                                            [.]
of this linear P combination. Let ω(G ) be denoted a weight of a network G
       [.]
(ω(G ) = ij aij ). A network of a weight one is a normalized network.
The networks’ linear combination is a weighted network sum if
                        X                                   X
                 Gw =        W k · Gwk where {W k } > 0,           W k = 1.            (2)
                          k                                   k
                                      - 35 -


3     Motivation
Let us consider a social network. Suppose that, in addition to basic information
about the node and edge sets, there is available some extra information about
the nodes and edges features (social semantic networks are highly helpful here
[2]). These additional characteristics are called attributes and the procedure of
their complementing is decoration of the network [3], [5] resulted in creation
of an attributed network [8]. Applying CD on the network we, typically, get
communities closely related to one node attribute and this dominant attribute
does not allow us to observe communities in other layers related to the rest, less
important, node attributes. For instance, in the humankind network the domi-
nant attribute would be belongingness to families. If we are interested in study
communities, say, in work place, then the family division is an obstacle on this
way. However, if it is possible to transform the network into weighted, moreover,
to assign edge weights to each layer subnetworks of the multi-layer network, the
problem of multi-layer CD (MLCD) can be solved. For that we just detect the
dominant attribute and extract the corresponding subnetwork from considera-
tion repeating then the procedure on the remaining network.
The crucial part of the approach is constructing edge sets of the one-layer sub-
networks and distributing weights within them. The first one is a problem of the
attributed network formation considered in Sect.4.1 (the edge inference problem
[5]), the second one is the edge attribute inference problem [5]. The last one we
solve for a social network of people face-to-face communication in Sec.4.2.


4     Human Communication Network Models
At this section we touch formation of attributed networks. We are wondering
how an attributed network (1) is formed if the information about nodes V and
their attributes Λ is known. In other words, we review formation of an edge set
E and its attributes Λ0 and refer to them as Problems 1 and 2, respectively.

4.1   Attributed Network Formation
We consider a number of ways to solve Problem 1. For convenience, we interpret
the presented network formation models in terms of communication of people
spending a time together during common activities/interests (AIs). Here nodes
are people and their AIs are the nodes’ attributes.
Model 1 - an association network model. An association network Ga [5]
is an example of an attributed network where links exist between any nodes
with common attributes. It can be interpreted as a network of virtual contacts
of people with common interests where supporting such contacts does not need
anything.
The auxiliary network Gwk corresponds to each activity/interest (AI) AT k ; Gw
is representable as a weighted network sum (2) of K networks, which are collec-
tions of complete graphs: Gwk = ∪ Knkl . Thus the network Ga is a cover of
                                   l∈JLk
                                       - 36 -


K overlapping V -partitions by a disjoint union of complete graphs.
Model 2 - an attributed networks model based on Erdos-Renyi Model.
Suppose that for existing an edge a similarity of node attributes is necessary, but
not enough because of randomness. Similar to Model 1, we represent the network
Gw by (2). Edges in Gwk are created randomly with probability pkl between two
nodes vi , vj sharing the l-th value of the attribute AT k . Hence Gk is a node par-
tition by Erdos-Renyi Random Graphs (ERRGs) [4]: Gwk = ∪ ERRG(pkl , nkl )
                                                               l∈JLk
and the resulting network Gw is an overlapping of K partitions by ERRGs.
In terms of human communication, Model 2 simulates a real situation where a
group of people is formed simultaneously. Contacts of each user occur randomly
without analysing any prior information due to its inaccessibility. The commu-
nication can be established on a regular basis only if these people actually have
common interests. Different type of contacts are formed independently.
Model 3 - an attributed networks model based on Barabasi-Albert
Model. In comparison with Model 2, here we review a situation where a group
of people is formed gradually. First of all, group members aspire to contacts
with popular and authoritative colleagues in each area of expertise. First, these
contacts are formed for the most important AIs, then for the less significant. A
chance to clarify common interests is higher if the contact already exists.
As before, Gw is a weighted network sum (2). {Gwk } are formed consecutively
by k in accordance with decreasing priorities of node attributes. For each k an
edge set E k is formed between nodes with the same value of AT k consecutively
by i with probabilities depending on degrees of all preceding nodes {dki0 }i0 <i and
parameters pk , p0k (pk ≤ p0k ) for new and previously established contacts.
There are many ways of a generalisation to attributed networks of Barabasi-
Albert Preferential Attachment Model [1]. For instance, each auxiliary network
Gk is formed as follows: disjoint subsets of nodes of different ACs are connected
by preferential attachment and then the isolated subnetworks are connected
forming the whole node partition AC k . These all partitions are united into a
cover with respect to node attributes priorities, Λ, and pre-assigned order of the
nodes arising. The network layers are dependent regardless we consider the case
pk = p0k , ∀ k (Model 3.1) or another one (∃k : pk < p0k ). Model 3.1 simulates
a node partition by Barabasi-Albert Graphs (BAGs). Each Gwk can be repre-
sented in a manner of Models 1, 2: Gwk = ∪ BAG(nkl , αlk ) where αlk is the
                                                l∈JLk
power of preferential attachment in AClk .

4.2   The Human Communication Model
The models presented in Sect. 4.1 - Model 2 and Model 3 - are able to simulate
networks of real, face-to-face contacts implying requirements to spend time for
keeping in touch. Suppose an edge set E was formed according to Models 2 or 3.
To finish the network Gw formation, Problem 2 has to be solved and the matrix
Λ0 be formed. Here we present a way to distribute edge weights according to
assumptions typical, in our opinion, for real people interaction. We will refer to
the obtained network as a Human Communication Network (HCN).
                                         - 37 -


The HCN model assumptions. Condition 1. People AIs had already formed;
Condition 2. Connections between people are possible if they have common AIs;
Condition 3. Each person distributes uniformly the time tk allotted for support-
ing a contact related to the AI AT k between friends of this interest;
Condition 4. For everyone possibility of the communication is restricted by time
T . If for a person the time is not enough for supporting his/her contacts, then
                                                                          0
the time allotted for supporting a contact related to the AT k and AT k is dis-
                                      0
tributed proportionally to tk and tk , respectively;
Condition 5. If two persons with the same interest are ready to devote time to
each other, then, if necessary, they come to a compromise following certain rules.
Formalise Conditions 1-5 in terms of the WAM A. We rewrite (2) in the form:
                           X      0             0
                     Gw =      Gw k , where Gw k = W k · Gwk .                (3)
                               k
                                                                                       0
In addition to Gw satisfying Conditions 1-5, we introduce networks Gw∗ , Gw ∗
satisfying Conditions 1-3 and 1-4, respectively.
                                   0
Similarly
P ∗k tow0the      (3), Gw∗ and Gw ∗ are representable as networks sums: Gw∗ =
                                                                             0
             ∗
                = k G0∗k where G∗k , G0∗k are subnetworks of Gw∗ and Gw ∗
                   P
   kG , G
related to AT k . Respectively, the following holds for the corresponding WAMs:
                         X              X              X
                    A=      A0k , A∗ =     A∗k , A0∗ =     A0∗k .           (4)
                           k               k                    k

Let a set of vi , vj common attribute values be found as follows: Eij = {k : atki =
                       [.]k          [.]
atkj } ⊆ JK . Then Ni = {vj ∈ Ni : k ∈ Eij } is a set of vi -neighbours with the
same AT k -value as vi in G[.] . We expand the notations of the node degree and
                            [.]                [.]k       [.]k        [.]k
strength from the set Ni into the sets {Ni }k : a) di = |Ni | is the node
                                                 [.]k              [.]k
attribute AT k -degree of vi ∈ V in G[.] ; b) si = vj ∈N [.]k aij - is the AT k -
                                                      P
                                                               i
                                                                       [.] P [.]k
strength of vi in G[.] . Respectively, the node strength in G[.] is si = k si .
 1. We start with assigning edge weights in Gw∗ :
    (a) Condition 1 says that the network Gw∗ is decorated by discrete attributes
        AT k and the matrix Λ is known;
    (b) Condition 2 means that the links are formed only by similarity of the
        node’s attributes, hence if i, j : Eij = ∅ ⇒ {vi , vj } ∈
                                                                / E.
    (c) Condition 3 allows to determine the ratio of A∗k -elements: if i, j, j 0 , k, k 0
                                    0       0
        such that atki = atkj , atki = atkj 0 , then

                                          at∗k
                                            ij         tk
                                               0   =       .                         (5)
                                          a∗k
                                           ij 0        tk0
    Since there is no restrictions on the communication time in Gw∗ , it implies
    that all of the contacts are supported at the appropriate level. So, the weights
    in Gw∗ , Gw∗k can be assigned with respect to the maximal needed time tk :
                                                         X
                         ∀i, j : k ∈ Eij a∗k    k   ∗
                                          ij = t ; aij =    tk .                 (6)
                                                               k∈Eij
                                            - 38 -


   Notice that A∗ is symmetric thus Gw∗ is undirected. The communication
   time of each person depends on the number of the contacts of each type
   therefore theP node   strengths
                            P ∗k in P  Gw∗k , Gw∗ are defined as follows: s∗k  i  =
           ∗          ∗k
    k k
   di t , si = k,j aij = k si = k dki tk . In terms of the HCN model, the
   values s∗i , s∗k
                 i can be interpreted as the time that a person i could devote
   for the communication overall0 and for the particular AI, correspondingly.
2. Moving on to the network Gw ∗ , we add Condition 4 - the time restriction -
   to the network Gw∗ . This condition determines how much time a person i is
   ready to spend for supporting each AI-contact depending on his/her priori-
   ties and the number of these contacts. It can be expressed as the restriction
   on node strengths by T -value: ∀i s0∗i ≤ T . If it holds, then the above restric-
                       0
   tion holds and Gw ∗ = Gw∗ , otherwise the weights a∗k      ij are scaled to meet
   the time restriction:             0
                                   aij∗k = νi∗ a∗k
                                                ij                               (7)
   where the scaling parameter νi∗ depends on the node i strength: νi∗ =
                                                    0
                                                  a ∗k      tk ·ν ∗
                                                                       k
   min 1, sT∗ . Substitution (7) into (5) yields: a0ij∗k0 = tk0 ·νi∗ = ttk0 . It means
              i                                                ij 0          i

   that each person distributes his/her own time independently from each other
   and guided common priorities W accumulated in t = (tk ): W = t/|t|. Find
                   0
   the WAM of Gw ∗ by (4),(7):
                        0     X 0            X
                       aij∗ =    aij∗k = νi∗   a∗k    ∗    ∗
                                                ij = νi · aij .            (8)
                                    k                k

   The weights a0∗    0∗
                ij , aji determine how much time a person i is ready to devote
   for communication with a person j and vice versa. It is clear that normally
                                                                      w0 ∗
   these are different values, a0∗       0∗
                                  ij 6= aji . Thus the network G           is directed that
   does not display a face-to-face communication.
3. To describe the real situation, we consider constructing the final network
                 0                                                                          0
   Gw from Gw ∗ . By adding Condition 5, the abstract directed network Gw ∗
   is transformed into the undirected Gw with weights equal to time that both
   persons - i and j - actually devote to each other. The weights are obtained
   as a result of a compromise between these persons who are ready to spend
   together not the same time.
   Let persons i and j have a real contact (Eij 6= {∅}) and are looking for a com-
   promise (a0∗      0∗
              ij 6= aji ). The result of their common decision can be expressed as
   function of these weights aij = f (a0∗        0∗
                                           ij , aji ). The function f (.) can be chosen in
   different way. For instance, we choose a simple averaging: aij = 21 (a0∗              0∗
                                                                                   ij + aji ).
                                                       ∗
   Then, by (8) and due to a symmetry of A , we have:

                      aij = 0.5(νi∗ a∗ij + νj∗ a∗ji ) = 0.5 · a∗ij (νi∗ + νj∗ ).            (9)

   Distribution of weights within {G0k } is obtained from (4), (6), (9): aij =
   P 0k      νi∗ +νj∗ P   ∗k   νi∗ +νj∗ P       k  νi∗ +νj∗ P k k
     k aij =     2     k aij =     2     k∈Eij t =     2     k t bij wherefrom
               0                                     0
             aijk = 0.5(νi∗ + νj∗ )tk bkij , akij = aijk /W k = 0.5|t|(νi∗ + νj∗ )bkij .   (10)
                                       - 39 -


4.3   Human Communication Network Simulation
Example 1 - Model 2 simulation. First, we demonstrate a solution of Prob-
lem 1 for Model 2 (see Sect. 4.1). Parameters of a simulated node-attributed
network G are: the order n = 60, the number of node attributes K = 3, the
nodes are divided randomly into {Lk }k = {5, 4, 6} attribute clusters of the
same sizes: (nkl ) = (125 , 154 , 106 ). The result of the simulation with param-
eters (pkl ) = (0.35 , 0.34 , 0.56 ) is shown in Figure 1.


      Fig. 1. Model 2 - the weighted network Gw and its subnetworks G1 − G3


           Fig. 2. The HCN GwI                    Fig. 3. The HCN GwII


Example 2 - HCN Model 2 simulation. We took the unweighted network G
from Example 1 and converted it into the HCN-Model 2 network (see Sect. 4.2)
decorating edges by weights according to (9), (10). Two values of the time re-
source T = (T I , T II ) and the vector t = (t1 , t2 , t3 ) = (4, 3, 2) are used. The
                                      (4,3,2)
vector of priorities of AIs is W = |(4,3,2)|  = (0.45, 0.33, 0.22). We constructed
two networks GwI , GwII corresponding to T I , T II . The time restrictions are
                                           - 40 -


chosen in the following way: a) in the network GwI for majority, 80%, of people
the time T I is sufficient to support their contacts completely; b) for the network
GwII the situation is opposite - most, 80%, of people should distribute their
time resource T II . For the simulated in Example 1 network these parameters
are T = (56, 40). In Figures 2-3 we can see the resulted HCNs and observe that
edge weights in GwI are more heterogeneous than the ones in GwII . Most likely,
the reason is in absence in GwI , in most cases, of necessity to redistribute the
time resource. After normalizing Gw , the weights of the subnetworks {Gwk } are
(ω(Gwk )) = (0.772, 1.330, 0.962), hence they are all not normalised and Gw2 is
the ”haviest”.
Results of community detection. CD on G does not show community struc-
ture in the network whilst CD on Gw quite accurately yields the partition AC 2
into ACs related to AT 2 , namely, in 80% of cases two ACs of AC 2 were detected
correct, rest two - with one error each in GwI .

5      Conclusions and Future Work
The presented Human Communication Network (HCN) model demonstrates an
approach to reconstructing missing network information about edges and edge
weights based on node attributes and assumptions on nature of interaction in
the networks. To the edge inference problem we apply an extension of Erdos-
Renyi and Barabasi-Albert random graph models to multi-layer node attributed
networks. There is shown that, in spite of interconnection of HCN layers, CD
is running better in these networks decorated by weights.
The results we are planning to expand to other kinds of networks and use for
designing new MLCD algorithms and solving node attribute inference problems.

References
    1. Albert, R., Barabsi, A.-L.: Statistical mechanics of complex networks. Rev. Mod.
       Phys. 74, 47-97 (2002)
    2. Breslin, J., Passant, A., Decker, S.: The Social Semantic Web, 2010 edition. ed.
       Springer, Heidelberg; New York (2009)
    3. Brning, J., Geiler, V.A., Lobanov, I.S.: Spectral Properties of Schrodinger Oper-
       ators on Decorated Graphs. Mathematical Notes. 77, 858-861 (2005)
    4. Erds, P., Rnyi, A: On random graphs, I. Publicationes Mathematicae (Debrecen).
       6, 290-297 (1959)
    5. Kolaczyk, E.D.: Statistical Analysis of Network Data, Springer Series in Statistics.
       Springer New York, New York (2009)
    6. Lin, Z., Zheng, X., Xin, N., Chen, D.: CK-LPA: Efficient community detection al-
       gorithm based on label propagation with community kernel. Physica A: Statistical
       Mechanics and its Applications. 416, 386-399 (2014)
    7. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, 1
       edition. ed. Cambridge University Press, Cambridge; New York (1994)
    8. Zhou, Y., Cheng, H., Yu, J.X.: Clustering Large Attributed Graphs: An Efficient
       Incremental Approach. In: 2010 IEEE 10th International Conference on Data Min-
       ing (ICDM), pp. 689-698 (2010)