Generalized Preferential Attachment: Towards
    Realistic Socio-Semantic Network Models

                                     Camille Roth

            CREA (Center for Research in Applied Epistemology),
         CNRS/Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France
                        roth@shs.polytechnique.fr


      Abstract. The mechanism of preferential attachment underpins most
      recent social network formation models. Yet few authors attempt to check
      or quantify assumptions on this mechanism. We call generalized preferen-
      tial attachment any kind of preference to interact with other agents with
      respect to any node property. We introduce tools for measuring empiri-
      cally and characterizing comprehensively such phenomena, consequently
      suggest significant implications for model design, and apply these tools
      to a socio-semantic network of scientific collaborations, investigating in
      particular homophilic behavior. This opens the way to a whole class of
      realistic and credible social network morphogenesis models.

      Keywords: Morphogenesis models, Preferential attachment, Social Net-
      works, Dynamic Networks, Complex systems.


      ACM: G.2.2; MSC: 68R10; PACS: 89.65.-s, 87.23.Ge, 89.75.-k


Introduction

A recent challenge in structural research in social science consists in modeling
social network formation. Social networks are usually interaction networks —
nodes are agents and links between nodes represent interactions between agents
— and in this respect, modeling these networks involves disciplines linked both
to graph theory (computer science and statistical physics), mathematical so-
ciology and economics [1, 9, 28]. Most of the interest in this topic stems from
the empirical observation that real social networks strongly differ from uniform
random graphs as regards several statistical parameters; and foremost with re-
spect to node connectivity distribution, or degree distribution. Indeed, in random
graphs a la Erdos-Renyi [12] links between agents are present with a constant
probability p and degree distributions follow a Poisson law, whereas empirical
social networks exhibit power-law, or scale-free, degree distributions [1]. This
phenomenon suggested that link formation does not occur randomly but rather


                                                                                     29
depends on node and network properties — that is, agents do not interact at
random but instead according to heterogeneous preferences for other nodes.
    Hence, early social network models endeavored to describe non-uniform in-
teraction and growth mechanisms yielding the famous “scale-free” degree distri-
bution [2]. Subsequently, much work has been focused on determining processes
explaining and rebuilding more complex network structures consistent with those
observed in the real world — a consistency validated through a rich set of statis-
tical parameters measured on empirical networks, not limited to degree distribu-
tion but including as well clustering coefficient, average distance, assortativity,
etc. [6, 7, 15, 22, 32].
    However, even when cognitively, sociologically or anthropologically credi-
ble, most of the hypotheses driving these models are mathematical abstractions
whose empirical measurement and justification are dubious, if any. In this paper,
we call preferential attachment (PA) any kind of non-uniform interaction behav-
ior and introduce tools for experimentally measuring PA with respect to any
node property. We hence suggest significant implications for model design, and
eventually apply these measures to an empirical case of socio-semantic network.
In particular, we question degree-related PA, and estimate homophily.


1     A brief survey of social network models
Barabasi & Albert [2] pioneered the use of preferential linking in social network
formation models to successfully rebuild a particular statistical parameter, the
scale-free degree distribution. In their model, new nodes arrive at a constant rate
and attach to already-existing nodes with a likeliness linearly proportional to
their degree. This model has been widely spread and reused, and the term “pref-
erential attachment” has consequently been often understood as degree-related
only preferential attachment. Since then, many authors introduced diverse modes
of preferential link creation depending on either various node properties (hidden
variables and “types” [5, 30], fitness [7], centrality, euclidian distance [19, 13],
common friends [17], bipartite structure [14], alleged underlying group structure
[32], etc.) or on various linking mechanisms (competitive trade-off and optimiza-
tion heuristics [10, 13], two-steps node choice [31], group formation [15, 24], to
cite a few).
    However and even in recent papers, hypotheses on PA are often arbitrary
and at best supported by qualitative intuitions. Existing quantitative estima-
tions of PA and consequent validations of modeling assumptions are quite rare,
and either (i) related to the classical degree-related PA [3, 11, 16, 25], sometimes
extended to a selected network property, like common acquaintances [21]; or
(ii) reducing PA to a single parameter: for instance using direct mean estima-
tion [15], econometric approaches [23] or Markovian models [29].1 In addition,
the way distinct properties correlatively influence PA is widely ignored. Thus,
while of great interest in approaching the underlying behaviorial reality of social
1
    Let us also mention link prediction from similarity features based on various strictly
    structural properties [18], obviously somewhat related to PA.


                                                                                             30
networks, these works may not be able to provide a sufficient empirical basis and
support for designing trustworthy PA mechanisms, and accordingly for propos-
ing credible social network morphogenesis models. Yet in this view we argue that
the following points are key:
1. Node degree does not make it all — and even the popular degree-related PA
   (a linear “rich-get-richer” heuristics) seems to be inaccurate for some types
   of real networks [3], and possibly based on flawed behavioral fundations, as
   we will suggest below in Sec. 4.2.
2. Strict social network topology and derived properties may not be sufficient
   to account for complex social phenomena — as several above-cited mod-
   els suggest, introducing “external” properties (such as e.g. node types) may
   influence interaction; explaining for instance homophily-related PA [20] re-
   quires at least to qualify nodes using non-structural data.
3. Single parameters cannot express the rich heterogeneity of interaction be-
   havior — for instance, when assigning a unique parameter to preferential
   interaction with close nodes, one misses the fact that such interaction could
   be significantly more frequent for very close nodes than for loosely close
   nodes, or discover that for instance it might be quadratic instead of linear
   with respect to the distance, etc.
4. Often models assume properties to be uncorrelated which, when it is not the
   case, would amount to count twice a similar effect;2 knowing correlations
   between distinct properties is necessary to correctly determine their proper
   influence on PA.
    To summarize, it is crucial to conceive PA in such a way that (i) it is a
flexible and general mechanism, depending on relevant parameters based on
both topological and non-topological properties; (ii) it is an empirically valid
function describing the whole scope of possible interactions; and (iii) it takes
into account overlapping influences of different properties.


2   Measuring preferential attachment
PA is the likeliness for a node to be involved in an interaction with another node
with respect to node properties. In order to measure it, we first have to distin-
guish between (i) single node properties, or monadic properties (such as degree,
age, etc.) and (ii) node dyad properties, or dyadic properties (social distance,
dissimilarity, etc.). When dealing with monadic properties indeed, we seek to
know the propension of some kinds of nodes to be involved in an interaction. On
the contrary when dealing with dyads, we seek to know the propension for an
interaction to occur preferentially with some kinds of couples.3
2
  Like for instance in [17] where effects related to degree and common acquaintances
  are combined in an independent way.
3
  Note that a couple of monadic properties can be considered dyadic; for instance, a
  couple of nodes of degrees k1 and k2 considered as a dyad (k1 , k2 ). This makes the
  former case a refinement, not always possible, of the latter case.


                                                                                         31
2.1     Monadic PA
Suppose we want to measure the influence on PA of a given monadic property m
taking values in M = {m1 , ..., mn }. We assume this influence can be described by
a function f of m, independent of the distribution of agents of kind m. Denoting
by “L” the event “attachment of a new link”, f (m) is simply the conditional
probability P (L|m) that an agent of kind m is involved into an interaction.
    Thus, it is f (m) times more probable that an agent of kind m receives a
link. We call f the interaction propension with respect to m. For instance, the
classical degree-based PA used in Barabasi-Albert and subsequent models —
links attach proportionally to node degrees [2, 3, 8] — is an assumption on f
equivalent to f (k) ∝ k.
    P (m) typically denotes the distribution of nodes of type m. The probability
P (m|L) for a new link extremity to be attached to an agent of kind m is therefore
proportional to f (m)P (m), or P (L|m)P (m). Applying the Bayes formula yields
indeed:
                                          f (m)P (m)
                              P (m|L) =                                        (1)
                                             P (L)
               X
with P (L) =        f (m0 )P (m0 ).
               m0 ∈M
    Empirically, during a given period of time ν new interactions occur and
2ν new link extremities appear. Note that a repeated interaction between two
already-linked nodes is not considered a new link, for it incurs acquaintance bias.
The expectancy of new link extremities attached to nodes of property m along a
                                         2ν
period is thus ν(m) = P (m|L) · 2ν. As         is a constant of m we may estimate
                                       P (L)
f through fˆ such that:
                         
                          fˆ(m) = ν(m) if P (m) > 0
                                    P (m)                                       (2)
                         ˆ
                            f (m) = 0       if P (m) = 0

      Thus 1P (m)f (m) ∝ fˆ(m), where 1P (m) = 1 when P (m) > 0, 0 otherwise.

2.2     Dyadic PA
Adopting a dyadic viewpoint is required whenever a property has no meaning
for a single node, which is mostly the case for properties such as proximity,
similarity — or distances in general. We therefore intend to measure interac-
tion propension for a dyad of agents which fulfills a given property d taking
values in D = {d1 , d2 , ..., dn }. Similarly, we assume the existence of an essential
dyadic interaction behavior embedded into g, a strictly positive function of d;
correspondingly the conditional probability P (L|d). Again, interaction of a dyad
satisfying property d is g(d) times more probable. In this respect, the probability
for a link to appear between two such agents is:
                                            g(d)P (d)
                                P (d|L) =                                         (3)
                                              P (L)


                                                                                         32
                X
with P (L) =           g(d0 )P (d0 ).
               d0 ∈D
   Here, the expectancy of new links between dyads of kind d is ν(d) = P (d|L)ν.
        ν
Since       is a constant of d we may estimate g with ĝ:
      P (L)
                          
                                     ν(d)
                             ĝ(d) =       if P (d) > 0
                          
                                     P (d)                                   (4)
                             ĝ(d) = 0     if P (d) = 0
                          

Likewise, we have 1P (d)g(d) ∝ ĝ(d).

3     Interpreting interaction propensions
3.1    Shaping hypotheses
The PA behavior embedded in fˆ (or ĝ) for a given monadic (or dyadic) property
can be reintroduced as such in modeling assumptions, either (i) by reusing the
exact empirically calculated function, or (ii) by stylizing the trend of fˆ (or ĝ)
and approximating f (or g) by more regular functions, thus making possible
analytic solutions.
   Still, an acute precision when carrying this step is often critical, for a slight
modification in the hypotheses (e.g. non-linearity instead of linearity) makes
some models unsolvable or strongly shakes up their conclusions. For this reason,
when considering a property for which there is an underlying natural order, it
                                                                        Xmi
may also be useful to examine the cumulative propension F̂ (mi ) =            fˆ(m0 )
                                                                        m0 =m1
as an estimation of the integral of f , especially when the data are noisy (the
same goes with Ĝ and ĝ).

3.2    Correlations between properties
Besides, if modelers want to consider PA with respect to a collection of proper-
ties, they have to make sure that the properties are uncorrelated or that they
take into account the correlation between properties: evidence suggests indeed
that for instance node degrees depend on age. If two distinct properties p and
p0 are independent, the distribution of nodes of kind p in the subset of nodes of
                                                       P (p|p0 )
kind p0 does not depend on p0 , i.e. the quantity                must theoretically be
                                                        P (p)
                  0
equal to 1, ∀p, ∀p . Empirically, it is possible to estimate it through:4
                                      P (p|p0 )
                        
                           cc
                            p (p) =             if P (p) > 0
                         0
                                        P (p)                                      (5)
                           cc
                            p (p) = 0           if P (p) = 0
                         0

in the same manner as previously.
4
    For computing the correlation between a monadic and a dyadic property, it is easy
    to interpret P (p|d) as the distribution of p-nodes being part of a dyad d.


                                                                                         33
3.3   Essential behavior

As such, calculated propensions do not depend on the distribution of nodes of
a given type at a given time. In other words, if for example physicists prefer to
interact twice more with physicists than with sociologists but there are three
times more sociologists around, physicists may well be apparently interacting
more with sociologists. Nevertheless, fˆ remains free of such biases and yields the
“baseline” preferential interaction behavior of physicists.
    However, fˆ could still depend on global network properties, e.g. its size, or its
average shortest path length. Validating the assumption that fˆ is independent
of any global property of the network — i.e., that it is an entirely essential
property of nodes of kind p — would require to compare different values of fˆ for
various periods and network configurations. Put differently, this entails checking
whether the shape of fˆ itself is a function of global network parameters.


3.4   Activity

Additionally, fˆ represents equivalently an attractivity or an activity: if interac-
tions occur preferentially with some kinds of agents, it could as well mean that
these agents are more attractive or that they are more active. If more attrac-
tive, the agent will be interacting more, thus being apparently more active. To
distinguish between the two effects, it is sometimes possible to measure indepen-
dently agent activity, notably when interactions occur during events, or when
interaction initiatives are traceable (e.g. in a directed network).
    In such cases, the distinction is far from neutral for modeling. Indeed, when
considering evolution mechanisms focused not on agents creating links, but in-
stead on events gathering agents (like in [15, 24]), modelers have to be careful
when integrating back into models the observed PA as a behavioral hypothesis.
Some categories of agents might in fact be more active and accordingly involved
in more events, not enjoying more attractivity. This would eventually lead the
modeler to refine agent behavior characterization by including both the par-
ticipation in events and the number of interactions per event, rather than just
preferential interactions.


4     An application to socio-semantic networks

4.1   Definitions

We now apply the above tools to a socio-semantic network, that is, a social
network where agents are also linked to semantic items. We examine therein
two particular kinds of PA: (i) PA related to a monadic property: the node
degree; and (ii) PA linked to a dyadic property: homophily, i.e. the propension
of individuals to interact more with similar agents.


                                                                                         34
                                                     c’
                                  a’

                                                           c
                                 a

                            A            a’’              c’’
                                                   S
 Fig. 1. Sample socio-semantic network (3 agents a, a0 , a00 and 3 concepts c, c0 , c00 ).


Networks. The social network A is the network of agents, where links corre-
spond to interactions: A = (A, EA ), with A denoting the agent set and EA the
(undirected) set of links between agents. Interactions occur through events, and
each event is associated with a semantic content, made of semantic markers (e.g.
keywords), or concepts taken in a concept set C. Similarly, agents are linked to
concepts associated with events they are involved in, forming a second network,
S = (A ∪ C, EAC ). Thus we deal with two kinds of links: (i) between pairs of
agents, and (ii) between concepts and agents. Since we measure agent behavior
through network dynamics, we also consider the temporal series of networks A(t)
and S(t), with t ∈ IN, which altogether make a dynamic socio-semantic network
(see Fig. 1).

Empirical protocol. Empirical data come from the bibliographical database Med-
line which contains dated abstracts of published articles of biology and/or me-
dicine. We focused on a portion concerning a well-defined community of em-
bryologists working on the zebrafish, during the period 1997-2004. Translated
in the above framework, articles are events, their authors are the agents, and
semantic markers are made of expert-selected abstract words. In order to have
a non-empty and statistically significant network for computing propensions, we
first build the network on an initialization period of 7 years (from 1997 to end-
2003), then carry the calculation on new links appearing during the last year.
The dataset contains around 10, 000 authors, 5, 000 articles and 70 concepts;
about 10, 000 new links appear during the last year.

4.2   Degree-related PA
We use Eq. 2 and consider the node degree k as property m (thus M = IN): in this
manner, we intend to compute the real slope fˆ(k) of the degree-related PA and
compare it with the assumption “f (k) ∝ k”. This hypothesis classically relates
to the preferential linking of new nodes to old nodes. To ease the comparison,
we considered the subset of interactions between a new and an old node.
    Empirical results are shown on Fig. 2. Seemingly, the best linear fit corrob-
orates the data and tends to confirm that f (k) ∝ k. The best non-linear fit


                                                                                             35
     fHkL                                         FHkL
                                                   1

    0.1                                          0.8
0.08
                                                 0.6
0.06
                                                 0.4
0.04

0.02                                             0.2

                                        k                                               k
            5     10     15     20                         5     10     15     20


Fig. 2. Left: Degree-related interaction propension fˆ, computed on a one-year period,
for k < 25 (confidence intervals are given for p < .05); the solid line represents the best
linear fit. Right: Cumulated propension F̂ . Dots represent empirical values, the solid
color line is the best non-linear fit for F̂ ∼ k1.83 , and the gray area is the confidence
interval.


however deviates from this hypothesis, suggesting that f (k) ∝ k 0.97 . However,
the confidence interval on this exponent is [0.6 − 1.34] thus dramatically too
wide to determine the precise exponent, which may be critical. When the data
is noisy like in the present situation, since there is a natural order on k it is
very instructive to plot the cumulated propension F (k)  ˆ = Pk0 fˆ(k) on Fig. 2.
                                                                  k =1
In this case, the best non-linear fit for F̂ is F̂ (k) ∝ k 1.83 ±0.05, confirming the
slight deviation from a strictly linear preference which would yield k 2 .

Rich-work-harder. This precise result is not new and agrees with existing studies
of the degree-related PA (e.g. [16, 21]). Nevertheless, we wish to stress a more
fundamental point concerning this kind of PA. Indeed, considerations on agent
activity lead us to question the usual underpinnings and justifications of PA
related to a monadic property. Regarding in particular degree-related PA, we
question the “rich-get-richer” metaphor describing rich, or well-connected agents
as more attractive than poorly connected agents, thus receiving more connections
and becoming even more connected.5
    When considering the activity of agents with respect to k, that is, the number
of events in which they participate (here, the number of articles they co-author),
“rich” agents are proportionally more active than “poor” agents (see Fig. 3),
and thus obviously encounter more interactions. It might thus well simply be
that richer agents work harder, not are more attractive; the underlying behavior
linked to preferential interaction being simply “proportional activity”.6
5
  “(...) the probability that a new actor will be cast with an established one is much
  higher than that the new actor will be cast with other less-known actors” [2].
6
  We obviously make the assumption that k accurately reflects author activity, i.e.
  a behavioral feature. Thus k is a proxy for agent activity and, if the number of
  coauthors does not depend on k (which is actually roughly the case in this data),
  then observing a quasi-linear degree-related PA is not surprising.


                                                                                              36
      aHkL                                           AHkL
 HeventsperiodL                                  HÚ eventsL
        2                                             15
                                                   12.5
      1.5
                                                      10
        1                                           7.5
                                                       5
      0.5
                                                    2.5
                                      k                                              k
              5    10    15    20                            5    10    15   20


Fig. 3. Left: Activity a(k) during the same period, in terms of articles per period
(events per period) with P
                         respect to agent degree; solid line: best linear fit. Right: Cu-
mulated activity A(k) = kk0 =1 a(k), best non-linear fit is k1.88 ±0.09.


    While formally equivalent from the viewpoint of PA measurement, the “rich-
get-richer” and “rich-work-harder” metaphors are not behaviorally equivalent.
One could choose to be blind to this phenomenon and keep an interaction propen-
sion proportional to node degree. On the other hand, one could also prefer to
consider higher-degree nodes as more active, assuming instead that the number
of links per event is degree-independent and that agents do neither prefer, nor
decide to interact with famous, highly connected nodes; a hypothesis supported
by the present empirical results. These two viewpoints, while both consistent
with the observed PA, bear distinct implications for modeling as underlined in
Sec. 3.4.
    More generally, such feature supports the idea that events, not links, are the
right level of modeling for social networks — with events reducing in some cases
to a dyadic interaction. Then, modelers would have to break down interaction
propensions into (i) activities (number of events) and (ii) interactivities (number
of interactions per event).


4.3   Homophilic PA


Homophily translates the fact that agents prefer to interact with other resem-
bling agents. Here, we assess the extent to which agents are “homophilic” by
introducing an inter-agent semantic distance. By semantic distance we mean a
function of a dyad of nodes that enjoys the following properties: (i) decreasing
with the number of shared concepts between the two nodes, (ii) increasing with
the number of distinct concepts, (iii) equal to 1 when agents have no concept in
common, and to 0 when they are linked to identical concepts.


                                                                                            37
    Given (a, a0 ) ∈ A2 and denoting by a∧ the set of concepts a is linked to, we
introduce a semantic distance δ(a, a0 ) ∈ [0; 1] satistying the previous properties:7

                                           |(a∧ \ a0∧ ) ∪ (a0∧ \ a∧ )|
                             δ(a, a0 ) =
                                                  |a∧ ∪ a0∧ |

   As δ takes real values in [0, 1] we need to discretize δ. To this end, we use
a uniform partition of [0; 1[ in I intervals, to which we add the singleton {1}.
We thus define a new discrete property         d taking values in D = {d0 , d1 , ..., dI }
consisting of I+1 intervals: D = [0; I1 [; [ I1 ; I2 [; ...[ I−1
                                 
                                                              I ; 1[; {1} . Finally, we obtain
an empirical estimation of homophily with respect to this distance by applying
Eq. 4 on d, with I = 15.


                             gHdL

                          0.2
                          0.1
                         0.05

                         0.02
                         0.01
                                                                         d
                                0 1 2 3 4 5 6 7 8 9 1011121314

Fig. 4. Thick solid line: Homophilic interaction propension ĝ with respect to d ∈
D = {d0 , ..., d15 }. Thin lines: Confidence interval, for p < .05. Because several fitting
functions are conceivable no particular fitting has been carried on this graph. The
y-axis is in log-scale.


    The results are gathered on Fig. 4 and show that while agents favor interac-
tions with slightly different agents (as the initial increase suggests), they still very
strongly prefer similar agents, as the clearly decreasing trend indicates (sharp
decrease from d4 to d13 , with d4 being one order of magnitude larger than d13 —
note also that ĝ(d0 ) = ĝ(d1 ) = 0 because no new link appears for these distance
values). Agents thus display semantic homophily, a fact that fiercely advocates
7
    Note that this kind of distance, based on the Jaccard coefficient [4], has been exten-
    sively used in Information Retrieval, as well as recently for link formation prediction
    in [18]. The point here is however not to focus on this particular similarity mea-
    sure, but to show that simple non-topological properties may also strongly influence
    interaction behavior.
       Written in a more explicit manner, with a∧ = {c1 , ..., cn , cn+1 , ..., cn+p } and
    a = {c1 , ..., cn , c0n+1 , ..., c0n+q }, we have δ(a, a0 ) = p+q+n
     0∧                                                            p+q
                                                                        ; n and p, q representing
                                                  ∧      0∧
    respectively the number of elements a and a have in common and have in proper.
    We also verify that if n = 0 (disjoint sets), δ(a, a0 ) = 1; if n 6= 0, p = q = 0 (same
    sets), δ(a, a) = 0; and if a∧ ⊂ a0∧ (included sets), δ(a, a0 ) = q+n   q
                                                                             . It is moreover easy
    though cumbersome to show that δ(., .) is also a metric distance.


                                                                                                     38
the necessity of taking semantic content into account when modeling such social
networks.

4.4    Correlation between degree and semantic distance
In other words, the exponential trend of ĝ suggests that scientists seem to choose
collaborators most importantly because they are sharing interests, and less be-
cause they are attracted to well-connected colleagues, which besides actually
seems to reflect agent activity. As underlined in Sec. 3.1, when building a model
of such network based on degree-related and homophilic PA, one has to check
whether the two properties are independent, i.e. whether or not a node of low
degree is more or less likely to be at a large semantic distance of other nodes. It
appears here that there is no correlation between degree and semantic distance:
for a given semantic distance d, the probability of finding a couple of nodes
including a node of degree k is the same as it is for any value of d — see Fig. 5.
    To go further, we might suggest that socio-semantic networks are structured
in communities because agents group according to similar interests, in epistemic
communities [27], through a mechanism involving events where agents are more
or less active, and gather preferentially with respect to their interests; the former
being entirely independent of the latter.


                       P Hk È dL  P HkL
                           1.2
                           1.1
                             1
                           0.9
                           0.8
                           0.7


                                                                      k
                                        5      10      15       20


                                                                               P (k|d)
Fig. 5. Degree and semantic distance correlation estimated through cbd (k) =           ,
                                                                                P (k)
plotted here for three different values of d: d ∈ {d5 , d8 , d11 }.


Conclusion
Preferential attachment is the cornerstone of growth mechanisms in most recent
social network formation models. This notion was established by the success
of a pioneer model [2] rebuilding a major stylized fact of empirical networks,
the scale-free degree distribution. While PA has subsequently been widely used,
few authors have tried to check or quantify the rather arbitrary assumptions on
PA — when such prospects exist, they are mostly dealing with degree-related


                                                                                           39
PA or estimating PA phenomena as single parameters. Models should go further
towards empirical investigation when designing hypotheses. This would be really
appealing to social scientists, who are usually not seeking normative models.
We are confident the present reluctance to measuring interaction behaviors and
processes is due to the lack of a clean general framework for this purpose, which
is the aim of this paper.
    We thus introduced the notion of “interaction propension”, whereby we as-
sume that agents have an essential preferential interaction behavior. Using this
concept, we designed measurement tools for quantifying, in a dynamic network,
any kind of PA with respect to any property of a single node or of a dyad of
nodes — a generalized preferential attachment. The result is a function yielding
a comprehensive description of interaction behavior related to a given property.
In addition to clarifying PA three features are crucial: (i) properties not related
to the network structure (such as homophily), (ii) correlations between proper-
ties, (iii) activity of agents and nature of interactions (i.e. modeling events, not
nodes attaching to each other). This kind of hindsight on the notion and status
of PA should be useful, even for normative models.
    We finally applied these tools to a particular case of socio-semantic network,
a scientific collaboration network with agents linked to semantic items. While
we restricted ourselves to a reduced example of two significant properties (node
degree and semantic distance), measuring PA relatively to other parameters
could actually have been very relevant as well — such as PA based on social
distance for instance (shortest path length between two agents in the social
network). Specifying the list of properties is nevertheless a process driven by the
real-world situation and by the stylized facts the modeler aims at rebuilding and
considers relevant for morphogenesis.
    More generally, this framework could be applied to any kind of network (in-
cluding semantic web formation modeling) as well as adapted to disconnection
propensions. Likewise, once propensions of interaction in the broad sense are
known, a whole class of morphogenesis models [5, 9, 26] can be designed, with
agents interacting on a growing network according to stylized interaction heuris-
tics, heuristics precisely based on those measured empirically. In fine, introducing
more credible hypotheses based on real-case empirical measures would obviously
help attract more social scientists in this promising field.

Acknowledgements. The author wishes to thank Clémence Magnien, Matthieu Lat-
apy and Paul Bourgine for very fruitful discussions, and also acknowledges interesting
remarks from David Chavalarias and three anonymous reviewers. This work has been
partially funded by the CNRS.


References
 1. R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Reviews
    of Modern Physics, 74:47–97, 2002.
 2. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science,
    286:509–512, 1999.


                                                                                         40
 3. A.-L. Barabási, H. Jeong, R. Ravasz, Z. Neda, T. Vicsek, and T. Schubert. Evo-
    lution of the social network of scientific collaborations. Physica A, 311:590–614,
    2002.
 4. V. Batagelj and M. Bren. Comparing resemblance measures. Journal of Classifi-
    cation, 12(1):73–90, 1995.
 5. M. Boguna and R. Pastor-Satorras. Class of correlated random networks with
    hidden variables. Physical Review E, 68:036112, 2003.
 6. M. Boguna, R. Pastor-Satorras, A. Diaz-Guilera, and A. Arenas. Models of social
    networks based on social distance attachment. Physical Review E, 70:056122, 2004.
 7. G. Caldarelli, A. Capocci, P. D. L. Rios, and M. A. Munoz. Scale-free networks
    from varying vertex intrinsic fitness. Physical Review Letters, 89(25):258702, 2002.
 8. M. Catanzaro, G. Caldarelli, and L. Pietronero. Assortative model for social net-
    works. Physical Review E, 70:037101, 2004.
 9. P. Cohendet, A. Kirman, and J.-B. Zimmermann. Emergence, formation et dy-
    namique des réseaux – modèles de la morphogenèse. Revue d’Economie Indus-
    trielle, 103(2-3):15–42, 2003.
10. V. Colizza, J. R. Banavar, A. Maritan, and A. Rinaldo. Network structures from
    selection principles. Physical Review Letters, 92(19):198701, 2004.
11. E. Eisenberg and E. Y. Levanon. Preferential attachment in the protein network
    evolution. Physical Review Letters, 91(13):138701, 2003.
12. P. Erdős and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290–
    297, 1959.
13. A. Fabrikant, E. Koutsoupias, and C. H. Papadimitriou. Heuristically optimized
    trade-offs: A new paradigm for power laws in the internet. In ICALP ’02: Proceed-
    ings of the 29th International Colloquium on Automata, Languages and Program-
    ming, pages 110–122, London, UK, 2002. Springer-Verlag.
14. J.-L. Guillaume and M. Latapy. Bipartite structure of all complex networks. In-
    formation Processing Letters, 90(5):215–221, 2004.
15. R. Guimera, B. Uzzi, J. Spiro, and L. A. N. Amaral. Team assembly mecha-
    nisms determine collaboration network structure and team performance. Science,
    308:697–702, 2005.
16. H. Jeong, Z. Néda, and A.-L. Barabási. Measuring preferential attachment for
    evolving networks. Europhysics Letters, 61(4):567–572, 2003.
17. E. M. Jin, M. Girvan, and M. E. J. Newman. The structure of growing social
    networks. Physical Review E, 64(4):046132, 2001.
18. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks.
    In CIKM ’03: Proceedings of the twelfth international conference on Information
    and knowledge management, pages 556–559, New York, NY, USA, 2003. ACM
    Press.
19. S. S. Manna and P. Sen. Modulated scale-free network in euclidean space. Physical
    Review E, 66:066114, 2002.
20. M. McPherson and L. Smith-Lovin. Birds of a feather: Homophily in social net-
    works. Annual Review of Sociology, 27:415–440, 2001.
21. M. E. J. Newman. Clustering and preferential attachment in growing networks.
    Physical Review Letters E, 64(025102), 2001.
22. M. E. J. Newman. The structure of scientific collaboration networks. PNAS,
    98(2):404–409, 2001.
23. W. W. Powell, D. R. White, K. W. Koput, and J. Owen-Smith. Network dynamics
    and field evolution: The growth of interorganizational collaboration in the life
    sciences. American Journal of Sociology, 110(4):1132–1205, 2005.


                                                                                           41
24. J. J. Ramasco, S. N. Dorogovtsev, and R. Pastor-Satorras. Self-organization of
    collaboration networks. Physical Review E, 70:036106, 2004.
25. S. Redner. Citation statistics from 110 years of physical review. Physics Today,
    58:49–54, 2005.
26. C. Roth and P. Bourgine. Binding social and cultural networks: a model. arXiv.org
    e-print archive, nlin.AO/0309035, 2003.
27. C. Roth and P. Bourgine. Epistemic communities: Description and hierarchic
    categorization. Mathematical Population Studies, 12(2):107–130, 2005.
28. B. Skyrms and R. Pemantle. A dynamic model of social network formation. PNAS,
    97(16):9340–9346, 2000.
29. T. A. Snijders. The statistical evaluation of social networks dynamics. Sociological
    Methodology, 31:361–395, 2001.
30. B. Söderberg. A general formalism for inhomogeneous random graphs. Physical
    Review E, 68:026107, 2003.
31. H. Stefancic and V. Zlatic. Preferential attachment with information filtering–node
    degree probability distribution properties. Physica A, 350(2-4):657–670, 2005.
32. D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social
    networks. Science, 296:1302–1305, 2002.


                                                                                           42