=Paper= {{Paper |id=Vol-3135/dataplat_paper2 |storemode=property |title=RTGEN : A Relative Temporal Graph GENerator |pdfUrl=https://ceur-ws.org/Vol-3135/dataplat_paper2.pdf |volume=Vol-3135 |authors=Maria Massri,Zoltan Miklos,Philippe Raipin,Pierre Meye |dblpUrl=https://dblp.org/rec/conf/edbt/Massri0RM22 }} ==RTGEN : A Relative Temporal Graph GENerator== https://ceur-ws.org/Vol-3135/dataplat_paper2.pdf
RTGEN : A Relative Temporal Graph GENerator
Maria Massri1 , Zoltan Miklos2 , Philippe Raipin1 and Pierre Meye1
1
    Orange Labs, Cesson-SΓ©vignΓ©, France
2
    University of Rennes CNRS IRISA, Rennes, France


                       Abstract
                       Graph management systems are emerging as an efficient solution to store and query graph-oriented data. To assess the
                       performance and compare such systems, practitioners often design benchmarks in which they use large scale graphs. However,
                       such graphs either do not fit the scale requirements or are not publicly available. This has been the incentive of a number of
                       graph generators which produce synthetic graphs whose characteristics mimic those of real-world graphs (degree distribution,
                       community structure, diameter, etc.). Applications, however, require to deal with temporal graphs whose topology is in
                       constant change. Although generating static graphs has been extensively studied in the literature, generating temporal
                       graphs has received much less attention. In this work, we propose RTGEN a relative temporal graph generator that allows the
                       generation of temporal graphs by controlling the evolution of the degree distribution. In particular, we propose to generate
                       new graphs with a desired degree distribution out of existing ones while minimizing the efforts to transform our source graph
                       to target. Our proposed relative graph generation method relies on optimal transport methods. We extend our method to also
                       deal with the community structure of the generated graphs that is prevalent in a number of applications. Our generation
                       model extends the concepts proposed in the Chung-Lu model with a temporal and community-aware support. We validate
                       our generation procedure through experiments that prove the reliability of the generated graphs with the ground-truth
                       parameters.

                       Keywords
                       Temporal graphs, Graph generation, Optimal transport



1. Introduction                                                                                class citizen in graph management systems [13, 14, 15,
                                                                                               16]. Most of these systems rely on real-world temporal
Graphs are the most natural model to describe real world                                       graphs to evaluate their proposed methods. Real-world
interactions and are currently used in a myriad of appli-                                      graphs, however, do not often fit the scale requirements.
cation domains such as citation [1], transportation [? ],                                      Therefore, practitioners must rely on a temporal graph
and sensor networks [2] to cite just a few. These graphs                                       generator that is able to produce large scale graphs whose
are managed by a graph management system whose per-                                            evolution correlates with that of real world temporal
formance is usually evaluated through graph-centered                                           graphs. To tackle this challenge, we proposed RTGEN:
benchmarks that address different performance metrics                                          a relative temporal graph generator that produces large
such as ingestion throughput, space usage and query                                            scale temporal graphs by controlling a number of key
execution time. In this context, practitioners refer to real-                                  features that characterises the evolution of real-world
world and synthetic graphs to use in the benchmarks.                                           graphs. That is, our generation procedure, controls the
Indeed, available graph generation techniques fill the                                         evolution of the degree distribution by extending a very
gap between real and synthetically generated graphs by                                         common generation technique [17] referred to as the
trying to mimic the characteristics of real graphs such as                                     Chung-Lu model with temporal and community-aware
controlling the degree distribution [3, 4, 5, 6, 7, 8]. Besides,                               support.
a number of existing graph generators are community-                                              We model a temporal graph by a sequence of snapshots
aware in the sense that they group vertices that are more                                      𝑆 = {𝐺0 , . . . , 𝐺𝑁 } where 𝐺𝑖 is the graph snapshot at
densely connected between each other than they are                                             timestamp 𝑑𝑖 and characterized by a degree distribution
with the rest of the graph, in separate or overlapping                                         that is generated from sampling user-defined temporal
sub-graphs called communities [9, 10, 11].                                                     parameters. Having this, our relative graph generation
   Real graphs, however, are dynamic [12] such that their                                      procedure consists of transforming πΊπ‘–βˆ’1 into 𝐺𝑖 by ap-
topology is subject to continuous changes. In this context,                                    plying a stream of atomic graph operations with respect
a new emphasis is being placed to support time as a first                                      to the desired degree distribution at time instants π‘‘π‘–βˆ’1
                                                                                               and 𝑑𝑖 . Based on the fact that a strong correlation exists
Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint                              between successive snapshots [18, 19, 20], we propose
Conference (March 29-April 1, 2022), Edinburgh, UK
                                                                                               to minimize the number of graph operations that have
" maria.massri@orange.com (M. Massri); zoltan.miklos@irisa.fr
(Z. Miklos); philippe.raipin@orange.com (P. Raipin);                                           to be applied in order to transform a graph snapshot
pierre.meye@orange.com (P. Meye)                                                               into its successor. The main idea consists of minimizing
    Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
                                                                                               the distance between degree distributions of successive
    CEUR Workshop Proceedings (CEUR-WS.org)
snapshots. We achieve this goal by relying on an optimal           Given the number of vertices in each graph snap-
transport solver which provides a transportation plan ca-       shot π‘˜π‘– ∈ {π‘˜1 , . . . , π‘˜π‘› }, a stochastic community
pable of transforming a "mass" from source distribution         matrix 𝑀 and a sequence of degree distributions
to target distribution with a minimum of work. In order         {πœ‘1 , . . . , πœ‘π‘› }, we generate a sequence of graph snap-
to apply the obtained transportation plan, we proposed a        shots {𝐺1 , . . . , 𝐺𝑛 } such that each snapshot 𝐺𝑖 is
straightforward generalization of the well-known Chung-         relatively generated by transforming πΊπ‘–βˆ’1 . This
Lu’s model, also known as the CL model, that was first          transformation is based on morphing the πœ‘πΊπ‘–βˆ’1 =
discussed in [21] and formalized in [17, 22]. We choose to          𝐺          𝐺                 𝐺      𝐺
                                                                {(π‘₯1 π‘–βˆ’1 , πœ”1 π‘–βˆ’1 ), . . . , (π‘₯𝑛 π‘–βˆ’1 , πœ”π‘› π‘–βˆ’1 )} into πœ‘π‘– =
extend this model for the reasons of simplicity and scal-       {(π‘₯1 , πœ”1 ), . . . , (π‘₯π‘˜ , πœ”π‘˜ )} and preserving the community
                                                                    𝑖     𝑖             𝑖   𝑖

ability. We also extended the CL model to partition the         structure that is represented by the stochastic commu-
graph into ground-truth communities that coexist with           nity matrix 𝑀 such that π‘€πΊπ‘–βˆ’1 = 𝑀𝐺𝑖 = 𝑀 . Note
the aforementioned time-dependent degree distribution.          that each element π‘šπ‘’π‘£ of M is equal to the probability
Our contributions are validated through experimental            of edge creation between the source and target commu-
results showing the evolution of the degree distribution        nities 𝑐𝑒 and 𝑐𝑣 . Figure 1 illustrates the relative graph
and community structure with respect to ground-truth            generation procedure. Each graph snapshot 𝐺𝑖 is rela-
input parameters.                                               tively generated by transforming its ancestor πΊπ‘–βˆ’1 . This
   The rest of the paper is organized as follows, Section       transformation is based on computing a transportation
2 provides an overview of the generation procedure. Sec-        matrix 𝑇 that minimizes the cost of morphing πœ‘πΊπ‘–βˆ’1
tion 3 introduces the baseline generation procedure of the      into πœ‘π‘– . The computation of the transportation matrix
CL model. Section 4 describes the proposed community-           reduces to an optimal transport problem. Based on the
aware extension of the CL model. Section 5 presents             computed transportation matrix, each vertex belonging
a detailed description of the proposed generation pro-          to the graph πΊπ‘–βˆ’1 is assigned with a linkage or break-
cedure. Section 6 provides an experimental evaluation           age probability to indicate the probability of adding or
of the synthetically generated temporal graphs. Section         removing an edge. This phase is followed by creating or
7 describes the related work. Section 8 concludes the           removing edges to or from the graph πΊπ‘–βˆ’1 to produce
work.                                                           the graph 𝐺𝑖 . These graph updates follows the linkage
                                                                or breakage probabilities assigned for each of the ver-
                                                                tices. Finally, the graph 𝐺𝑖 is computed by applying
2. Overview                                                     the generated updates on πΊπ‘–βˆ’1 . Note that, the genera-
In this section, we describe the overall generation pro- tion procedure depicted in this Figure shows a simplified
cedure. Given the characteristics of a series of graph scenario where the number of vertices does not change.
snapshots, our relative generation procedure produces However, if that number changes, a phase consisting of
the series of graph snapshots {𝐺1 , . . . , 𝐺𝑛 } whose char- the addition or deletion of vertices should precede the
acteristics approximate the given ones. These graph snap- computation of the transportation matrix to assure the
shots are relatively computed by applying a number of following constraint:
graph updates on each snapshot in order to produce its                          π‘˜              π‘š
successor snapshot. To clarify, we apply a number of
                                                                               βˆ‘οΈ             βˆ‘οΈ
                                                                                     πœ”π‘ πΊπ‘– =       πœ”π‘‘π‘– , βˆ€1 ≀ 𝑖 ≀ 𝑛
graph updates on a graph snapshot πΊπ‘–βˆ’1 to produce an-                          𝑠=1            𝑑=1
other graph snapshot 𝐺𝑖 whose characteristics approx-
imate the given parameters assigned for the 𝑖th graph This constraint implies that the sum of weights of distri-
snapshot.                                                       butions πœ‘πΊπ‘– and πœ‘π‘– should be equal.
   Formally, we define a graph snapshot 𝐺𝑖 valid at a
time instant 𝑑𝑖 as the tuple {𝑉𝐺𝑖 , 𝐸𝐺𝑖 , πœ‘πΊπ‘– , 𝑀𝐺𝑖 } where 3. Graph generation with given
𝑉𝐺𝑖 is the set of vertices, 𝐸𝐺𝑖 is the set of edges, πœ‘πΊπ‘–
is a degree distribution and 𝑀𝐺𝑖 is the density commu-                expected degree distribution
nity matrix. For instance, we consider πœ‘πΊπ‘– of the form
                                                                In this section, we describe the generation procedure of
                           𝑛 , πœ”π‘› )} as a discrete distribution
{(π‘₯1𝐺𝑖 , πœ”1𝐺𝑖 ), . . . , (π‘₯𝐺𝑖   𝐺𝑖

over N where π‘₯𝑗 refers to the degree of a node and πœ”π‘—
                    𝐺𝑖                                       𝐺𝑖 random static graphs with a given degree distribution.
refers to the total number of vertices in the graph whose          Random graphs were introduced by ErdΕ‘s and RΓ©nyi
total number of edges is equal to π‘₯𝐺      𝑖
                                            . A density commu-  [23]. The popularity of this model, also known as the 𝐸𝑅
                                        𝑗
nity matrix 𝑀𝐺𝑖 defines the community structure of the model, stems from its simple generation procedure that
generated graphs, each element π‘šπ‘’π‘£ of which is equal consists of generating a number of vertices and connect-
to the density of edges between the source community ing them by an edge after picking each endpoint with a
𝑐𝑒 and the target community 𝑐𝑣 .                                fixed probability 𝑝. However, this model produces graphs
                                                                whose degree distribution follows a binomial distribution
                                                            probability 𝑝𝑣𝑖 in the following Equation:

                                                                                            𝑑𝑣𝑖
                                                                                    𝑝𝑣𝑖 =                             (1)
                                                                                            𝐷
                                                            Subsequently, a linkage phase consists of picking |𝐸| =
                                                            𝐷
                                                             2
                                                                pairs of vertices to connect such that for a sufficiently
                                                            large |𝐸| the random variable denoting the degree of
                                                            vertex 𝑣𝑖 is Poisson distributed with a mean equals to
                                                            𝑑𝑣𝑖 . Iterating the linkage phase |𝐸| times where an edge
                                                            is equally likely to be chosen in both directions for undi-
                                                            rected graphs, the insertion probability of an edge con-
                                                            necting vertex 𝑣𝑖 and vertex 𝑣𝑗 is 𝑝𝑣𝑖 𝑣𝑗 = 2𝑝𝑣𝑖 𝑝𝑣𝑗 𝐷     2
                                                                                                                         .
                                                            The edge insertion probability can be rewritten in the
                                                            more convenient form:
                                                                                            𝑑𝑣𝑖 𝑑𝑣𝑗
                                                                                 𝑝𝑣𝑖 𝑣𝑗 =
                                                                                              𝐷
                                                            For optimisation sake, we gather all vertices shar-
Figure 1: Relative graph generation procedure.              ing the same degree together in a pool 𝛾𝑑 =
                                                            {𝑣𝑖 |𝑣𝑖 ∈ 𝑉 ∧ 𝑑𝑣𝑖 = 𝑑} that we use as a subsidiary gen-
                                                            eration component. Each vertex in a pool is equally likely
                                                            to be chosen assuring that the aforementioned linkage
with a mean degree equals to (𝑁 βˆ’1)𝑝 where 𝑁 is the to- probability 𝑝 is not affected for a sufficiently large num-
                                                                         𝑣𝑖
tal number of vertices. Hence, it fails to mimic real-world ber of vertices. After the degree assignment phase, ver-
graphs that usually follow a power-law degree distribu- tices are distributed throughout the pools having each
tion. To tackle this limitation, the edge configuration the following linkage probability:
model [24] consists of generating a random graph whose
degree distribution matches, approximately, a given de-                                   𝑑|𝛾𝑑 |
gree distribution. That is, each vertex is assigned with a                         𝑝𝛾𝑑 =
                                                                                           𝐷
number of stubs equal to its desired degree that is drawn
independently from the given degree distribution. Hav- Now, instead of picking vertices a pool is first picked
ing this, pairs of stubs are linked randomly forming edges It should be highlighted that self-loops or multi-edges
between their endpoints. Although this technique ap- can be created since each endpoint of an edge is picked
proximately matches any given degree distribution, a independently. The number of these edges, however, is
relaxed version known as the Chung-Lu model was in- independent of the number of vertices and thus can be
troduced in [21]. This model consists of generating a ran- neglected for large scale graphs.
dom graph that approximately matches a given degree
distribution relying on a simple generation procedure 4. Community-aware graph
that can be considered as a variant of the 𝐸𝑅 model. For
simplicity, we will refer to this model as the CL model in       generation with given expected
the following description.                                       degree distribution
   Consider the degree distribution πœ‘ as the input param-
eter to the CL model and the undirected, unweighted and Although the CL model produces graphs with respect
unlabeled graph 𝐺 = {𝑉, 𝐸, πœ‘πΊ } as the output where to a given degree distribution, it is not aware of the
πœ‘πΊ denotes the degree distribution of 𝐺, 𝑉 and 𝐸 de- community structure existing in most real-world graphs.
note the set of vertices and edges, respectively. Having Hence, we propose a community-aware extension of the
this, the CL model produces a graph 𝐺 such that πœ‘πΊ is CL model based on the stochastic block model (SBM).
an approximation of πœ‘. The main idea is to pick each Since a community is not quantitatively well defined,
endpoint of an edge with a certain probability such that, many definitions where provided in literature. Intuitively,
at the end of the generation procedure, the total number one can consider a community as a subgraph which ver-
of incident edges to each vertex is close to its assigned tices are more densely connected between each other
degree. Hence, the starting phase consists of assigning than they are with the rest of the graph. Let’s consider
each vertex 𝑣𝑖 ∈ 𝑉 with a degree 𝑑𝑣𝑖 and a linkage the set of communities 𝐢 = {𝑐𝑖 } and suppose that a ver-
probability 𝑝𝑣𝑖 ∝ 𝑑𝑣𝑖 . Considering that 𝐷 is the sum of tex should belong to one community and edges should
the degrees extracted from πœ‘, we define the CL linkage be differentiated into within and between edges:
     β€’ Given a community 𝑐𝑖 , an edge 𝑒 is called a within    define πœ”π‘–π‘— = πœ”π‘—π‘– = 2π‘šπ‘–π‘— and πœ”π‘–π‘– = π‘šπ‘–π‘– . Furthermore,
       edge if the source vertex ∈ 𝑐𝑖 and the target vertex   we assign each community 𝑐𝑖 with a within edge creation
       ∈ 𝑐𝑖 .                                                 probability 𝑝𝑖𝑛
                                                                           𝑐𝑖 , a between edge creation equal to 𝑝𝑐𝑖
                                                                                                                  π‘œπ‘’π‘‘

     β€’ Given two communities 𝑐𝑖 and 𝑐𝑗 , an edge 𝑒 is         and a probability of edge creation 𝑝𝑐𝑖 such that:
       called a between edge if the source vertex ∈ 𝑐𝑖                                                  |𝐢|
       and target vertex ∈ 𝑐𝑗 or vice versa.                         𝑝𝑐𝑖 = 𝑝𝑖𝑛    π‘œπ‘’π‘‘
                                                                                                        βˆ‘οΈ
                                                                                                                      (2)
                                                                            𝑐𝑖 + 𝑝𝑐𝑖 = πœ”π‘–π‘– + 0.5              πœ”π‘–π‘—
To insure that vertices belonging to a community are                                           𝑗=1,𝑗̸=𝑖

more densely connected to each other than they are with We define the linkage probability 𝑝𝑣𝑖 of choosing a vertex
the rest of the graph, the within and between edge cre- 𝑣𝑖 belonging to community π‘π‘š as follows:
ation probabilities 𝑝𝑖𝑛𝑐𝑖 and 𝑝𝑐𝑖 of 𝑐𝑖 must satisfy the
                                 π‘œπ‘’π‘‘

condition 𝑝𝑖𝑛       π‘œπ‘’π‘‘                                                        𝑑 𝑣𝑖
             𝑐𝑖 > 𝑝 𝑐𝑖  , βˆ€π‘ 𝑖 ∈ 𝐢.                                     𝑝 𝑣𝑖 =      𝑝𝑐 , 𝑣𝑖 ∈ π‘π‘š               (3)
                                                                               π·π‘π‘š π‘š
4.1. Stochastic block model                                   where π·π‘π‘š is the sum of the degrees of vertices belong-
                                                              ing to community π‘π‘š and π‘π‘π‘š is the probability of choos-
In this section, we formulate the SBM model [9] (also         ing π‘π‘š . The linkage probability of a vertex is the product
known as the planted partition model) which is com-           of the probability π‘π‘π‘š of choosing the community to
monly used for the generation of random graphs with a                                                              𝑑
                                                              which the vertex belongs and the probability 𝐷𝑐𝑣𝑖 of
given community structure. Hence, this generation pro-                                                               π‘š

cedure only considers controlling the community struc-        choosing the vertex 𝑣𝑖 in that community. Hence, Equa-
ture of the graph and overlooks the resulting degree dis-     tion 3 assures the approximation of the community ma-
                                                                                                        𝑑𝑣𝑖
tribution. The input of the generation procedure is a         trix. However, 𝑝𝑣𝑖 should be equal to 𝐷       (Equation 1)
stochastic community matrix 𝑀 , each element π‘šπ‘–π‘— of           to assure the approximation of the degree distribution.
which defines the probability of edge creation between        Therefore, we define the following condition in order to
the source community 𝑐𝑖 and the target community 𝑐𝑗 .         reduce Equation (3) to Equation (1):
The output is a graph 𝐺 = {𝑉, 𝐸, 𝑀𝐺 } where 𝑀𝐺 is                                  π·π‘π‘š = π·π‘π‘π‘š
the obtained density community matrix, each element
                                                                                      𝐷
π‘šπΊ 𝑖𝑗 of which defines the relative edge density between      Now, replacing 𝐷 by π·π‘π‘π‘π‘š in the original CL linkage
                                                                                          π‘š
the source community 𝑐𝑖 and the target community 𝑐𝑗 .         probability (Equation 1) which assures the control of the
The generation procedure starts with the distribution of      degree distribution, we obtain Equation 3 which assures
vertices between the planted communities such that each       the control of the community structure. Having this, the
vertex belongs to a single community. Now, the linkage        duality of the linkage probability given in Equations (1)
probability between a vertex belonging to community 𝑐𝑖        and (3) insures that both requirements are satisfied by
and another vertex belonging to community 𝑐𝑗 is equal to      our generation procedure.
π‘šπ‘–π‘— . However, the extracted community density matrix            For performance amelioration, we consider the selec-
𝑀𝐺 from the resulting graph 𝐺 is an approximation of          tion of pools rather than vertices such that a pool is local
𝑀 . That is, each element π‘šπΊ 𝑖𝑗 is binomially distributed     to one community. That is vertices having the same de-
with mean equals to π‘šπ‘–π‘— and Poisson distributed with          gree variation and belonging to the same community π‘π‘š
the same mean for a sufficiently large number of edges.       are grouped in a pool π›Ύπ‘‘π‘π‘š = {𝑣𝑖 |𝑣𝑖 ∈ π‘π‘š ∧ 𝑑𝑣 𝑖 = 𝑑}
                                                              such that the probability of a pool selection for edge
4.2. Stochastic block model with given                        insertion is:
                                                                                            𝑑|π›Ύπ‘‘π‘π‘š |
     degree distribution                                                           𝑝 𝛾 π‘π‘š =
                                                                                      𝑑        𝐷
In this section, we propose a static graph generation
procedure which controls both the community structure         4.3. Hierarchical community structure
and degree distribution. Given a degree distribution πœ‘        The specification of the stochastic matrix is not straight-
and a stochastic community matrix 𝑀 , our proposed            forward and imposes an exhaustive number of user-
model generates a graph 𝐺 which degree distribution πœ‘πΊ        defined parameters. Hence, we define an auto-generative
is an approximation of πœ‘ and density community matrix         procedure that fills the matrix with no exogenous effort.
𝑀𝑔 is an approximation of 𝑀 . In the following, we            Considering a static graph, we construct a stochastic ma-
provide a description of our generation mechanism that        trix that reflects a hierarchical community structure with
extends the stochastic block model depicted in Section        only two given parameters. In a hierarchical community
4.1.                                                          matrix, communities recursively embed subsequent com-
   Since the generated graph 𝐺 is undirected, the matrix      munities in a self-similar fashion such that the commu-
𝑀 is symmetric such that π‘šπ‘–π‘— = π‘šπ‘—π‘– . Having this, we          nity structure is represented by a hierarchical tree where
each node represents a community. Each non-leaf node             cost:
                                                                                             𝑛 βˆ‘οΈ
                                                                                                π‘š
is expanded into 𝑏 other nodes until reaching a desired                                     βˆ‘οΈ
                                                                                      min             𝑑𝑖𝑗 𝑑𝑖𝑗
tree height β„Ž (Figure 2). The ending recursion results in                              𝑇
                                                                                            𝑖=1 𝑗=1
𝑛𝑐 = π‘β„Ž leaf-nodes referencing the finest scale communi-
ties having a linkage probability πœ”π‘–π‘— proportional to the        where 𝑑𝑖𝑗 = 𝑑(π‘₯𝑖 , 𝑦𝑗 ) is a measure of distance between
distance between 𝑐𝑖 and 𝑐𝑗 . The distance between two            π‘₯𝑖 and 𝑦𝑗 . The following constraints must hold for the
communities, 𝑑(𝑐𝑖 , 𝑐𝑗 ), is equal to the number of hops         optimal flow 𝑇 :
traversed in order to reach the least common ancestor of
these communities. In order to satisfy the condition stat-                     𝑑𝑖𝑗 β‰₯ 0, 1 β‰₯ 𝑖 β‰₯ 𝑛, 1 β‰₯ 𝑗 β‰₯ π‘š
ing that within edge linkage probability must be higher            π‘š                              𝑛
than between linkage probability (𝑝𝑖𝑛𝑐𝑖 > 𝑝𝑐𝑖 ), we define
                                           π‘œπ‘’π‘‘
                                                                  βˆ‘οΈ                             βˆ‘οΈ
                                                                         𝑑𝑖𝑗 ≀ 𝑝𝑖 , 1 β‰₯ 𝑖 β‰₯ 𝑛,          𝑑𝑖𝑗 ≀ π‘žπ‘— , 1 β‰₯ 𝑗 β‰₯ π‘š
πœ”π‘–π‘– as follows:                                                   𝑗=1                            𝑖=1

                              𝑐 βˆ’1
                             π‘›βˆ‘οΈ                                   Once the optimal flow 𝑇 is found, the EMD between
                πœ”π‘–π‘– = 0.5              πœ”π‘–π‘— + π‘˜                   𝑃 and 𝑄 is computed as follows:
                            𝑗=0,𝑗̸=𝑖                                                      βˆ‘οΈ€π‘› βˆ‘οΈ€π‘š
                                                                                             𝑖=1   𝑗=1 𝑑𝑖𝑗 𝑑𝑖𝑗
where π‘˜ is a tunable parameter which calibration steers                  𝐸𝑀 𝐷(𝑃, 𝑄) = βˆ‘οΈ€π‘› βˆ‘οΈ€π‘š
                                                                                                     𝑗=1 𝑑𝑖𝑗
the difference between within and between edge densi-                                          𝑖=1

ties. The effect of varying π‘˜ is further highlighted in the The EMD is fundamental in our generation procedure
Section 6.                                                  since it is used to compute the distance between two
                                                            degree distributions as described in the following Section.

                                                                 5.2. Baseline relative graph generation
                                                                 In this section, we provide the baseline procedure of
                                                                 transforming a graph 𝐺 with degree distribution πœ‘ into
                                                                 𝐺′ with degree distribution πœ‘β€² which we refer to as the
                                                                 Baseline relative graph generation. Note that, we use this
                                                                 technique for generating temporal graphs such that 𝐺
Figure 2: Hierarchical community tree with height β„Ž and          and 𝐺′ corresponds to successive graph snapshots. For
branching factor 𝑏.                                              generalisation purposes, however, we remove the notion
                                                                 of time in this section. This transformation is enabled by a
                                                                 set of atomic graph operations including the addition and
                                                                 deletion of a vertex or an edge. Following the assumption
                                                                 that temporal graphs gradually evolve, this number of
5. Relative graph generation                                     graph operations between successive snapshots should
                                                                 be minimized which is assured in our model by applying
In order to control the evolution of the degree distribu-        an optimal transport method.
tion of the generated temporal graphs, we propose in this           Consider the input graph 𝐺 = {𝑉, 𝐸, πœ‘} and de-
section an extension of the CL model that is based on the        gree distribution πœ‘β€² , the generated output graph 𝐺′ =
optimal transport to compute the minimal distance be-            {𝑉 β€² , 𝐸 β€² , πœ‘πΊβ€² } such that πœ‘πΊβ€² is an approximation of πœ‘β€² .
tween the degree distributions of each pair of successive        We define the distance between two degree distributions
graph snapshots.                                                 πœ‘ and πœ‘β€² as the earth mover’s distance 𝐸𝑀 𝐷(πœ‘, πœ‘β€² ).
                                                                    Consider 𝛿𝑛 = |𝑉 β€² | βˆ’ |𝑉 | as the total number of ver-
                                                                 tices to be added to or removed from the graph based
5.1. Earth mover’s distance
                                                                 on whether 𝛿𝑛 is a positive or negative number, respec-
The Earth mover’s distance can be defined as a mea-              tively. When adding a new vertex, this vertex is assigned
sure of distance over a domain 𝐷 between two dis-                with a degree equals to 0 and deleting a vertex consists
tributions of the form {(π‘₯1 , πœ”1 ), ..., (π‘₯𝑛 , πœ”π‘› )} where       of removing the vertex with its corresponding incident
π‘₯𝑖 ∈ 𝐷 and πœ”π‘– is the density of π‘₯𝑖 . Having this,                edges. This transformation phase assures that 𝐺 and 𝐺′
the problem reduces to the computation of an optimal             share the same number of vertices, hence, enables the
flow (transportation matrix) 𝑇 = [𝑑𝑖𝑗 ] between two              transformation of πœ‘ into πœ‘β€² . In order to morph πœ‘ into
distributions 𝑃 = {(π‘₯1 , 𝑝1 ), ..., (π‘₯𝑛 , 𝑝𝑛 )} and 𝑄 =          πœ‘β€² , a transportation matrix 𝑇 is computed, where each
{(𝑦1 , π‘ž1 ), ..., (𝑦𝑛 , π‘žπ‘› )} such that 𝑑𝑖𝑗 is the mass trans-   row corresponds to a degree 𝑑 in the set of degrees in the
ported between 𝑝𝑖 and π‘žπ‘— which minimizes the overall             source distribution πœ‘ and each column corresponds to a
degree 𝑑′ in the set of degrees in the target distribution πœ‘β€² .
Now, each cell consists of the portion of vertices having
a degree 𝑑 for which links are to be inserted or removed
in order to be assigned a total number of edges equals to
degree 𝑑′ . That is, a vertex 𝑣𝑖 , with a degree 𝑑𝑣𝑖 = 𝑑, will
be assigned a degree variation of 𝛿𝑑𝑣𝑖 = 𝑑′ βˆ’ 𝑑 resulting
in a total number of edge insertions and deletions defined
as 𝐷+ and π·βˆ’ , respectively.
We assign, for each vertex 𝑣𝑖 , a linkage probability 𝑝+ 𝑣𝑖 or         Figure 3: Graph representing the case of a non-possible edge
a breakage probability π‘βˆ’   𝑣𝑖 defined as extensions of the
                                                                       breakage.
CL linkage probability (1):
                          𝛿𝑑𝑣𝑖                                    desired degree distribution πœ‘ and the stochastic block
                  𝑝+
                   𝑣𝑖 =        , 𝛿𝑑𝑣𝑖 > 0                  (4)
                          𝐷+                                      matrix 𝑀 . However, the output consists of a graph
                        βˆ’π›Ώπ‘‘π‘£π‘–                                     𝐺′ = {𝑉 β€² , 𝐸 β€² , πœ‘πΊβ€² , 𝑀𝐺′ } where πœ‘πΊβ€² is an approxima-
                 π‘βˆ’
                  𝑣𝑖 =        , 𝛿𝑑𝑣𝑖 < 0               (5)        tion of πœ‘ and 𝑀𝐺′ is an approximation of 𝑀 . Recall that
                         π·βˆ’
We collect vertices sharing the same degree variation             the generation procedure depicted in section 4.2 produces
𝛿𝑑 = 𝑑′ βˆ’ 𝑑 into a linkage pool if 𝛿𝑑 > 0 and in a                a graph with a given expected degree distribution and
breakage pool if 𝛿𝑑 < 0. Consider 𝛾𝑑→𝑑′ = {𝑣𝑖 |𝑣𝑖 ∈               stochastic community matrix based on the proposed link-
𝑉 ∧ 𝛿𝑣𝑖 = 𝑑′ βˆ’ 𝑑} to be the pool containing vertices              age probability duality presented in Equations (1) and (3).
having a degree 𝑑 that should be transformed into 𝑑′ . We         Indeed, a relative community-aware graph generation is
compute the probability of picking a linkage or breakage          based on an extension of the aforementioned duality by
pool 𝑝+                                                           taking into consideration the degree variation of a vertex
      𝛾𝑑→𝑑′ and 𝑝𝛾𝑑→𝑑′ as follows:
                   βˆ’
                                                                  instead of the its degree. That is, the following linkage
                          𝛿𝑑|𝛾𝑑→𝑑′ |                              and breakage probabilities present a straightforward ex-
              𝑝+
               𝛾𝑑→𝑑′ =               , 𝛿𝑑 > 0                     tension of Equations (4) and (5):
                             𝐷+
                                                                                         𝛿𝑑𝑣𝑖
                        βˆ’π›Ώπ‘‘|𝛾𝑑→𝑑′ |                                              𝑝+
                                                                                  𝑣𝑖 =        π‘π‘π‘š , 𝑣𝑖 ∈ π‘π‘š
             π‘βˆ’
              𝛾𝑑→𝑑′ =                , 𝛿𝑑 < 0                                            𝐷𝑐+π‘š
                             π·βˆ’
However, breaking an edge might be impossible in situ-                                   𝛿𝑑𝑣𝑖
                                                                                 π‘βˆ’
                                                                                  𝑣𝑖 =        π‘π‘π‘š , 𝑣𝑖 ∈ π‘π‘š
ations where the source degree variation 𝛿𝑑 is negative                                  π·π‘βˆ’π‘š
and the sum of the negative degree variations of its neigh-       Where 𝐷𝑐+π‘š and π·π‘βˆ’π‘š are the total number of edge
bors is higher than 𝛿𝑑. For the sake of illustration, we          insertions and deletions in π‘π‘š , respectively. From the
present in Figure 3 a graph in which the number of edges          transportation matrix defined in section 5.2, we find
to remove from a node is higher than the sum of the               𝑛𝑖𝑗 as the portion of vertices with degree variation
number of edges to remove from its neighboring vertices.          𝛿𝑑 = 𝑑𝑗 βˆ’ 𝑑𝑖 . However, finding the portion π‘›π‘π‘–π‘—π‘š of
That is, the transformation of this graph implies remov-          vertices in community π‘π‘š should satisfy three conditions
ing 2 edges from vertex 𝑣1 since 𝛿𝑣1 = βˆ’2. However,               detailed bellow. Each condition 𝑖 results in a system of
the number of the edges that have to be removed from              linear equations of the form 𝐴𝑖 𝑋 = 𝐡𝑖 where 𝑋 is a
the neighboring vertices of 𝑣1 is equal to 𝛿𝑣2 = βˆ’1 since                                                    𝑐
                                                                  vector composed of π‘›π‘π‘–π‘—π‘š such that 𝑋 = {π‘›π‘–π‘—π‘˜ | βˆ€1 ≀ 𝑖 ≀
𝛿𝑣3 = 0 and 𝛿𝑣4 = 1. To overcome this, we repeat the              |πœ‘πΊ | ∧ βˆ€1 ≀ 𝑗 ≀ |πœ‘| ∧ 0 ≀ π‘˜ ≀ |𝐢|} where 𝑛𝑐 is the
morphing procedure until EMD(πœ‘, πœ‘β€² ) reaches a desired            total number of communities.
threshold. Our simulations have proved that the value
of EMD(πœ‘, πœ‘β€² ) converges rapidly towards the minimum              Condition 1: For each community π‘π‘š ∈ 𝐢, conditions
threshold after a tolerable number of iterations. This            stating that 𝐷𝑐+π‘š = 𝐷+ π‘π‘π‘š and π·π‘βˆ’π‘š = π·βˆ’ π‘π‘π‘š
statement will be further highlighted in Section 6.               must hold, where 𝐷+ and π·βˆ’ are the total number
                                                                  of edge insertions and deletions in all communities
5.3. Relative community-aware graph                               of 𝐢, respectively. Incorporating π‘›π‘π‘–π‘—π‘š in the previous
     generation                                                   condition translates to the following equality:
                                                                          β€²                           β€²
                                                          |πœ‘πΊ | |πœ‘ |                    |πœ‘πΊ | |πœ‘ |
A more complex version of the previously described        βˆ‘οΈ βˆ‘οΈ                  π‘π‘š
                                                                                        βˆ‘οΈ βˆ‘οΈ
                                                                     (𝑑 𝑗 βˆ’ 𝑑𝑖 )𝑛𝑖𝑗 = (            (𝑑𝑗 βˆ’ 𝑑𝑖 )𝑛𝑖𝑗 )π‘π‘π‘š
relative graph generation, consists of preserving the
                                                           𝑖=0 𝑗=0                       𝑖=0 𝑗=0
graph community structure in the transformation proce-
dure. That is, the input of our community-aware relative where πœ‘πΊ and πœ‘β€² are the source and target degree distri-
graph generator is the graph 𝐺 = {𝑉, 𝐸, πœ‘πΊ , 𝑀𝐺 }, the butions.
Condition 2: This condition states that the sum of            of adding and removing edges is repeated 𝐷+ and π·βˆ’
all portions of vertices with degree variation 𝑑𝑗 βˆ’ 𝑑𝑖        times, respectively. In each iteration, communities 𝑐𝑛 and
βˆ€π‘‘π‘— ∈ πœ‘π‘‘ in π‘π‘š should be equal to the portion 𝑛𝑐𝑖 π‘š of        π‘π‘š are picked based on 𝑐𝑑𝑓 πΆπ‘œπ‘šπ‘  and vertices 𝑛𝑖 and
vertices in π‘π‘š having a degree 𝑑𝑖 resulting in the follow-    𝑛𝑗 are picked using 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + [𝑛] and 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ [π‘š].
ing equality:                                                 Now, an addition or deletion graph update between the
                     π‘š
                    βˆ‘οΈ                                        chosen vertices is added to the list of logs using functions
                        π‘›π‘π‘–π‘—π‘š = 𝑛𝑐𝑖 π‘š                         addEdge and removeEdge whether the vertices where
                       𝑗=0
                                                              chosen from the linkage or breakage pools. However,
                                                              breaking an edge might be impossible in some situations
Condition 3: This condition states that the por- as shown in Figure 3. In such a use case, no graph update
tion 𝑛𝑖𝑗 of vertices with degree variation 𝑑𝑗 βˆ’ 𝑑𝑖 in is added to the list of logs 𝐿. Finally, the EMD distance
the graph must be equal to the sum of all portions π‘›π‘π‘–π‘—π‘š πœ– β€²is computed between the obtained               degree distribution
βˆ€π‘π‘š ∈ 𝐢.                                                      πœ‘  𝐺  and    the  desired  one  πœ‘. If πœ– β€²
                                                                                                        is higher than πœ– and
                        𝑛𝑐
                        βˆ‘οΈ                                    the number of repetitions π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ has not yet reached
                             π‘›π‘π‘–π‘—π‘š = 𝑛𝑖𝑗                      π‘šπ‘Žπ‘₯_π‘–π‘‘π‘’π‘Ÿ, the same algorithm is repeated on the newly
                       π‘π‘š =0                                  computed graph snapshot 𝐺′ . The computation stops
By solving the concatenated system of equations obtained     when       πœ–β€² is lower than or equal to πœ– or the number of
from the previous conditions π‘π‘œπ‘›π‘π‘Žπ‘‘(𝐴1 , 𝐴2 , 𝐴3 )𝑋 = repetitions has already been reached.
π‘π‘œπ‘›π‘π‘Žπ‘‘(𝐡1 , 𝐡2 , 𝐡3 ), we find the vector 𝑋, hence the          Algorithm 1: CRGG
values of π‘›π‘π‘–π‘—π‘š . Pools are created on a local basis in each
                                                                  Input: 𝐺 = {𝑉, 𝐸, πœ‘πΊ , 𝑀𝐺 }, πœ‘, 𝑀 , πœ–,
community such that vertices with the same degree vari-
                                                                            π‘šπ‘Žπ‘₯_π‘–π‘‘π‘’π‘Ÿ, π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ
ation 𝛿𝑑 = 𝑑 βˆ’ 𝑑 and belonging to the same community
               β€²
                                                                  Output: 𝐺′ = {𝑉 β€² , 𝐸 β€² , πœ‘πΊβ€² , 𝑀𝐺′ }
π‘π‘š are collected in a single pool 𝛾𝑑→𝑑′ . We compute
                                          π‘π‘š
                                                               1 𝑇 ← getTransportMatrix(πœ‘πΊ , πœ‘) ;
the probability of picking a linkage or breakage pool
                                                               2 X ← getVector(πœ‘πΊ , πœ‘, 𝑇 , 𝑀 ) ;
𝑝𝛾𝑑→𝑑′ ,π‘π‘š and 𝑝𝛾𝑑→𝑑′ ,π‘π‘š as follows:
  +                βˆ’
                                                               3 (𝐷 , 𝐷 ) ← getNumberOfEdges(T) ;
                                                                       +     βˆ’

                                   π‘π‘š                          4 𝑐𝑑𝑓 πΆπ‘œπ‘š ← getCDFComs(𝑀 ) ;
                             𝛿𝑑|𝛾𝑑→𝑑   β€²|
              𝑝+
               𝛾𝑑→𝑑′ ,π‘π‘š =          +
                                          , 𝛿𝑑 > 0             5  (𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + , 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ ) ← initCDFPools ;
                                  𝐷
                                                               6 𝐿 ← π‘–π‘›π‘–π‘‘πΏπ‘œπ‘”π‘ ()
                                    π‘π‘š
                            βˆ’π›Ώπ‘‘|𝛾𝑑→𝑑    β€²|                     7 for π‘π‘š ∈ 𝐢 do
             π‘βˆ’
              𝛾𝑑→𝑑′ ,π‘π‘š  =                  , 𝛿𝑑 < 0                                                βˆ’
                                  π·βˆ’                           8        (𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ +π‘π‘š , 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ π‘π‘š ) ←
Algorithm CRGG depicts the relative community aware                      getCDFPools(𝑋, π‘π‘š ) ;
graph generation procedure. The input parameters are 9                  𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + [π‘š] ← 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ +    π‘π‘š ;
the graph snapshot 𝐺, desired degree distribution πœ‘, den- 10            𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ [π‘š] ← 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’    π‘π‘š ;
sity community matrix 𝑀 , threshold of the EMD dis-                                  +
                                                             11 for 𝑖 ← 0 π‘‘π‘œ 𝐷 do
tance between πœ‘πΊ and πœ‘, maximum number of repeti-
                                                             12         (𝑐𝑛 , π‘π‘š ) ← chooseComs(𝑐𝑑𝑓 πΆπ‘œπ‘š) ;
tions π‘šπ‘Žπ‘₯_π‘–π‘‘π‘’π‘Ÿ and the current number of repetitions
                                                             13         (𝑛𝑖 , 𝑛𝑗 ) ← chooseVertices(𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + [𝑛],
π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ. Whereas, the output is a new graph snapshot
𝐺′ . Note that, the value of π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ is equal to 0 in the               𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + [π‘š]) ;
first iteration. The transportation matrix 𝑇 is computed 14             𝐿.addEdge (𝑛𝑖 , 𝑛𝑗 ) ;
using the function getTransportMatrix by taking the 15 for 𝑖 ← 0 π‘‘π‘œ π·βˆ’ do
degree distributions πœ‘πΊ and πœ‘ as input. The function 16                 (𝑐𝑛 , π‘π‘š ) ← chooseComs(𝑐𝑑𝑓 πΆπ‘œπ‘š) ;
getVector, computes 𝐴 and 𝐡 based on the Conditions 17                  (𝑛𝑖 , 𝑛𝑗 ) ← chooseVertices(𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ [𝑛],
1, 2 and 3 and solves the system of equations defined by                 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ [π‘š]) ;
𝐴𝑋 = 𝐡 to find the vector 𝑋. The total number of edges 18               𝐿.removeEdge (𝑛𝑖 , 𝑛𝑗 ) ;
to add (𝐷+ ) and delete (π·βˆ’ ) are then computed based on
                                                             19 𝐺 ← applyLogs(𝐺, 𝐿) ;
                                                                     β€²
the transporation matrix 𝑇 . The function getCDFComs
                                                             20 πœ– ← getEMD(πœ‘, πœ‘πΊ ) ;
                                                                   β€²                     β€²
computes the cumulative distribution function 𝑐𝑑𝑓 πΆπ‘œπ‘š                  β€²
                                                             21 if πœ– β‰₯ πœ– ∧ π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ < π‘šπ‘Žπ‘₯_π‘–π‘‘π‘’π‘Ÿ then
based on the density community matrix 𝑀 . Then, vectors
                                                             22         π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ ← π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ + 1 ;
𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + and 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ representing the cumulative
                                                             23         𝐺′ ←
density functions of the linkage and breakage pools and a
                                                                         𝐢𝑅𝐺𝐺(𝐺′ , πœ‘, 𝑀, π‘π‘’π‘Ÿ_π‘–π‘‘π‘’π‘Ÿ, π‘šπ‘Žπ‘₯_π‘–π‘‘π‘’π‘Ÿ) ;
list of logs (graph updates) 𝐿 are initialized. The function
getCDFPools is used to compute the cumulative distri- 24 else
bution functions 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ + and 𝑐𝑑𝑓 𝑃 π‘œπ‘œπ‘™π‘ βˆ’ based on 25                return 𝐺′ ;
the probabilities 𝑝𝛾𝑑→𝑑′ ,π‘π‘š and 𝑝𝛾𝑑→𝑑′ ,π‘π‘š . The process
                     +                βˆ’
                                                                                          Γ—10 4                                                                               Γ—10 4
                         15000                                                                10                                                                                                       6
 Number of occurrences




                                                                                          Number of occurrences




                                                                                                                                                                               Number of occurrences
                                                                                                                   8                                                                                   5

                         10000                                                                                                                                                                         4
                                                                                                                   6
                                                                                                                                                                                                       3
                                                                                                                   4
                          5000                                                                                                                                                                         2
                                                                                                                   2                                                                                   1

                            0                                                                                      0                                                                                   0
                           20                                                                                     20                                                                                   0
                                 40                                                 10                                 40                                               10                                                                             10
                                                                                8                                                                                   8                                       20                                     8
                                        60                             6                                                      60                              6                                                                              6
                                 Deg                            4                                                      Deg                             4                                                   Deg      40                4
                                      ree    80                                                                             ree    80                                                                         ree
                                                            2
                                                                         st   amp                                                                  2
                                                                                                                                                               st amp                                                             2
                                                                                                                                                                                                                                              st amp
                                                  100   0           Time                                                                100   0            Time                                                          60   0           Time




 Figure 4: Gaussian degree distribution                                                  Figure 5: Gaussian degree distribution                                              Figure 6: Zipfian degree distribution
 of a growth only graph                                                                  of a graph with edge deletions                                                      of a growth only graph



5.4. Accuracy of the generation                                                                                                                   characteristics of the generated temporal graphs. Note
     procedure                                                                                                                                    that the source code of RTGEN is publicly available1 .
                                                                                                                                                  Besides the source code, we also provide the instruc-
In order to measure how far the characteristics of the                                                                                            tions describing how to use the tool to generate temporal
generated graphs are from the ground truth parameters,                                                                                            graphs. For instance, users can pass the input parame-
we define two distance metrics πœ€π‘‘ and πœ€π‘ .                                                                                                        ters to describe the desired sequence of degree distribu-
   The first metric πœ€π‘‘ measures the inaccuracy of approx-                                                                                         tions or stochastic community matrix and the format of
imating the degree distributions of the generated graphs                                                                                          the generated output files to RTGEN using a terminal
with the given sequence of degree distributions. That                                                                                             command. RTGEN proposes two output types: snapshot-
is, it measures the root mean square of the EMD dis-                                                                                              based and event-based. The snapshot based type consists
tances between each degree distribution πœ‘π‘– in the given                                                                                           a sequence of graph snapshots represented each in a sep-
sequence {πœ‘1 , . . . , πœ‘π‘› } and its corresponding degree dis-                                                                                     arate file. Whereas, the event-based type, consists of
tribution πœ‘πΊπ‘– in the sequence {πœ‘πΊ1 , . . . πœ‘πΊπ‘› } extracted                                                                                        generating the sequence of graph updates (events) that
from the generated graphs. Having this, πœ€π‘‘ is computed                                                                                            we applied between successive snapshots to transform
as follows:                                                                                                                                       one snapshot into the next one.
                   βˆšοΈ€βˆ‘οΈ€π‘›
                                                2
                          𝑖=1 (𝐸𝑀 𝐷(πœ‘π‘– , πœ‘πΊπ‘– ))
            πœ€π‘‘ =                                              6.0.1. Experimental setup
                                   𝑛
Whereas, the second metric πœ€π‘ measures the inaccuracy The experiments were conducted on a single machine
of approximating the community density matrix of the equipped with Intel(R) Core(TM) i5-8350U CPU @
generated graphs with a given stochastic matrix. That 1.70GHz 1.90 GHz, 16 GB memory and 500 GB SSD.
is, it measure the root mean square of the difference be- We used Go 1.17.5 and Python 3.8.0. Besides, we referred
tween the Frobenius norms of the given stochastic matrix to the optimal transport solver proposed in [25]. The
𝑀 and the stochastic matrix 𝑀𝐺𝑖 extracted from every graphs shown in this section are visualized using Gephi
generated graph snapshot. Having this, πœ€π‘ is computed tool [26] which offers network visualization facilities and
as follows:                                                   community detection algorithms [27].
                 βˆšοΈ€βˆ‘οΈ€π‘›
                        𝑖=1 (𝐹 (𝑀 ) βˆ’ 𝐹 (𝑀𝐺𝑖 ))
                                                  2
           πœ€π‘ =                                               6.0.2. Preliminaries
                                   𝑛
                                                              In the following experiments, we refer to two types of
where 𝐹 (𝑀 ) is the Frobenius norm of the stochastic
                                                              common degree distributions: Gaussian 𝑓𝐺 and Zipfian
community matrix 𝑀 . We recall that the Frobenius norm
                                                              𝑓𝑍 that are defined as follows:
of a matrix 𝐴 of dimensions (𝑛,π‘š) is defined as follows:
                                                                                         1     1 π‘₯βˆ’πœ‡ 2
                                                                             𝑓𝐺 (π‘₯) = √ π‘’βˆ’ 2 ( 𝜎 )
                            ⎯
                            ⎸ 𝑛 βˆ‘οΈ   π‘š
                            βŽΈβˆ‘οΈ
                                                                                       𝜎   πœ‹
                𝐹 (𝐴) = ⎷               |π‘Žπ‘–π‘— |2
                               𝑖=1 𝑗=1                                                 1
                                                                         𝑓𝑍 (π‘₯) =             π‘₯ ∈ [0, π‘‘π‘šπ‘Žπ‘₯ ]
                                                                                  (π‘₯ + 𝑣)𝑠
6. Experimental evaluation                                                                                                                        We consider a special case where the value of a parameter
                                                                                                                                                  π‘₯ ∈ N in iteration 𝑖 depends on the its value in the
We conducted a number of experiments to validate the                                                                                              previous iteration 𝑖 βˆ’ 1 such as π‘₯𝑖 = π‘₯π‘–βˆ’1 + 𝛿π‘₯ such
efficiency of our generator RTGEN. We also provide an in-
sight on how changing the input parameters can steer the                                                                                               1
                                                                                                                                                           https://github.com/MariaMassri/RTGEN
that πœ‡π‘– = πœ‡π‘–βˆ’1 +π›Ώπœ‡. This is applied on the parameters of      graphs where new nodes join the graph and new connec-
the degree distributions πœ‡, 𝜎, π‘‘π‘šπ‘Žπ‘₯ , 𝑠, 𝑣 and 𝑛 denoting     tions are created as the time elapses.
the total number of vertices. That is, 𝛿𝑛 denotes the
number of vertices to be added or removed from the graph      6.2. Controlling the community structure
in the relative generation process. Note that, RTGEN
also generates the first snapshot which implies that the
                                                                   of the generated graphs
parameters of the degree distribution of the first snapshot   In this experiment, we show the generated community
should be given.                                              structure with different parameters of the stochastic com-
                                                              munity matrix and the effect of varying parameter π‘˜ of
6.1. Controlling the evolution of the                         the hierarchical tree. As described in Section 4.3, RTGEN
                                                              is capable of auto-generating the stochastic community
     degree distribution
                                                              matrix representing a hierarchical community structure.
In this experiment, we show the evolution of the degree       Consider a stochastic community matrix generated by set-
distribution of a sequence of graph snapshots generated       ting 𝑏 = 4 and β„Ž = 2. As depicted in Equation 2, one can
with the relative generation procedure given a set of input   tune the parameter π‘˜ in order to control the within and
parameters. Hence, we consider Gaussian and Zipfian           between edge densities. Hence, we select three different
degree distributions with different parameters and plot-      values of π‘˜ in {2, 4, 8}. Furthermore consider, 𝑛 = 1000
ted the obtained degree distributions in Figures 4, 5 and     to be the total number of vertices and parameters πœ‡ = 30
6. Figure 4 shows the evolution of the degree distribution    and 𝜎 = 2 to be the parameters of a Gaussian distribu-
of a generated sequence of 10 graph snapshots given the       tion. Note that, in this experiment, we generate a single
following parameters: {𝑛0 = 10𝐾, πœ‡0 = 30, 𝜎 0 =               graph snapshot relying on the generation procedure pro-
2, 𝛿𝑛 = 10π‘˜, π›Ώπœ‡ = 5, π›ΏπœŽ = 0.1}. By setting π›Ώπœ‡                 posed in Section 4.2. The generated graphs are shown in
to 5, we increase the average degree by 5 between each        Figures 7a, 7b and 7c using the Gephi tool. It can be no-
pair of snapshots. This indeed, can model a growth-only       ticed that the difference between the within and between
graph where the average edge degree tend to regularly         edge densities is proportional to π‘˜ since π‘˜ ∝ 𝑝𝑖𝑛      π‘œπ‘’π‘‘
                                                                                                               𝑐𝑖 βˆ’ 𝑝𝑐𝑖
increase as the time elapses.                                 where 𝑝𝑐𝑖 and 𝑝𝑐𝑖 are the within and between linkage
                                                                        𝑖𝑛       π‘œπ‘’π‘‘

However, some real-world graphs are not growth-only           probabilities of a community 𝑐𝑖 . Furthermore, Figure 8
in the sense that they are subject to edge deletions. This    presents the modularity in function of parameter π‘˜ which
is indeed the case of human-proximity or transportation       we vary from 0 to 32. The modularity is a measure to
graphs where an important number of short-term con-           quantify the goodness of community structure. Its for-
nections is only valid during peak hours. To model this       mula compares, for all the communities, the fraction of
characteristic, RTGEN also supports edge deletions. The       edges that falls within the given community with the
evolution of the degree distribution with edge deletions      expected fraction if edges were distributed at random.
is presented in Figure 5. Let the following parameters        It is clear from the results that the modularity increases
define the evolution of degree distribution for 𝑖 ∈ [0, 4]:   with the increase of π‘˜. This is justified by the fact that
{𝑛0 = 1𝑀, πœ‡0 = 60, 𝜎 0 = 4, 𝛿𝑛 = 0, π›Ώπœ‡ =                      π‘˜ is proportional to the difference between within and
5, π›ΏπœŽ = 0} Whereas the following parameters de-               between edge linkage probabilities 𝑝𝑖𝑛 𝑐𝑖 βˆ’ 𝑝𝑐𝑖 .
                                                                                                            π‘œπ‘’π‘‘

fine its evolution for 𝑖 ∈ [5, 9]: {𝑛0 = 10𝐾, πœ‡0 =
80, 𝜎 0 = 2, 𝛿𝑛 = 0, π›Ώπœ‡ = βˆ’5, π›ΏπœŽ = 0}. In-                    6.3. Generating graphs with deletions
deed, setting π›Ώπœ‡ to βˆ’5 indicates that the average degree
decreases by a value of 5 between each pair of successive
                                                                   between snapshots
graph snapshots.                                              As mentioned in Section 5, the relative graph genera-
Since real-world temporal graphs usually exhibit a power      tion procedure may incur a number of edge deletions.
law degree distribution, we also generated graphs with        This can be cumbersome when the number of edges to
an evolutionary Zipfian degree distribution composed          delete for a given vertex is higher than the total sum of
of 10 graph snapshots as shown in Figure 6. For this          edges to delete from its neighboring vertices. We solve
generated temporal graph, we set the following param-         this problem by repeating the generation process until
eters {𝑛0 = 50π‘˜, 𝑠0 = 2.5, 𝑣 0 = 10, 𝑑0π‘šπ‘Žπ‘₯ =                  reaching an acceptable error threshold that is defined by
10, 𝛿𝑛 = 50π‘˜, 𝑠 = 0, 𝛿𝑣 = 0, π›Ώπ‘‘π‘šπ‘Žπ‘₯ = 5}.                      the EMD between the obtained and desired degree dis-
By setting parameter π›Ώπ‘‘π‘šπ‘Žπ‘₯ to 5, we consider that the         tributions. Figure 12 shows the variation of the number
maximum degree of nodes increases by a value of 5 be-         of iterations and the execution time of the generation
tween each pair of successive snapshots. Whereas, the         process in function of the threshold error defined by the
value of 𝛿𝑛 indicates that 50π‘˜ new nodes join the graph       EMD. The obtained results show that our generation pro-
between successive snapshots. These parameters reflect        cedure converges rapidly to a tolerable threshold. That
the growth of a large number of real-world temporal           is, a threshold equals to 0.001 can be reached with only 7
            (a) k=2                (b) k=4                 (c) k=8

   Figure 7: A visualization of the generated graphs with a hierarchical       Figure 8: Modularity value in function
   community structure with parameters: 𝑏 = 4, β„Ž = 2, 𝑐 = 4 and a varying π‘˜.   of parameter π‘˜ ranging from 0 to 32.




Figure 9: Execution time in              Figure 10: πœ€π‘‘ in function of the         Figure 11: πœ€π‘ in function of the
function of the number of edges.         number of edges.                         number of edges.




                                                             6.4. Accuracy of the generation
                                                                  procedure
                                                             We quantify the accuracy of the generated graphs with
                                                             the given parameters by computing the distance met-
                                                             rics πœ€π‘‘ and πœ€π‘ defined in Section 5.4. We generated
                                                             a sequence of 𝑛 = 5 snapshots with the following
                                                             parameters of Gaussian degree distribution: {𝑛0 ∈
                                                             {10π‘˜, 100π‘˜, 500π‘˜, 1𝑀 }, πœ‡0 = 30, 𝜎0 =
Figure 12: The variation of the number of iterations and 2, 𝛿𝑛 = 0, π›Ώπœ‡ = 10, π›ΏπœŽ = 0}. Besides, we con-
execution time in function of the EMD.                       trolled the community structure by fixing the following
                                                             parameters of a hierarchical tree: β„Ž = 2, 𝑏 = 2, 𝑐 =
                                                             4, π‘˜ = 0.
iterations. By comparing the execution time of 1 iteration      Figures 9, 10 and 11 plot the execution time, value of
and 7 iterations, we can notice that the difference is lower πœ€π‘‘ and πœ€π‘ in function of the total number of created edges
than the execution time of a single iteration. Indeed, the from applying the Gaussian distribution whose parame-
execution time resulting from repeating the generation ters are given above. It is clear that the execution time
is lower than the first iteration since the majority of mod- increases with the number of the generated edges. The
ifications are added in the first iteration and only the distance metric, however, decreases implying that RT-
remaining vertices whose linkage probability does not GEN approximates more accurately the given sequence
satisfy the sum of the linkage probabilities of its neigh- of degree distribution and community structure as the
boring vertices are considered in the next iteration. Note total number of edges grows.
that these results are obtained from the generation of two
successive snapshots with the following input parameters
of a Gaussian degree distribution: {𝑛0 = 500π‘˜, πœ‡0 =
                                                             7. Related work
60, 𝜎0 = 2, 𝛿𝑛 = 0, π›Ώπ‘šπ‘’ = βˆ’30, π›ΏπœŽ = 0}.                      Synthetic graphs are important for developing bench-
                                                             marks for assessing the performance of graph-oriented
                                                             data platforms, when real graphs are not publicly avail-
able or expensive to obtain. This has been the incentive     updates. DSNG-M (dynamic social network generator
to design models and generators, which are very use-         based on modularity) [35] is a graph generator that is
ful for evaluating the efficiency of graph management        capable of generating temporal graphs by flipping the
techniques as storage, query evaluation, indexing, parti-    direction of edges of a given graph in order to satisfy a
tioning, etc.                                                randomly chosen modularity value assigned to a single
   An extensive work has been posited for the genera-        graph snapshot.
tion of static graphs. For instance, a special emphasis         Some of the aforementioned graph generators produce
has been placed to control the degree distribution of the    temporal graphs with properties on nodes or vertices,
generated graphs. In this context, many graph genera-        which we do not address in this paper. None of them,
tors were designed such as RTG [3], RMAT [4] and its         however, allows the control of the evolution of the degree
generalisation Kronecker [5] producing only Power-Law        distribution given ground truth parameters that describe
distributions. Since real-world graphs are not limited       this evolution. This challenge lead to the elaboration of
to power-law distributions, BTER [6] and its extension       the RTGEN tool that allows the approximation of any
Darwini [7] and GMark [8] produce graphs with any user       given sequence of degree distributions that describes the
defined distribution.                                        evolution of the graph. We firmly believe, that the degree
   Another graph generation model producing a given          distribution is a key feature that characterizes graphs,
degree distribution is the CL model, forming the basis       hence, it should not be disregarded in graph generation
of the RTGEN tool. This model can be regarded as a           tools.
successor of the Erdos-RΓ©nyi model [23] that is designed
for the generation of random graphs and a variant of
the edge configuration model of Newman et al. [24].          8. Conclusion
It was extensively discussed and reused [17, 28, 29, 30].
                                                             In this paper, we addressed the generation of temporal
We choose to extend this model for its simplicity and
                                                             graphs that represents a critical challenge in the design
scalability.
                                                             of benchmarks specific for evaluating temporal graph
   Besides, a number of existing graph generators are
                                                             management systems. That is, we proposed RTGEN, a
community-aware in the sense that they collect vertices
                                                             temporal graph generator that produces a sequence of
that are more densely connected between each other
                                                             graph snapshots whose community structure and evolu-
than they are with the rest of the graph, in separate or
                                                             tion of the degree distribution results from approximating
overlapping subgraphs called communities [9, 10, 11].
                                                             user defined parameters. This generation procedure con-
Although these generators preserve a given community
                                                             sists of relatively generating a graph snapshot from a
structure, they fail to produce a graph with respect to a
                                                             previous one by applying a number of atomic graph op-
given degree distribution. In this paper, we overcome this
                                                             erations. Our generation technique relies on an Optimal
limitation by allowing not only the generation of a given
                                                             transport solver to approximate a user-defined sequence
community structure but also a given degree distribution.
                                                             of degree distributions while minimizing the number of
   Despite the extensive work posited on the genera-
                                                             operations needed to transform one snapshot into its
tion of non-temporal graphs, the generation of tempo-
                                                             successor. We conducted a number of experiments that
ral graphs has received much less attention. For in-
                                                             validated the efficiency and accuracy of our generation
stance, DANCer [31] is capable of generating temporal,
                                                             procedure. In the future, we are planning to include a
community-aware property graphs. It separates opera-
                                                             dynamic community structure to RTGEN. Indeed, the
tions performed on communities (macro operations) from
                                                             communities found in real-world graphs are subject to
operations performed on vertices and edges (micro opera-
                                                             splits, merges, shrinks or expansions which should also
tions). ComAwareNetGrowth [32] is a community-aware
                                                             be modelled in synthetic graphs.
graph generator that is capable of creating growth only
graphs. APA (Attribute-Aware Preferential Attachment)
[33] is a graph generator capable of creating growth-only    References
property graphs based on a non-conventional triangle
closing. Instead of closing a triangle based on a uniform     [1] J. R. Clough, T. S. Evans, Time and citation net-
probability given as an input parameter, their proposed           works, arXiv preprint arXiv:1507.01388 (2015).
model consists of closing a triangle based on the simi-       [2] M. D. Mueller, D. Hasenfratz, O. Saukh, M. Fierz,
larity between the candidate edge’s endpoints. While              C. Hueglin, Statistical modelling of particle number
GMark [8] generates static graphs, EGG (Evolving Graph            concentration in zurich at high spatio-temporal res-
Generator) [34] proposes an extension including evolv-            olution utilizing data from a mobile sensor network,
ing properties attached to each vertex. EGG, however,             Atmospheric Environment 126 (2016) 171–181.
disregard the topological changes to the network and          [3] L. Akoglu, C. Faloutsos, Rtg: a recursive realistic
narrow the temporal evolution of the graph to property            graph generator using random typing, in: Joint
     European Conference on Machine Learning and                   15882.
     Knowledge Discovery in Databases, Springer, 2009,        [18] A. G. Labouseur, J. Birnbaum, P. W. Olsen, S. R.
     pp. 13–28.                                                    Spillane, J. Vijayan, J.-H. Hwang, W.-S. Han, The
 [4] D. Chakrabarti, Y. Zhan, C. Faloutsos, R-mat: A               g* graph database: efficiently managing large dis-
     recursive model for graph mining, in: Proceedings             tributed dynamic graphs, Distributed and Parallel
     of the 2004 SIAM International Conference on Data             Databases 33 (2015) 479–514.
     Mining, SIAM, 2004, pp. 442–446.                         [19] M. Then, T. Kersten, S. GΓΌnnemann, A. Kemper,
 [5] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Falout-         T. Neumann, Automatic algorithm transformation
     sos, Z. Ghahramani, Kronecker graphs: An ap-                  for efficient multi-snapshot analytics on temporal
     proach to modeling networks, Journal of Machine               graphs, Proceedings of the VLDB Endowment 10
     Learning Research 11 (2010) 985–1042.                         (2017) 877–888.
 [6] T. G. Kolda, A. Pinar, T. Plantenga, C. Seshadhri, A     [20] C. Ren, E. Lo, B. Kao, X. Zhu, R. Cheng, On querying
     scalable generative graph model with community                historical evolving graph sequences, Proceedings
     structure, SIAM Journal on Scientific Computing               of the VLDB Endowment 4 (2011) 726–737.
     36 (2014) C424–C452.                                     [21] W. Aiello, F. Chung, L. Lu, A random graph model
 [7] S. Edunov, D. Logothetis, C. Wang, A. Ching, M. Ka-           for power law graphs, Experimental Mathematics
     biljo, Darwini: Generating realistic large-scale so-          10 (2001) 53–66.
     cial graphs, arXiv preprint arXiv:1610.00664 (2016).     [22] F. Chung, L. Lu, Connected components in ran-
 [8] G. Bagan, A. Bonifati, R. Ciucanu, G. H. Fletcher,            dom graphs with given expected degree sequences,
     A. Lemay, N. Advokaat, gmark: Schema-driven                   Annals of combinatorics 6 (2002) 125–145.
     generation of graphs and queries, IEEE Transac-          [23] P. Erdos, A. rΓ©nyi on random graphs i, Publ. Math.
     tions on Knowledge and Data Engineering 29 (2016)             Debrecen 6 (1959) 290–297.
     856–869.                                                 [24] M. E. Newman, D. J. Watts, S. H. Strogatz, Random
 [9] P. W. Holland, K. B. Laskey, S. Leinhardt, Stochastic         graph models of social networks, Proceedings of the
     blockmodels: First steps, Social networks 5 (1983)            national academy of sciences 99 (2002) 2566–2572.
     109–137.                                                 [25] R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya,
[10] B. Karrer, M. E. Newman, Stochastic blockmodels               A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos,
     and community structure in networks, Physical                 K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud,
     review E 83 (2011) 016107.                                    H. Janati, A. Rakotomamonjy, I. Redko, A. Rolet,
[11] B. KamiΕ„ski, P. PraΕ‚at, F. ThΓ©berge, Artificial bench-        A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard,
     mark for community detection (abcd): Fast ran-                A. Tong, T. Vayer, Pot: Python optimal transport,
     dom graph model with community structure, arXiv               Journal of Machine Learning Research 22 (2021)
     preprint arXiv:2002.00843 (2020).                             1–8. URL: http://jmlr.org/papers/v22/20-451.html.
[12] P. Holme, Modern temporal network theory: a              [26] M. Bastian, S. Heymann, M. Jacomy, Gephi: an open
     colloquium, The European Physical Journal B 88                source software for exploring and manipulating
     (2015) 234.                                                   networks, in: Third international AAAI conference
[13] E. Pitoura, Historical graphs: models, storage, pro-          on weblogs and social media, 2009.
     cessing, in: European Business Intelligence and Big      [27] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefeb-
     Data Summer School, Springer, 2017, pp. 84–111.               vre, Fast unfolding of communities in large net-
[14] Y. Miao, W. Han, K. Li, M. Wu, F. Yang, L. Zhou,              works, Journal of statistical mechanics: theory and
     V. Prabhakaran, E. Chen, W. Chen, Immortal-                   experiment 2008 (2008) P10008.
     graph: A system for storage and analysis of tempo-       [28] F. Chung, F. R. Chung, F. C. Graham, L. Lu, K. F.
     ral graphs, ACM Transactions on Storage (TOS) 11              Chung, et al., Complex graphs and networks, 107,
     (2015) 1–34.                                                  American Mathematical Soc., 2006.
[15] U. Khurana, A. Deshpande, Storing and analyz-            [29] A. Pinar, C. Seshadhri, T. G. Kolda, The similarity
     ing historical graph data at scale, arXiv preprint            between stochastic kronecker and chung-lu graph
     arXiv:1509.08960 (2015).                                      models, in: Proceedings of the 2012 SIAM Interna-
[16] M. Haeusler, T. Trojer, J. Kessler, M. Farwick,               tional Conference on Data Mining, SIAM, 2012, pp.
     E. Nowakowski, R. Breu, Chronograph: A ver-                   1071–1082.
     sioned tinkerpop graph database, in: International       [30] M. Winlaw, H. DeSterck, G. Sanders, An in-depth
     Conference on Data Management Technologies and                analysis of the chung-lu model, Technical Report,
     Applications, Springer, 2017, pp. 237–260.                    Lawrence Livermore National Lab.(LLNL), Liver-
[17] F. Chung, L. Lu, The average distances in random              more, CA (United States), 2015.
     graphs with given expected degrees, Proceedings of       [31] O. Benyahia, C. Largeron, B. Jeudy, O. R. ZaΓ―ane,
     the National Academy of Sciences 99 (2002) 15879–             Dancer: Dynamic attributed network with com-
     munity structure generator, in: Joint European
     Conference on Machine Learning and Knowledge
     Discovery in Databases, Springer, 2016, pp. 41–44.
[32] F. Gursoy, B. Badur, A community-aware network
     growth model for synthetic social network genera-
     tion, arXiv preprint arXiv:1901.03629 (2019).
[33] A. Aghasadeghi, J. Stoyanovich, Generating evolv-
     ing property graphs with attribute-aware preferen-
     tial attachment, in: Proceedings of the Workshop
     on Testing Database Systems, 2018, pp. 1–6.
[34] K. Alami, R. Ciucanu, E. M. Nguifo, Synthetic graph
     generation from finely-tuned temporal constraints.,
     in: TD-LSG@ PKDD/ECML, 2017, pp. 44–47.
[35] B. Duan, W. Luo, H. Jiang, L. Ni, Dynamic social
     networks generator based on modularity: Dsng-
     m, in: 2019 2nd International Conference on Data
     Intelligence and Security (ICDIS), IEEE, 2019, pp.
     167–173.