=Paper= {{Paper |id=Vol-2841/BigVis_5 |storemode=property |title= Graph Cities: Their Buildings, Waves, and Fragments |pdfUrl=https://ceur-ws.org/Vol-2841/BigVis_5.pdf |volume=Vol-2841 |authors=James Abello,Daniel Nakhimovich,Chengguizi Han,Mridul Aanjaneya |dblpUrl=https://dblp.org/rec/conf/edbt/AbelloNHA21 }} == Graph Cities: Their Buildings, Waves, and Fragments== https://ceur-ws.org/Vol-2841/BigVis_5.pdf
             Graph Cities: Their Buildings, Waves, and Fragments
                   James Abello1                 Daniel Nakhimovich                    Chengguizi Han           Mridul Aanjaneya
                                            Department of Computer Science, Rutgers University, USA
                                                      1 DIMACS, Rutgers University, USA




Figure 1: When a direct graph layout is unavailable or impractical due to memory or screen size constraints, our method
can render humanly interpreted visualizations of massive graphs on the order of 10 seconds (after a few minutes of pre-
processing) by leveraging the Graph Waves [3] decomposition. (Left) A Graph City for the Friendster social network with
1.8 billion edges. (Right) The internal structure of a building from the city with peel value 13 showing finer scale connec-
tivity. By leveraging hierarchical decompositions, our framework allows for interactive visualization of massive graphs.
ABSTRACT                                                                               all the elements of their corresponding Graph Cities are built in
“Graph Cities” are 3D representations of maximal edge graph                            a few minutes (excluding I/O time) and storage proportional to
partitions. Each connected equivalence class corresponds to a                          the number of edges and vertices of a graph. Our ultimate goal is
“Building” that is formed by stacking graph “Edge Fragments”.                          to obtain humanly-interpretable hierarchical descriptions of any
The number of such graph edge fragments determines the height                          graph that are accessible via a Unified Web Interface for Graph
of the building. The overall number of buildings is the number                         Analytics, without being constrained by the graph size.
of equivalence classes in the edge partition. A poly-log buck-
etization of the size distribution of the equivalence classes is
used to generate a 2D position for each bucket. For the buckets                        1   INTRODUCTION
containing more than one equivalence class, we also generate                           Techniques for analyzing massive data sets are becoming central
a visual “Bush” representation. The Delaunay triangulation of                          to several communities, with the need for interactive and scalable
these building locations determines the “street network” of the                        visualizations being one of the most pressing issues [27]. Since
Graph City. The weight of a connection between two buildings                           a large variety of massive data sets can be abstracted as hav-
on this street network is proportional to the intersection of the                      ing an underlying graph topology, our interest is in computing
subgraph vertex sets represented by the two buildings. To handle                       graph decompositions that are useful for making sense of massive
equivalence classes (i.e., buildings) consisting of a large num-                       graphs, for which a direct layout is unavailable due to memory
ber of fragments, we use the notion of “Graph Waves” from                              constraints or impractical due to screen size constraints. We fo-
[3]. Graph Waves are intervals of graph edge fragments with a                          cus on the identification of “global data patterns” that emerge
“well-defined” beginning and end fragment. For computational                           due to the co-occurrence of pairs of well-defined data entities.
purposes, the beginning and end fragments should satisfy a com-                           Motivated by [12] who proved that the graph degree sequence
putationally “easy to verify” property. We illustrate Graph Cities                     solely determines the expected Hopfield network pattern stabil-
obtained with the maximal edge partitions defined by the iter-                         ity, we wanted to find an “efficient” visual representation of any
ative edge core decomposition introduced in [4]. The graphs                            graph that is driven by the dismantling of its degree distribution.
used include the Friendster social network (1.8 billion edges),                        We introduce Graph Cities as a visual representation of such
a co-occurrence keywords network derived from the internet                             dismantling. It has the potential of bringing together streaming
movie database (115 million edges), and a patents citation net-                        computations and visualization to offer “large scale” structural
work (16.5 million edges). For graphs with up to 2 billion edges,                      graph information without losing the ability to interactively ex-
                                                                                       tract finer scale connectivity. All this is possible by streaming
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   Graph Waves [3], which are in turn streams of graph fragments.
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       The obtained visual representations can be rendered in a few
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)                                                              minutes and offer humanly interpretable large scale features of
                                                                                       graphs with billions of edges at different levels of granularity.
Figure 2: Overview of our proposed framework. (Left) The individual buildings in the graph city, which comprise of cylin-
drical edge fragments. (Middle) The entire graph city. (Right) Bushes representing buckets with more than one building.

   The coarsest views are offered by Graph City buildings with          • A 2D spiral layout is used to fix building locations, and also
floors and frustums representing Waves and their interconnecting          for visualizing the street network, whose edge widths are de-
Edge Fragments. Each building internal structure is represented           termined by the intersection graph of the building vertex sets.
by a Directed Acyclic Graph (DAG) whose macro-vertices can be           • Smaller buildings are bucketed into “green” areas of bushes
expanded into their internal fragments, that can be navigated by          that are generated by special natural-looking L-systems.
usual node link diagrams at different levels of detail depending        • Graph Cities can be navigated at different levels of granularity.
on their size. In summary, our approach follows a hierarchical          • The largest buildings within graphs with over a billion edges
edge decomposition with five levels: Edge Graph Decomposition,            are rendered in a few seconds. The algorithms computing the
Buildings, Waves, Edge Fragments, with the bottom level consist-          data required to specify the rendering of a graph city from a
ing of “structured” node-link subgraphs of reasonable size that           given wave decomposition are linear both in time and storage.
make them amenable to human descriptions. To our knowledge,
                                                                           As far as we know, Graph Cities constitute the first abstract vi-
this is the first time that such a large scale representation has
                                                                        sual representation of node link representations of graphs that are
been introduced for visualizing graph data sets, that is humanly
                                                                        linearly computable, and that are amenable to interactive topo-
interpretable at “natural” topological levels of granularity.
                                                                        logical exploration at different levels of granularity for massive
                                                                        graph data sets. We plan to provide browser access to our tools
1.1    Summary of Our Overall Approach                                  for the exploration of graphs with up to a few billion edges. In
An overview of our proposed framework is illustrated in Figure 2.       this paper, we focus on the computational techniques to provide
Our work builds upon the Graph Wave decomposition recently              scalable visible abstractions for large graphs. In the future, we
introduced in [3], which is a refinement of the iterative degree        will couple the graph city abstraction with semantic information.
edge partition [4]. Each connected equivalence class is visually           The paper layout is as follows: Section 2 summarizes relevant
represented as a “building”. The size distribution of the equiva-       work. In Section 3, we introduce the main definitions and illus-
lence classes in the given edge partition is bucketed via a poly-log    trate the main conceptual tools borrowed from a previous work:
function of the total number of edges in the entire data set. Buck-     graph Edge Fragments and Graph Waves [3]. Section 4 introduces
ets containing more than one building are visually represented by       Graph Cities, their buildings, bushes, waves and fragments, and
“bushes” that are generated via special L-systems (see green areas      Section 5 describes what the drawn street network represents.
in Figure 2(right)). A spiral embedding of this poly-log expression     Section 6 discusses 3 data sets used, and statistics of their size and
provides a layout for all the equivalence classes in a graph city.      various decompositions. Sections 7 and 8 discuss the rendering
    Each graph city has a “street network” that is obtained by com-     of graph cities and their interactive navigation. Section 9 briefly
puting the intersection graph of the collection of vertex sets of       mentions end-user goals. Section 10 discusses avenues for future
the buildings. The 2D positions of the centers of the bottom floors     research and Section 11 closes with concluding remarks.
of the buildings are used to indicate the location of the building
vertex sets. The weight of the geometric edge representing the          2    RELATED WORK
distance between two buildings is obtained as a function of the
                                                                        Efforts to deal with large graphs have mainly employed compu-
size of the intersection of the corresponding buildings vertex sets
                                                                        tational approaches to handle scale. Generally, computation and
(i.e., a Jaccard-type coefficient).
                                                                        visualization appear to be treated as independent tasks. One ap-
    All the elements of a graph city including its buildings, bushes,
                                                                        proach is to develop scalable algorithmic tools that amplify users’
and street network are built in a few minutes (excluding I/O time)
                                                                        understanding of the underlying data topologies at different lev-
and storage proportional to the number of edges and vertices of
                                                                        els of granularity. In general, macro graph views can in principle
the graph. We exemplify our results on a variety of large graph
                                                                        be obtained by some form of vertex or edge aggregation that is
data sets ranging in size from 1 million to over 2 billion edges.
                                                                        conceptually represented as hierarchy trees [5, 8, 13, 26, 32]. The
They include the Friendster social network (1.8 billion edges), a co-
                                                                        choice of representations that facilitate smooth visual interaction
occurrence keywords network derived from the internet movie
                                                                        is a subject of active research [19, 22, 23, 25].All these previ-
database (115 million edges), and a patents citation network (16.5
                                                                        ous techniques have algorithms with running time substantially
million edges). To summarize, our main contributions are:
                                                                        greater than linear on the number of graph elements, making
• Graphs become represented as a collection of buildings with           them not suitable for massive graph visualization. Sampling, as a
  floors, with inter-floor volumes encoding subgraph sizes.             mechanism to shed light on graph structure, has been explored in
[24]. Graph Thumbnails, as a mechanism to identify and compare
multiple graphs, are alone the subject of [31]. Uses of the Core
decomposition to grasp some coarse graph topological views
are discussed in [15]. Generation of graphs with a predefined
core structure is the focus of [10]. Computational aspects of
core related graph decompositions and graph sparsification are
studied in [7, 9, 18, 20, 28, 30].Some algorithmic principles for
graph reachability in large graphs are proposed in [17]. Machine
learning approaches, such as those described in [16], have been
proposed to learn low-level embeddings of graphs, however, such
approaches are not yet scalable to billion edge graphs.
   Our approach differs from prior work in the sense that we look
for efficient data traversal algorithms via exploration primitives
that lend themselves to visual representations that amplify users’
understanding of the internal graph structure for graphs that
are too large to be visualized directly. Buildings, Waves, and
Fragments are examples of such primitives.
                                                                         Figure 3: Graph cities for (top left) the Friendster network
                                                                         (1.8 billion edges), (top right) movie phrase co-occurrence
3     PROBLEM FORMULATION                                                network (115 million edges), and (bottom left) the patent
We aim to provide visual abstractions for representing edge par-         citation network (16.5 million edges). The Delaunay tri-
titions of graphs to highlight connectivity and size distribution        angulation (bottom right) of the spiral building layout
of subgraphs with special degree distributions - fixed points in         (green). The red street network corresponds to the white
our case (see Def. 3). We achieve this by first computing a maxi-        street network in the city on the bottom left.
mal edge partition of the graph as in [4]. We then represent the
                                                                            Definition 1. (Peel Value) The peel value of a vertex 𝑢 ∈
equivalence classes of the given edge partition as “buildings”.
                                                                         𝑉 (𝐺), denoted 𝑝𝑒𝑒𝑙𝐺 (𝑢), is the largest 𝑖 ∈ [1, 𝑑𝑒𝑔(𝑢)] such that 𝑢
Each building consists of a collection of “Waves”, each of which
                                                                         belongs to a subgraph of 𝐺 of minimum degree 𝑖.
is formed by a stack of “Edge Fragments” [3], i.e., Waves are
ordered maximal sequences of graph Edge Fragments (see Def. 5               Definition 2. (Graph Core) The core of 𝐺, denoted 𝑐𝑜𝑟𝑒 (𝐺),
and 6). This section will first describe what the graph structures       sometimes also called the k-core of 𝐺, is the subgraph induced by
called fixed points, waves, and fragments are and then the visual        the maximal subset of vertices of 𝐺 whose peel value is maximum.
representations we use to display them will be described in Sec. 4.          Definition 3. A graph 𝐹𝑘 is a fixed point of degree peeling 𝑘,
                                                                         if 𝑐𝑜𝑟𝑒 (𝐹𝑘 ) = 𝑉 (𝐹𝑘 ) and the peel value of each vertex in 𝐹𝑘 is 𝑘.
3.1    Graph Edge Fragments and Waves
                                                                            Figure 4 shows vertices in a small graph colored by their peel
One approach to getting a sense of the topology of a graph be-           value. It also shows edges colored according to the iterative edge
gins with a choice of a starting set of vertices 𝑆 0 satisfying a        decomposition from [4] which obtains an edge partition by re-
property of interest 𝑃 and marking them as “explored”. All the           moving the highest peel value edges (i.e., initially the red edges),
edges with at least one endpoint in 𝑆 0 are then traversed, deleting     updating the vertex peeling values, recoloring the affected edges,
them from the graph and adding them to the “Edge Fragment”               and identifying the next highest peel value edges and repeating
associated with 𝑆 0 . These edges then become the beginning part         until the whole graph is processed. A graph city (Sec. 4) can be
of the “Graph Wave” generated by 𝑆 0 . If there are any edges with       derived from this edge partition by mapping each connected edge
one endpoint not in 𝑆 0 , we check whether those vertices not in 𝑆 0     color class into a building.
still satisfy a property of interest 𝑄 (usually, a relaxation of 𝑃) in
the remaining graph. If there are any such vertices we continue
exploring in parallel only from such vertices, adding incremen-
tally the new found edges into the current Wave started by 𝑆 0 ,
and deleting them from the existing graph. This process ends
when all edges with exactly one endpoint in the current Wave
lead to vertices that do not satisfy 𝑃. This means that if there
still remains “unexplored” edges, a new Wave can be initiated
by selecting another starting set of vertices satisfying a property
of interest 𝑅 (usually, stricter than 𝑃). We formalize the process       Figure 4: Left: The vertices are colored according to their
above with the following definitions.                                    peel value: 1-grey, 2-blue, 3-orange, 4-red. Right: The
                                                                         edges are colored according to the iterative edge decom-
3.2    Definitions                                                       position using the same color scale.
We consider undirected graphs 𝐺 = (𝑉 , 𝐸) with vertex set
𝑉 (𝐺) and edge set 𝐸 (𝐺). We denote by 𝑛 the number of vertices
                                                                         3.3    Waves and their Edge Fragments
in 𝑉 , 𝑚 the number of edges in 𝐸, and the degree of a vertex            Next, we present some new definitions that generalize the notion
𝑢 ∈ 𝑉 by deg(𝑢). For any 𝑈 ⊂ 𝑉 , let the notation 𝐺 (𝑈 ) denote          of graph Edge Fragments and Waves introduced in [3].
the subgraph of 𝐺 induced by 𝑈 . For the sake of completeness,
we restate some definitions from [4].                                       Definition 4. The boundary of a vertex set 𝑆 ⊂ 𝑉 is defined
                                                                         as 𝜕𝑆 = {𝑣 ∈ 𝑉 : ∃ (𝑢, 𝑣) ∈ 𝐸, 𝑢 ∈ 𝑆, 𝑣 ∉ 𝑆 }. The proper
boundary of S, denoted 𝜕𝑃 𝑆, is the set of vertices in 𝜕𝑆 which
satisfy a desired property P restricted to the graph induced by 𝑉 \ 𝑆.
These definitions are extended, in a straightforward fashion, to a
collection of disjoint sets by taking their union.
   Definition 5. Given a vertex subset 𝑆 ⊂ 𝑉 , the edge fragment
frag(S) generated by S is the set of edges (𝑢, 𝑣), such that 𝑢 ∈ 𝑆.
   Definition 6. A graph wave 𝑊 (𝑆 0, 𝑃) is the union of frag-
ments in the sequence (frag(𝑆 𝑗 ))𝑚𝑗=0 , with 𝑆 0 being the source set
                                                           Ð𝑗
of vertices satisfying 𝑃 and each subsequent 𝑆 𝑗+1 = 𝜕𝑃 ( 𝑖=0 𝑆𝑖 ).
   3.3.1 Why are Graph Waves useful abstractions? Waves were
inspired by previous graph exploration approaches based on
Sparse Nets [2] and Boruvka MST type contraction algorithms
[1]. Graph Waves generated by minimum degree source sets were
introduced in [3]. In this work, we generalize this notion and use
Edge Fragments to describe and visually represent the Waves’
                                                                           Figure 5: An “interesting” portion of the spanning Meta-
internal structure (see subsection 3.3.2 below).
                                                                           DAG of a fixed point of peel value 101 from the movie
   Graph Waves provide different “lenses” into the structure of
                                                                           phrase co-occurrence network. The whole fixed point con-
a graph according to a particular property or substructure. For
                                                                           tains 17,311 vertices and 182,085 edges while the spanning
example, if 𝑆 0 is non-empty and consists of the set of vertices not
                                                                           Meta-DAG only has 4181 vertices and 10,016 edges.
in a given maximal edge-matching 𝑀, then the Wave generated
by such an initial set 𝑆 0 provides a layered view of G as a sequence      case we already have a simple description of the Wave: it is a tree
of independent sets, and the last Wave Edge Fragment is a perfect          with the number of fragments equal to the radius of the tree.
matching. This is because, for this example, 𝑆 0 is the complement
of a maximally matched set of vertices. We refer to this example
                                                                           4     WHAT IS A GRAPH CITY?
as the Maximal Matching Wave. Graph Waves derived from a
graph’s degree distribution are essential for discovering the non-         We assume that the input is some ordered partition (𝐸 0, 𝐸 1, ..., 𝐸𝑧 )
regular macro structure of very large graphs, and at the same time         of the edges of a graph 𝐺 = (𝑉 , 𝐸), where each 𝐸𝑖 is edge-maximal
help isolate Edge Fragments with peculiar levels of regularity.            with respect to a predefined property. An efficient representation
For example, if 𝑘 is the minimum degree of a graph 𝐺 and if 𝑆 0            for such partitions is stored as a set of triples (source, target,
consists of all the vertices of degree 𝑘, and subsequent boundary          label𝑖 ). This assumption is justified since the edges of any graph
sets are restricted to have degree strictly less than this minimum         𝐺 can be efficiently partitioned into edge-maximal subgraphs 𝐺𝑘 ,
𝑘, the corresponding Wave is called a Fixed Point of Degree 𝑘              each of minimum degree at least 𝑘 and average degree not more
in [4] with a vertex source set of minimum degree. We refer to             than 2𝑘 [4]. Furthermore, we assume that for each subgraph in
                                                                                                           𝑧
                                                                           the sequence (𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ))𝑘=0    the Wave and Edge Fragment
these subgraphs as Minimum Degree Waves.
   Graph Waves can be adapted to the particular properties of              decompositions from [3] has already been computed. This decom-
the boundary vertex sets and Edge Fragments being discovered               position is stored in three parts: triples of (source,target,unique
during the algorithmic exploration of a large unknown graph                fragment id), a mapping from unique fragment ids to wave num-
topology. Efficiently and automatically determining the most               bers, and a mapping from wave numbers to edge label𝑖 s. The
appropriate properties to generate the Waves of a particular               vertex set 𝑉𝑘 of each such subgraph 𝐺𝑘 can be partitioned into
graph is an interesting direction for future work. In this work,           ordered sets 𝑉𝑘,𝑗 that correspond precisely to Minimum Degree
we only use Waves generated based on vertex degree thresholds.             Waves(𝑆 0, 𝑃) (see Section 3.3), where 𝑆 0 consists of the vertices
   3.3.2 Meta-DAG Internal Structure of a Wave(𝑆 0 ,𝑃). Since              of minimum degree, and the property 𝑃 corresponds to boundary
the edges of a Wave(𝑆 0, 𝑃) are obtained by taking the union of            vertices of degree less than the degree of vertices in 𝑆 0 [3].
Edge Fragments in the sequences (𝑓 𝑟𝑎𝑔(𝑆 𝑗 ))𝑚                                A Graph City is a 3D representation of a given maximal edge
                                                     𝑗=0 , where 𝑆 𝑗+1 =
    Ð𝑗                                                                     graph partition, as shown in Figure 3. It consists of a floor plan
𝜕𝑃 ( 𝑖=0 𝑆𝑖 ), the Wave vertex set is an ordered partition of sub-         of buildings (one per equivalence class), bushes that represent
sets (𝑆 0, 𝑆 1, ...𝑆𝑚 ). We use the connected components of the sub-       clusters of small buildings, and a weighted street network.
graphs induced by each subset to define a Meta-DAG, where each
macro-vertex represents such connected components. Weighted
directed meta-edges ((𝐶 𝑗,𝑢 , 𝐶𝑙,𝑣 ),𝑊𝑢,𝑣 ) encode nonempty set of
                                                                           4.1    Graph City Buildings
edges {(𝑥, 𝑦) : 𝑥, 𝑦 ∈ 𝐶 𝑗,𝑢 ∪ 𝐶𝑙,𝑣 }, where 𝐶 𝑗,𝑢 and 𝐶𝑙,𝑣 are con-       For each edge-maximal subgraph 𝐺𝑘 , we use the ordered se-
nected components of 𝑆 𝑗 and 𝑆𝑙 . Finally, the weight 𝑊𝑢,𝑣 encodes         quence of vertex subsets (𝑉𝑘,𝑗 )ℎ𝑗=0 , their “internal” edges, and
the density of edges running between 𝐶 𝑗,𝑢 and 𝐶𝑙,𝑣 .                      those edges running between consecutive levels to create a vi-
   The spanning subgraph of this metagraph that consists of                sual representation for each such fixed point 𝐺𝑘 , that resembles
only those meta-edges that connect components present in con-              a building in a city with ℎ floors (see Figure 2).
secutive vertex sets, 𝑆 𝑗 , 𝑆 𝑗+1 , is called the Spanning Meta-DAG            This building representation provides an alternative view of
of the Wave (shown in Figure 5). It describes some of the most             a fixed point 𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ) that can be computed in time and
fundamental directed internal macro-connectivity of a Wave and             space linearly dependent on the size of 𝐺𝑘 , plus the complexity
is expected to be substantially smaller that the Wave itself. An           of finding and/or “describing” the initial subset 𝑉𝑘,0 of 𝑉𝑘 . That
extreme case in which this is not the case is when the Wave is a           is, the macro-structure of any fixed point can be described as a
tree with the source set being the tree leaves. However, in this           “building” whose internal structure is a Meta-DAG (see Section
                                                                          condition, i.e., the number of connections to non-Wave vertices
                                                                          is larger than the degree of vertices in the Wave source set.

                                                                          4.2    Summary Graph City Sculpture
                                                                          To provide an overview of the size distribution of fixed points
                                                                          in the set of buildings in a graph city we use a summary sculp-
                                                                          ture (see Figure 7). The aspect ratio of the sculpture encodes
                                                                          the average gap between consecutive fixed point values from
Figure 6: A flag on a building from the (left) Friendster
                                                                          the iterative edge core decomposition. The taller it is, the larger
City, and (right) from the patent citation network city.
                                                                          the average gap. This sculpture is obtained by considering each
                                                                          building as its set of connected components. All these edge maxi-
3.3.2), with macro-vertices representing connected components
                                                                          mal connected subgraphs with the same peel value and the same
of seed sets within each building floor (Figure 5).
                                                                          size are represented by cylinders encoding their frequency. Each
   A graph building with ℎ floors representing a fixed point
                                                                          cylinder of a particular peel value appears at a unique height
𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ) is completely determined by the “Disjoint Union
                                                                          in the sculpture. Larger cylinder radii correspond to higher fre-
of ordered Edge Fragments” specified by the partition of 𝑉𝑘 into
                                                                          quency of a particular size. All the disks representing connected
level sets (𝑉𝑘,𝑗 )ℎ𝑗=0 . Our representation requires only 5ℎ numbers.
                                                                          components with the same peel value (i.e., associated with the
For each wave we specify 2 disk radii, the starting height, the           same building) are stacked vertically on top of each other sorted
color of a frustum, and a light intensity for night view.                 by size. The fixed point value is encoded by rainbow coloring the
   The “𝑗-th floor” of a building representing 𝐺𝑘 corresponds to          corresponding cylinders (increasing from blue to red). Our inter-
the subgraph induced by a subset of vertices 𝑉𝑘,𝑗 . Each floor is         face provides access to the location of all buildings in the graph
represented by two concentric disks, one above the other. The             city that correspond to a fixed point selected in the sculpture.
radius of the bottom disk encodes the number of vertices in
the seed set of the starting fragment of the corresponding floor.
The radius of the top disk encodes the total number of vertices
besides the seed set vertices. The top disk is placed at the same
height as the bottom disk of the next floor. A frustum between the
bottom disks of adjacent floors is set to have volume encoding
the number of edges running from one floor to the next. Since
the radii of the two disks is already determined, the height of
the frustum is calculated from the desired volume. For the last
floor the height of the top disk is determined from the volume
corresponding to the total number of edges on the floor.
   A special fixed color map across the entire graph is used to
encode the density of a variety of induced subgraphs. For exam-
ple, the color of a frustum represents the density of the set of          Figure 7: Graph City Sculptures for (left) the Friendster
edges running between the corresponding two floors, and the               network, (middle) the movie co-occurence phrase net-
same color map is used to highlight the density of the connected          work, and (right) the patent citation network. Note that
components within a floor, as shown in Figure 2.                          the largest data set need not have the tallest city sculpture.
   A flag on top of a building displays summary information
that includes the distribution of fixed point values of that bucket.         4.2.1 Vertex Diversity and Light Intensities. Graph cities can
Recall that “fixed point” refers to the edge-maximal subgraph 𝐺𝑘 .        be seen as 3D representations of a coloring of the edges, where
For buildings that are alone in a bucket the flag shows the number        colors encode the edge peel values. This coloring partitions the
of floors, the peel value, and the number of vertices and edges of        edges adjacent to any particular vertex by their assigned color.
the corresponding fixed point. Figure 6 shows both these cases.           The frequencies of these local colors for a vertex defines a profile
The height of the flag cloth encodes the total number of edges            vector for that vertex. Following [4] we compute this profile vec-
in 𝐺𝑘 , and its width encodes the total number of vertices. The           tor’s Shannon Entropy and use it as a measure of the “diversity”
length of the flag pole is the overall edge density of 𝐺𝑘 . If the user   of the color pattern of the local edge coloring around each vertex.
toggles the night view mode, a checkerboard of lights is applied          Higher “diversity” of a vertex is an indicator of a higher weighted
to the frustum between all floors, as shown in Figure 9(right).           level of participation of that vertex in a given edge partition. It is
The light intensity at each floor is proportional to the number           worth noting that diversity is a more expressive measure than
of vertices shared between the set of edges represented by that           degree. Specifically, very high degree vertices can have very low
floor and its complement with respect to the whole graph.                 diversity. We add “diversity” light intensities to the disks in the
   Handling buildings with lots of fragments does not present             City Sculpture to encode the average diversity of the vertices in
an issue because an intermediate structural level of granularity          the corresponding connected fixed point.
between Fixed Point Buildings and Fragments is provided by
Waves (Section 3.3). The floors of a building represent contigu-          4.3    Graph City Interpretation
ous segments of Fragments that satisfy some initial condition.            “Graph Cities” provide visual representations of the overall macro-
They are characterized by their source layer of vertices and an           structure of graphs with few billion edges, i.e., GigaGraphs, which
ending layer of vertices whose unexplored neighbors violate a pre-        are derived by mapping each connected equivalence class, of a
specified expansion condition. The beginning and end fragments            special edge partition, into a “city building”. Below we address
of Minimum Degree Waves specifically satisfy a bounded-degree             some common questions regarding their interpretation:
 (1) What do “floors” tell us about a “building” in a Graph
     City? The number ℎ of floors in a building (i.e. the number
     of waves) indicates a fixed point whose full exploration
     requires the sequential activation of ℎ disjoint seed sets.
     In cases where a “building” is used to represent an edge
     equivalence class with several connected components then
     the number of floors in the “building” corresponds to the
     maximum number of waves in any of its components.
 (2) What does a “building” volume represent? It encodes
     the number of edges of the represented edge equivalence
     class, i.e., a fixed point of degree peeling. A building with
     no enclosure represents a more localized topology, i.e., is a
     “tree like” fixed point with only consecutive edges.               Figure 8: (Left) The bush skeleton for a fixed point of peel
 (3) How is the internal detailed structure of a “building”             value 1 from the Friendster network [21]. (Right) The com-
     made accessible for user exploration? It is represented            plete bush after applying 3 iterations of an L-system.
     by a Directed Acyclic Macro Graph obtained by contracting
     the connected components of each wave seed set. This DAG           one represented by the building) we sort the fixed points in a
     represents the connectivity between the connected compo-           bucket by size and uniformly select a few fixed points to draw as
     nents of all seed sets appearing in the waves. Our interface       bushes. These bushes visually show some of the same properties
     provides on-demand access to this DAG internal structure           as a building but are a much “rougher” view of a fixed point.
     for user navigation and exploration on a per building basis.          For generating a bush, a skeleton is first created based on the
A video illustrating our current interface can be found here1 .         parameters of a given fixed point (see Figure 8(left)). Then, a
                                                                        few iterations of an L-System are executed with axiom points
5 GRAPH CITIES STREET NETWORK                                           distributed along the skeleton drawing (see Figure 8(right)), to
                                                                        create natural-looking bushes. The skeleton consists of a central
5.1 Graph City Layout and Street Network                                stem with branch segments coming out at points in between.
The size distribution of the equivalence classes is used to generate    These junction points are spaced the same as the disks in the
a 2D position for each building. This is done by bucketing the          building representation of the fixed point; thus, the length of
fixed points of a graph by size and then mapping each such              stem segments correspond to the height of the respective frus-
bucket to a 2D location by following an Archimedean spiral. We          tums. The inclination of the stem segments ranges from 0 to 45
create buckets containing connected fixed points the same way           degrees off the vertical axis based on the density of edges in the
as [3]. These connected fixed points are grouped together into          frustum (0 degrees corresponding to 0 density and 45 degrees to
buckets according to the number of edges. Bucket 𝑖 has fixed            a density of 1). Two sets of branches are produced at each stem
points of size 𝑠, such that 𝑙𝑜𝑔𝑖−1 (𝑚) < 𝑠 ≤ 𝑙𝑜𝑔𝑖 (𝑚). We create        junction corresponding to the inner and outer disks in the build-
a building for the largest fixed point in each bucket 𝐵, and if         ing. The number and length of branches encodes the number of
|𝐵| > 1, we create “bushes” (see Section 5.1.1) for a representative    vertices represented at that level. The inclination of the branches
selection of 𝑙𝑜𝑔(8 ∗ (|𝐵| − 1)) fixed points from 𝐵. Additionally,      is proportional to the density of internal edges for that level. All
for such buckets we also draw a grass patch as a green polygon          branches are equally spaced at the junction and phase angles at
with 𝑙𝑜𝑔(8 ∗ (|𝐵| − 1)) + 2 sides (see Figure 2(right)). From each      successive junctions are randomized.
bucket, we display the tallest building and a flag with a histogram        We extended an implementation of a turtle graphics based
with peel values on the 𝑋 -axis and the number of buildings with        L-system interpreter2 to draw the underlying skeleton structure
that peel value and the maximum size of all these buildings. For        and then applied 2-3 iterations of a natural looking L-system,
buckets with one fixed point the flag just shows the peel value.        starting at evenly spaced lengths along the stem and branches.
   The Graph City street network is determined by the Delau-            The bushes and the green polygon together give the appearance
nay triangulation of the building locations in the spiral layout        of a “garden” for some of the buildings in the Graph City, that
(see Fig. 3). The weight of a connection between two buildings          are primarily centered around the periphery of the spiral layout
is proportional to the intersection of the subgraph vertex sets         on the ground, as shown in Figures 3 and 2(right). Although
represented by the two buildings. Graph-theoretically, the street       these bushes don’t reflect as much detail of the underlying fixed
network is determined by the intersection graph of the collection       point they represent as opposed to the building metaphor, they
                                                  𝑧 , which in turn
of vertex sets of the subgraphs (𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ))𝑘=0                    are useful for quickly getting a sense of scale for the size and
are determined by the given edge partition (𝐸 0, 𝐸 1, ..., 𝐸𝑧 ). When   distribution of other fixed points in a bucket while being more
a particular building is selected by the user, a Euclidean spanning     efficient to render than a whole building.
tree rooted at that building displays the corresponding street net-     6    DESCRIPTION OF OUR DATA SETS
work, which is obtained by a Breadth First Search. If the building
street network is disconnected, then we show a spanning forest          We show examples of our system applied to 3 datasets. Our largest
instead. Note that the connectivity of the street network only          data set is the Friendster social network consisting of 65,608,366
depends on the connectivity of the subgraphs represented by the         nodes each representing a user and 1,806,067,135 edges represent-
buildings and not the graphs represented by the entire bucket.          ing “friendships” between them. This dataset was retrieved from
                                                                        the Stanford Large Dataset Collection (SNAP) [21]. Our next data
   5.1.1 Bushes and L-Systems. To provide a visual indication           set is a graph of phrases used in movie reviews. There are 218,052
for the properties of fixed points in a bucket (besides the largest     nodes each representing a phrase and 115,050,370 edges, where
1 https://rutgers.box.com/s/qeygblwr5udeti9vxcmr0nj6swuf179n            2 https://github.com/andonutts/donatello
Figure 9: Different views provided by our interactive control menu when rendering Graph Cities in day and night modes.
each edge connects two phrases both used to describe the same           8    NAVIGATING A GRAPH CITY
movie in a review. This data set was derived from the Internet          The interactivity provided by Three.js [6] allows the user to ex-
Movie Database 3 . Our last data set is a patents citation network,     plore different parts of a Graph City conveniently from a browser.
also from [21]. There are 3,774,768 nodes each representing a           An interactive control menu is provided on the top right (see
patent and 16,518,947 edges each linked to a cited patent.              Figure 9(left)) for toggling between various options, such as the
   Table 1 shows the number of connected components (CC),               current data set being viewed, displaying building information,
connected fixed points (FP), peel value of the core (CV), maximum       changing camera/environment controls and day/night modes, or
number of waves among its fixed points (MW), and maximum                selecting different street networks. The user can use the mouse
number of fragments among its fixed points (MF) for each data           to zoom in and out, as well as pan around the Graph City. Upon
set. The spiral length is related to the total number of graph edges.   entering a particular building, the view changes to highlight the
The number of fragments in a building, i.e., the building’s height,     finer scale internal connectivity for each edge-maximal subgraph
encodes the longest path length in the building’s Meta-DAG.             (or fixed point) that constitutes a building, as shown in Figure 10.
                                                                           Our interactive visualization ultimately owes its speed to the
      Dataset    CC       FP      CV MW MF            RT                hierarchical Graph Wave decomposition [3]. At the macroscopic
   Friendster     1     29,692 304       212 3,279 11.82                level, our abstraction has a much smaller spatial complexity than
      Movies      38     2,044 3114      37      282    3               the actual data set, which makes rendering cheap. The only com-
      Patents 3,627 6,469         64     47      996   3.3              putation that happens is whenever the root for the street network
Table 1: Statistics for all data sets. The last column shows            changes, but this is only linear in the number of buildings. When
the rendering times (RT) for the graph city in seconds.                 the user views inside a particular building, although we show
                                                                        finer scale connectivity, it is still conveniently partitioned into
                                                                        Edge Fragments. Thus, rendering is never the bottleneck, as the
7    RENDERING GRAPH CITIES
                                                                        user is only visualizing a subset of the data.
We use Three.js [6], a 3D Javascript library to create and display
Graph Cities interactively in the web browser using WebGL.
Each building in the Graph City consists of several floors (i.e.,       9    CATALOGUE OF SUBGRAPH PATTERNS
Edge Fragments). For each floor, we instantiate a cylinder shape        A useful outcome of user or computer explorations of any graph
geometry, where the top and bottom face radii, height, and color        city will be a summary and an extensive catalogue of the subgraph
are chosen appropriately from the data (see Section 4.1), the           patterns found. For this to be feasible, users must be provided
number of balcony segments defaults to 6, with 3 windows per            with annotation and summarization tools that can keep track
floor in the night view. All floors are generated in the material       of their exploration trails. These patterns should be classified at
space, centered at the origin, and subsequently translated in the Y     least by size, density, frequency of occurrence, rarity, interest
(up) direction to form a building in the world space. Each building     and usefulness. To assess the efficacy of these tools we are cur-
is also translated in the X and Z directions to form the spiral         rently identifying a list of basic tasks that a user can perform in
layout (see Section 5.1). Bushes are generated with an L-system.        order to conduct a substantial number of user experiments. For
   On top of each building, a flag displaying summary informa-          example: can graph cities be used effectively on shortest path
tion for that building (i.e., edge-maximal subgraph) is added. A        approximations queries?
box shape geometry is used for the flag, and a cylinder shape              One of the ultimate goals of this work will be to have a descrip-
geometry is used for the mast. The length of the mast, size of the      tive semantic summary of the expected patterns that can be fed
flag, and color of the flag are all determined from the data set.       into a deep learning engine to learn to discriminate and find new
   To highlight the street network representing data flow from          patterns according to certain specified criteria. In our current
one building to the remaining Graph City, we precompute the             experimentation with a variety of data sets, the most naturally
Delaunay triangulation for all the buildings in the Graph City          detected patterns include: tree forests, cliques, bi-cliques, and
at the beginning. Subsequently, when a user selects a particular        hierarchical compositions of these basic patterns.
building from the interactive control menu (see Figure 9(left)),
we perform a Breadth First Search (BFS) from the root building          10    FUTURE WORK
to the rest, and display only those edges that are encountered
                                                                        Streaming Graph Cities: An interesting question is the feasibil-
in the search. The width of the edges are determined by the
                                                                        ity of efficiently updating the Wave and Fragment decomposition
data. Rendering times for the Graph City for each data set are
                                                                        when the streaming input graph is known a priori to be a fixed
summarized in the last column in Table 1.
                                                                        point or a Wave of fixed degree peeling. This is the main bot-
3 https://www.imdb.com/interfaces/                                      tleneck for streaming a Graph City. If the wave decomposition
Figure 10: Snapshots of a user navigating the Graph City for the patents citation network with interactive camera control.
is provided then we can easily obtain web renderings of the                          [4] J. Abello and F. Queyroi. 2013. Fixed points of graph peeling. In Proc. of the
corresponding individual building(s) at interactive frame rates.                         2013 IEEE/ACM Int. Conf. on Adv. in Soc. Net. Anal. and Mining. 256–263.
                                                                                     [5] James Abello, Frank Van Ham, and Neeraj Krishnan. 2006. Ask-graphview: a
   Composing Global Solutions from Local Ones: The Wave                                  large scale graph visualization system. IEEE TVCG 12, 5 (2006), 669–676.
and Edge Fragments decomposition will be more advantageous                           [6] Ed Angel and Eric Haines. 2017. An Interactive Introduction to WEBGL and
                                                                                         Three.JS. In ACM SIGGRAPH 2017 Courses. Article 17, 95 pages.
in those cases where a given question or structure of interest on                    [7] A. Arleo, O.-H. Kwon, and K.-L. Ma. 2017. GraphRay: Distributed pathfinder
𝐺 = (𝑉 , 𝐸) can be obtained as a “composition” of the local Wave                         network scaling. In 2017 IEEE 7th Symp. on Large Data Anal. and Vis. 74–83.
and/or Fragment solutions. For example, is there any algebraic                       [8] B. Bach, N. H. Riche, C. Hurter, K. Marriott, and T. Dwyer. 2017. Towards
                                                                                         unambiguous edge bundling: Investigating confluent drawings for network
relation between the all-pairs shortest path distances in 𝐺 versus                       visualization. IEEE Trans. on Visualization & Computer Graphics (2017), 1–1.
their restrictions to the Waves and/or Fragments of 𝐺?                               [9] V. Batagelj and M. Zaveršnik. 2011. Fast algorithms for determining core
   How to pick interesting Waves? Automating the process                                 groups in social networks. Adv. in Data Anal. and Class. 5, 2 (2011), 129–145.
                                                                                    [10] M. Baur, M. Gaertler, R. Görke, M. Krug, and D. Wagner. 2007. Generating
of choosing the property of interest for the Wave decomposition                          graphs with predefined k-core structure. In Eur. Conf. of Compl. Sys. Citeseer.
would be a powerful addition to the pipeline. Can the type of wave                  [11] O. Ben-Eliezer, L. Gishboliner, D. Hefetz, and M. Krivelevich. 2020. Very fast
                                                                                         construction of bounded-degree spanning graphs via the semi-random graph
be chosen efficiently based on local or global graph structure?                          process. In Proc. of the Symposium on Discrete Algorithms. SIAM, 718–737.
   Semi-random Graph Processes: It will be interesting to un-                       [12] Daniel Berend, Shlomi Dolev, and Ariel Hanemann. 2014. Graph Degree
derstand the type of Graph Waves that can be built by these                              Sequence Solely Determines the Expected Hopfield Network Pattern Stability.
                                                                                         Neural computation 27 (11 2014), 1–9.
semi-random processes with high probability in 𝑂 (𝑛) rounds                         [13] Tim Dwyer, Nathalie Henry Riche, Kim Marriott, and Christopher Mears. 2013.
where 𝑛 is the number of vertices [11].                                                  Edge compression techniques for visualization of dense directed graphs. IEEE
   Hypergraph Cities: A central problem in simplicial finite                             transactions on visualization and computer graphics 19, 12 (2013), 2596–2605.
                                                                                    [14] Peter Frankl and Andrey Kupavskii. 2018. The Erdös Matching Conjecture
set systems is to identify a global structure whose intersection                         and concentration inequalities. arXiv preprint arXiv:1806.08855 (2018).
with a given finite set family is highly concentrated around its                    [15] P. Govindan, C. Wang, C. Xu, H. Duan, and S. Soundarajan. 2017. The k-peak
                                                                                         decomposition: Mapping the global structure of graphs. In Proceedings of the
expectation. Random matchings are an example [14].                                       26th International Conference on World Wide Web. 1441–1450.
11     CONCLUSIONS                                                                  [16] W. L. Hamilton, R. Ying, and J. Leskovec. 2017. Representation learning on
                                                                                         graphs: Methods and applications. IEEE Data Engineering Bulletin (2017).
The iterative edge decomposition partitions the edges of a graph                    [17] R. Jin, N. Ruan, S. Dey, and J. Y. Xu. 2012. SCARAB: scaling reachability
                                                                                         computation on large graphs. In Proc. of Int. Conf. on Man. of Data. 169–180.
into Fixed Points of degree peeling; they are in turn decomposed                    [18] H. Kabir and K. Madduri. 2017. Shared-memory graph truss decomposition.
into Graph Waves and Edge Fragments. They provide mecha-                                 In 2017 IEEE 24th Int. Conf. on High Perf. Comput. (HiPC). IEEE, 13–22.
nisms that may help assess the topological and statistical reasons                  [19] M. Krommyda, V. Kantere, and Y. Vassiliou. 2019. IVLG: Interactive Visualiza-
                                                                                         tion of Large Graphs. Int. Conf. on Data Engineering (2019), 1984–1987.
that explain the emergence of a large class of bipartite graph-like                 [20] R. Laishram, A. Erdem Sar, T. Eliassi-Rad, A. Pinar, and S. Soundarajan. 2020.
patterns in very large graph data sets.                                                  Residual Core Maximization: An Efficient Algorithm for Maximizing the Size
                                                                                         of the k-Core. In Proc. of Int. Conf. on Data Mining. SIAM, 325–333.
   We introduced 3D representations of Fixed Points based on                        [21] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network
their wave decomposition that resemble buildings in a city, hence                        Dataset Collection. http://snap.stanford.edu/data.
Graph Cities. A spiral arrangement of the buildings is obtained                     [22] Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, and U Kang. 2013. Interac-
                                                                                         tive multi-resolution exploration of million node graphs. In IEEE Conference
from the size distribution of the Fixed Points. The Delaunay                             on Visual Analytics Science and Technology, Poster.
triangulation of the building locations determines the graph city                   [23] Peng Mi, Maoyuan Sun, Moeti Masiane, Yong Cao, and Chris North. 2016.
street network. The size distribution of all the Fixed Points is                         Interactive graph layout of a million nodes. In Informatics, Vol. 3. Multidisci-
                                                                                         plinary Digital Publishing Institute, 23.
summarized by a graph city sculpture (Sec 4.2).                                     [24] Q. H. Nguyen, S.-H. Hong, P. Eades, and A. Meidiana. 2017. Proxy graph: visual
   Dense bipartite graph-like patterns have been proposed as an                          quality metrics of big graph sampling. IEEE TVCG 23, 6 (2017), 1600–1611.
                                                                                    [25] R. Pienta, F. Hohman, A. Endert, A. Tamersoy, K. Roundy, C. Gates, S. Navathe,
abstract formalization of “concepts” in [29]. Their identification                       and D. H. Chau. 2018. VIGOR: interactive visual exploration of graph query
in very large data sets has defied computation. Nevertheless,                            results. IEEE Trans. on Vis. and Comp. Graph. 24, 1 (2018), 215–225.
Graph Cities offer a promising approach to the efficient detection                  [26] L. Royer, M. Reimann, B. Andreopoulos, and M. Schroeder. 2008. Unraveling
                                                                                         protein networks with power graph analysis. PLoS comp. bio. 4, 7 (2008).
of a large sub-class of these patterns in attributed graphs.                        [27] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer
   Due to lack of space, some technical results were put in an                           Özsu. 2017. The ubiquity of large graphs and surprising challenges of graph
appendix here4 .                                                                         processing. Proceedings of the VLDB Endowment 11, 4 (2017).
                                                                                    [28] N. Wang, D. Yu, H. Jin, C. Qian, X. Xie, and Q.-S. Hua. 2017. Parallel algorithm
                                                                                         for core maintenance in dynamic graphs. In Prof. of ICDCS. 2366–2371.
REFERENCES                                                                          [29] Rudolf Wille. 1992. Concept lattices and conceptual knowledge systems.
 [1] J. Abello, A. L. Buchsbaum, and J. R. Westbrook. 1998. A functional approach        Computers & mathematics with applications 23, 6-9 (1992), 493–515.
     to external graph algorithms. In European Symp. on Alg. Springer, 332–343.     [30] H. Wu, J. Cheng, Y. Lu, Y. Ke, Y. Huang, D. Yan, and H. Wu. 2015. Core
 [2] James Abello, Daniel Mawhirter, and Kevin Sun. 2019. Taming a Graph                 decomposition in large temporal graphs. In Int. Conf. on Big Data. 649–658.
     Hairball: Local Exploration in a Global Context. In Business and Consumer      [31] V. Yoghourdjian, T. Dwyer, K. Klein, K. Marriott, and M. Wybrow. 2018. Graph
     Analytics: New Ideas. Springer, 467–490.                                            thumbnails: Identifying and comparing multiple graphs at a glance. IEEE
 [3] James Abello and Daniel Nakhimovich. 2020. Graph Waves. In The 3rd Interna-         Transactions on Visualization and Computer Graphics 24, 12 (2018), 3081–3095.
     tional Workshop on Big Data Visual Exploration and Analytics with EDBT/ICDT.   [32] H. Zhou, P. Xu, X. Yuan, and H. Qu. 2013. Edge bundling in information
                                                                                         visualization. Tsinghua Science and Technology 18, 2 (2013), 145–156.
4 https://rutgers.box.com/s/qeygblwr5udeti9vxcmr0nj6swuf179n