=Paper=
{{Paper
|id=Vol-2841/BigVis_5
|storemode=property
|title= Graph Cities: Their Buildings, Waves, and Fragments
|pdfUrl=https://ceur-ws.org/Vol-2841/BigVis_5.pdf
|volume=Vol-2841
|authors=James Abello,Daniel Nakhimovich,Chengguizi Han,Mridul Aanjaneya
|dblpUrl=https://dblp.org/rec/conf/edbt/AbelloNHA21
}}
== Graph Cities: Their Buildings, Waves, and Fragments==
Graph Cities: Their Buildings, Waves, and Fragments James Abello1 Daniel Nakhimovich Chengguizi Han Mridul Aanjaneya Department of Computer Science, Rutgers University, USA 1 DIMACS, Rutgers University, USA Figure 1: When a direct graph layout is unavailable or impractical due to memory or screen size constraints, our method can render humanly interpreted visualizations of massive graphs on the order of 10 seconds (after a few minutes of pre- processing) by leveraging the Graph Waves [3] decomposition. (Left) A Graph City for the Friendster social network with 1.8 billion edges. (Right) The internal structure of a building from the city with peel value 13 showing finer scale connec- tivity. By leveraging hierarchical decompositions, our framework allows for interactive visualization of massive graphs. ABSTRACT all the elements of their corresponding Graph Cities are built in “Graph Cities” are 3D representations of maximal edge graph a few minutes (excluding I/O time) and storage proportional to partitions. Each connected equivalence class corresponds to a the number of edges and vertices of a graph. Our ultimate goal is “Building” that is formed by stacking graph “Edge Fragments”. to obtain humanly-interpretable hierarchical descriptions of any The number of such graph edge fragments determines the height graph that are accessible via a Unified Web Interface for Graph of the building. The overall number of buildings is the number Analytics, without being constrained by the graph size. of equivalence classes in the edge partition. A poly-log buck- etization of the size distribution of the equivalence classes is used to generate a 2D position for each bucket. For the buckets 1 INTRODUCTION containing more than one equivalence class, we also generate Techniques for analyzing massive data sets are becoming central a visual “Bush” representation. The Delaunay triangulation of to several communities, with the need for interactive and scalable these building locations determines the “street network” of the visualizations being one of the most pressing issues [27]. Since Graph City. The weight of a connection between two buildings a large variety of massive data sets can be abstracted as hav- on this street network is proportional to the intersection of the ing an underlying graph topology, our interest is in computing subgraph vertex sets represented by the two buildings. To handle graph decompositions that are useful for making sense of massive equivalence classes (i.e., buildings) consisting of a large num- graphs, for which a direct layout is unavailable due to memory ber of fragments, we use the notion of “Graph Waves” from constraints or impractical due to screen size constraints. We fo- [3]. Graph Waves are intervals of graph edge fragments with a cus on the identification of “global data patterns” that emerge “well-defined” beginning and end fragment. For computational due to the co-occurrence of pairs of well-defined data entities. purposes, the beginning and end fragments should satisfy a com- Motivated by [12] who proved that the graph degree sequence putationally “easy to verify” property. We illustrate Graph Cities solely determines the expected Hopfield network pattern stabil- obtained with the maximal edge partitions defined by the iter- ity, we wanted to find an “efficient” visual representation of any ative edge core decomposition introduced in [4]. The graphs graph that is driven by the dismantling of its degree distribution. used include the Friendster social network (1.8 billion edges), We introduce Graph Cities as a visual representation of such a co-occurrence keywords network derived from the internet dismantling. It has the potential of bringing together streaming movie database (115 million edges), and a patents citation net- computations and visualization to offer “large scale” structural work (16.5 million edges). For graphs with up to 2 billion edges, graph information without losing the ability to interactively ex- tract finer scale connectivity. All this is possible by streaming © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- Graph Waves [3], which are in turn streams of graph fragments. ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) The obtained visual representations can be rendered in a few on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) minutes and offer humanly interpretable large scale features of graphs with billions of edges at different levels of granularity. Figure 2: Overview of our proposed framework. (Left) The individual buildings in the graph city, which comprise of cylin- drical edge fragments. (Middle) The entire graph city. (Right) Bushes representing buckets with more than one building. The coarsest views are offered by Graph City buildings with • A 2D spiral layout is used to fix building locations, and also floors and frustums representing Waves and their interconnecting for visualizing the street network, whose edge widths are de- Edge Fragments. Each building internal structure is represented termined by the intersection graph of the building vertex sets. by a Directed Acyclic Graph (DAG) whose macro-vertices can be • Smaller buildings are bucketed into “green” areas of bushes expanded into their internal fragments, that can be navigated by that are generated by special natural-looking L-systems. usual node link diagrams at different levels of detail depending • Graph Cities can be navigated at different levels of granularity. on their size. In summary, our approach follows a hierarchical • The largest buildings within graphs with over a billion edges edge decomposition with five levels: Edge Graph Decomposition, are rendered in a few seconds. The algorithms computing the Buildings, Waves, Edge Fragments, with the bottom level consist- data required to specify the rendering of a graph city from a ing of “structured” node-link subgraphs of reasonable size that given wave decomposition are linear both in time and storage. make them amenable to human descriptions. To our knowledge, As far as we know, Graph Cities constitute the first abstract vi- this is the first time that such a large scale representation has sual representation of node link representations of graphs that are been introduced for visualizing graph data sets, that is humanly linearly computable, and that are amenable to interactive topo- interpretable at “natural” topological levels of granularity. logical exploration at different levels of granularity for massive graph data sets. We plan to provide browser access to our tools 1.1 Summary of Our Overall Approach for the exploration of graphs with up to a few billion edges. In An overview of our proposed framework is illustrated in Figure 2. this paper, we focus on the computational techniques to provide Our work builds upon the Graph Wave decomposition recently scalable visible abstractions for large graphs. In the future, we introduced in [3], which is a refinement of the iterative degree will couple the graph city abstraction with semantic information. edge partition [4]. Each connected equivalence class is visually The paper layout is as follows: Section 2 summarizes relevant represented as a “building”. The size distribution of the equiva- work. In Section 3, we introduce the main definitions and illus- lence classes in the given edge partition is bucketed via a poly-log trate the main conceptual tools borrowed from a previous work: function of the total number of edges in the entire data set. Buck- graph Edge Fragments and Graph Waves [3]. Section 4 introduces ets containing more than one building are visually represented by Graph Cities, their buildings, bushes, waves and fragments, and “bushes” that are generated via special L-systems (see green areas Section 5 describes what the drawn street network represents. in Figure 2(right)). A spiral embedding of this poly-log expression Section 6 discusses 3 data sets used, and statistics of their size and provides a layout for all the equivalence classes in a graph city. various decompositions. Sections 7 and 8 discuss the rendering Each graph city has a “street network” that is obtained by com- of graph cities and their interactive navigation. Section 9 briefly puting the intersection graph of the collection of vertex sets of mentions end-user goals. Section 10 discusses avenues for future the buildings. The 2D positions of the centers of the bottom floors research and Section 11 closes with concluding remarks. of the buildings are used to indicate the location of the building vertex sets. The weight of the geometric edge representing the 2 RELATED WORK distance between two buildings is obtained as a function of the Efforts to deal with large graphs have mainly employed compu- size of the intersection of the corresponding buildings vertex sets tational approaches to handle scale. Generally, computation and (i.e., a Jaccard-type coefficient). visualization appear to be treated as independent tasks. One ap- All the elements of a graph city including its buildings, bushes, proach is to develop scalable algorithmic tools that amplify users’ and street network are built in a few minutes (excluding I/O time) understanding of the underlying data topologies at different lev- and storage proportional to the number of edges and vertices of els of granularity. In general, macro graph views can in principle the graph. We exemplify our results on a variety of large graph be obtained by some form of vertex or edge aggregation that is data sets ranging in size from 1 million to over 2 billion edges. conceptually represented as hierarchy trees [5, 8, 13, 26, 32]. The They include the Friendster social network (1.8 billion edges), a co- choice of representations that facilitate smooth visual interaction occurrence keywords network derived from the internet movie is a subject of active research [19, 22, 23, 25].All these previ- database (115 million edges), and a patents citation network (16.5 ous techniques have algorithms with running time substantially million edges). To summarize, our main contributions are: greater than linear on the number of graph elements, making • Graphs become represented as a collection of buildings with them not suitable for massive graph visualization. Sampling, as a floors, with inter-floor volumes encoding subgraph sizes. mechanism to shed light on graph structure, has been explored in [24]. Graph Thumbnails, as a mechanism to identify and compare multiple graphs, are alone the subject of [31]. Uses of the Core decomposition to grasp some coarse graph topological views are discussed in [15]. Generation of graphs with a predefined core structure is the focus of [10]. Computational aspects of core related graph decompositions and graph sparsification are studied in [7, 9, 18, 20, 28, 30].Some algorithmic principles for graph reachability in large graphs are proposed in [17]. Machine learning approaches, such as those described in [16], have been proposed to learn low-level embeddings of graphs, however, such approaches are not yet scalable to billion edge graphs. Our approach differs from prior work in the sense that we look for efficient data traversal algorithms via exploration primitives that lend themselves to visual representations that amplify users’ understanding of the internal graph structure for graphs that are too large to be visualized directly. Buildings, Waves, and Fragments are examples of such primitives. Figure 3: Graph cities for (top left) the Friendster network (1.8 billion edges), (top right) movie phrase co-occurrence 3 PROBLEM FORMULATION network (115 million edges), and (bottom left) the patent We aim to provide visual abstractions for representing edge par- citation network (16.5 million edges). The Delaunay tri- titions of graphs to highlight connectivity and size distribution angulation (bottom right) of the spiral building layout of subgraphs with special degree distributions - fixed points in (green). The red street network corresponds to the white our case (see Def. 3). We achieve this by first computing a maxi- street network in the city on the bottom left. mal edge partition of the graph as in [4]. We then represent the Definition 1. (Peel Value) The peel value of a vertex 𝑢 ∈ equivalence classes of the given edge partition as “buildings”. 𝑉 (𝐺), denoted 𝑝𝑒𝑒𝑙𝐺 (𝑢), is the largest 𝑖 ∈ [1, 𝑑𝑒𝑔(𝑢)] such that 𝑢 Each building consists of a collection of “Waves”, each of which belongs to a subgraph of 𝐺 of minimum degree 𝑖. is formed by a stack of “Edge Fragments” [3], i.e., Waves are ordered maximal sequences of graph Edge Fragments (see Def. 5 Definition 2. (Graph Core) The core of 𝐺, denoted 𝑐𝑜𝑟𝑒 (𝐺), and 6). This section will first describe what the graph structures sometimes also called the k-core of 𝐺, is the subgraph induced by called fixed points, waves, and fragments are and then the visual the maximal subset of vertices of 𝐺 whose peel value is maximum. representations we use to display them will be described in Sec. 4. Definition 3. A graph 𝐹𝑘 is a fixed point of degree peeling 𝑘, if 𝑐𝑜𝑟𝑒 (𝐹𝑘 ) = 𝑉 (𝐹𝑘 ) and the peel value of each vertex in 𝐹𝑘 is 𝑘. 3.1 Graph Edge Fragments and Waves Figure 4 shows vertices in a small graph colored by their peel One approach to getting a sense of the topology of a graph be- value. It also shows edges colored according to the iterative edge gins with a choice of a starting set of vertices 𝑆 0 satisfying a decomposition from [4] which obtains an edge partition by re- property of interest 𝑃 and marking them as “explored”. All the moving the highest peel value edges (i.e., initially the red edges), edges with at least one endpoint in 𝑆 0 are then traversed, deleting updating the vertex peeling values, recoloring the affected edges, them from the graph and adding them to the “Edge Fragment” and identifying the next highest peel value edges and repeating associated with 𝑆 0 . These edges then become the beginning part until the whole graph is processed. A graph city (Sec. 4) can be of the “Graph Wave” generated by 𝑆 0 . If there are any edges with derived from this edge partition by mapping each connected edge one endpoint not in 𝑆 0 , we check whether those vertices not in 𝑆 0 color class into a building. still satisfy a property of interest 𝑄 (usually, a relaxation of 𝑃) in the remaining graph. If there are any such vertices we continue exploring in parallel only from such vertices, adding incremen- tally the new found edges into the current Wave started by 𝑆 0 , and deleting them from the existing graph. This process ends when all edges with exactly one endpoint in the current Wave lead to vertices that do not satisfy 𝑃. This means that if there still remains “unexplored” edges, a new Wave can be initiated by selecting another starting set of vertices satisfying a property of interest 𝑅 (usually, stricter than 𝑃). We formalize the process Figure 4: Left: The vertices are colored according to their above with the following definitions. peel value: 1-grey, 2-blue, 3-orange, 4-red. Right: The edges are colored according to the iterative edge decom- 3.2 Definitions position using the same color scale. We consider undirected graphs 𝐺 = (𝑉 , 𝐸) with vertex set 𝑉 (𝐺) and edge set 𝐸 (𝐺). We denote by 𝑛 the number of vertices 3.3 Waves and their Edge Fragments in 𝑉 , 𝑚 the number of edges in 𝐸, and the degree of a vertex Next, we present some new definitions that generalize the notion 𝑢 ∈ 𝑉 by deg(𝑢). For any 𝑈 ⊂ 𝑉 , let the notation 𝐺 (𝑈 ) denote of graph Edge Fragments and Waves introduced in [3]. the subgraph of 𝐺 induced by 𝑈 . For the sake of completeness, we restate some definitions from [4]. Definition 4. The boundary of a vertex set 𝑆 ⊂ 𝑉 is defined as 𝜕𝑆 = {𝑣 ∈ 𝑉 : ∃ (𝑢, 𝑣) ∈ 𝐸, 𝑢 ∈ 𝑆, 𝑣 ∉ 𝑆 }. The proper boundary of S, denoted 𝜕𝑃 𝑆, is the set of vertices in 𝜕𝑆 which satisfy a desired property P restricted to the graph induced by 𝑉 \ 𝑆. These definitions are extended, in a straightforward fashion, to a collection of disjoint sets by taking their union. Definition 5. Given a vertex subset 𝑆 ⊂ 𝑉 , the edge fragment frag(S) generated by S is the set of edges (𝑢, 𝑣), such that 𝑢 ∈ 𝑆. Definition 6. A graph wave 𝑊 (𝑆 0, 𝑃) is the union of frag- ments in the sequence (frag(𝑆 𝑗 ))𝑚𝑗=0 , with 𝑆 0 being the source set Ð𝑗 of vertices satisfying 𝑃 and each subsequent 𝑆 𝑗+1 = 𝜕𝑃 ( 𝑖=0 𝑆𝑖 ). 3.3.1 Why are Graph Waves useful abstractions? Waves were inspired by previous graph exploration approaches based on Sparse Nets [2] and Boruvka MST type contraction algorithms [1]. Graph Waves generated by minimum degree source sets were introduced in [3]. In this work, we generalize this notion and use Edge Fragments to describe and visually represent the Waves’ Figure 5: An “interesting” portion of the spanning Meta- internal structure (see subsection 3.3.2 below). DAG of a fixed point of peel value 101 from the movie Graph Waves provide different “lenses” into the structure of phrase co-occurrence network. The whole fixed point con- a graph according to a particular property or substructure. For tains 17,311 vertices and 182,085 edges while the spanning example, if 𝑆 0 is non-empty and consists of the set of vertices not Meta-DAG only has 4181 vertices and 10,016 edges. in a given maximal edge-matching 𝑀, then the Wave generated by such an initial set 𝑆 0 provides a layered view of G as a sequence case we already have a simple description of the Wave: it is a tree of independent sets, and the last Wave Edge Fragment is a perfect with the number of fragments equal to the radius of the tree. matching. This is because, for this example, 𝑆 0 is the complement of a maximally matched set of vertices. We refer to this example 4 WHAT IS A GRAPH CITY? as the Maximal Matching Wave. Graph Waves derived from a graph’s degree distribution are essential for discovering the non- We assume that the input is some ordered partition (𝐸 0, 𝐸 1, ..., 𝐸𝑧 ) regular macro structure of very large graphs, and at the same time of the edges of a graph 𝐺 = (𝑉 , 𝐸), where each 𝐸𝑖 is edge-maximal help isolate Edge Fragments with peculiar levels of regularity. with respect to a predefined property. An efficient representation For example, if 𝑘 is the minimum degree of a graph 𝐺 and if 𝑆 0 for such partitions is stored as a set of triples (source, target, consists of all the vertices of degree 𝑘, and subsequent boundary label𝑖 ). This assumption is justified since the edges of any graph sets are restricted to have degree strictly less than this minimum 𝐺 can be efficiently partitioned into edge-maximal subgraphs 𝐺𝑘 , 𝑘, the corresponding Wave is called a Fixed Point of Degree 𝑘 each of minimum degree at least 𝑘 and average degree not more in [4] with a vertex source set of minimum degree. We refer to than 2𝑘 [4]. Furthermore, we assume that for each subgraph in 𝑧 the sequence (𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ))𝑘=0 the Wave and Edge Fragment these subgraphs as Minimum Degree Waves. Graph Waves can be adapted to the particular properties of decompositions from [3] has already been computed. This decom- the boundary vertex sets and Edge Fragments being discovered position is stored in three parts: triples of (source,target,unique during the algorithmic exploration of a large unknown graph fragment id), a mapping from unique fragment ids to wave num- topology. Efficiently and automatically determining the most bers, and a mapping from wave numbers to edge label𝑖 s. The appropriate properties to generate the Waves of a particular vertex set 𝑉𝑘 of each such subgraph 𝐺𝑘 can be partitioned into graph is an interesting direction for future work. In this work, ordered sets 𝑉𝑘,𝑗 that correspond precisely to Minimum Degree we only use Waves generated based on vertex degree thresholds. Waves(𝑆 0, 𝑃) (see Section 3.3), where 𝑆 0 consists of the vertices 3.3.2 Meta-DAG Internal Structure of a Wave(𝑆 0 ,𝑃). Since of minimum degree, and the property 𝑃 corresponds to boundary the edges of a Wave(𝑆 0, 𝑃) are obtained by taking the union of vertices of degree less than the degree of vertices in 𝑆 0 [3]. Edge Fragments in the sequences (𝑓 𝑟𝑎𝑔(𝑆 𝑗 ))𝑚 A Graph City is a 3D representation of a given maximal edge 𝑗=0 , where 𝑆 𝑗+1 = Ð𝑗 graph partition, as shown in Figure 3. It consists of a floor plan 𝜕𝑃 ( 𝑖=0 𝑆𝑖 ), the Wave vertex set is an ordered partition of sub- of buildings (one per equivalence class), bushes that represent sets (𝑆 0, 𝑆 1, ...𝑆𝑚 ). We use the connected components of the sub- clusters of small buildings, and a weighted street network. graphs induced by each subset to define a Meta-DAG, where each macro-vertex represents such connected components. Weighted directed meta-edges ((𝐶 𝑗,𝑢 , 𝐶𝑙,𝑣 ),𝑊𝑢,𝑣 ) encode nonempty set of 4.1 Graph City Buildings edges {(𝑥, 𝑦) : 𝑥, 𝑦 ∈ 𝐶 𝑗,𝑢 ∪ 𝐶𝑙,𝑣 }, where 𝐶 𝑗,𝑢 and 𝐶𝑙,𝑣 are con- For each edge-maximal subgraph 𝐺𝑘 , we use the ordered se- nected components of 𝑆 𝑗 and 𝑆𝑙 . Finally, the weight 𝑊𝑢,𝑣 encodes quence of vertex subsets (𝑉𝑘,𝑗 )ℎ𝑗=0 , their “internal” edges, and the density of edges running between 𝐶 𝑗,𝑢 and 𝐶𝑙,𝑣 . those edges running between consecutive levels to create a vi- The spanning subgraph of this metagraph that consists of sual representation for each such fixed point 𝐺𝑘 , that resembles only those meta-edges that connect components present in con- a building in a city with ℎ floors (see Figure 2). secutive vertex sets, 𝑆 𝑗 , 𝑆 𝑗+1 , is called the Spanning Meta-DAG This building representation provides an alternative view of of the Wave (shown in Figure 5). It describes some of the most a fixed point 𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ) that can be computed in time and fundamental directed internal macro-connectivity of a Wave and space linearly dependent on the size of 𝐺𝑘 , plus the complexity is expected to be substantially smaller that the Wave itself. An of finding and/or “describing” the initial subset 𝑉𝑘,0 of 𝑉𝑘 . That extreme case in which this is not the case is when the Wave is a is, the macro-structure of any fixed point can be described as a tree with the source set being the tree leaves. However, in this “building” whose internal structure is a Meta-DAG (see Section condition, i.e., the number of connections to non-Wave vertices is larger than the degree of vertices in the Wave source set. 4.2 Summary Graph City Sculpture To provide an overview of the size distribution of fixed points in the set of buildings in a graph city we use a summary sculp- ture (see Figure 7). The aspect ratio of the sculpture encodes the average gap between consecutive fixed point values from Figure 6: A flag on a building from the (left) Friendster the iterative edge core decomposition. The taller it is, the larger City, and (right) from the patent citation network city. the average gap. This sculpture is obtained by considering each building as its set of connected components. All these edge maxi- 3.3.2), with macro-vertices representing connected components mal connected subgraphs with the same peel value and the same of seed sets within each building floor (Figure 5). size are represented by cylinders encoding their frequency. Each A graph building with ℎ floors representing a fixed point cylinder of a particular peel value appears at a unique height 𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ) is completely determined by the “Disjoint Union in the sculpture. Larger cylinder radii correspond to higher fre- of ordered Edge Fragments” specified by the partition of 𝑉𝑘 into quency of a particular size. All the disks representing connected level sets (𝑉𝑘,𝑗 )ℎ𝑗=0 . Our representation requires only 5ℎ numbers. components with the same peel value (i.e., associated with the For each wave we specify 2 disk radii, the starting height, the same building) are stacked vertically on top of each other sorted color of a frustum, and a light intensity for night view. by size. The fixed point value is encoded by rainbow coloring the The “𝑗-th floor” of a building representing 𝐺𝑘 corresponds to corresponding cylinders (increasing from blue to red). Our inter- the subgraph induced by a subset of vertices 𝑉𝑘,𝑗 . Each floor is face provides access to the location of all buildings in the graph represented by two concentric disks, one above the other. The city that correspond to a fixed point selected in the sculpture. radius of the bottom disk encodes the number of vertices in the seed set of the starting fragment of the corresponding floor. The radius of the top disk encodes the total number of vertices besides the seed set vertices. The top disk is placed at the same height as the bottom disk of the next floor. A frustum between the bottom disks of adjacent floors is set to have volume encoding the number of edges running from one floor to the next. Since the radii of the two disks is already determined, the height of the frustum is calculated from the desired volume. For the last floor the height of the top disk is determined from the volume corresponding to the total number of edges on the floor. A special fixed color map across the entire graph is used to encode the density of a variety of induced subgraphs. For exam- ple, the color of a frustum represents the density of the set of Figure 7: Graph City Sculptures for (left) the Friendster edges running between the corresponding two floors, and the network, (middle) the movie co-occurence phrase net- same color map is used to highlight the density of the connected work, and (right) the patent citation network. Note that components within a floor, as shown in Figure 2. the largest data set need not have the tallest city sculpture. A flag on top of a building displays summary information that includes the distribution of fixed point values of that bucket. 4.2.1 Vertex Diversity and Light Intensities. Graph cities can Recall that “fixed point” refers to the edge-maximal subgraph 𝐺𝑘 . be seen as 3D representations of a coloring of the edges, where For buildings that are alone in a bucket the flag shows the number colors encode the edge peel values. This coloring partitions the of floors, the peel value, and the number of vertices and edges of edges adjacent to any particular vertex by their assigned color. the corresponding fixed point. Figure 6 shows both these cases. The frequencies of these local colors for a vertex defines a profile The height of the flag cloth encodes the total number of edges vector for that vertex. Following [4] we compute this profile vec- in 𝐺𝑘 , and its width encodes the total number of vertices. The tor’s Shannon Entropy and use it as a measure of the “diversity” length of the flag pole is the overall edge density of 𝐺𝑘 . If the user of the color pattern of the local edge coloring around each vertex. toggles the night view mode, a checkerboard of lights is applied Higher “diversity” of a vertex is an indicator of a higher weighted to the frustum between all floors, as shown in Figure 9(right). level of participation of that vertex in a given edge partition. It is The light intensity at each floor is proportional to the number worth noting that diversity is a more expressive measure than of vertices shared between the set of edges represented by that degree. Specifically, very high degree vertices can have very low floor and its complement with respect to the whole graph. diversity. We add “diversity” light intensities to the disks in the Handling buildings with lots of fragments does not present City Sculpture to encode the average diversity of the vertices in an issue because an intermediate structural level of granularity the corresponding connected fixed point. between Fixed Point Buildings and Fragments is provided by Waves (Section 3.3). The floors of a building represent contigu- 4.3 Graph City Interpretation ous segments of Fragments that satisfy some initial condition. “Graph Cities” provide visual representations of the overall macro- They are characterized by their source layer of vertices and an structure of graphs with few billion edges, i.e., GigaGraphs, which ending layer of vertices whose unexplored neighbors violate a pre- are derived by mapping each connected equivalence class, of a specified expansion condition. The beginning and end fragments special edge partition, into a “city building”. Below we address of Minimum Degree Waves specifically satisfy a bounded-degree some common questions regarding their interpretation: (1) What do “floors” tell us about a “building” in a Graph City? The number ℎ of floors in a building (i.e. the number of waves) indicates a fixed point whose full exploration requires the sequential activation of ℎ disjoint seed sets. In cases where a “building” is used to represent an edge equivalence class with several connected components then the number of floors in the “building” corresponds to the maximum number of waves in any of its components. (2) What does a “building” volume represent? It encodes the number of edges of the represented edge equivalence class, i.e., a fixed point of degree peeling. A building with no enclosure represents a more localized topology, i.e., is a “tree like” fixed point with only consecutive edges. Figure 8: (Left) The bush skeleton for a fixed point of peel (3) How is the internal detailed structure of a “building” value 1 from the Friendster network [21]. (Right) The com- made accessible for user exploration? It is represented plete bush after applying 3 iterations of an L-system. by a Directed Acyclic Macro Graph obtained by contracting the connected components of each wave seed set. This DAG one represented by the building) we sort the fixed points in a represents the connectivity between the connected compo- bucket by size and uniformly select a few fixed points to draw as nents of all seed sets appearing in the waves. Our interface bushes. These bushes visually show some of the same properties provides on-demand access to this DAG internal structure as a building but are a much “rougher” view of a fixed point. for user navigation and exploration on a per building basis. For generating a bush, a skeleton is first created based on the A video illustrating our current interface can be found here1 . parameters of a given fixed point (see Figure 8(left)). Then, a few iterations of an L-System are executed with axiom points 5 GRAPH CITIES STREET NETWORK distributed along the skeleton drawing (see Figure 8(right)), to create natural-looking bushes. The skeleton consists of a central 5.1 Graph City Layout and Street Network stem with branch segments coming out at points in between. The size distribution of the equivalence classes is used to generate These junction points are spaced the same as the disks in the a 2D position for each building. This is done by bucketing the building representation of the fixed point; thus, the length of fixed points of a graph by size and then mapping each such stem segments correspond to the height of the respective frus- bucket to a 2D location by following an Archimedean spiral. We tums. The inclination of the stem segments ranges from 0 to 45 create buckets containing connected fixed points the same way degrees off the vertical axis based on the density of edges in the as [3]. These connected fixed points are grouped together into frustum (0 degrees corresponding to 0 density and 45 degrees to buckets according to the number of edges. Bucket 𝑖 has fixed a density of 1). Two sets of branches are produced at each stem points of size 𝑠, such that 𝑙𝑜𝑔𝑖−1 (𝑚) < 𝑠 ≤ 𝑙𝑜𝑔𝑖 (𝑚). We create junction corresponding to the inner and outer disks in the build- a building for the largest fixed point in each bucket 𝐵, and if ing. The number and length of branches encodes the number of |𝐵| > 1, we create “bushes” (see Section 5.1.1) for a representative vertices represented at that level. The inclination of the branches selection of 𝑙𝑜𝑔(8 ∗ (|𝐵| − 1)) fixed points from 𝐵. Additionally, is proportional to the density of internal edges for that level. All for such buckets we also draw a grass patch as a green polygon branches are equally spaced at the junction and phase angles at with 𝑙𝑜𝑔(8 ∗ (|𝐵| − 1)) + 2 sides (see Figure 2(right)). From each successive junctions are randomized. bucket, we display the tallest building and a flag with a histogram We extended an implementation of a turtle graphics based with peel values on the 𝑋 -axis and the number of buildings with L-system interpreter2 to draw the underlying skeleton structure that peel value and the maximum size of all these buildings. For and then applied 2-3 iterations of a natural looking L-system, buckets with one fixed point the flag just shows the peel value. starting at evenly spaced lengths along the stem and branches. The Graph City street network is determined by the Delau- The bushes and the green polygon together give the appearance nay triangulation of the building locations in the spiral layout of a “garden” for some of the buildings in the Graph City, that (see Fig. 3). The weight of a connection between two buildings are primarily centered around the periphery of the spiral layout is proportional to the intersection of the subgraph vertex sets on the ground, as shown in Figures 3 and 2(right). Although represented by the two buildings. Graph-theoretically, the street these bushes don’t reflect as much detail of the underlying fixed network is determined by the intersection graph of the collection point they represent as opposed to the building metaphor, they 𝑧 , which in turn of vertex sets of the subgraphs (𝐺𝑘 = (𝑉𝑘 , 𝐸𝑘 ))𝑘=0 are useful for quickly getting a sense of scale for the size and are determined by the given edge partition (𝐸 0, 𝐸 1, ..., 𝐸𝑧 ). When distribution of other fixed points in a bucket while being more a particular building is selected by the user, a Euclidean spanning efficient to render than a whole building. tree rooted at that building displays the corresponding street net- 6 DESCRIPTION OF OUR DATA SETS work, which is obtained by a Breadth First Search. If the building street network is disconnected, then we show a spanning forest We show examples of our system applied to 3 datasets. Our largest instead. Note that the connectivity of the street network only data set is the Friendster social network consisting of 65,608,366 depends on the connectivity of the subgraphs represented by the nodes each representing a user and 1,806,067,135 edges represent- buildings and not the graphs represented by the entire bucket. ing “friendships” between them. This dataset was retrieved from the Stanford Large Dataset Collection (SNAP) [21]. Our next data 5.1.1 Bushes and L-Systems. To provide a visual indication set is a graph of phrases used in movie reviews. There are 218,052 for the properties of fixed points in a bucket (besides the largest nodes each representing a phrase and 115,050,370 edges, where 1 https://rutgers.box.com/s/qeygblwr5udeti9vxcmr0nj6swuf179n 2 https://github.com/andonutts/donatello Figure 9: Different views provided by our interactive control menu when rendering Graph Cities in day and night modes. each edge connects two phrases both used to describe the same 8 NAVIGATING A GRAPH CITY movie in a review. This data set was derived from the Internet The interactivity provided by Three.js [6] allows the user to ex- Movie Database 3 . Our last data set is a patents citation network, plore different parts of a Graph City conveniently from a browser. also from [21]. There are 3,774,768 nodes each representing a An interactive control menu is provided on the top right (see patent and 16,518,947 edges each linked to a cited patent. Figure 9(left)) for toggling between various options, such as the Table 1 shows the number of connected components (CC), current data set being viewed, displaying building information, connected fixed points (FP), peel value of the core (CV), maximum changing camera/environment controls and day/night modes, or number of waves among its fixed points (MW), and maximum selecting different street networks. The user can use the mouse number of fragments among its fixed points (MF) for each data to zoom in and out, as well as pan around the Graph City. Upon set. The spiral length is related to the total number of graph edges. entering a particular building, the view changes to highlight the The number of fragments in a building, i.e., the building’s height, finer scale internal connectivity for each edge-maximal subgraph encodes the longest path length in the building’s Meta-DAG. (or fixed point) that constitutes a building, as shown in Figure 10. Our interactive visualization ultimately owes its speed to the Dataset CC FP CV MW MF RT hierarchical Graph Wave decomposition [3]. At the macroscopic Friendster 1 29,692 304 212 3,279 11.82 level, our abstraction has a much smaller spatial complexity than Movies 38 2,044 3114 37 282 3 the actual data set, which makes rendering cheap. The only com- Patents 3,627 6,469 64 47 996 3.3 putation that happens is whenever the root for the street network Table 1: Statistics for all data sets. The last column shows changes, but this is only linear in the number of buildings. When the rendering times (RT) for the graph city in seconds. the user views inside a particular building, although we show finer scale connectivity, it is still conveniently partitioned into Edge Fragments. Thus, rendering is never the bottleneck, as the 7 RENDERING GRAPH CITIES user is only visualizing a subset of the data. We use Three.js [6], a 3D Javascript library to create and display Graph Cities interactively in the web browser using WebGL. Each building in the Graph City consists of several floors (i.e., 9 CATALOGUE OF SUBGRAPH PATTERNS Edge Fragments). For each floor, we instantiate a cylinder shape A useful outcome of user or computer explorations of any graph geometry, where the top and bottom face radii, height, and color city will be a summary and an extensive catalogue of the subgraph are chosen appropriately from the data (see Section 4.1), the patterns found. For this to be feasible, users must be provided number of balcony segments defaults to 6, with 3 windows per with annotation and summarization tools that can keep track floor in the night view. All floors are generated in the material of their exploration trails. These patterns should be classified at space, centered at the origin, and subsequently translated in the Y least by size, density, frequency of occurrence, rarity, interest (up) direction to form a building in the world space. Each building and usefulness. To assess the efficacy of these tools we are cur- is also translated in the X and Z directions to form the spiral rently identifying a list of basic tasks that a user can perform in layout (see Section 5.1). Bushes are generated with an L-system. order to conduct a substantial number of user experiments. For On top of each building, a flag displaying summary informa- example: can graph cities be used effectively on shortest path tion for that building (i.e., edge-maximal subgraph) is added. A approximations queries? box shape geometry is used for the flag, and a cylinder shape One of the ultimate goals of this work will be to have a descrip- geometry is used for the mast. The length of the mast, size of the tive semantic summary of the expected patterns that can be fed flag, and color of the flag are all determined from the data set. into a deep learning engine to learn to discriminate and find new To highlight the street network representing data flow from patterns according to certain specified criteria. In our current one building to the remaining Graph City, we precompute the experimentation with a variety of data sets, the most naturally Delaunay triangulation for all the buildings in the Graph City detected patterns include: tree forests, cliques, bi-cliques, and at the beginning. Subsequently, when a user selects a particular hierarchical compositions of these basic patterns. building from the interactive control menu (see Figure 9(left)), we perform a Breadth First Search (BFS) from the root building 10 FUTURE WORK to the rest, and display only those edges that are encountered Streaming Graph Cities: An interesting question is the feasibil- in the search. The width of the edges are determined by the ity of efficiently updating the Wave and Fragment decomposition data. Rendering times for the Graph City for each data set are when the streaming input graph is known a priori to be a fixed summarized in the last column in Table 1. point or a Wave of fixed degree peeling. This is the main bot- 3 https://www.imdb.com/interfaces/ tleneck for streaming a Graph City. If the wave decomposition Figure 10: Snapshots of a user navigating the Graph City for the patents citation network with interactive camera control. is provided then we can easily obtain web renderings of the [4] J. Abello and F. Queyroi. 2013. Fixed points of graph peeling. In Proc. of the corresponding individual building(s) at interactive frame rates. 2013 IEEE/ACM Int. Conf. on Adv. in Soc. Net. Anal. and Mining. 256–263. [5] James Abello, Frank Van Ham, and Neeraj Krishnan. 2006. Ask-graphview: a Composing Global Solutions from Local Ones: The Wave large scale graph visualization system. IEEE TVCG 12, 5 (2006), 669–676. and Edge Fragments decomposition will be more advantageous [6] Ed Angel and Eric Haines. 2017. An Interactive Introduction to WEBGL and Three.JS. In ACM SIGGRAPH 2017 Courses. Article 17, 95 pages. in those cases where a given question or structure of interest on [7] A. Arleo, O.-H. Kwon, and K.-L. Ma. 2017. GraphRay: Distributed pathfinder 𝐺 = (𝑉 , 𝐸) can be obtained as a “composition” of the local Wave network scaling. In 2017 IEEE 7th Symp. on Large Data Anal. and Vis. 74–83. and/or Fragment solutions. For example, is there any algebraic [8] B. Bach, N. H. Riche, C. Hurter, K. Marriott, and T. Dwyer. 2017. Towards unambiguous edge bundling: Investigating confluent drawings for network relation between the all-pairs shortest path distances in 𝐺 versus visualization. IEEE Trans. on Visualization & Computer Graphics (2017), 1–1. their restrictions to the Waves and/or Fragments of 𝐺? [9] V. Batagelj and M. Zaveršnik. 2011. Fast algorithms for determining core How to pick interesting Waves? Automating the process groups in social networks. Adv. in Data Anal. and Class. 5, 2 (2011), 129–145. [10] M. Baur, M. Gaertler, R. Görke, M. Krug, and D. Wagner. 2007. Generating of choosing the property of interest for the Wave decomposition graphs with predefined k-core structure. In Eur. Conf. of Compl. Sys. Citeseer. would be a powerful addition to the pipeline. Can the type of wave [11] O. Ben-Eliezer, L. Gishboliner, D. Hefetz, and M. Krivelevich. 2020. Very fast construction of bounded-degree spanning graphs via the semi-random graph be chosen efficiently based on local or global graph structure? process. In Proc. of the Symposium on Discrete Algorithms. SIAM, 718–737. Semi-random Graph Processes: It will be interesting to un- [12] Daniel Berend, Shlomi Dolev, and Ariel Hanemann. 2014. Graph Degree derstand the type of Graph Waves that can be built by these Sequence Solely Determines the Expected Hopfield Network Pattern Stability. Neural computation 27 (11 2014), 1–9. semi-random processes with high probability in 𝑂 (𝑛) rounds [13] Tim Dwyer, Nathalie Henry Riche, Kim Marriott, and Christopher Mears. 2013. where 𝑛 is the number of vertices [11]. Edge compression techniques for visualization of dense directed graphs. IEEE Hypergraph Cities: A central problem in simplicial finite transactions on visualization and computer graphics 19, 12 (2013), 2596–2605. [14] Peter Frankl and Andrey Kupavskii. 2018. The Erdös Matching Conjecture set systems is to identify a global structure whose intersection and concentration inequalities. arXiv preprint arXiv:1806.08855 (2018). with a given finite set family is highly concentrated around its [15] P. Govindan, C. Wang, C. Xu, H. Duan, and S. Soundarajan. 2017. The k-peak decomposition: Mapping the global structure of graphs. In Proceedings of the expectation. Random matchings are an example [14]. 26th International Conference on World Wide Web. 1441–1450. 11 CONCLUSIONS [16] W. L. Hamilton, R. Ying, and J. Leskovec. 2017. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin (2017). The iterative edge decomposition partitions the edges of a graph [17] R. Jin, N. Ruan, S. Dey, and J. Y. Xu. 2012. SCARAB: scaling reachability computation on large graphs. In Proc. of Int. Conf. on Man. of Data. 169–180. into Fixed Points of degree peeling; they are in turn decomposed [18] H. Kabir and K. Madduri. 2017. Shared-memory graph truss decomposition. into Graph Waves and Edge Fragments. They provide mecha- In 2017 IEEE 24th Int. Conf. on High Perf. Comput. (HiPC). IEEE, 13–22. nisms that may help assess the topological and statistical reasons [19] M. Krommyda, V. Kantere, and Y. Vassiliou. 2019. IVLG: Interactive Visualiza- tion of Large Graphs. Int. Conf. on Data Engineering (2019), 1984–1987. that explain the emergence of a large class of bipartite graph-like [20] R. Laishram, A. Erdem Sar, T. Eliassi-Rad, A. Pinar, and S. Soundarajan. 2020. patterns in very large graph data sets. Residual Core Maximization: An Efficient Algorithm for Maximizing the Size of the k-Core. In Proc. of Int. Conf. on Data Mining. SIAM, 325–333. We introduced 3D representations of Fixed Points based on [21] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network their wave decomposition that resemble buildings in a city, hence Dataset Collection. http://snap.stanford.edu/data. Graph Cities. A spiral arrangement of the buildings is obtained [22] Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, and U Kang. 2013. Interac- tive multi-resolution exploration of million node graphs. In IEEE Conference from the size distribution of the Fixed Points. The Delaunay on Visual Analytics Science and Technology, Poster. triangulation of the building locations determines the graph city [23] Peng Mi, Maoyuan Sun, Moeti Masiane, Yong Cao, and Chris North. 2016. street network. The size distribution of all the Fixed Points is Interactive graph layout of a million nodes. In Informatics, Vol. 3. Multidisci- plinary Digital Publishing Institute, 23. summarized by a graph city sculpture (Sec 4.2). [24] Q. H. Nguyen, S.-H. Hong, P. Eades, and A. Meidiana. 2017. Proxy graph: visual Dense bipartite graph-like patterns have been proposed as an quality metrics of big graph sampling. IEEE TVCG 23, 6 (2017), 1600–1611. [25] R. Pienta, F. Hohman, A. Endert, A. Tamersoy, K. Roundy, C. Gates, S. Navathe, abstract formalization of “concepts” in [29]. Their identification and D. H. Chau. 2018. VIGOR: interactive visual exploration of graph query in very large data sets has defied computation. Nevertheless, results. IEEE Trans. on Vis. and Comp. Graph. 24, 1 (2018), 215–225. Graph Cities offer a promising approach to the efficient detection [26] L. Royer, M. Reimann, B. Andreopoulos, and M. Schroeder. 2008. Unraveling protein networks with power graph analysis. PLoS comp. bio. 4, 7 (2008). of a large sub-class of these patterns in attributed graphs. [27] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Due to lack of space, some technical results were put in an Özsu. 2017. The ubiquity of large graphs and surprising challenges of graph appendix here4 . processing. Proceedings of the VLDB Endowment 11, 4 (2017). [28] N. Wang, D. Yu, H. Jin, C. Qian, X. Xie, and Q.-S. Hua. 2017. Parallel algorithm for core maintenance in dynamic graphs. In Prof. of ICDCS. 2366–2371. REFERENCES [29] Rudolf Wille. 1992. Concept lattices and conceptual knowledge systems. [1] J. Abello, A. L. Buchsbaum, and J. R. Westbrook. 1998. A functional approach Computers & mathematics with applications 23, 6-9 (1992), 493–515. to external graph algorithms. In European Symp. on Alg. Springer, 332–343. [30] H. Wu, J. Cheng, Y. Lu, Y. Ke, Y. Huang, D. Yan, and H. Wu. 2015. Core [2] James Abello, Daniel Mawhirter, and Kevin Sun. 2019. Taming a Graph decomposition in large temporal graphs. In Int. Conf. on Big Data. 649–658. Hairball: Local Exploration in a Global Context. In Business and Consumer [31] V. Yoghourdjian, T. Dwyer, K. Klein, K. Marriott, and M. Wybrow. 2018. Graph Analytics: New Ideas. Springer, 467–490. thumbnails: Identifying and comparing multiple graphs at a glance. IEEE [3] James Abello and Daniel Nakhimovich. 2020. Graph Waves. In The 3rd Interna- Transactions on Visualization and Computer Graphics 24, 12 (2018), 3081–3095. tional Workshop on Big Data Visual Exploration and Analytics with EDBT/ICDT. [32] H. Zhou, P. Xu, X. Yuan, and H. Qu. 2013. Edge bundling in information visualization. Tsinghua Science and Technology 18, 2 (2013), 145–156. 4 https://rutgers.box.com/s/qeygblwr5udeti9vxcmr0nj6swuf179n