<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>About Graph Index Compression Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giambattista Amati</string-name>
          <email>gba@fub.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paola Vocca</string-name>
          <email>vocca@unitus.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Cruciani</string-name>
          <email>antonio.cruciani@students.uniroma2</email>
          <email>antonio.cruciani@students.uniroma2. eu daniele.pasquini@uniroma2.it Dipartimento di Ingegneria dell'Impresa, University of Rome “Tor Vergata” Rome, Italy</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Daniele Pasquini</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Humanities</institution>
          ,
          <addr-line>Communication and Tourism, (DISUCOM)</addr-line>
          ,
          <institution>University of Tuscia</institution>
          ,
          <addr-line>Viterbo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Fondazione Ugo Bordoni</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>We perform a preliminary study on large graph eficient indexing using a gap-based compression techniques and diferent node labelling functions. As baseline we use the Webgraph + LLP labelling function. To index the graph we use three labelling functions: Pagerank, HITS, and Pagerank with random walks choosing restart nodes with HITS authority scores. To compress the graphs we use Varint GB, with and without d-gaps, derived by rank value of the labelling function. Overall, we compare 8 diferent methods on diferent datasets composed by the WebGraph eu-2005, uk-2007-05@100000, cnr-2000, and the social networks, enron, ljournal-2008, provided by the Laboratory for Web Algorithmics (LAW).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Information retrieval; Query
representation; • Theory of computation → Data compression;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        The analysis of the structure of real graphs requires many data
mining tasks (e.g., detecting specific nodes, identifying interest
groups, estimating measures of centrality, evaluating distance based
measures etc. [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]). Often, graph algorithms used to implement
these mining tasks assume that the graph is stored into the main
memory. However, this assumption is far from trivial when dealing
with large graphs, and this is actually the case when social networks
are considered. In these cases, even the successor lists would require
hundreds of terabytes of memory.
      </p>
      <p>
        Store and access large graphs is a central issue in the field of
information retrieval [
        <xref ref-type="bibr" rid="ref1 ref8 ref9">1, 8, 9</xref>
        ]; a good compression technique allows
to eficiently manage large graphs; better understand the social
network structure; and derive and exploit possible regularities. In
this context, a compressed data structure for a graph, also known as
succinct representation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], must provide both very fast amortized
access to an edge (link) and require an amount of space that is
"close" to the information-theoretic lower bound, as opposed to
other compressed representation whose only evaluation criterion is
the number of bits per link. While this definition is not formal, it
excludes methods in which the successors of a node are not directly
accessible unless, for instance, a large part of the graph is scanned.
      </p>
      <p>
        We use the Varint GB [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] compression schema. In order to take
into account the structure of the graph, we use three labelling
functions which label the nodes of the graph in not increasing order
of: 1) PageRank [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; 2) HITS (Hyperlink-Induced Topic Search) also
known as hubs and authorities [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; and 3) PageRank having as seed
the HITS authority number, that is, PageRank with random walks
choosing restart nodes with HITS authority scores. As an additional
compression step, we use the d-gap compression of the adjacency
lists. We choose as a baseline the labelling function WebGraph+LLP
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Hence, we overall test 8 diferent compression techniques,
derived by mixing 4 labelling functions (PageRank HITS, mixed
PageRank-HITS, LLP), with or without d-gaps compression of the
adjacency lists, and the encoding Varint GB. The experimentation is
performed using the datasets provided by the Laboratory for Web
Algorithmics (LAW) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. More precisely, we consider three graphs
derived from the web crawling (eu-2005, uk-2007-05@100000,
cnr2000) and three social networks (enron, ljournal-2008).
      </p>
      <p>We derive the following results:
• Without d-gaps representation of the adjacency lists of nodes,
the best labelling strategy is the greedy one, based on the
use of node degrees ranking, the entropy being the lower
bound of the compression, that is m · E bits where E is the
entropy of the system and m is the number of the edges.
• With d-gaps diferent labelling strategy of the nodes are
required according to what the graph represents. For example,
in order to minimize gaps and thus compression, nodes in a
cluster need to be consecutive in a linear ordered sequence.
Using PageRank strategy to visit a graph, we first visit nodes
in a cluster and then we teleport (with the dumping factor
probability) to a new node that will be lead us to surf a
portion of the graph (high a hubness score), but in the meantime
containing a high authoritative nodes, that is nodes where
many randoms walks end into them. This intuition leads
us to provide a labelling/ranking node strategy, which is
also a visiting/crawling strategy that assigns lower values
to high PageRank scores nodes (they must compare many
ranks before others in any adjacency list) and when we jump
to a new subgraph we restart with new nodes that have high
PageRank scores, that is the ones that are also good hub
nodes. HITS authority nodes are indeed the authoritative
nodes that contain also good hub nodes in their random
walks. This remark suggests us to use PageRank with restart
nodes. These restart nodes are those that have the highest
HITS authoritative scores.
• Experiments show that the best strategy for web collections
(uk-2007-05@100000 and cnr-2000) is the webGraph+LLP
strategy. For other kind of graphs the other labelling
functions produce better results, even if we still need to assess
which is the best strategy which allows to obtain an
optimal compression. In the specific case of social networks, for
example, we need to combine hub and authority scores to
linearly order the nodes of graphs and thus compress the
graph indexes by the d-gap. It seems that using PageRank
scores with restart nodes with HITS authoritative scores is
a very intuitive and promising strategy to model minimal
d-gap distributions in the adjacency lists of the nodes.
The paper is organized as follows: In Section 2 we formally define
all the tools used; Section 3, the development platform is presented,
while in Section 4 we describe the results of the experimentations.
Finally, in Section 5, we conclude and present further investigation
directions.
2</p>
    </sec>
    <sec id="sec-3">
      <title>BASIC DEFINITIONS AND FRAMEWORK</title>
    </sec>
    <sec id="sec-4">
      <title>DESCRIPTION</title>
      <p>
        In this section we describe both the compression schema and the
labelling functions used to exploit the graph structure and, hence,
to obtain a more eficient compression. For lack of space we do not
describe the framework WebGraph+LLP that we use as a benchmark
for our analysis. A detailed description can be found in [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Compression schema</title>
      <p>
        Varint GB. The VarInt GB or Group VarInt is a variant of the VarInt
compression scheme proposed by Jef Dean at the 2009 WSDM
conference[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The idea of this technique is to encode the integers
in groups of four at a time. Each group is preceded by a prefix of a
byte which is composed of four pairs of bits that specify the length
of each integer following the prefix. For example, the following
prefix byte:
      </p>
      <p>00, 00, 01, 10
indicates that the following four integers are of length, respectively:
one byte, one byte, two bytes and three bytes. So each group of
four integers occupies a space ranging from 5 to 17 bytes. A simple
search table based on the prefix bytes can therefore be used to
decode the encoding. The advantage of this coding schema is that
the values can be decoded with fewer branching errors and less
bit-to-bit operations.</p>
      <p>
        Another important coding is the Elias γ -code, that we may
use in the experimentation, but it is not important to the final
aim of this preliminary work because it is the d-gap generation
that matters in the comparison of labelling strategies. For sake
of completeness we remind that the Elias γ -code is an eficient
encoding algorithm widely used in practice [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The idea of this
technique is as follows: Given an integer x ≥ 1 it is "broken" into
two components n = 1 + ⌊log2 x ⌋ which encode, in unary, the
length of x in binary and r = x − 2 ⌊log2 x ⌋ the carry, which is coded
in binary. The unary component, n, specifies the number of bits
needed to encode x , and the binary component, r , is encoded using
⌊log2 x ⌋ bits.
2.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Labelling Functions</title>
      <p>The compression schema above described are designed to compress
any type of information without taking into account the specificity
of the data set under examination. In the case of graphs represented
by adjacency lists it is possible to perform some optimizations.</p>
      <p>Generally, input graphs are represented by lists of arcs or edges,
that is, each node is represented by an integer and each row of the
input file represents an arc (clearly an arc is represented by a pair
of integers (u, v)). So first we need to convert the list of edges into
an adjacency list and then apply the encoding algorithms on the
adjacency list. Since we have to deal with large graphs it is very
likely that there are nodes with indexes represented by very large
integers.</p>
      <p>We now tackle the problem on how to encode the adjacency list
of the nodes of a graph in order to have the optimal compression
scheme.</p>
      <p>
        We first try to label nodes in not increasing order of
frequencies (node with higher frequency are encoded with a lower label).
Unfortunately, the frequency it is not suficient to capture the
structure of real large graphs, for which the degree distribution is not
enough to minimize d-gap cost labelling of complex graphs.
Instead, we use three diferent labeling functions:1) PageRank [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; 2)
HITS (Hyperlink-Induced Topic Search) also known as hubs and
authorities [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; and 3) PageRank having as seed the HITS number.
PageRank Index PageRank index was introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and it
recursively quantifies a "value" or the PageRank of a node based
on: (i) the number of links it receives, (ii) the link propensity of
the linkers (that is, the number of outgoing links of each in-going
node), and (iii) the centrality of the linkers, that is their PageRank.
According to Google1: PageRank index works by counting the number
and quality of links to a page to determine a rough estimate of how
important the website is. The underlying assumption is that more
important websites are likely to receive more links from other websites.
It is mathematically defined as:
      </p>
      <p>CP R (v) = α</p>
      <p>
        Õ
v ∈N −(u)
avu |CNP+R((vv))| + 1 −n α ,
(1)
where N +(u) (N −(u)) is the set of vertices v ∈ V linked to u
with outgoing (in-coming) edges, that is such that (u, v) ∈ E (
(v, u) ∈ E, respectively) and n = |V |. Additionally, α is the damping
factor which models the probability that, in a random walk on the
network the walk will stop. It is generally assumed that the damping
factor will be set around 0.85 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Hence, according to the PageRank
index, a node is important if it is linked from other node with high
PageRank index and, at the same time, it links parsimonious nodes
or if it is highly linked.
      </p>
      <p>
        HITS Hypertext Induced Topics Search (HITS) was developed by
Jon Kleinberg [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and it identifies good authorities and hubs for
a topic by assigning two numbers to a page: an authority and a
1Facts about Google and Competition. https://web.archive.org/web/20111104131332/
https://www.google.com/competition/howgooglesearchworks.html
Archived from the original on 4 November 2011. Retrieved 12 July 2014.
hub weight. An authority is a page that many hubs link to; a hub
is a page that links to many authorities. The weights are defined
recursively. A higher authority weight occurs if the page is pointed
to by pages with high hub weights. A higher hub weight occurs if
the page points to many pages with high authority weights.
2.3
      </p>
      <p>d -Gap
To obtain a better compression, we can use a technique used in
Information Retrieval, that is to encode the diferences of the ranks
of the elements of the adjacency list instead of the ranks themselves.
Since the lists of each node are sorted in ascending order, the
differences (called d-gaps) must be positive integers greater than zero.
We show an example of this technique: Let {i1, i2, i3, i4, . . .} be the
adjacency list of a node x , then the d-Gap encoding is {i1, i2 −i1, i3 −
i2, i4 − 13, . . .}. This technique allows us to obtain smaller
dimensions than the original list and therefore also smaller encodings
than those applied to the original adjacency list. It is clear that this
technique is Lossless, since there is no lost of information because
the original integers, given the integers encoded with d-gap, can
be easily reconstructed.
3
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>IMPLEMENTATION</title>
    </sec>
    <sec id="sec-8">
      <title>Development environment</title>
      <p>The development environment is based on Apache Spark. As for
the data structure used, we used the Bytes structure for the VarInt
GB encoding. The compression of the instances and the gaps
construction have been made in a distributed environment with Spark
Python on a cluster of 8 CX Server machines 2550 Fujitsu (CX 400
M1) with 32 GB of memory each and CPU Xeon e5 1620 v4.
All the data are stored on distrubuted files (HDFS). ‘ To optimize
the encoding time of the posting lists we used bitwise operations.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Apache Spark Implementation</title>
      <p>We used the Apache Spark framework to convert the original graph
data structure from the edge list to the posting list; to relabel and
then to compress the graph. First we perform a map job for reading
the edge list and to obtain the RDD of the edge list where each
element is a tuple ⟨source, target⟩; then, for what concern the
relabeling task, given the RDDs where vertices are ordered in not
increasing order of HITS, Pagerank or HITS+Pagerank, we:
1) relabel each node in increasing order, starting from 0, of</p>
      <p>HITS, Pagerank or HITS+Pagerank scores;
2) excute a map job on the RDD edge list assigning to each
vertex the new label obtained in 1).</p>
      <p>As a next step, we produce the adjacency list data structure from
the renamed RDD edge list. We execute a map job for converting the
target vertex of each tuple in a list of one element, ⟨source, [target]⟩,
and then perform a reduceByKey function using the RDD with the
relabeled tuples, concatenating the target value lists, so obtaining
the adjacency list. Now, as for the d-Gap compression, we use a map
job to apply to each list of the adjacency list the d-Gap function.
As a last step, we execute again the map function to the new RDD
(the adjacency list data structure) in order to execute the Varint GB
encoding function.
3.3</p>
    </sec>
    <sec id="sec-10">
      <title>Datasets</title>
      <p>
        As a test set we used six graphs from the WebGraph datasets [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Table 1 reports the graphs dimensions.
      </p>
      <p>
        • eu-2005: A small crawl of the .eu domain, mainly useful for
debugging and testing purposes. This graph exhibits a very
low locality, probably because the crawl was quite shallow
(and the chosen domain is quite artificial anyway).
• uk-2007-05@100000: This graph has been artificially
generated from uk-2007-05. It is a ball (in the graph metric sense)
of 100000 nodes centered at a random node. It simulates the
result of a small breadth-first crawl around the node.
• enron: This dataset was made public by the Federal Energy
Regulatory Commission during its investigations: it is a
partially anonymised corpus of e-mail messages exchanged by
some Enron employees (mostly part of the senior
management). We turned this dataset into a directed graph, whose
nodes represent people and with an arc from x to y whenever
y was the recipient of (at least) a message sent by x .
• ljournal-2008: LiveJournal is a virtual-community social
site started in 1999: nodes are users and there is an arc from
x to y if x registered y among his friends. It is not necessary
to ask y permission, so the graph is directed). This graph is
the snapshot used in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
• cnr-2000: A very small crawl of the Italian CNR domain,
mainly useful for debugging and testing purposes.
      </p>
      <p>graph
eu-2005
uk-2007-05@100000</p>
      <p>enron
ljournal-2008
cnr-2000
Our main focus is to determine on the best graphs labelling
technique for compression. The behavior of a labelling technique, among
other features, mostly depends on the type or nature of the graph
that we want to compress. Indeed, our conjecture is that the LLP
strategy is not optimal when the graph is generated by a Social
Network while it is optimal when the graph a portion/instance of the
Web Graph. This is due to the topological diferences of this class
of graphs and to the crawling techniques used to discover them.
Considering the Web Graphs, a typical crawling policy is to use a
BFS to visit in a lexicographical order urls pointing to other pages.
By doing this, we cluster implicitly nodes by their hosting domains,
with a higher probability that many links are internal to the domain,
and thus it can capture the scale free property of the network. Our
experiments show (see Table 2 that LLP is optimal for this class of
graphs because for each cluster it labels the adjacent vertices with
close integer values allowing to get small size gaps for each posting
list of each node of each cluster. However, from our experiments,
for Social Network graphs there is a better labelling technique that</p>
      <sec id="sec-10-1">
        <title>Graph</title>
        <p>enron
22.94
20.97
26.86
3.40
25.66
Gap-LLP</p>
      </sec>
      <sec id="sec-10-2">
        <title>Labeling 17.64 11.8 14.92</title>
        <p>size of the</p>
      </sec>
      <sec id="sec-10-3">
        <title>Graph</title>
        <p>is based on the use of the PageRank with initial seeds (restarting
nodes) given by the highest authorities given by HITS algorithm.
The interpretation is that, in a Social Network, the centroid of a
cluster, i.e. celebrities, politicians, etc., can be seen as vertices with
the highest authority scores. These nodes are the end of many
random walks and are good candidates to be PageRank restart nodes.
The intuition is if we can obtain a better compression schema if we
label vertices having higher PageRanks with smaller integers since
they more frequently appear in other vertices posting lists.
5</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS AND FUTURE WORKS</title>
      <p>
        We have showed that for web collections like uk-2007-05@100000
and cnr-2000 the LLP strategy is optimal but for other types of
graphs we still need to assess better strategies to obtain an optimal
compression of the graphs. For social networks, for example, we
need to combine hub and authority scores to linearly order the
nodes of graphs and thus compress the graph indexes by the d -gap.
It seems that using PageRank scores with restart nodes with HITS
authoritative scores is a very intuitive and promising strategy to
model minimal d -gap distributions in the adjacency lists of the
nodes. There are three main future directions: 1) to consider real
large graphs diferent from social networks and web graphs, such
us, for example, biological networks, and of diferent dimension;
2) to implement and test other compression schemes and labelling
function, as for example, the Elias γ -code for the graph
compression and the labelling function presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] based on recursive
graph bisection; 3) besides to the space occupancy, it would be
important to evaluate encoding and decoding time, as well as query
or the adjacency list time.
      </p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was conducted in the Laboratory of Big Data of
ISCOMMISE (Institute of communication of the Italian Ministry for
Economic Development).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Boldi</surname>
          </string-name>
          , Marco Rosa, Massimo Santini, and
          <string-name>
            <given-names>Sebastiano</given-names>
            <surname>Vigna</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Layered Label Propagation: A Multiresolution Coordinate-free Ordering for Compressing Social Networks</article-title>
          .
          <source>In Proceedings of the 20th International Conference on World Wide Web (WWW '11)</source>
          . ACM, New York, NY, USA,
          <fpage>587</fpage>
          -
          <lpage>596</lpage>
          . https://doi.org/10. 1145/1963405.1963488
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Boldi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sebastiano</given-names>
            <surname>Vigna</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>The WebGraph Framework I: Compression Techniques</article-title>
          .
          <source>In Proc. of the Thirteenth International World Wide Web Conference (WWW</source>
          <year>2004</year>
          ). ACM Press, Manhattan, USA,
          <fpage>595</fpage>
          -
          <lpage>601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Brin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>The Anatomy of a Large-scale Hypertextual Web Search Engine</article-title>
          .
          <source>Comput. Netw. ISDN Syst</source>
          .
          <volume>30</volume>
          ,
          <issue>1</issue>
          -
          <fpage>7</fpage>
          (
          <year>April 1998</year>
          ),
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          . https://doi.org/10.1016/S0169-
          <volume>7552</volume>
          (
          <issue>98</issue>
          )
          <fpage>00110</fpage>
          -X
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Flavio</given-names>
            <surname>Chierichetti</surname>
          </string-name>
          , Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and
          <string-name>
            <given-names>Prabhakar</given-names>
            <surname>Raghavan</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>On Compressing Social Networks</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09)</source>
          . ACM, New York, NY, USA,
          <fpage>219</fpage>
          -
          <lpage>228</lpage>
          . https://doi.org/10.1145/1557019.1557049
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Challenges in Building Large-scale Information Retrieval Systems: Invited Talk</article-title>
          .
          <source>In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09)</source>
          . ACM, New York, NY, USA,
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . https://doi.org/10.1145/1498759.1498761
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Laxman</given-names>
            <surname>Dhulipala</surname>
          </string-name>
          , Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and
          <string-name>
            <given-names>Alon</given-names>
            <surname>Shalita</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Compressing Graphs and Indexes with Recursive Graph Bisection</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Francisco, CA, USA,
          <year>August</year>
          13-
          <issue>17</issue>
          ,
          <year>2016</year>
          .
          <fpage>1535</fpage>
          -
          <lpage>1544</lpage>
          . https://doi.org/10.1145/2939672.2939862
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Elias</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>Universal Codeword Sets and Representations of the Integers</article-title>
          .
          <source>IEEE Trans. Inf. Theor</source>
          .
          <volume>21</volume>
          ,
          <issue>2</issue>
          (Sept.
          <year>1975</year>
          ),
          <fpage>194</fpage>
          -
          <lpage>203</lpage>
          . https://doi.org/10.1109/TIT.
          <year>1975</year>
          .1055349
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Ferragina</surname>
          </string-name>
          , Francesco Piccinno, and
          <string-name>
            <given-names>Rossano</given-names>
            <surname>Venturini</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Compressed Indexes for String Searching in Labeled Graphs</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web (WWW '15)</source>
          .
          <source>International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland</source>
          ,
          <fpage>322</fpage>
          -
          <lpage>332</lpage>
          . https://doi.org/10.1145/2736277.2741140
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Anna</given-names>
            <surname>Gilbert</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kirill</given-names>
            <surname>Levchenko</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Compressing network graphs</article-title>
          . In In LinkKDD.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Guy</surname>
            <given-names>Joseph</given-names>
          </string-name>
          <string-name>
            <surname>Jacobson</surname>
          </string-name>
          .
          <year>1988</year>
          .
          <article-title>Succinct static data structures</article-title>
          . (
          <year>1988</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jon</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Authoritative Sources in a Hyperlinked Environment</article-title>
          .
          <source>J. ACM</source>
          <volume>46</volume>
          (
          <year>1998</year>
          ),
          <fpage>668</fpage>
          -
          <lpage>677</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>David</given-names>
            <surname>Knoke</surname>
          </string-name>
          and
          <string-name>
            <given-names>Song</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Social Network Analysis</article-title>
          . Vol.
          <volume>154</volume>
          . https: //doi.org/10.1007/978-1-
          <fpage>4614</fpage>
          -6170-8_
          <fpage>362</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Stanley</given-names>
            <surname>Wasserman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Katherine</given-names>
            <surname>Faust</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Social Network Analysis: Methods and Applications</article-title>
          . Cambridge University Press. https://doi.org/10.1017/ CBO9780511815478
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>