<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Triangle Finding: How Graph Theory can Help the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edward Jimenez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric L. Goodman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sandia National Laboratories</institution>
          ,
          <addr-line>Albuquerque, NM</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>45</fpage>
      <lpage>58</lpage>
      <abstract>
        <p>RDF data can be thought of as a graph where the subject and objects are vertices and the predicates joining them are edge attributes. Despite decades of research in graph theory, very little of this work has been applied to RDF data sets and it has been largely ignored by the Semantic Web research community. We present a case study of triangle finding, where existing algorithms from graph theory provide excellent complexity bounds, growing at a significantly slower rate than algorithms used within existing RDF triple stores. In order to scale to large volumes of data, the Semantic Web community should look to the many existing graph algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Formalisms</title>
      <p>RDF data can be modeled as a graph, G = (V, E) where V is a set of vertices
and E is a set of edges connecting the vertices v ∈ V . We use n to refer to
the number of vertices, |V |, and m as the number of edges, |E|. For RDF, the
subjects and objects are vertices in V . The predicates can be thought of as
attributes associated with each edge. We enumerate the vertices so that each
has a unique id in [1, n]. Also, we enumerate the edges of G to be in [1, m]. The
notation ei refers to the ith edge under the enumeration. Since RDF is described
directionally, i.e. there is a subject uni-directionally related to an object, edges
in the graph are also directed. We will use source(ei) to refer to the source
or subject of the edge. target(ei) refers to the target or object of the edge.
We call two vertices v1, v2 ∈ V adjacent if there is an edge e ∈ E where
(source(e) = v1 ∧ target(e) = v2) ∨ (source(e) = v2 ∧ target(e) = v1). We use
δ(vi) to denote the total degree, both incoming and outgoing, of a vertex. Below
we formally define the notion of a triangle.</p>
      <p>Definition 1. A triangle in a graph G is a set of three edges, ei, ej , ek ∈ E such
that the set of vertices vˆ = { |</p>
      <p>v v = source(ei) ∨ target(ei) ∨ v = source(ej ) ∨ v =
target(ej ) ∨ v = source(ek) ∨ v = target(ek)} have the following properties:
1. |vˆ| = 3
2. All v ∈ vˆ are adjacent to one another.</p>
      <p>We also refer to triads, which we define as pair of edges with a common vertex.</p>
      <p>It is also useful to define the notion of triangle equality.</p>
      <p>Definition 2. Two triangles are considered equal if their edge sets are the same.</p>
      <p>While triangle equality may seem obvious, it is important point to consider
when generating result sets via SPARQL. Without care, duplicate triangles can
be generated that are not prunable with the DISTINCT keyword. Duplicate
triangles can be produced that differ in the order in which the nodes and edge labels
are presented (i.e. to which variables they are bound), and thus DISTINCT will
have no effect on these duplicates. More of this will be discussed in Section 4.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Applicability to Current Benchmarks</title>
      <p>
        An obvious question when optimizing for triangles in SPARQL queries is how
often triangles occur in practice. We examine two popular benchmarks, LUBM
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and SP2Bench [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Two out of the 14 queries in LUBM have triangles in
them, namely queries 2 and 9. For brevity and to emphasize the triangle portion
of the queries, we omit the prefixes and the type constraints on each of the
variables.
      </p>
      <sec id="sec-3-1">
        <title>Query 2</title>
        <sec id="sec-3-1-1">
          <title>SELECT ?X, ?Y, ?Z WHERE {?X ub:memberOf ?Z . ?Z ub:subOrganizationOf ?Y . ?X ub:undergraduateDegreeFrom ?Y}</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Query 9</title>
        <sec id="sec-3-2-1">
          <title>SELECT ?X, ?Y, ?Z WHERE {?X ub:advisor ?Y . ?Y ub:teacherOf ?Z . ?X ub:takesCourse ?Z}</title>
          <p>These two queries are also some of the more complicated and time
intensive ones as reported by various vendors and researchers (e.g. AllegroGraph3
OWLIM4), pointing to the need for efficient processing.</p>
          <p>
            SP2Bench does not have any queries with triangles in them, but there are
several queries that with a natual extension of one more constaint form a triangle.
For example, Query 4 requests all distinct pairs of article author names for
authors that have published in the same journal. A natural extension is to define
a constraint between the two authors, either directly, forming a triangle, or
to another common vertex. The latter case forms what is called a quadrangle,
and the algorithmic approach is similar and has the same complexity as finding
triangles [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. In conclusion, there appears to be a small but significant portion
of queries that contain triangle structures within them.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Finding Triangles with SPARQL</title>
      <p>Expressing a SPARQL query to find all triangles in a graph is surprisingly
convoluted. Figure 1 is a query that finds all unique triangles in an RDF graph, but
perhaps more importantly it finds the triangles with no duplication, meaning
that no two solution triangles in the result set are equal. We will refer to this
query as the Triangle-finding SPARQL query. Note that this query only works on
data involving only IRIs as the subjects and objects. According to the standard
the STR function is not defined for blank nodes and the function also removes
typing and language modifiers on literals which may cause the comparison filter
to be incorrect.</p>
      <p>Before we discuss how this query finds the triangles without duplication, we
must first discuss two concepts, that of graph isomorphism and graph
automorphism. Two graphs G and H are isomorphic if there is a bijection, f , mapping
the vertex sets of G and H such that two vertices vi and vj are adjacent in G
if and only if f (vi) and f (vj) are also adjacent in H. A graph automorphism is
3 http://www.franz.com/agraph/allegrograph/agraph_bench_lubm.lhtml
4 http://www.ontotext.com/owlim/benchmark-results/lubm</p>
      <p>SELECT ?X ?Y ?Z
WHERE {
{ ?X ?a ?Y .</p>
      <p>?Y ?b ?Z .
?Z ?c ?X
FILTER (STR(?X) &lt; STR(?Y))</p>
      <p>FILTER (STR(?Y) &lt; STR(?Z))
}
UNION
{
?X ?a ?Y .
?Y ?b ?Z .
?Z ?c ?X
FILTER (STR(?Y) &gt; STR(?Z))</p>
      <p>FILTER (STR(?Z) &gt; STR(?X))
}
UNION
{
?X ?a ?Y .
?Y ?b ?Z .</p>
      <p>?X ?c ?Z
}</p>
      <p>}
when there is a isomorphism of a graph G onto itself, and generally we will be
concerned with non-identity mappings.</p>
      <p>The reason this is important can be understood from the query below:</p>
      <sec id="sec-4-1">
        <title>SELECT ?X ?Y ?Z</title>
      </sec>
      <sec id="sec-4-2">
        <title>WHERE { { ?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X }</title>
        <p>For this query, every triangle found will appear three times. For instance, a
triangle with solution s1, s2, s3 will also appear as s2, s3, s1 and s3, s1, s2 .
The reason for this is that each of the vertices s1, s2, and s3 each bind to each
of the three variables ?X, ?Y, and ?Z under different circumstances. All three
solutions satisfy the query, but each solution is the same triangle. DISTINCT
does not help because the bindings are different for each duplicate solution. The
problem arises because the query graph represented above is automorphic. Each
non-trivial automorphism is a mapping to translate from one solution to another
solution using the same set of edges.</p>
        <p>Thus we arrive at the convoluted nature of the query in Figure 1. When
constructing the SPARQL query, we need to account for all possible triangles
but not generate duplicates. There are eight types of triangles, shown in Figure
2. However, all eight of these types can be collapsed down to the three unioned
clauses in Figure 1. We present this formally as a proof.</p>
        <p>Theorem 1. The Triangle-finding SPARQL query finds all triangles in the
queried graph and each solution is unique.</p>
        <p>Proof. First consider that triangle type iii is isomorphic to types iv through
viii. The below table outlines the functions needed to show the isomorphisms.
As they are isomorphic, type iii is sufficient to represent all the other types iv
viii. Also, type iii is not automorphic, and thus will not produce any duplicate
triangles. Finally, the solutions resulting from iii are disjoint from i and ii as
there is not mapping from iii to either i or ii. Thus we can deal with the solution
sets separately.</p>
        <p>Mapping ?X ?Y ?Z
iii to ...</p>
        <p>iv ?X ?Z ?Y
v ?Y ?X ?Z
vi ?Z ?Y ?X
vii ?Z ?X ?Y
vii ?Y ?Z ?X</p>
        <p>Concerning types i and ii, they are isomorphic under the mapping f (?X) =
?Z, f (?Y ) =?Y , and f (?Z) =?X. Thus, we need only include one of the patterns
in the query. We arbitrarily select type i. However, type i is automorphic (as is
ii ), and we must concoct a way of avoiding duplicate triples with the available
SPARQL language features. We solve the issue by enforcing an ordering on the
bindings. The first clause of the Triangle-finding SPARQL query enforces that
the string representation of the bindings must obey ?X &lt;?Y &lt;?Z under an
alphanumeric ordering. The second clause enforces ?Y &gt;?Z &gt;?X. It remains to
show that these two orderings will find all triangles of type i without duplication.
The table below outlines all possible relations between the variables assuming
no self loops.
?X ? ?Y ?Y ? ?Z ?Z ? ?X
&lt; &lt; &lt;
&lt;
&lt;
&lt;
&gt;
&gt;
&gt;
&gt;
&lt;
&gt;
&gt;
&lt;
&lt;
&gt;
&gt;
&gt;
&lt;
&gt;
&lt;
&gt;
&lt;
&gt;</p>
        <p>Empty since ?X &lt;?Y ∧?Y &lt;?Z =⇒ ?X &lt;?Z,
contradicting the third constraint.</p>
        <p>Fulfilled directly by first clause
Use automorphism f (?X) =?Y, f (?Y ) =?Z, f (?Z) =?X.</p>
        <p>Fulfilled by first clause.</p>
        <p>Fulfilled directly by second clause.</p>
        <p>Use automorphism f (?X) =?Z, f (?Y ) =?X, f (?Z) =?Y .</p>
        <p>Fulfilled by first clause.</p>
        <p>Use automorphism f (?X?) =?Z, f (?Y ) =?X, f (?Z) =?Y ).</p>
        <p>Fulfilled by second clause.</p>
        <p>Use automorphism f (?X) =?Y, f (?Y ) =?Z, f (?Z) =?X.</p>
        <p>Fulfilled by second clause.</p>
        <p>Empty since ?X &gt;?Y ∧?Y &gt;?Z =⇒ ?X &gt;?Z,
contradicting the third constraint.</p>
        <p>Two of the possibilities are invalid because the constraints are contradictory.
Another two possibilities directly match the first two clauses of the
trianglefinding query. The remaining four cases match the two clauses through an
automorphism. Thus, we may conclude that the Triangle-finding SPARQL query
does in fact find all triangles in the graph with no duplicates.
5</p>
        <p>
          An O(m3/2) Triangle Finding Algorithm
There are many triangle finding algorithms that are O(m3/2). For the
experiments we employ an algorithm presented by Cohen [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This algorithm has the
benefit of already being described in a parallel fashion in terms of mappers and
reduces of the MapReduce paradigm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Also, there is a version implemented in
the MultiThreaded Graph Library5 (MTGL) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. However, a formal complexity
analysis was not outlined in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], so we perform that here.
5 https://software.sandia.gov/trac/mtgl
        </p>
        <p>Cohen’s algorithm operates on what he calls a simplified graph. Namely, a
graph in which self-loops are eliminated, directionality is ignored, and there are
no duplicate edges. In our later experiments we do not allow self-loops, but we
account for directionality and do allow duplicate edges. This is due to the fact
that RDF is directional and multiple edges can be defined between vertices with
different edge types (predicate types). We do not want to collapse all of these
edge types down into one edge. We do not formally account for these different
assumptions in our analysis.</p>
        <p>Theorem 2. Given a simplified graph G, Cohen’s triangle finding algorithm is
O(m3/2).</p>
        <p>Proof. We assume the worst case, that G is completely connected. Relating m
to n, we have
m =</p>
        <p>i =
n−1
i=1
(n − 1)(n)
2
(1)</p>
        <p>Cohen’s algorithm is composed of two MapReduce phases. The input to the
first map is a list of edges. Each edge has been previously been augmented with
the degree of each vertex. If one were to include this preprocessing step in the
overall complexity, it is O(m) which is also in O(m3/2).</p>
        <p>Map 1 Map each edge to its low-degree vertex. According to our assumptions,
δ(vx) = δ(vy) ∀x, y ≤ n. Cohen suggests a tie-breaker based on vertex ordering;
we’ll use v1 &lt; v2 &lt; · · · &lt; vn. Below is the composition of the bins. Since
directionality is ignored, we’ll use a canonical representation of each edge, vi, vj ,
such that i &lt; j.</p>
        <p>Bin 1</p>
        <p>Reduce 1 Emit a record for each pair of edges in a bin (one for every open
triad). For the graph G, the first Map phase created n−1 bins, and bin i contains
n − i edges. Therefore the number of triads created is
n−2
i=1
n − i =
2
n−2</p>
        <p>(n − i)!
i=1 2(n − (i + 2))!
1 n−2
=
&lt;
=
=
=
&lt;
2
Hence, the first MapReduce task has complexity O(m3/2).</p>
        <p>For the second MapReduce phase the input is the emitted records of the first
MapReduce phase (O(m3/2)) as well as the augmented edge list that was used
as the input for the first MapReduce phase (O(m)). For the second MapReduce
phase, let p be the size of the combined input, note:</p>
        <p>O(p) = O(m) + O(m3/2) = O(m3/2).</p>
        <p>Map 2 Combine degree-augmented file and output from Reduce 1. For the
augmented edge list we have the following mapping to remove the vertex valences:
key1 = [e1, δ(vx), δ(vy)]
key2 = [e2, δ(vw), δ(vz)]
.
.</p>
        <p>.
keyn = [en, δ(vu), δ(vt)]
⇒
keye1 = [e1]
keye2 = [e2]
.
.</p>
        <p>.
keyen = [en]
and for the records emitted by Reduce 2, we have the identity operation.
Therefore, this task is O(p).
Reduce 2 Each bin corresponds with a vertex pair. A bin will contain at most
one edge record and any number of triad records. With our assumptions of a
completely connected graph, bin i contains n−2i triad records and one edge
record, and the reducer will emit n−i triangles. From our previous analysis,
2
this is O(m3/2) and therefore the overall complexity is O(m3/2).
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>We experimentally compare open source RDF/SPARQL engines, Jena, version
2.7.2, and Sesame, version 2.6.8, with MTGL’s implementation of Cohen’s
algorithm. For both Jena and Sesame we use the in-memory backend version of
each. We did need to make some modifications to the MTGL version in order to
allow for directionality and duplicate edges. Namely, we created a multimap that
gives a the list of edges connecting any two vertices. This allowed us to create
multiple triangles from single instances of a triad in the last reduce phase. We
used a workstation with 8 GB of memory and a 2.2 GHz Intel Core i7 processor.
Detailed times for our experiments can be found in the Appendix.</p>
      <p>
        For our experiments we create R-MAT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] graphs to simulate real-world graph
properties such as power-law distributions on degree, small-world graphs, and
small diameter. We varied the size of the graph from between n = 25 to n = 219.
Also, we tried three different edge factors (average degree per vertex), namely
16, 32, and 64. R-MAT has four other parameters, a, b, c, and d. These four
parameters are probabilities used recursively to determine where edges exist
within the adjacency matrix. We set these to the values of the Graph5006 search
benchmark: a = 0.57, b = 0.19, c = 0.19, and d = 0.05. To enable the SPARQL
engines the ability to process the data, we created IRI’s of the form &lt;http://i &gt;
where i is the vertex id given by the R-MAT generator. Also we made all edges
of the same type.
      </p>
      <p>Figures 3(a), 3(b), and 3(c) show the individual performance of each of the
three platforms as a function of the number of edges. All of the plots have a
loglog scale. Figure 3(d) shows the three platforms side by side. For this Figure, we
exclude graphs below 8192 edges to give a better idea of the scaling behavior for
larger graphs. Both Jena and Sesame exhibit a fair amount of constant overhead
per query that dominates the times in the smaller graphs. When excluding this
data, the fit of trendlines using power regression is quite good, with R2 all
exceeding 0.98. As can be seen from Figure 3(d), Jena has a complexity of around
O(m1.83), Sesame has O(m1.58), and MTGL’s version of Cohen’s algorithm is
around O(m1.39). The best possible for this data would be around O(m1.12),
which is rate of growth of triangles for the data as determined experimentally
and shown in Figure 3(e).</p>
      <p>It is clear that both Jena and Sesame are not employing a triangle finding
algorithm as their rate of growth is significantly larger than O(m1.5). While
the differences in powers may seem small between the three, consider that
extrapolating out to a billion edges, the difference in times between an O(m1.39)
6 www.graph500.org
(a) Jena</p>
      <p>(b) Sesame
(c) MTGL
(d) All
(e) Triangle Growth
algorithm and one with O(m1.58) with the same constant is about a 50x
difference. And the difference between O(m1.39) and O(m1.83) is around 9000x. While
not the focus of this paper, it is interesting to note that the MTGL version
computed the triangles in about two-orders of magnitude less time than either Jena
or Sesame.</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        There is much work within the graph theory community where there is a
straightforward application to the Semantic Web. For the most part, interfacing with
the Semantic Web takes place through SPARQL. This largely confines
interactions to subgraph matching tasks. Under this limitation, we can find work on
clique-finding [
        <xref ref-type="bibr" rid="ref10 ref5">5,10</xref>
        ], or generalizations of cliques such as trusses [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. An excellent
overview of research in subgraph pattern matching over a thirty year timespan
can be found in Conte et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        There are also algorithms outside of subgraph matching that can be of help
in analyzing the Semantic Web. Probably one of the most fundamental
algorithms is breadth-first search. In recent years, the Graph500 list has
significantly increased the competitiveness and visibility of the task, and scalability
and performance have increased concomitantly with the greater attention. Other
algorithms include single source shortest path [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], betweeness centrality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
connected components [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and many others.
      </p>
      <p>
        Related to all of these efforts is the task of graph partitioning. Graphs have
historically been difficult to partition in a distributed memory setting, where
the interconnectedness of the graph make it difficult to divide the data in such
a way to minimize communication overhead. Notable efforts include Buluc¸ and
Madduri [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Devine et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Programming models have also been created to aid development of
graphcentric algorithms. Notable among them are Google’s Pregel [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Signal/Collect
by Stutz et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. These may prove to be valuable paradigms for implementing
efficient graph-oriented code that can scale on large distributed systems.
8
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In order for Semantic Web applications to scale, the community needs to adopt
efficient algorithms to use as the computational kernels underlying analytics. In
this paper we’ve demonstrated how long existing triangle finding algorithms can
be employed to speed up SPARQL queries. We believe there are other algorithms
and lessons from graph theory that can utilized to speed up Semantic Web
applications and also open up other avenues for analysis.</p>
      <p>A</p>
    </sec>
    <sec id="sec-8">
      <title>Experimental Data</title>
      <p>Below is the data we collected running the triangle query on Jena, Sesame, and
MTGL. We divide the data into three tables, one for each edge factor. Times
are in seconds.
A.1</p>
      <p>Edge Factor = 16
|V | |E| Num Triangles Jena Sesame MTGL
25 512 7,645 0.66 0.39 0.0011
26 1,024 16,159 0.96 0.44 0.0024
27 2,048 33,263 1.23 0.57 0.0081
28 4,096 72,357 1.96 0.88 0.0132
29 8,192 156,716 2.96 1.78 0.0365
210 16,384 333,174 7.79 3.86 0.0721
211 32,768 739,951 26.38 10.67 0.1798
212 65,536 1,648,301 89.72 31.12 0.4696
213 131,072 3,450,520 291.65 92.78 1.1767
214 262,144 7,573,624 1263.91 317.21 3.3691
215 524,288 16,864,063 5550.93 1022.38 8.7734
216 1,048,576 35,286,039 3062.77 21.6454
217 2,097,152 74,837,468 57.1703
218 4,194,304 168,767,188 153.9390
219 8,388,608 357,383,850 409.6070
A.2</p>
      <p>Edge Factor = 32</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kintali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Madduri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mihail</surname>
          </string-name>
          .
          <article-title>Approximating betweenness centrality</article-title>
          .
          <source>In Proceedings of the 5th international conference on Algorithms and models for the web-graph, WAW'07</source>
          , pages
          <fpage>124</fpage>
          -
          <lpage>137</lpage>
          , Berlin, Heidelberg,
          <year>2007</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Berry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hendrickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Konecny</surname>
          </string-name>
          .
          <article-title>Software and algorithms for graph queries on multithreaded architectures</article-title>
          .
          <source>Parallel and Distributed Processing Symposium</source>
          , International,
          <volume>0</volume>
          :
          <fpage>495</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulu</surname>
          </string-name>
          <article-title>¸c and K. Madduri</article-title>
          .
          <article-title>Graph partitioning for scalable distributed graph computations</article-title>
          .
          <source>In Proc. 10th DIMACS Implementation Challenge Workshop - Graph Partitioning and Graph Clustering</source>
          , Feb.
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos.</surname>
          </string-name>
          R-mat:
          <article-title>A recursive model for graph mining</article-title>
          . In M. W. Berry,
          <string-name>
            <given-names>U.</given-names>
            <surname>Dayal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kamath</surname>
          </string-name>
          , and D. B. Skillicorn, editors,
          <source>SDM. SIAM</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>N.</given-names>
            <surname>Chiba</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Nishizeki</surname>
          </string-name>
          .
          <article-title>Arboricity and subgraph listing algorithms</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <fpage>210</fpage>
          -
          <lpage>223</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <article-title>Graph twiddling in a mapreduce world</article-title>
          .
          <source>Computing in Science Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <fpage>29</fpage>
          -
          <lpage>41</lpage>
          ,
          <fpage>july</fpage>
          -aug.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.</given-names>
            <surname>Conte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Foggia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sansone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vento</surname>
          </string-name>
          .
          <article-title>Thirty years of graph matching in pattern recognition</article-title>
          .
          <source>IJPRAI</source>
          , pages
          <fpage>265</fpage>
          -
          <lpage>298</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          .
          <article-title>Mapreduce: simplified data processing on large clusters</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>51</volume>
          :
          <fpage>107</fpage>
          -
          <lpage>113</lpage>
          ,
          <year>January 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>K. D. Devine</surname>
            ,
            <given-names>E. G.</given-names>
          </string-name>
          <string-name>
            <surname>Boman</surname>
            , R. T. Heaphy,
            <given-names>R. H.</given-names>
          </string-name>
          <string-name>
            <surname>Bisseling</surname>
            , and
            <given-names>U. V.</given-names>
          </string-name>
          <string-name>
            <surname>Catalyurek</surname>
          </string-name>
          .
          <article-title>Parallel hypergraph partitioning for scientific computing</article-title>
          .
          <source>In Proceedings of the 20th international conference on Parallel and distributed processing</source>
          ,
          <source>IPDPS'06</source>
          , pages
          <fpage>124</fpage>
          -
          <lpage>124</lpage>
          , Washington, DC, USA,
          <year>2006</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>F.</given-names>
            <surname>Eisenbrand</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Grandoni</surname>
          </string-name>
          .
          <article-title>On the complexity of fixed parameter clique and dominating set</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>326</volume>
          (
          <issue>1-3</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>67</lpage>
          , Oct.
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heflin</surname>
          </string-name>
          .
          <article-title>Lubm: A benchmark for owl knowledge base systems</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <fpage>158</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Lumetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Culler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Yelick</surname>
          </string-name>
          .
          <article-title>Connected components on distributed memory machines</article-title>
          .
          <source>In Parallel Algorithms: 3rd DIMACS Implementation Challenge October 17-19</source>
          ,
          <year>1994</year>
          , volume
          <volume>30</volume>
          <source>of DIMACS Series in Discrete Mathematics and Theoretical Computer Science</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . American Mathematical Society,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>K.</given-names>
            <surname>Madduri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berry</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Crobak</surname>
          </string-name>
          .
          <article-title>An experimental study of a parallel shortest path algorithm for solving large-scale graph instances</article-title>
          .
          <source>In ALENEX</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. G. Malewicz,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Austern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Bik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Dehnert</surname>
          </string-name>
          , I. Horn,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leiser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Czajkowski.</surname>
          </string-name>
          <article-title>Pregel: a system for large-scale graph processing</article-title>
          .
          <source>In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10</source>
          , pages
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. Schmidt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hornung</surname>
            , G. Lausen, and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pinkel</surname>
          </string-name>
          .
          <article-title>Sp2bench: A sparql performance benchmark</article-title>
          .
          <source>CoRR, abs/0806.4627</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>P.</given-names>
            <surname>Stutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <article-title>Signal/collect: graph algorithms for the (semantic) web</article-title>
          .
          <source>In Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I, ISWC'10</source>
          , pages
          <fpage>764</fpage>
          -
          <lpage>780</lpage>
          , Berlin, Heidelberg,
          <year>2010</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Edge</given-names>
            <surname>Factor</surname>
          </string-name>
          =
          <volume>64</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>