<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">SWARMGUIDE: Towards Multiple-Query Optimization in Graph Databases</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zahid</forename><surname>Abul-Basher</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<region>ON</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nikolay</forename><surname>Yakovets</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">York University</orgName>
								<address>
									<settlement>Toronto</settlement>
									<region>ON</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Parke</forename><surname>Godfrey</surname></persName>
							<email>godfrey@cse.yorku.ca</email>
							<affiliation key="aff1">
								<orgName type="institution">York University</orgName>
								<address>
									<settlement>Toronto</settlement>
									<region>ON</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mark</forename><surname>Chignell</surname></persName>
							<email>chignell@mie.utoronto.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<region>ON</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">SWARMGUIDE: Towards Multiple-Query Optimization in Graph Databases</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">184CBE99F8317EDADAB37343FF6BF21E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Analytic tasks over graph databases often require sequences of related queries to be evaluated. While there are promising query optimization methods for graph queries <ref type="bibr" target="#b5">[6]</ref>, these do not optimize globally for sets of queries. For relational, multiple query optimization (MQO) has been studied <ref type="bibr" target="#b4">[5]</ref>. We propose Swarmguide, a framework for MQO for graph databases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Graph Databases &amp; Queries</head><p>Preliminaries. A graph database G is a finite, directed, edge-labeled, multigraph defined by G = N, Σ, E , where N is a finite set of nodes (vertices), Σ is a set of labels, E is a set of directed, labeled edges, and E ⊆ N × Σ × N . A path p in G is defined as a sequence of n 0 a 0 n 1 • • • n k−1 a k−1 n k where n i ∈ N , a i ∈ Σ, and n i , a i , n i+1 ∈ E for 0 ≤ i ≤ k. We call the sequence of edge labels Σ * of a particular path p the word, ω(p) that p induces.</p><p>A regular path rp is a path in the graph where ω(rp) is a word in a given regular language L(reg) (e.g., ω(rp) ∈ L(reg)). A regular path query (RPQ) <ref type="bibr" target="#b1">[2]</ref> is a triple x, reg, y in which x and y are free variables over the domain of nodes, and reg is a regular expression. An answer of an RPQ is a node-pair s, t (s, t ∈ N ) such that there is a path p in G between s and t and ω(p) ∈ L(reg). The answer set of an RPQ is the set of all its answers. The RDF data model and SPARQL query language instantiate these concepts. With the introduction of property paths in SPARQL 1.1, the query language encompasses RPQs. For our examples in this paper, we adopt SPARQL syntax.</p><p>Evaluation of RPQs. Evaluating an RPQ is a two-step procedure. Path recognition determines whether the word of path p, ω(p), is recognized by the regular expression reg (i.e., ω(p) ∈ L(reg)); path search finds pairs of nodes, s, t ∈ N such that there exists such a path p from node s to node t. There has been recent interest in how to evaluate RPQs efficiently in SPARQL over RDF stores. Yakovets et al. <ref type="bibr" target="#b5">[6]</ref> demonstrate in their Waveguide framework that there is a rich plan space for such queries, and that different plans for a given query can Consider Q 1 = x, director of /actor/born in, y over the movie graph in Figure <ref type="figure" target="#fig_0">1</ref>. This query can be evaluated in different ways, with different edgewalk costs. We can evaluate it in forward-direction by retrieving all node-pairs connected by edge label director of then appending with those linked by actor and then by born in. We can also evaluate it in the backward direction by retrieving all node-pairs connected by edge label born in then prepending with those connected by actor and then by director of . A third way is from the middle by retrieving all node-pairs connected by edge label actor then appending with those connected by born in and then prepending by director of . Note that, the edge-walk cost for the first is 10, in the second is 9, and in the third is 7.</p><p>In Waveguide, path search is performed efficiently while simultaneously recognizing the path expressions. Waveguide's input is a graph database G and a waveplan (WP) P Q which guides a number of search wavefronts that explore the given graph. The term wavefront is introduced to refer to a part of the plan that evaluates breadth-first during the evaluation. This graph exploration, driven by an iterative search procedure, is inspired by the semi-naïve bottom-up strategy used in the evaluation of linear recursive expressions based on a fixpoint.</p><p>The key idea is to expand repeatedly the search wavefronts until no new answers are produced; i.e., we reach a fixpoint. Each search wavefront is guided by an automaton in the plan, a finite state machine based on an NFA.</p><p>Consider query plans for Q 1 = x, director of /actor/born in, y as in Figure <ref type="figure">2</ref>. Plan A employs a WP corresponding to a single FA that directly encodes a recognizer for the query's regular expression. This plan captures the notion of "pipelining". Whereas Plan B employs a WP that consists of two subplan automata, the first subplan (SP) is used as a view for transition over states in the second plan. This captures the notion of "materialization". Note that in the second subplan automaton, we used a prepend transition (•director of ) over the previous state. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">SWARMGUIDE: The Framework</head><p>Queries running in a batch may have common subexpressions among them. Instead of running each query individually, one could benefit from sharing the results of common subexpressions <ref type="bibr" target="#b4">[5]</ref>. Consider two queries, Q 1 = x, director of / actor/born in, y and Q 2 = x, actor/born in/located in, y as in Figure <ref type="figure">2</ref>. Clearly, the optimal automaton plans (Plan A) do not share any common subplan for Q 1 and Q 2 . However, if we chose suboptimal plans (Plan B) then we could have a common sub-plan automaton (actor/born in). This common subplan could be shared among Q 1 and Q 2 without re-evaluating it.</p><p>Evaluation of MRPQs. The evaluation of multiple regular path queries (MR-PQs) is two-step: identifying common subexpressions among queries; and searching for a global optimal plan such that its cost is less than the summed cost of the local optimal plans in the batch. We present our framework Swarmguide, a generic framework for optimizing multiple regular path queries (MRPQs) in graph databases. Figure <ref type="figure">3</ref> shows Swarmguide architecture.</p><p>Clustering Queries. Clustering queries is a preprocessing step to finding the commonalities among the RPQs in the batch. These can be detected by finding the isomorphic subgraphs of their corresponding finite automata (FAs). This process is known to be NP-hard, in general <ref type="bibr" target="#b0">[1]</ref>. Therefore, we use heuristics to group only queries that can have common sub-automata, and then do the hard task of identifying the common sub-automata within each group. Finding Common Sub-automata. Finding common sub-automaton resembles finding the maximum common subgraph among graphs. The problem becomes more difficult here as we need to find the largest common subgraphs (sub-automata) for multiple graphs (FAs). Most existing solutions only consider non-labeled edges and nodes in undirected graphs. We adopt the solution from the maximal common-edge subgraph (MCES) problem <ref type="bibr" target="#b2">[3]</ref> to detect common subautomata. This has three steps: transforming labeled-graphs into the equivalent line-graphs; producing a product graph from the line-graphs; and detecting the maximal cliques in the product graph, which corresponds to MCESs (therefore, common sub-automata). Query Rewriting. A regular expression reg can be recognized equivalently by many FAs. For instance, the two FAs for two regular expressions reg 1 = (ab) * ac and reg 2 = a(ba) * c are equivalent; however, the plan space may differ, depending on the chosen FA. In this step, we rewrite FAs according to their shared common sub-automata. Global Optimization. A local optimal plan for a given query may not be the best plan when optimizing a batch of queries. The goal of the global optimization is then to search local plans of queries to find a global plan by choosing one local plan per query in a cluster of RP Qs. The cost of this global plan should be less than, or equal to, the total cost of local optimal plans of queries in the batch. In <ref type="bibr" target="#b3">[4]</ref>, the authors propose cost-based heuristic algorithms for this; we likewise adapt their algorithms for here.</p><p>Other Considerations. A single FA plan is often decomposed into several subplans as views. When planning globally, one has to check if views from other queries' plans can be shared. Intermediate node-pairs-along with their states in FA-are cached to avoid unbounded computation over cyclic graphs. When evaluating a global plan in MRPQs, these local caches need to be maintained globally so subplans shared among queries can be leveraged. The global cache also can be used to obtain useful statistics about the graph to obtain a more accurate global optimal plan and choosing the initial nodes for search exploration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>Graph database analytic applications are rapidly on the rise. These raise new and arduous challenges. We can apply many of the lessons learned from the relational domain-e.g., pipelining, materialized views, cost-based optimization, query answering by views, and multiple query optimization-to these challenges to great effect. However, these techniques have to be carefully re-imagined to be effective for graph databases, their queries, and applications.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Example movie graph</figDesc><graphic coords="2,177.99,116.83,259.37,115.91" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 :Fig. 3 :</head><label>23</label><figDesc>Fig. 2: A batch of two queries</figDesc><graphic coords="3,177.99,252.53,259.36,129.68" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Scalable multi-query optimization for SPARQL</title>
		<author>
			<persName><forename type="first">W</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kementsietsidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Engineering (ICDE), 2012 IEEE 28th International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="666" to="677" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Finding regular simple paths in graph databases</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Mendelzon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T</forename><surname>Wood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Computing</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1235" to="1258" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Maximum common subgraph isomorphism algorithms for the matching of chemical structures</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Raymond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Willett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of computer-aided molecular design</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="521" to="533" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Efficient and extensible algorithms for multi query optimization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Seshadri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sudarshan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhobe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="249" to="260" />
			<date type="published" when="2000">2000</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Multiple-query optimization</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Sellis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Database Systems (TODS)</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="23" to="52" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Query planning for evaluating SPARQL property paths</title>
		<author>
			<persName><forename type="first">N</forename><surname>Yakovets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Godfrey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gryz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the ACM SIGMOD International Conference on Management of Data<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-06">June 2016</date>
			<biblScope unit="page" from="1" to="15" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
