=Paper=
{{Paper
|id=Vol-2369/short03
|storemode=property
|title=Parallel Computation of Generalized Hypertree Decompositions
|pdfUrl=https://ceur-ws.org/Vol-2369/short03.pdf
|volume=Vol-2369
|authors=Georg Gottlob,Cem Okulmus,Reinhard Pichler
|dblpUrl=https://dblp.org/rec/conf/amw/GottlobOP19
}}
==Parallel Computation of Generalized Hypertree Decompositions==
<pdf width="1500px">https://ceur-ws.org/Vol-2369/short03.pdf</pdf>
<pre>
                Parallel Computation of
         Generalized Hypertree Decompositions?

            Georg Gottlob,1,2 Cem Okulmus,1 and Reinhard Pichler1
                   1
                       TU Wien, Austria, 2 University of Oxford, UK


1     Introduction
Answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Prob-
lems (CSPs) are arguably among the most important tasks in Computer Science.
They are classical NP-complete problems. However, they have tractable frag-
ments for instances where the underlying hypergraphs are acyclic. There has
been active research for several decades to generalize acyclicity with several no-
tions of decompositions and width [4]. Here we focus on generalized hypertree
decompositions (GHDs) and generalized hypertree width (ghw ).
    Deciding if a given CQ or CSP (strictly speaking, the underlying hypergraph
H) has ghw ≤ k is NP-complete even for k = 2 [3]. However, it was also shown
in [3] that the problem of deciding if a given hypergraph has ghw (H) ≤ k
becomes tractable for fixed k under realistic restrictions. One such restriction
is the Bounded Intersection Property (BIP): a hypergraph H has intersection
width ≤ i (denoted as iwidth(H) ≤ i), if the intersection of any two edges in H
has at most i vertices. A class of hypergraphs satisfies the BIP, if there exists a
constant i, such that every hypergraph H ∈ C has iwidth(H) ≤ i.
    In [2], three different algorithms for checking ghw (H) ≤ k (and, if so, com-
puting a concrete GHD of width ≤ k) were implemented and tested on the Hy-
perBench benchmark [2] comprising over 3,000 hypergraphs derived from CQs
and CSPs from various sources. All the algorithms thus implemented rely on
the observation that hypergraphs of CQs or CSPs stemming from applications
tend to have low iwidth. These GHD-computations were purely sequential, even
though one of the algorithms seems to lend itself naturally to parallel processing:
more precisely, this GHD-algorithm is based on so-called “balanced separators”;
at each step of the top-down construction of a GHD, this algorithm searches
for a set of at most k edges {e1 , . . . , ek } (= a “balanced separator”), such that
the vertex set e1 ∪ · · · ∪ ek splits the hypergraph into components whose size
(measured in terms of the edges that intersect with each component) is at most
half the size of the component processed by the parent node in the GHD.
    Given that many of the experiments reported in [2] had high run times or
were even stopped due to a time out (with default value 3,600 seconds), we
have to look for a different computation strategy. The goal of this work is to
provide a parallel computation of GHDs and to move the GHD-computation to
a powerful cluster. To this end, we adopt the aforementioned GHD-algorithm
?
    This work was supported by the Austrian Science Fund (FWF):P30930-N35.
based on balanced separators (referred to as “b-seps”, for short, in the sequel)
and implement it in the Go programming language [1]. Developed at Google
in 2009, it has already seen widespread use by a number of companies such as
Dropbox, CloudFare, Netflix and by Google itself.
    Below, we describe the challenges that we have faced in designing a parallel
implementation of the b-seps approach:

 – Finding a good design of the main targets for parallelization, namely the
   search for the next balanced separator and the recursive calls of the GHD-
   computation for the resulting components;
 – Finding a way to partition the work space equally among CPUs;
 – Designing parallel search to support efficient backtracking;
 – Splitting resources equally among the search space during recursive calls;
 – Introducing subedges of the edges in the given hypergraph (which is needed to
   exploit the low iwidth(H)) while avoiding unneeded combinatorial explosion.

   Below, we summarize how we have dealt with the above challenges and we
report on first, promising experimental results. We will show that the parallel ap-
proach is indeed able to significantly speed up the expensive GHD-computations
and that it allowed us to solve some cases which were out of reach with the
previous, purely sequential GHD-implementations in [2].


2   Preliminaries

We assume the reader to be familiar with basic notions such as conjunctive
queries (CQs) and their corresponding hypergraphs. A GHD of a hypergraph
H = (V (H), E(H)) is a tuple hT, χ, λi where T = (N, E(T )) is a tree, and χ and
λ are labelling functions, which map to each node n ∈ N two sets, χ(n) ⊆ V (H)
and λ(n) ⊆ E(H). We denote with B(λ(n)) the set {v ∈ V (H) | v ∈ e, e ∈ λ(n)}.
The functions χ and λ have to satisfy the following conditions:

 1. For each edge e ∈ E(H) there exists a node n ∈ N such that e ⊆ χ(n).
 2. For each vertex v ∈ V (H), {n ∈ N | v ∈ χ(n)} is a connected subtree of T .
 3. For each node n ∈ N , we have that χ(n) ⊆ B(λ(n)).

    The width of a GHD (ghw) is the size of the largest label for λ. It was shown
in [2] that for a class of hypergraphs enjoying the BIP, one can add polynomially
many subedges of edges in E(H) to ensure B(λ(n)) = χ(n) for every n ∈ N .
    For a set of edges S ⊆ E(H), we say two vertices v1 , v2 ∈ V (H) are [S]-
connected if there is a path of edges e1 , . . . , en ∈ E(H)\S such that v1 ∈ e1 and
v2 ∈ en and for each pair in the path ei , ei+1 , i ≤ n − 1 we have that ei and
ei+1 share a common vertex. We define an [S]-component to be a maximal set of
[S]-connected vertices. The size of an [S]-component C is defined as the number
of edges e ∈ E(H) such that e ∩ C 6= ∅. For a hypergraph H and a set of edges
S ⊆ E(H), we say that S is a balanced separator (b-sep) if all [S]-components
of H have a size of |E(H)|
                         2    or less .
       graph                           sequential       parallel   speedup
       Dubois-016.xml.hg              668.097 ms       45.39 ms       14.71
       rand-25-10-25-87-24.xml.hg     5936.83 ms      879.99 ms        7.84
       Nonogram-012-table.xml.hg     73976.08 ms    11057.68 ms        6.69
       Pi-20-10-20-30-17.xml.hg       1439.71 ms      200.54 ms        7.17

             Table 1. Running times of b-seps algorithm for width 3.


    It was shown in [2] that, for every GHD hT, χ, λi of a hypergraph H, there
exists a node n ∈ N such that λ(n) is a balanced separator of H. This property
can be made use of when searching for a GHD of size k of a hypergraph H: if
no such separator exists, then clearly there can be no GHD of H of width k.


3   Results
The Go programming language [1] has a model of concurrency that is based on
Hoare’s Communicating Sequential Processes [5]. The smallest concurrent unit
of computation is called a “goroutine”, essentially a light-weight thread.
    For the b-seps algorithm, we looked at two key areas of parallelization: the
search for b-seps and the recursive calls. For the search, while testing out a
number of configurations, we settled ultimately on using two types of goroutines:
A number of workers, spawned via a central master goroutine, which starts them
and waits on exactly two conditions (whichever happens first): either one of the
workers finds a b-sep, or none of them do and they all terminate on their own.
In the first case it makes sure all workers terminate and continues the main
computation. For the recursive calls, we so far only spawn a single goroutine
for each of them and wait for each call to return its result (a GHD of the
corresponding subhypergraph).
    To split the work equally among the workers during the search, a simple
work balancer (each worker receiving jobs from the central master goroutine)
would address this nicely. However, we found that this introduces a considerable
synchronization delay when compared with splitting up the work beforehand. We
assume here that the choice of edges has only small influence on the time needed
to check if they form a b-sep. Thus we split the search space of size M = N
                                                                              
                                                                             k by
the number of workers W (and if there is some remainder, we simply increase
the workload of the first (M mod W ) workers by 1). Each worker has its own
iterator, thus minimizing the need for communication during the search.
    Our algorithm must support backtracking, i.e., restarting the search and
finding another solution (if it exists). If we do not keep track carefully where
each worker left off when the parallel search halted earlier, we could be forced
to repeat work. We address this by saving the above mentioned iterators, used
by each worker, in the main thread so as to reuse them during backtracking.
    Another challenge lies in parallelizing the recursive calls in such a way that
the resources (CPU cores) are split among them, to reduce unnecessary synchro-
                                          4.6

                                          4.4

                                          4.2


                         Median Speedup
                                           4

                                          3.8

                                          3.6

                                          3.4

                                                width 2   width 3   width 4


          Figure 1. Median speedups for CSP Application instances from [3]


nization delays and not focus too much on one branch of the search tree. We
have not yet found a satisfactory solution for this. One idea would be to go back
to load balancing, and see if it helps with parallelizing the recursive calls, at the
cost of taking away resources from the parallel search of b-seps.
    The algorithm needs to compute subedges and consider them as possible
choices for b-seps, as it would otherwise not be complete. The first and obvious
choice is to add all possible subedges in the beginning (adding them globally).
This leads to a combinatorial explosion, and more crucially also leads to many
useless combinations, such as considering multiple subedges from the same orig-
inal edge at once. We address this by adopting the local subedge variant from [3]
and making it more restricted by first finding a balanced separator among the
“original” edges of E(H), and if this leads to a reject case, considering subedge
variants of its edges. We also make sure not to repeat the same subedges by
caching the vertex sets that generate them.
    We used our parallel implementation to look at further instances that can
be determined negatively (proving that no GHD of a certain width can exist) or
positively (actually producing a GHD of that width). Our test setup was a cluster
with 11 machines, each with two 12-core Intel Xeon CPU E5-2650 v4, clocked
at 2.20GHz. For the tests, each job ran exclusively on a machine spawning up
to 24 goroutines.
    Possible speedups of four sample hypergraphs from HyberBench when com-
pared to a sequential execution1 are showcased in Table 1, where speedup is
simply the sequential time divided by the parallel one. Furthermore, when com-
paring the execution times of the purely sequential version with parallelizing
both the search for b-seps and parallelizing the recursive calls, we observe an
increase in speedups at higher widths, seen in Figure 1. Additionally, we could
produce new results for some previously unsolved instances of the HyperBench
benchmark, thus determining their exact ghw . We are positive that with further
1
    ’Sequential execution’ here refers to running the same general algorithm, but rewrit-
    ten so that it does not use any goroutines. This version therefore only uses a single
    CPU core.
improvements, we will be able to “fill out the blank spots” on many hypergraphs
of the HyperBench benchmark from [2] and in the process produce a more robust
tool to compute GHDs.


4   Outlook
This paper is about work in progress. The next step will be to incorporate
further optimizations into our Go implementation such as the following: (1)
Applying various heuristics for ordering the hyperedges (to find b-seps faster),
(2) caching of previous computations, and (3) looking at hybrid solutions which
first apply the recursive splitting into subproblems via the b-seps approach and
then (for sufficiently small subproblems) switch to one of the other (sequential)
GHD-algorithms in [2]. The ultimate goal is, at least for all hypergraphs in the
HyperBench benchmark where an upper bound on ghw of at most 6 is known
(i.e., slightly more than 1,500 instances), to be able to compute the precise value
of ghw . Currently, for about half of these instances, the exact ghw is still open.


References
1. Golang.org (Feb 2019), https://golang.org/
2. Fischl, W., Gottlob, G., Longo, D.M., Pichler, R.: Hyperbench: A benchmark and
   tool for hypergraphs and empirical findings. In: PODS 2019 (to appear) (2019)
3. Fischl, W., Gottlob, G., Pichler, R.: General and fractional hypertree decomposi-
   tions: Hard and easy cases. In: Proc. PODS 2018. pp. 17–32 (2018)
4. Gottlob, G., Greco, G., Leone, N., Scarcello, F.: Hypertree decompositions: Ques-
   tions and answers. In: Proc. PODS 2016. pp. 57–74 (2016)
5. Hoare, C.A.R.: Communicating sequential processes. Commun. ACM 21(8), 666–
   677 (1978)

</pre>