<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Advanced Heuristics for Parallel ASP Instantiation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simona Perri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Ricca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Sirianni</string-name>
          <email>sirianni@mat.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica Universita` della Calabria 87030</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The evaluation of ASP programs is traditionally carried out in two steps. The first is called instantiation or grounding, and consists on the computation of a ground program equivalent to the input one that, in turn, is evaluated by using a backtracking search algorithm in the second phase. Instantiation is important for the efficiency of the whole evaluation, might becomes a bottleneck in common situations, and is particularly crucial when huge input data has to be dealt with. Notably, performance improvements can be obtained by developing parallel systems, which exploit modern multi-core multi-processor machines. In this paper, we describe a dynamic heuristics for load balancing and granularity control devised for improving parallel instantiation systems. We implemented the new technique in the parallel instantiator based on the DLV system, and conducted an experimental analysis that confirms its efficacy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the last few years, entry-level computer systems have started to implement
multicore/multi-processor SMP (Symmetric MultiProcessing) architectures. In a
modern SMP computer two or more identical processors can connect to a single shared
main memory, and the operating system supports multithreaded programs for
exploiting the available CPUs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, most of the available software, that was
devised for single-processor machines, is unable to exploit the power of SMP
architectures. Recently [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref7 ref8">2, 3, 4, 5, 6, 7, 8</xref>
        ], the parallel evaluation technology has
been exploited for implementing faster evaluation systems in the field of Answer
Set Programming (ASP). ASP [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref9">9, 10, 11, 12, 13, 14</xref>
        ] is a declarative approach to
programming proposed in the area of nonmonotonic reasoning and logic
programming which features a declarative nature combined with a relatively high
expressive power [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
      <p>
        Traditionally, the kernel modules of ASP systems work on a ground
instantiation of the input program. Therefore, an input program P first undergoes the
so-called instantiation process, which produces a program P′ semantically
equivalent to P, but not containing any variable. This phase is computationally
expensive (see [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ]); thus, having an efficient instantiation procedure is, in general,
crucial for the performance of the entire ASP systems. Moreover, some recent
applications of ASP (see e.g. [
        <xref ref-type="bibr" rid="ref17 ref18 ref19 ref20">17, 18, 19, 20</xref>
        ]), have evidenced the practical need for
faster instantiators. It is easy to see that the exploitation of SMP systems in the
grounding process can bring significant performance improvements. Indeed, an
effective technique for the parallel instantiation of ASP programs exploiting SMP
systems was proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        However, the efficacy of this method was limited to programs with many rules,
since, roughly, it allows for instantiating independent rules in parallel; but, a
rewriting technique has been proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that modifies the input program in such a
way that the technique of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] becomes applicable also in case of programs with few
rules. The basic idea of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is to rewrite input rules at execution time in order to
induce a form of or-parallelism. This can be obtained, given a rule r, by “splitting”
the extension of one single body predicate p of r in several parts. Each part is
associated with a different temporary predicate; and, for each of those predicates,
say pi, a new rule, obtained by replacing p with pi, is produced. The so-created
rules are instantiated in parallel in place of r by exploiting the parallel algorithm
of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (a trivial realign step gets rid of the new names to obtain the intended output).
This rewriting technique can be exploited by any available parallel ASP
instantiator [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ], and it was successfully implemented [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in a parallel ASP instantiator
based on DLV [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Here the number of splits per rule was set to a global
userdefined value, and the same value is used for each rule in input. Clearly, this simple
strategy does not work well might in several cases. Indeed, if each process receives
a “too small” amount of work, then the costs added by parallel execution may
become larger than the benefits (because of thread creation and scheduling overhead);
on the other hand, if the amount of work assigned to threads is “too large”, then
a resulting bad workload distribution will reduce the advantages of parallel
evaluation. In this paper, we propose an advanced heuristics that is able to improve
the efficiency of the parallel evaluation by automatically determining, rule by rule,
the amount of work that has to be assigned to each parallel instantiator. Moreover,
we implemented our heuristics in the parallel ASP instantiator of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and we
report here the results of an experimental analysis that confirms the efficacy of the
proposed method.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Answer Set Programming</title>
      <p>In this section, we briefly recall syntax and semantics of Answer Set Programming.
Syntax. A variable or a constant is a term. An atom is a(t1, ..., tn), where a is
a predicate of arity n and t1, ..., tn are terms. A literal is either a positive literal p
or a negative literal not p, where p is an atom. A disjunctive rule (rule, for short)
r is a formula a1 ∨ · · · ∨ an :– b1, · · · , bk, not bk+1, · · · , not bm. where
a1, · · · , an, b1, · · · , bm are atoms and n ≥ 0, m ≥ k ≥ 0. The disjunction a1 ∨
· · · ∨ an is the head of r, while the conjunction b1, ..., bk, not bk+1, ..., not bm
is the body of r. A rule without head literals (i.e. n = 0) is usually referred to as
an integrity constraint. If the body is empty (i.e. k = m = 0), it is called a fact.
H(r) denotes the set {a1, ..., an} of the head atoms, and by B(r) the set {b1, ..., bk,
not bk+1, . . . , not bm} of the body literals. B+(r) (resp., B−(r)) denotes the set
of atoms occurring positively (resp., negatively) in B(r). A rule r is safe if each
variable appearing in r appears also in some positive body literal of r.</p>
      <p>An ASP program P is a finite set of safe rules. An atom, a literal, a rule, or
a program is ground if no variables appear in it. Accordingly with the database
terminology, a predicate occurring only in facts is referred to as an EDB predicate,
all others as IDB predicates; the set of facts of P is denoted by EDB(P).
Semantics. Let P be a program. The Herbrand Universe and the Herbrand Base
of P are defined in the standard way and denoted by UP and BP , respectively.</p>
      <p>Given a rule r occurring in P, a ground instance of r is a rule obtained from r
by replacing every variable X in r by σ(X), where σ is a substitution mapping the
variables occurring in r to constants in UP ; ground(P) denotes the set of all the
ground instances of the rules occurring in P.</p>
      <p>An interpretation for P is a set of ground atoms, that is, an interpretation is a
subset I of BP . A ground positive literal A is true (resp., false) w.r.t. I if A ∈ I
(resp., A 6∈ I). A ground negative literal not A is true w.r.t. I if A is false w.r.t.
I; otherwise not A is false w.r.t. I. Let r be a ground rule in ground(P). The
head of r is true w.r.t. I if H (r) ∩ I 6= ∅. The body of r is true w.r.t. I if all body
literals of r are true w.r.t. I (i.e., B+(r) ⊆ I and B−(r) ∩ I = ∅) and is false w.r.t.
I otherwise. The rule r is satisfied (or true) w.r.t. I if its head is true w.r.t. I or its
body is false w.r.t. I. A model for P is an interpretation M for P such that every
rule r ∈ ground(P) is true w.r.t. M . A model M for P is minimal if no model N
for P exists such that N is a proper subset of M . The set of all minimal models
for P is denoted by MM(P).</p>
      <p>
        Given a ground program P and an interpretation I, the reduct of P w.r.t. I is
the subset PI of P, which is obtained from P by deleting rules in which a body
literal is false w.r.t. I. Note that the above definition of reduct, proposed in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
simplifies the original definition of Gelfond-Lifschitz (GL) transform [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but is
fully equivalent to the GL transform for the definition of answer sets [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        Let I be an interpretation for a program P. I is an answer set (or stable model)
for P if I ∈ MM(PI ) (i.e., I is a minimal model for the program PI ) [
        <xref ref-type="bibr" rid="ref22 ref9">22, 9</xref>
        ]. The
set of all answer sets for P is denoted by AN S(P).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Parallel Instantiation of ASP Programs</title>
      <p>
        In this Section we briefly recall some recently proposed techniques ([
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]) for the
parallel instantiation of ASP Programs. In particular, we show that according to
such techniques, three levels of parallelism can be exploited during the instantiation
process, namely, components, rules and single rule level. The first level allows for
instantiating in parallel subprograms of the program in input and it is especially
useful when handling programs containing parts which are, somehow, independent.
The second one, the rules level, allows for the parallel evaluation of rules within a
given subprogram and it is thus useful when the number of rules in the subprograms
is high. The third one, the single rule level, allows for the parallel evaluation of a
single rule and it is thus crucial for the parallelization of programs with few rules,
where the first two levels are almost not applicable. A detailed description of these
techniques is out of the scope of this paper. For further details, we refer the reader
to [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>Components Level</title>
        <p>
          The first level of parallelism, called Components Level, has been described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
and, essentially, it consists in dividing the input program P into subprograms,
according to the dependencies among the IDB predicates of P, and by identifying
which of them can be evaluated in parallel. More in detail, each program P is
associated with a graph, called the Dependency Graph of P, which, intuitively,
describes how IDB predicates of P depend on each other. For a program P, the
Dependency Graph of P is a directed graph GP = hN, Ei, where N is a set of
nodes and E is a set of arcs. N contains a node for each IDB predicate of P, and
E contains an arc e = (p, q) if there is a rule r in p such that q occurs in the head
of r and p occurs in a positive literal of the body of r.
        </p>
        <p>The graph GP induces a subdivision of P into subprograms (also called
modules) allowing for a modular evaluation. We say that a rule r ∈ P defines a
predicate p if p appears in the head of r. For each strongly connected component (SCC)
1C of GP , the set of rules defining all the predicates in C is called module of C. A
rule r occurring in a module of a component C (i.e., defining some predicate ∈ C)
is said to be recursive if there is a predicate p ∈ C occurring in the positive body
of r; otherwise, r is said to be an exit rule. Moreover, a partial ordering among
the SCCs is induced by GP , defined as follows: for any pair of SCCs A, B of GP ,
we say that B directly depends on A if there is an arc from a predicate of A to a
predicate of B; and, B depends on A if there is a path in GP from A to B.</p>
        <p>According to such definitions, the instantiation of the input program P can
be carried out by separately evaluating its modules; if the evaluation order of the
modules respects the above mentioned partial ordering then a small ground
program is produced. Indeed, this gives the possibility to compute ground instances of
rules containing only atoms which can possibly be derived from P (thus, avoiding
the combinatorial explosion which can be obtained by naively considering all the
atoms in the Herbrand Base).</p>
        <p>
          Intuitively, this partial ordering guarantees that a component A precedes a
component B if the program module corresponding to A has to be evaluated before the
one of B (because the evaluation of A produces data which are needed for the
instantiation of B). Moreover, the partial ordering allows for determining which
modules can be evaluated in parallel. Indeed, if two components A and B, do not
depend on each other, then the instantiation of the corresponding program modules
can be performed simultaneously, because the instantiation of A does not require
1A strongly connected component of a directed graph is a maximal subset of the vertices, such
that every vertex is reachable from every other vertex.
the data produced by the instantiation of B and vice versa. The dependency among
components is thus the principle underlying the first level of parallelism. At this
level subprograms can be evaluated in parallel, but still the evaluation of each
subprogram is sequential. Note that, for the sake of clarity, a simplified version of the
technique presented in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] has been described. The original one is quite more
involved and takes into account also negative dependencies among predicates. Many
details have been omitted since they do not give additional insight for the
comprehension of the idea underlying the technique.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Rules Level</title>
        <p>
          Concerning the second level of parallelism, the Rules Level, in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] a technique has
been presented allowing for concurrently evaluating the rules within each module.
According to this technique, rules are evaluated following a semi-na¨ıve schema [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]
and the parallelism is exploited for the evaluation of both exit and recursive rules.
More in detail, for the instantiation of a module M , first all exit rules are processed
in parallel by exploiting the data (ground atoms) computed during the instantiation
of the modules which M depends on (according to the partial ordering induced by
the dependency graph). Only afterward, recursive rules are processed in parallel
several times by applying a semi-na¨ıve evaluation technique. At each iteration n,
the instantiation of all the recursive rules is performed concurrently and by
exploiting only the significant information derived during iteration n − 1. This is done
by partitioning significant atoms into three sets: ΔS, S and N S. N S is filled
with atoms computed during current iteration (say n); ΔS contains atoms
computed during previous iteration (say n − 1); and, S contains the ones previously
computed (up to iteration n − 2).
        </p>
        <p>Initially, ΔS and N S are empty; while S contains all the information
previously derived in the instantiation process. At the beginning of each new iteration,
N S is assigned to ΔS, i.e. the new information derived during iteration n is
considered as significant information for iteration n + 1. Then, the recursive rules are
processed simultaneously and each of them uses the information contained in the
set ΔS; at the end of the iteration, when the evaluation of all rules is terminated, the
set ΔS is added to the set S (since it has already been exploited). The evaluation
stops whenever no new information has been derived (i.e. N S = ∅).
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Single Rule Level</title>
        <p>
          The techniques described above, concerning the first two levels of parallelism, are
very effective when handling with long programs, as confirmed also by the
experimental analysis conducted in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. However, when the input program consists of few
rules, their efficacy is drastically reduced, and there are cases where components
and rules parallelism is not exploitable at all.
        </p>
        <p>Consider for instance the following program P encoding the well-known
3colorability problem:
(r) col(X, red) ∨ col(X, yellow) ∨ col(X, green) :– node(X).</p>
        <p>(c) :– edge(X, Y ), col(X, C), col(Y, C).</p>
        <p>The two levels of parallelism described above have no effects on the evaluation of
P. Indeed, this encoding consists of only two rules which have to be evaluated
sequentially, since, intuitively, the instantiation of (r) produces the ground atoms
with predicate col which are necessary for the evaluation of (c).</p>
        <p>
          For the instantiation of this kind of programs a third level is necessary for the
parallel evaluation of each single rule, which is therefore called Single Rule Level.
To this aim, a strategy has been presented in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] which allows for parallelizing the
evaluation of a rule on the base of a dynamic rewriting of the program.
Oversimplifying, the basic idea of [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] consists in rewriting the program rules into a number
of new rules whose evaluation can be performed simultaneously by applying the
techniques described above.
        </p>
        <p>
          For instance, rule (c) in the previous example can be rewritten as follows [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]:
(c1) :– edge1(X, Y ), col(X, C), col(Y, C).
(c2) :– edge2(X, Y ), col(X, C), col(Y, C).
. . .
        </p>
        <p>(cn) :– edgen(X, Y ), col(X, C), col(Y, C).
by splitting the set of ground atoms with predicate edge (also called the extension
of edge), into a number of subsets. The obtained rules can be evaluated in parallel
and the instantiation produced is equivalent (modulo renaming) to the original one.
However, in general, many ways for rewriting a program may exist (for instance,
in the case of (c), col can be split up instead of edge) and the choice of the literal
to split has to be carefully made, since it may strongly affect the cost of the
instantiation of rules. Indeed, a “bad” split might reduce or neutralize the benefits of
parallelism, thus making the overall time consumed by the parallel evaluation not
optimal (and, in some corner case, even worse than the time required to instantiate
the original encoding). Moreover, if the predicate to split is an IDB predicate (as
in the case col) a static rewriting would lead to quite complex encodings possibly
requiring a slower instantiation; in this case a rewriting performed at running time
is more suitable, since it can be applied when the extension of the IDB predicate
has already been computed.</p>
        <p>
          The technique in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] solves both these issues, indeed, rules are rewritten at
execution time, thus dynamically distributing the workload among processing units,
and an heuristics is used for determining the literal to split. More in detail, the
strategy works as follows: a rule r is rewritten at execution time by splitting the
extension of one single body predicate p of r (chosen according to an heuristics) in
several parts. Each part is associated with a different temporary predicate; and, for
each of those predicates, say pi, a new rule called split rule, obtained by replacing
p with pi, is produced. The so-created rules will be instantiated in parallel in place
of r; when their evaluation is completed, a realign step gets rid of the new names
in order to obtain the same output of the original algorithm. Hereafter, we refer to
the number of split rules as split number, and to the size of the extensions of each
split predicate as split size.
4
        </p>
        <p>Heuristics for Load Balancing and Granularity Control
An advanced implementation of a parallel system has to deal with two important
issues that strongly affect the performance: load balancing and granularity control.
Indeed, if the workload is not uniformly distributed to the available processors then
the benefits of parallelization are not fully obtained; moreover, if the amount of
work assigned to each parallel processing unit is too small then the (unavoidable)
overheads due to creation and scheduling of parallel tasks might overcome the
advantages of parallel evaluation (in a corner case, adopting a sequential evaluation
might be preferable).</p>
        <p>
          In this respect, the parallel grounder described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] implements a naive
strategy: each rule is rewritten in a globally fixed (specified by the user) number of
splits. The number of splits allowed for each rule is (usually) the main source of
concurrently running threads (roughly, the number of running threads is bounded
by the number of generated split rules) and it directly determines the split size and,
thus the “amount of work” assigned to threads. It is easy to see that this choice
might be not the best one in several cases. As an example, consider the case in
which we are running on a two processor machine the instantiation of a rule r and
that, by applying dynamic rewriting, r is rewritten into two split rules. Assume also
that the extension of the split predicate of r is divided into two subsets with,
approximatively, the same size. Then, each split rule will be processed by a thread; and
the two threads will possibly run separately on the two available processors. For
limiting the inactivity time of the processors, it would be desirable that the threads
terminate their execution almost at the same time. Unfortunately, this is not always
the case, because subdividing the extension of the split predicate in equal parts
does not ensure that the workload is equally spread between threads. However, if
we consider a larger number of split, a further subdivision of the workload will be
implied, and, the inactivity time would be more likely limited. Moreover, it is not
possible nor desirable, to let the user assessing a possible size of the split in order
to obtain a balanced workload distribution, especially considering that it strictly
depends by the rule at hand (and different rules in the same programs may require
different split sizes); rather, a better policy for load balancing and granularity
control is necessary. Despite being crucial in distributed parallel architectures (like, e.
g. , clusters), in our setting (i.e., shared memory processor), developing a
sophisticated granularity-control strategy is not essential, as also observed in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]; rather
it is sufficient to set the split size to an adequate value for each rule. Clearly, the
size of the split should be sufficiently large to avoid thread management overhead
(granularity control); and sufficiently small to exploit the preemptive multitasking
scheduler of the operating system for obtaining a good workload distribution (load
balancing). Importantly, the number of running threads has to be controlled in
order to save resources. In order to satisfy both requirements, (i) we modified the
implementation of [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] so that the user can set the number of concurrently running
threads; and, (ii) we devised and tuned an heuristics that allows for selecting an
optimal split size for each rule. Note that, the second task is not trivial, since the
time needed for evaluating each rule is not known a priori. In detail, our method
computes an heuristic value W(r) that acts as a litmus paper indicating the amount
of work required for evaluating each rule r of the program, and so, its “hardness”,
just before its instantiation; then, it exploits W(r) to select the more appropriate
split size among six settings: small, medium, large, extra-large, equally-sized split
(i.e. the old technique), and no split (i.e. sequential evaluation). The choice is
made by comparing W(r) with five empirically-determined thresholds (wseq, wes,
wel, wl, wm). Basically, the criterion is to evaluate “very easy” rules sequentially
(if W(r) &lt; wseq), since the overhead introduced by threads is higher their
expected evaluation time (granularity control); “easy rules”, whose computation can
still exploit some parallelism, are evaluated using an equally-sized split (that is,
the technique on [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]) for minimizing the overheads (if W(r) &lt; wes); whereas, for
harder and harder rules, smaller and smaller split sizes are employed for obtaining
a finer distribution of work. W(r) is obtained by combining (actually summing)
two estimations: J (r) and C(r). First, note that computing all the possible
instantiations of a rule is equivalent to calculate all the answers of a conjunctive query.
Thus, we considered J (r) that is an estimation of the size of the join
corresponding to the evaluation of the body of r. Moreover, since in the instantiation of rules
with several join variables the running time is mostly due to variable matching,
we considered C(r) that is an estimation of the number of comparisons made by
the instantiation algorithm (roughly, we considered C(r) because even producing
a small output might require a considerable amount of time due to many matching
failures). We now detail how the two components of W(r) have been estimated.
Size of the join. The size of the join between two relations R and S with one or
more common variables can be estimated, according to [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] as follows:
T (R 1 S) =
        </p>
        <p>T (R) · T (S)</p>
        <p>
          QX∈var(R)∩var(S) max {V (X, R) , V (X, S)}
where T (R) is the number of tuples in R, and V (X, R) (called selectivity) is
the number of distinct values assumed by the variable X in R. For joins with more
relations one can repeatedly apply this formula to pair of body predicates according
to a given evaluation order for computing J (r). The interested reader can find a
more detailed discussion on this estimation in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>Number of comparisons. An approximation of the number of comparisons done
for instantiating a rule r is:</p>
        <p>C(r) =</p>
        <p>X Y
X∈X (r) L∈L(r,X)</p>
        <p>V (X, L)
where X (r) is the set of variables that appear in at least two literals in the body
of r, L(R, X) is the set of body literals in which X occurs; and V (X, L) is the
selectivity of X in the extension of L. Roughly, the number of comparisons is
approximated by the sum of the product of the number of distinct values assumed
by each join variable in the body of r.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        In order to assess the impact of the proposed heuristics, we implemented it in the
system of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and carried out an experimental activity.
      </p>
      <p>
        The machine used for the experiments is a two-processor Intel Xeon
“Woodcrest” (quad core) 3GHz machine with 4MB of L2 Cache and 4GB of RAM,
running Debian GNU Linux 4.0. Experiments were performed on a collection
of benchmark programs already used for assessing ASP instantiator performance
([
        <xref ref-type="bibr" rid="ref15 ref26">15, 26</xref>
        ]). In particular, we considered the following well-known problems:
Ramsey Numbers, 3-Colorability, Hamiltonian Path and Reachability.
      </p>
      <p>In the following, we briefly describe both benchmark problems and data. In
order to meet space constraints, encodings are not presented but they are available,
together with the employed instances, and the binaries, at http://www.mat.
unical.it/ricca/downloads/heur09.zip. Rather, to help the
understanding of the results, some information is given on the number of rules of each
program.
5.1</p>
      <sec id="sec-4-1">
        <title>Benchmark Problems and Data</title>
        <p>
          For the experiments, we considered encodings belonging to a particularly
difficultto-parallelize class i.e. ASP encodings with few rules.2 Note that, such kind of
programs are quite common given the declarative nature of the ASP language which
allows to compactly encode even very hard problems. About data, we considered
for each problem five instances of increasing size; and, for obtaining more
significant results, we considered instances where the instantiation time is non negligible.
Ramsey Numbers. The Ramsey number ramsey(k, m) is the least integer n
such that, no matter how the edges of the complete undirected graph (clique) with
n nodes are colored using two colors, say red and blue, there is a red clique with
k nodes (a red k-clique) or a blue clique with m nodes (a blue m-clique).The
encoding of this problem consists of one rule and two constraints. For the
experiments, the problem was considered of deciding whether, for k = 7, m = 7, and
n ∈ {31, 32, 33, 34, 35}, n is the Ramsey number ramsey(k, m).
3-Colorability. This well-known problem asks for an assignment of three colors
to the nodes of a graph, in such a way that adjacent nodes always have different
colors. The encoding of this problem consists of one rule and one constraint. Three
simplex graphs were generated with the Stanford GraphBase library [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], by using
the function simplex(n, n, −2, 0, 0, 0, 0), (n ∈ {150, 170, 190, 210, 230}).
        </p>
        <p>
          2The good behavior of the system on easy-to-parallelize instances (where superlinear speedups
have to be expected) and program with many rules has already been reported in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
Problem
ramsey1
ramsey2
ramsey3
ramsey4
ramsey5
3col1
3col2
3col3
3col4
3col5
hampath1
hampath2
hampath3
hampath4
hampath5
reach1
reach2
reach3
reach4
reach5
Reachability. Given a finite directed graph G = (V, A), we want to compute
all pairs of nodes (a, b) ∈ V × V (i) such that b is reachable from a through a
nonempty sequence of arcs in A. The encoding of this problem consists of one exit
rule and a recursive one. Tree trees were generated [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] having pair (number of
levels, number of siblings): (9,3),(7,5),(14,2),(10,3) and (15,2), respectively.
Hamiltonian Path. A classical NP-complete problem in graph theory, which can
be expressed as follows: given a directed graph G = (V, E) and a node a ∈ V
of this graph, does there exist a path in G starting at a and passing through each
node in V exactly once. The encoding of this problem consists of several rules, one
of these is recursive. Instances were generated, by using a tool by Patrik Simons
(cf. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]), having 5800, 6500, 7200, 8000 and 8800 nodes, respectively.
5.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Impact of The New Heuristics</title>
        <p>In order to prove the efficacy of the method that is the subject of this work, we
compared the performance of the instantiator equipped with the heuristics with
the previous version implementing a simple dynamic rewriting. The results of
the experiments are summarized in Table 1, where Columns 2-4 report the times
obtained by the serial instantiator, the previous parallel instantiator, and the parallel
instantiator enhanced with the heuristics, respectively; in addition, Column 5 shows
the percentage gain obtained by the version with heuristics w.r.t the previous one
(the speedup of the version with the heuristics w.r.t. serial execution minus the
speedup of the plain parallel version w.r.t. serial execution), and Columns 6-7
show the speedup and the relative efficiency, respectively.3</p>
        <p>First of all, we notice that the version with the heuristics is basically the best
performer and always overcomes the results obtained by the previous one with a
percentage gain ranging from 1% in Reachability up to 300% in Ramsey. Such
good results are mainly due to the selection of different split sizes for different
rules in the same program.</p>
        <p>More in details, in the case of the Reachability problem, the two parallel
instantiators show very similar behaviors. Indeed the heuristics suggests for this problem
to use the biggest split size (equally-split size) for most of the rules, which
corresponds to the fixed setting imposed by the previous implementation and which
already allowed very good results with a speed up of about 800% (and, thus, an
efficiency of about 1). However, the heuristics still gives a little benefit thanks to
the effects of the granularity control, which allows to compute sequentially very
easy rules, thus avoiding some overhead for threads creation and scheduling.</p>
        <p>Similar considerations hold for the Hamiltonian Path problem, even if, here, the
effects of the heuristics are more evident. In this case, the system benefits of the fact
that the heuristics may dynamically assign to the same recursive rule different split
sizes in different iterations. In particular, the heuristics suggests splits sizes mainly
varying between large and equally-split size, and, still, the granularity control has
some positive effect when the iteration of recursive rules has to compute very little
domains.</p>
        <p>The positive impact of the heuristics becomes very evident in the case of the
Ramsey Number problem. In fact, since the encoding is composed of few “very
easy” rules and two “very hard” constraints, the heuristics selects a sequential
evaluation for the rules, and the smallest split size possible for the constraints. As a
result, the system produces a well-balanced work subdivision, that allows for
improving its overall performance, reaching a speedup of 730% in the best case, thus
resulting in a percentage gain w.r.t the previous system of about 300%.</p>
        <p>Similar considerations hold for 3-Colorability. As for Ramsey Number, the
encoding is composed of few “easy” rules, and an “hard” constraint; the heuristics
selects equally-split size for the rules and a small split size for the constraint, which
leads to a percentage gain of about 100% w.r.t. the previous instantiator and a
speedup of more than 800%.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        Several works about parallel techniques for the evaluation of ASP programs have
been proposed, focusing on both the propositional (model search) phase [
        <xref ref-type="bibr" rid="ref2 ref4 ref5 ref6">5, 6, 4, 2</xref>
        ],
3We did not report here the size of the ground programs produced by the compared
implementations because we verified that they are basically the same (for both parallel and serial version); thus,
the good behavior (see [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]) of the grounding module of DLV (that is able to produce an output that
is sensibly smaller than the theoretical ground instantiation) is preserved on its parallel version.
and the instantiation phase [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ]. Model generation is a distinct phase of ASP
computation, carried out after the instantiation, and thus, the first group of
proposals is not directly related to our setting. Concerning the parallelization of the
instantiation phase, some preliminary studies were carried out in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as one of the
aspects of the attempt to introduce parallelism in non-monotonic reasoning
systems. However, there are crucial differences with our system regarding both the
employed technology and the supported parallelization strategy. Indeed, our
system is implemented by using POSIX threads APIs, and works in a shared memory
architecture [
        <xref ref-type="bibr" rid="ref1 ref30">1, 30</xref>
        ], while the one described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is actually a Beowulf [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]
cluster working in local memory. Moreover, the parallel instantiation strategy of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
is applicable only to a subset of the program rules (those not defining domain
predicates), and is, in general, unable to fruitfully exploit parallelism in case of
programs with a small number of rules. Importantly, the parallelization strategy of
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] statically assigns a rule per processing unit; whereas, in our approach, both the
extension of predicates and “split sizes” are dynamically computed (and updated
at different iterations of the semi-na¨ıve) while the instantiation process is running.
Note also that our parallelization techniques and heuristics could be also adapted
for improving the Lparse instantiator.
      </p>
      <p>
        Concerning other related works, it is worth remembering that, the dynamic
rewriting technique employed in our system is related to the copy and constrain
technique for parallelizing the evaluation of deductive databases [
        <xref ref-type="bibr" rid="ref32 ref33 ref34 ref35 ref36">32, 33, 34, 35, 36</xref>
        ]
(for a detailed comparison between the two approaches see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Focusing on the
heuristics employed on parallel databases, we mention [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] is
described an heuristics for balancing the distribution of load in the parallel
evaluation of PARULEL, a language similar to Datalog. Here, load balancing is done by
a manager server that records the execution times at each site, and exploits this
information for distributing the load. In [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] the proposed heuristics were devised for
both minimizing communication costs and choosing an opportune site for
processing sub-queries among various network-connected database systems. In both cases,
the proposed heuristics were devised and tuned for dealing with data distributed in
several sites and their application to similar architectures might be neither viable
nor straightforward.
7
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>
        In this paper, an advanced heuristics for load balancing and granularity control in
the parallel instantiation of ASP programs has been proposed. The heuristics has
been implemented in the parallel instantiator of [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] based on the DLV system,
and an experimental analysis has been conducted on hard-to-parallelize problem
instances which confirms the efficacy of the method for improving the performance
of the system. In particular, the parallel instantiator equipped with the new
heuristics always improves the results obtained by the old version; and compared with the
previous parallel method offers a very relevant gain especially in case of programs
with hard-to-instantiate rules/constraints.
      </p>
      <p>As far as future work is concerned, we are experimenting for obtaining a finer
tuning of the heuristics; and we are working on a procedure for the automatic
calibration of the heuristics thresholds. Moreover, we are assessing the impact of
the heuristics on a larger set of benchmarks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Stallings</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Operating systems</article-title>
          (3rd ed.)
          <article-title>: internals and design principles</article-title>
          . Prentice-Hall, Inc., Upper Saddle River, NJ, USA (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Pontelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El-Khatib</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Exploiting Vertical Parallelism from Answer Set Programs</article-title>
          . In: Answer Set Programming,
          <source>Towards Efficient and Scalable Knowledge Representation and Reasoning, Proceedings of the 1st Intl. ASP'01 Workshop</source>
          , Stanford (March
          <year>2001</year>
          )
          <fpage>174</fpage>
          -
          <lpage>180</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Balduccini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pontelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkhatib</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
          </string-name>
          , H.:
          <article-title>Issues in parallel execution of non-monotonic reasoning systems</article-title>
          .
          <source>Parallel Computing</source>
          <volume>31</volume>
          (
          <issue>6</issue>
          ) (
          <year>2005</year>
          )
          <fpage>608</fpage>
          -
          <lpage>647</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gressmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janhunen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thiele</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tichy</surname>
          </string-name>
          , R.:
          <article-title>Platypus: A Platform for Distributed Answer Set Solving</article-title>
          .
          <source>In: Proceedings of Logic Programming and Nonmonotonic Reasoning</source>
          , 8th International Conference (LPNMR), Diamante, Italy (
          <year>September 2005</year>
          )
          <fpage>227</fpage>
          -
          <lpage>239</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marek</surname>
            ,
            <given-names>V.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Truszczynski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Computing stable models in parallel</article-title>
          .
          <source>In: Answer Set Programming</source>
          ,
          <source>Towards Efficient and Scalable Knowledge Representation and Reasoning, Proceedings of the 1st Intl. ASP'01 Workshop</source>
          , Stanford (March
          <year>2001</year>
          )
          <fpage>72</fpage>
          -
          <lpage>76</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ellguth</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gebser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gusowski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liske</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneidenbach</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schnor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A simple distributed conflictdriven answer set solver</article-title>
          .
          <source>In LPNMR. LNCS 5753</source>
          ,(
          <year>2009</year>
          )
          <fpage>490</fpage>
          -
          <lpage>495</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Calimeri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ricca</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Experimenting with Parallelism for the Instantiation of ASP Programs</article-title>
          .
          <source>Journal of Algorithms in Cognition, Informatics and Logics</source>
          <volume>63</volume>
          (
          <issue>1-3</issue>
          ) (
          <year>2008</year>
          )
          <fpage>34</fpage>
          -
          <lpage>54</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Perri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ricca</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vescio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Efficient Parallel ASP Instantiation via Dynamic Rewriting</article-title>
          .
          <source>In: Proceedings of the First Workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP</source>
          <year>2008</year>
          ), Udine, Italy (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Gelfond</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lifschitz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Classical Negation in Logic Programs</article-title>
          and
          <string-name>
            <given-names>Disjunctive</given-names>
            <surname>Databases</surname>
          </string-name>
          .
          <source>New Generation Computing</source>
          <volume>9</volume>
          (
          <year>1991</year>
          )
          <fpage>365</fpage>
          -
          <lpage>385</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lifschitz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Answer Set Planning</article-title>
          . In: (ICLP'99)
          <fpage>23</fpage>
          -
          <lpage>37</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Marek</surname>
            ,
            <given-names>V.W.</given-names>
          </string-name>
          , Truszczyn´ski, M.:
          <article-title>Stable Models and an Alternative Logic Programming Paradigm</article-title>
          .
          <source>In : The Logic Programming Paradigm - A 25-</source>
          <string-name>
            <given-names>Year</given-names>
            <surname>Perspective</surname>
          </string-name>
          . (
          <year>1999</year>
          )
          <fpage>375</fpage>
          -
          <lpage>398</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Baral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Knowledge Representation, Reasoning and Declarative Problem Solving</article-title>
          . Cambridge University Press (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Gelfond</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leone</surname>
          </string-name>
          , N.:
          <article-title>Logic Programming and Knowledge Representation - the A-Prolog perspective</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>138</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>2002</year>
          )
          <fpage>3</fpage>
          -
          <lpage>38</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
          </string-name>
          , H.:
          <article-title>Disjunctive Datalog</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          <volume>22</volume>
          (
          <issue>3</issue>
          ) (
          <year>September 1997</year>
          )
          <fpage>364</fpage>
          -
          <lpage>418</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfeifer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faber</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scarcello</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The DLV System for Knowledge Representation and Reasoning</article-title>
          .
          <source>ACM Transactions on Computational Logic</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          )
          <issue>(</issue>
          <year>July 2006</year>
          )
          <fpage>499</fpage>
          -
          <lpage>562</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Dantsin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voronkov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Complexity and Expressive Power of Logic Programming</article-title>
          .
          <source>ACM Computing Surveys</source>
          <volume>33</volume>
          (
          <issue>3</issue>
          ) (
          <year>2001</year>
          )
          <fpage>374</fpage>
          -
          <lpage>425</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faber</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fink</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ianni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kałka</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lio</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nowicki</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruzzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staniszkis</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terracina</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The INFOMIX System for Advanced Integration of Incomplete and Inconsistent Data</article-title>
          .
          <source>In: Proceedings of the 24th ACM SIGMOD International Conference on Management of Data (SIGMOD</source>
          <year>2005</year>
          ), Baltimore, Maryland, USA, ACM Press (
          <year>June 2005</year>
          )
          <fpage>915</fpage>
          -
          <lpage>917</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Curia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ettorre</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallucci</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iiritano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rullo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Textual Document</surname>
          </string-name>
          Pre-Processing and
          <article-title>Feature Extraction in OLEX</article-title>
          .
          <source>In: Proceedings of Data Mining</source>
          <year>2005</year>
          , Skiathos, Greece (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Massacci</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Computer Aided Security Requirements Engineering with ASP Non-monotonic Reasoning, ASP and Constraints</article-title>
          ,
          <string-name>
            <surname>Seminar</surname>
            <given-names>N</given-names>
          </string-name>
          05171.
          <article-title>Dagstuhl Seminar on Nonmonotonic Reasoning, Answer Set Programming and Constraints (September</article-title>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ruffolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manna</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Sacca`,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Zavatto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Exploiting ASP for Semantic Information Extraction</article-title>
          . In de Vos,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Provetti</surname>
          </string-name>
          , A., eds.
          <source>: Proceedings ASP05 - Answer Set Programming: Advances in Theory and Implementation</source>
          , Bath,
          <string-name>
            <surname>UK</surname>
          </string-name>
          (
          <year>July 2005</year>
          )
          <fpage>248</fpage>
          -
          <lpage>262</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Faber</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfeifer</surname>
          </string-name>
          , G.:
          <article-title>Recursive aggregates in disjunctive logic programs: Semantics and complexity</article-title>
          .
          <source>In JELIA 2004. LNCS 3229</source>
          , (
          <year>2004</year>
          )
          <fpage>200</fpage>
          -
          <lpage>212</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Przymusinski</surname>
            ,
            <given-names>T.C.</given-names>
          </string-name>
          :
          <article-title>Stable Semantics for Disjunctive Programs</article-title>
          .
          <source>New Generation Computing</source>
          <volume>9</volume>
          (
          <year>1991</year>
          )
          <fpage>401</fpage>
          -
          <lpage>424</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Principles of Database and Knowledge-Base Systems</article-title>
          , Volume I. Computer Science Press (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hermenegildo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debray</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A methodology for granularitybased control of parallelism in logic programs</article-title>
          .
          <source>J. Symb. Comput</source>
          .
          <volume>21</volume>
          (
          <issue>4-6</issue>
          ) (
          <year>1996</year>
          )
          <fpage>715</fpage>
          -
          <lpage>734</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Principles of Database and Knowledge Base Systems</article-title>
          . Computer Science Press (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Gebser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Namasivayam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Truszczyn´ski,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>The first answer set programming system competition</article-title>
          .
          <source>In: LMNMR'07. LNCS 4483</source>
          , (
          <year>2007</year>
          )
          <fpage>3</fpage>
          -
          <lpage>17</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          :
          <article-title>The Stanford GraphBase : A Platform for Combinatorial Computing</article-title>
          . ACM Press, New York (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Terracina</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lio</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panetta</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Experimenting with recursive queries in database and logic programming systems</article-title>
          .
          <source>TPLP 8</source>
          (
          <year>2008</year>
          )
          <fpage>129</fpage>
          -
          <lpage>165</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Simons</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Extending and Implementing the Stable Model Semantics</article-title>
          .
          <source>PhD thesis</source>
          , Helsinki University of Technology, Finland (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Tanenbaum</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodhull</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          :
          <article-title>Operating Systems Design and Implementation (3rd Edition)</article-title>
          . Prentice-Hall, Inc., Upper Saddle River, NJ, USA (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>[31] : The Beowulf Cluster Site &lt;URL:http://www.beowulf.org&gt;.</mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Wolfson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silberschatz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Distributed Processing of Logic Programs</article-title>
          .
          <source>In: Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data</source>
          , Chicago, Illinois, USA (
          <year>June 1988</year>
          )
          <fpage>329</fpage>
          -
          <lpage>336</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Wolfson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozeri</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>A new paradigm for parallel and distributed ruleprocessing</article-title>
          .
          <source>In: SIGMOND Conference</source>
          <year>1990</year>
          , New York, USA (
          <year>1990</year>
          )
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Ganguly</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silberschatz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsur</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Framework for the Parallel Processing of Datalog Queries</article-title>
          .
          <source>In: SIGMOND Conference</source>
          <year>1990</year>
          ,
          <string-name>
            <given-names>Atlantic</given-names>
            <surname>City</surname>
          </string-name>
          , NJ,
          <year>1990</year>
          .(
          <year>1990</year>
          )
          <fpage>143</fpage>
          -
          <lpage>152</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chau</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          :
          <article-title>Data Partition and Parallel Evaluation of Datalog Programs</article-title>
          .
          <source>IEEE TKDE 7</source>
          (
          <issue>1</issue>
          ) (
          <year>1995</year>
          )
          <fpage>163</fpage>
          -
          <lpage>176</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Dewan</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stolfo</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernndez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
          </string-name>
          , J.:
          <article-title>Predictive dynamic load balancing of parallel and distributed rule and query processing</article-title>
          .
          <source>In: Proc. of ACM SIGMOD'86</source>
          ,
          <fpage>277</fpage>
          -
          <lpage>288</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Carey</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , H.:
          <article-title>Load balancing in a locally distributed db system</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>15</volume>
          (
          <issue>2</issue>
          ) (
          <year>1986</year>
          )
          <fpage>108</fpage>
          -
          <lpage>119</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>