<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Constrained optimisation over massive databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Toni Mancini</string-name>
          <email>tmancini@di.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Flener</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amir Hossein Monshi</string-name>
          <email>AmirHossein.Monshi.1351@student.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Justin Pearson</string-name>
          <email>Justin.Pearson@it.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology, Uppsala University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica, Sapienza Università di Roma</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Constrained optimisation is increasingly considered by industry as a major candidate to cope with hard problems of practical relevance. However, the approach of exploiting, in those scenarios, current Constraint or Mathematical Programming solvers has severe limitations, which clearly demand new methods: data is usually stored in potentially very-large databases, and building a problem model in central memory suitable for current solvers could be very challenging or impossible. In this paper, we extend the approach followed in [3], by presenting a declarative language for constrained optimisation based on sql, and novel techniques for local-search algorithms explicitly designed to handle massive data-sets. We also discuss and experiment with a solver implementation that, working on top of any DBMS, exploits such algorithms in a way transparent to the user, allowing smooth integration of constrained optimisation into industrial environments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Constrained combinatorial optimisation is increasingly considered by
industry as a major candidate to cope with hard problems of practical relevance,
like scheduling, rostering, resource allocation, security, bio-informatics, etc.
However, the approach of exploiting, in those scenarios, current Constraint
(CP) or Mathematical Programming (MP) solvers has severe limitations,
which clearly demand new methods:</p>
      <p>Problems of industrial relevance may easily exhibit sizes that are way out
of the reach of current solvers: data usually resides on information systems,
in forms of potentially very large, possibly distributed relational databases
with complex integrity constraints, and cannot easily be transferred to
central memory for being processed.</p>
      <p>Moving data into central memory also potentially affects data integrity.
Solving very-large combinatorial problems is likely to need a huge amount
of computing time, during which original data must be kept locked, thus
freezing the normal activities of the information system.</p>
      <p>
        In several scenarios, the structure of both the data and the problem is
very articulated and complex. Although it is always possible to map them
into the constructs offered by current constraint modelling languages, this
further step must be carefully designed and implemented, being a real
engineering activity. This is usually perceived as a major obstacle by industry
programmers, who usually do not have CP/MP and abstraction skills.
Several attempts to join combinatorial optimisation and databases have
already been carried out in the literature, especially in the contexts of
Deductive Database Systems (see, e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) and Logic Programming (see, e.g., the
system DLVDB described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        As a different approach to cope with these issues and to provide effective
means to spread combinatorial optimisation into industrial environments, in
previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a non-deterministic extension of Relational Algebra (RA)
has been proposed. This language, called NP-Alg, has the ability of
expressing all problems in the complexity class NP, where (the decision versions of)
many problems of practical relevance lie. NP-Alg has been the formal basis
to derive an extension for the well-known database querying language sql,
with similar characteristics, but also able to handle optimisation problems.
This second language, called ConSQL (i.e., sql with constraints) proposes
a new paradigm for modelling constrained optimisation problems, that of
modelling as querying : a problem specification is given as a set of views
devoted to encode solutions, and constraints that the contents of such views
must satisfy. Of course, since we face with optimisation problems whose
decision versions belong to the complexity class NP, the “views” we are talking
about are not first-order definable, hence not expressible in standard sql.
      </p>
      <p>Advantages of this approach wrt the ones above are two-fold: (i) the
modelling language is essentially sql, plus a few new and very intuitive
constructs: industrial programmers do not have to master CP/MP to formalise
their problems; (ii) the paradigm allows solving approaches (like the one we
consider in this paper) that interact transparently with the DBMS and its
mechanisms for data transfers and access policies, and for maintaining data
integrity, since the solving process, which is seen as a normal transaction by
the DBMS, may easily recover from data updates.</p>
      <p>
        In this paper, after recalling NP-Alg (Section 2) we make the following
contributions:
1. We re-design our ConSQL language in order to obtain a language having
very few deviations from standard sql (Section 3).
2. We reconsider the local-search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] approach suggested in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and argue
that, to succeed in solving large problems, classical local-search has to be
re-thought, with an explicit focus on the handling of massive data. In
particular, we isolate two major issues in classical local-search and
propose two methods to boost scalability: dynamic neighbourhood reduction
(DNR), and incremental evaluation of constraints (IEC) (Section 4).
3. We provide a new implementation of the system, based on the ideas above
and on a much wider portfolio of local-search algorithms, with full support
to the user for declaratively choosing the search strategy to follow, and
experimentally evaluate it against the original one in terms of performance
and scalability (Section 5).
      </p>
    </sec>
    <sec id="sec-2">
      <title>The language NP-Alg</title>
      <p>
        NP-Alg [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a non-deterministic extension of RA, suitable for modelling
combinatorial decision problems.
      </p>
      <p>Syntax. Let R be a relational schema, i.e., a set of relations of arbitrary
finite arities. An NP-Alg expression is a pair hS; faili where:
1. S = fS(a1); : : : ; Sn(an)g is a set of new relations of arbitrary finite arities
1
a1; : : : ; an. These relations are called “guessed”. R and S are disjoint.
2. fail is an expression of plain RA on the new database vocabulary R [ S.
Semantics. Given a finite database D over schema R, let DOMD be the
unary relation representing the set of all constants occurring in D. The
semantics of an NP-Alg expression hS; faili is as follows:
1. For each possible extension of the relations in S with elements in DOMD,
the expression fail is evaluated, using ordinary rules of RA.
2. If there exists one such extension for which fail evaluates to the empty
relation ; (denoted as fail ;), the answer to the decision problem is “yes”.
Otherwise, the answer is “no”. When the answer is “yes”, such extension
of relations in S is a solution of the problem instance.</p>
      <p>Intuitively, fail encodes constraints that extensions of S need to satisfy to
be considered solutions. These are represented in terms of violations, hence
the goal is to guess an extension of S that makes fail = ; (a dual approach,
where constraints are defined in a ‘positive’ way can however be followed).</p>
      <p>
        NP-Alg has strong relationships with existential second-order logic
(ESO) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and first-order model-expansion (FO-MX) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], with guessed
relations playing the role of quantified predicates in ESO and expansion
vocabulary symbols in FO-MX. It can be shown [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that NP-Alg can express
all (and only) the queries that specify decision problems belonging to NP.
      </p>
      <p>
        In the following, we assume the standard version of RA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with the
usual operators f ; ; ; ; [; \g, and positional notation ($1; $2, etc.) for
attributes of the various relations. Also, given a tuple with n
components, we denote its i-th component by [$i]; i 2 [1 : : n], and the sub-tuple
obtained from by considering only components with indices from i to j
(i j) by [$i : : $j]. Finally, to ease notation, especially when expressing
conditions over tuples defined over the Cartesian product of some relations,
we may denote tuple indices by the name of the original relations they come
from, followed by the relative column number. Hence, e.g., (R1 R2)
$1=$3
      </p>
      <p>R1:$1=R2:$1
is equivalent to
(R1</p>
      <p>R2) if R1 is binary.</p>
      <p>
        Example 1 (Graph 3-colouring). Let D be a database with two relations N
and E, encoding a graph: N is unary and encodes the set of nodes, while
E is binary and encodes the set of edges of the graph, as pairs of nodes.
Hence DOMD = N . The NP-complete Graph 3-colouring problem [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] can be
expressed in NP-Alg by the following expression:
1. S = fR; G; Bg all unary, encoding the set of nodes to be coloured in red,
green, and blue, respectively;
2. fail = fail_cover [ fail_disj [ $1fail_col, where: fail_cover = N (R [ G [ B),
(C
      </p>
      <p>C)\E .</p>
      <p>For any guessed extension of R; G; B, fail_cover = ; iff every node is
assigned at least one colour; similarly, fail_disj = ; iff every node is assigned
at most one colour, and fail_col = ; iff no two adjacent nodes are assigned
the same colour. Hence, fail ; iff there exist extensions for R, G, B
encoding an assignment of colours to nodes such that no two adjacent nodes have
the same colour. Such extensions are solutions to the problem instance.
3</p>
    </sec>
    <sec id="sec-3">
      <title>ConSQL: modelling as querying</title>
      <p>
        In this section we present a new and improved version of our ConSQL
language [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. ConSQL is a non-deterministic extension of sql, with its
optimisation-free subset having the same expressive power of NP-Alg.
ConSQL is a super-set of sql: users can hence safely exploit the rich set of
language features of sql during problem modelling. The current version of
the language has only very few deviations from standard sql, in the aim of
greatly easing the modelling task of constrained optimisation problems for
the average industry programmer.
      </p>
      <p>A detailed description of the syntax and semantics of ConSQL is given
in the appendix (available at http://www.dis.uniroma1.it/~tmancini).
Here, we introduce the language by an example.</p>
      <p>
        Example 2 (University timetabling). Assume a university wants to solve
the following course timetabling problem (a variant of the ones presented
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in order to show the flexibility of the modelling language), namely
finding a schedule for the lectures of a set of courses, in a set of classrooms,
by minimising the total number of students that are enrolled in courses
having overlapping lectures, by satisfying constraints about lecture allocation in
time and rooms (con1, con2, con6), room equipment (con3), teachers’
unavailability (con4) and conflicts (con7), and course lengths (con5).
      </p>
      <p>Data reside in a relational database with the tables listed in Table 1
(primary keys are underlined; foreign key constraints are omitted for brevity). A
ConSQL specification for this problem is as follows:
create SPECIFICATION Timetabling (
// A guessed view (see new construct CHOOSE), encoding a ‘guessed’ timetable, as an
// assignment of courses to room/period pairs (w. some possibly unassigned, i.e., null)
create view TT as select p.id as p, r.id as r, CHOOSE(select id as c from Course) CAN BE NULL
from Period p, Room r
// A view defined in terms of TT (dependent guessed view)
create view ConflCourses as select c1.id as c1, c2.id as c2 from Course c1, Course c2, TT t1, TT t2
where c1.id &lt; c2.id and t1.c = c1.id and t2.c = c2.id and t1.p = t2.p
// Objective function, relying on view ConflCourses
MINIMIZE select count(*) from Student s, ConflCourses cc, Enrolled e1, Enrolled e2
where e1.student = s and e1.course = cc.c1 and e2.student = s and e2.course = cc.c2
Student(id, name, . . . )
Teacher(id, name, . . . )
Course(id, teacher, num_lect)
Period(id, day, week, start, end)
Room(id, capacity)
Enrolled(student, course)
Equip(id, name)
EquipAvail(equip, room)
EquipNeeded(course, equip)
TeacherUnav(teacher, period)</p>
      <p>Set of students with id and other attributes
Set of teachers
Set of courses to be scheduled, with teacher and nb. of lectures
planned
Set of time-periods, plus information on day, start and finish times
Available rooms with their capacity
Info on enrolment of students in the various courses
Equipment available for teaching (e.g., VGA projector)
Presence of equipment in the various rooms
Equipment needed for the various courses
Unavailability constraints for teachers: lectures of their courses
cannot be scheduled in these periods
As it can be seen, a problem specification is defined in terms of the new
construct create SPECIFICATION, embedding the definition of a set of views
(via the standard sql create view construct), an optional objective function
(via the new constructs MINIMIZE and MAXIMIZE) and a set of constraints
(via the standard sql check keyword). Some of the views are guessed, in
that they have special columns, called guessed column-sets. Such columns
are defined by the new construct CHOOSE, which is the main enabler of the
non-deterministic mechanism added to sql in order to increase its expressive
power. The semantics is as follows:</p>
      <p>The solver is asked to populate non-deterministically the guessed
columnsets of views, choosing tuples from the query argument of CHOOSE.
This choice must be such that all constraints are satisfied and the objective
function, if present, takes an optimal value.</p>
      <p>In the example, our goal is to assign the lectures of some course to room/period
pairs (guessed view TT), with some pairs possibly being empty (CAN BE
NULL), in such a way that the number of students enrolled in courses with
conflicting lectures is minimised and that all the constraints con1–con7 are
satisfied. The ‘residual’ (i.e., the pure sql) part of view TT has schema (p, r),
storing all period/room pairs. This is extended ‘to the right’ with a guessed
column-set of width 1 having the schema (c) of the argument query. The
overall view has schema (p, r, c). While the values in the first two columns are
fixed (period/room pairs), those in the entries of the third column are
nondeterministically picked from the sets of values occurring as ids for courses.
Some of them can be assigned to null, allowing us to model situations, like
ours, where we need partial functions. Figure 1(a) shows guessed view TT at
the initialisation stage, when the database holds three periods with ids p1,
p2, p3, and two rooms with ids ra and rb. Figure 1(b) shows one possible
extension of the guessed view, when the database stores courses having ids c1,
c2, c3 (plus, possibly, others). Note that, in case the modifier distinct were
used, it would be impossible for two period/room pairs to be assigned to the
same course. This is analogous to the all-different constraint used in CP.</p>
      <p>Extensions of guessed views need to satisfy all constraints in order to
be considered solutions to the problem. These are expressed by using the
standard sql keyword check applied to an arbitrary sql Boolean expression
(various forms allowed by standard SQL to specify conditions, hence
constraints, are described in the appendix). Finally, an objective function can
be defined by using one of the two new keywords MINIMIZE and MAXIMIZE
applied to an aggregate arbitrary query returning a single scalar defined over
database tables, ordinary views, guessed views.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Solving NP-Alg expressions over massive data</title>
      <p>
        In principle, in order to solve a constrained optimisation problem, different
competing techniques could be adopted, e.g., CP, Propositional Satisfiability
(SAT), MP. In presence of massive data-sets, such complete approaches may
show severe limitations, since they require the systematic exploration of the
huge search space. In these cases, incomplete approaches like local-search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
may help. These algorithms sacrifice completeness for efficiency, and may
be advantageous especially on large instances, and in scenarios where
computing a solution with a sufficient high quality is satisfactory. Local-search
has been employed successfully to different formalisms, like SAT (cf.
bgwalksat [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) and CP (cf., e.g., comet [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), the successful resolution of
large scheduling and timetabling problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or other large constrained
optimisation problems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        In this section, after discussing some convenient syntactic extensions to
NP-Alg, and recalling the main notions of classical local-search, we present
novel techniques to make local-search applicable on massive data-sets.
NP-Alg revisited. The language described in Section 2 can be extended
without affecting its expressive power, with means to support bounded
integers and arithmetic, typed guessed relations, guessed functions, and guessed
permutations. As an example, although in the base version of the language
entries of guessed relations take values ranging in DOMD (hence extensions
of an r-ary relation range over subsets of DOMDr), in a (just syntactically)
enriched version we could easily force them to be typed and also to model
functions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this way we could easily model, e.g., the Graph k-colouring
problem, where the set of the k available colours is given in the additional
unary relation K, as follows:
      </p>
      <p>$16=$3^$2=$4
Example 3 (Graph k-colouring).
1. Guessed relation: Col : N ! K, a total function.
2. fail = (Col Col) \ E.</p>
      <p>Guessed function Col is represented as a relation having two typed columns,
$1 and $2, with entries in $1 ranging over N and those in $2 ranging over
K. Being also forced to be a total function, Col is such that any node in N
occurs exactly once in the first column of Col. The expression for fail will
evaluate to ; iff the guessed extension for Col is such that no two different
nodes ($1 6= $3) assigned to the same colour ($2 = $4) share an edge.
It can be shown that the enriched language where guessed relations are
always forced to be typed total functions between two unary database relations
has the same expressive power as the base language. This is because: (i)
tuples of non-unary relations can always be denoted by new constants which
act as keys; and (ii) arbitrary guessed relations can always be encoded by
characteristic functions. Also, the expression fail can always be rewritten
in disjunctive form, as a union of sub-expressions. Since this variation is
syntactically much closer to our target practical language ConSQL, in the
remainder of this paper we will refer exclusively to it. Hence, from now on,
we rely on the following definition:
Definition 1 (NP-Alg expression). An NP-Alg expression hS; faili over
a relational schema R is defined as:
1. A set S = fS1; : : : ; Sng of new binary relations, with any Si : Di ! Ci
forced to be a total function with domain Di and co-domain Ci, with both
these relations being unary and belonging to R.
2. An expression fail = Sik=1 faili of plain RA on the vocabulary R [ S.
The expressions faili encode constraints to satisfy, with their tuples for a
given extension of the guessed functions representing constraint violations.</p>
      <p>The gap between NP-Alg and ConSQL is now much smaller (see
Example 3): guessed functions in NP-Alg neatly correspond to guessed
columnsets of ConSQL, and expressions faili correspond to check not exists
constraints. Extensions like having multiple guessed column-sets in the same
view, null values, and non-unary relations serving as domains and co-domains
are not substantial, and could be simulated by means of additional joins,
special values in the co-domain relations, and introducing unary key fields.
Classical local-search. The following notions play key-roles in all
localsearch algorithms. We recall them in terms of an NP-Alg expression hS; faili
when evaluated over a finite database over schema R.</p>
      <p>Definition 2 (States, search space, costs, solutions). A state, or candidate
solution, is an extension S for guessed functions S. The search space is the
set of all possible states. The cost of a state S is ik=1jfailij, i.e., the total
number of tuples returned by sub-expressions faili when evaluated on S. A
solution is a state of cost 0.</p>
      <p>
        Most of the local-search algorithms described in the literature [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have, at
their core, the following greedy behaviour:
current-state Init-State()
while not Termination-Condition() do
move choose-improving-move(current-state)
if move 6= nil then current-state make-move(move) else break
The idea is, after a –usually random– initialisation, to iteratively evaluate
and perform small changes (moves) to the current state, in order to reduce
the cost of the current candidate solution, until some termination condition
(in terms of, e.g., the cost of current -state or the amount of computational
resources used) is met. The universe of possible moves is chosen in advance
by the programmer from the structure of the problem. Although this choice
could in principle be arbitrary, in local-search moves are mostly defined to
involve a very small change to the current state.
      </p>
      <p>The simple structure of guessed functions in NP-Alg suggests a very
natural definition for atomic moves, while any other possible definition could be
given just by composition.</p>
      <p>Definition 3 (Move). A triple hS;d;ci with S 2 S (S : D ! C), d 2 D, c 2 C.
Hence, given any state S of S and the corresponding extension S of guessed
function S, a move = hS; d; ci changes the extension S of S by modifying
the co-domain value assigned to domain value d to c. We denote the state
reached from S after performing as S . The new extension of S is
denoted by S . A move executed on state S is improving, neutral or
worsening iff the cost of state S is, respectively, less than, equal to, or
greater than the cost of state S. Also, given a constraint i, move executed
on S is improving, neutral or worsening wrt constraint i iff the share of cost
of state S due to constraint i (i.e., jfailij) is, respectively, less than, equal
to, or greater than that of state S.</p>
      <p>
        The choice of moves to perform usually varies, depending on the
particular algorithm considered. As an example, gradient descent chooses an
improving move randomly, while steepest descent chooses the move that, if
performed, maximally reduces the cost of the new current state. Since steepest
descent needs to consider all possible moves in order to choose the best one, it
could be very inefficient on large scale problems. A third very successful
algorithm is min-conflicts that, once adapted to the structure of the search space
in the NP-Alg framework, randomly selects one guessed function S : D ! C
and one of its domain values d 2 D, and then chooses the best improving
move among all those regarding S and d. All these greedy algorithms return
nil if no moves lead to a cost reduction, and hence are easily trapped in the
so-called local minima, i.e., states that have cost lower than all their
neighbours (states that could be reached by executing a single move). To avoid
being stuck in local minima, and to be able to continue search towards better
states, usually these basic algorithms are enhanced with more sophisticated
(not pure-greedy) techniques. As an example, simulated annealing executes
also worsening moves, with a probability decreasing during time, while
tabusearch dynamically filters the set of available moves, by forbidding those that
would bring search back to recently visited states. Also, to increase
robustness, such algorithms are enhanced with random restarts, joint together in
batches, and/or wrapped by even more sophisticated meta-heuristics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Classical local-search algorithms typically use problem constraints only
to evaluate the quality of a move, i.e., its impact in the cost of the search
state achieved if such a move is performed. In the context of constrained
optimisation over massive data, this approach quickly becomes impractical.
Example 4 (University timetabling, continued). Assume that database
relations have sizes jRoomj = r, jPeriodj = p, jCoursej = c. By Definition 3, a
steepest descent algorithm needs, at any step, to evaluate a constant number
prc of candidate moves (each period/room pair can be reassigned to one of the
other c 1 courses plus null) to find and choose the one that maximally reduces
the cost of the new current state. By using a naive approach, for any such
move, 7 sql queries, one for each constraint, should be evaluated on the new
database obtained by changing the course assigned to one room/period pair.</p>
      <p>
        Even in not-so-large scenarios, e.g., p = 960 (8 periods/day, 5 days/week
for 6 months, observe that scheduling is not periodic, given, e.g., teachers’
unavailabilities), r = 30, c = 40, the number of candidate moves to
consider at each step is over 106. Now, assuming that each query takes only
10 3 seconds to run, and even ignoring time for modifying and rolling-back
the database each time, a naive version of steepest descent would need about
106 7 10 3 = 7 103 seconds to execute one single move!
Constraint-directed dynamic neighbourhood reduction (DNR). To
overcome these difficulties, the concept of constraint-directed
neighbourhoods has been recently proposed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The idea is to focus on the problem
constraints also to isolate subsets of the moves that possess some useful
properties, such as those that are guaranteed to be improving wrt one or more
constraints. These ideas can be exploited and extended in our context in a
very elegant and natural way, thanks to the great simplicity of NP-Alg.
      </p>
      <p>Consider an NP-Alg expression hS; faili, where fail = Sik=1 faili, and
all faili are conjunctive queries, i.e., of the form faili = i (Si1 Sis
Ri1 Rir ), with Sij 2 S (j 2 [1 : : s]) and Rip 2 R (p 2 [1 : : r]). In
ConSQL this is equivalent to say that constraints can be expressed as not
exists select-from-where queries, as it is often the case. Let us now focus on
a single constraint i. Assume that, in a given state S, faili evaluates to a
non-empty set of tuples Vi (that can be regarded as violations of constraint
i). In order to find a solution we need to resolve all violations of all
constraints. Let us limit our attention on the extension S 2 S of a distinguished
guessed function S (binary relation encoding a function S : D ! C with D,
C unary relations in R) occurring m 1 times in the expression for faili.
To emphasise the m occurrences of S, we can –wlog– rewrite faili as follows
(after also changing references to columns in i accordingly):</p>
      <p>faili = i (S(1) S(m) T); (1)
with S(1); : : : ; S(m) being the m occurrences of S, and T the Cartesian
product of all the relations other than S in the constraint expression.</p>
      <p>Since a move over S improving wrt constraint i must resolve at least one
violation in Vi S(1) S(m) T, we give the following definition:
Definition 4 (Promising moves). The set of promising moves over guessed
function S wrt constraint i is defined by the following RA query:
PromiS =</p>
      <p>D;C
^: 0i
(Vi</p>
      <p>D</p>
      <p>C)
(2)
= Wjm=1(D:$1 = Vi:$(2j
where 1)) and 0i is obtained from i by replacing
any atom a in which S(x):$2 occurs (x 2 [1::m]) with ((S(x):$1 =D:$1) !
a0) ^ ((S(x):$1 6= D:$1) ! a) where a0 is obtained from a by replacing any
occurrence of S(x):$2 with C:$1.</p>
      <p>Despite its apparent complexity, the meaning of formula (2) is quite
intuitive. A generic tuple in Vi has the form hd1; c1; : : : dm; cm; ti, with
d1; : : : ; dm 2 D and c1; : : : ; cm 2 C, and t a sub-tuple over schema T. PromiS
computes moves over S (hence pairs = hd; ci 2 D C, cf. the projection
operator) such that: (i) The domain value of , d, is equal to at least one
among d1; : : : dm of at least one violation 2 Vi (condition ); and (ii) For
such a , tuple 0, built by replacing in any component cj with the new
value c iff the corresponding dj is equal to d, does not satisfy condition i.
Since in RA we cannot modify tuples when evaluating a condition, in (2)
we take the alternative approach of changing the condition itself into 0i.
Requirement (i) guarantees that synthesised moves involve only domain
values that appear in violations 2 Vi, hence, if performed, resolve at least
. Requirement (ii), instead, ensures that, after performing , the newly
introduced tuple in S, i.e., , does not lead to a new violation 0, at least
with tuple t. The following result holds (proofs omitted for space reasons):
Theorem 1. In any state S, for any constraint i, and for any S 2 S having
extension S in S, all moves over S that, if executed from S, are improving
wrt constraint i belong to PromiS.</p>
      <p>In other words, since a move is improving iff it is improving wrt at least
one constraint, by computing, in the current state, PromiS for all guessed
functions S and all constraints i, we obtain a subset of the possible moves
that contains all the improving moves, filtering out many (hopefully most)
of the others. To better appreciate the advantages of this computation, let
us reconsider the University timetabling problem.</p>
      <p>Example 5 (University timetabling, continued). Assume that the problem
instance refers to periods p1–p40, rooms r1, r2, r3 with capacities of 30, 40, 50
students, respectively, and courses c1–c7. The number of students enrolled
in the various courses is given by view Audience in Figure 2(a). Suppose
now that, in the current state, whose extension for TT is partially given in
Figure 2(b), the expression for con2 evaluates to the set of violations Vcon2 of
Figure 2(c). Tuples in Vcon2 give us precious information to drive the search.
As an example, the first tuple shows that the current candidate timetabling
has the problem that course c6 is given in period p2 in a room, r2, which is
too small. Analogously for the other tuples. The query for PromcToTn2 (once
generalised to handle domain and co-domain relations of arity greater than
1) returns the set of moves that resolve at least one violation in Vcon2 without
producing a new one, at least with the same tuples of table Room and view
Audience. Figure 2(d) shows part of the result of PromcToTn2 containing the
moves synthesised from the first violation. Note that, by using the naive
approach, prc = 40 3 7 = 840 moves needed to be considered. With the
presented dynamic reduction instead, at most 7 moves for each violation of
con2 need to be considered. Hence, the more progressed the greedy search, and
the closer we are to a solution, the fewer the moves that need to be considered.
DNR can be exploited also when we do not need knowledge of all improving
moves. Consider, e.g., min-conflicts : at any iteration this algorithm
randomly selects one guessed function S : D ! C and one of its domain values
d 2 D, and then chooses the best improving move among all those regarding
S and d. Given that for a move to be improving, it needs to resolve at least
one violation of one constraint, we can design a violation-directed (and
generalised) version of min-conflicts in such a way that it first selects a random
violation 2 Sik=1 Vi, and then, for each pair of components hdj ; cj i 2 caused
by an occurrence of any guessed function Sj : Dj ! Cj in the definition of
the corresponding constraint, it considers as promising all moves of the kind
hSj ; dj ; c0i; c0 2 Cj ; c0 6= cj , i.e., all moves that would make disappear in the
target state. We omit a formal discussion of this variation for space reasons,
but we will describe some preliminary experimental results in Section 5.
Incremental evaluation of constraints (IEC). Although DNR allows
us to reduce the set of moves that need to be considered during a greedy
search, it tells us nothing about the quantitative impact (in terms of cost
change) of the surviving moves over i and the other constraints. To do this,
the naive algorithm would be to execute any such move and evaluate all
problem constraints over the new database, then roll-back.</p>
      <p>In the important and usual case of constraints of the form (1), the
following results show that: (i) we can compute the exact cost variation upon
all moves in a given set by running only 2 queries for each constraint; and
(ii) after one such move has been chosen, we can update incrementally its
impact in terms of violations added and resolved for all the constraints.
Definition 5 (Resolved iS and Added iS). Given an NP-Alg expression hS; faili,
one of the guessed functions S 2 S, and one sub-expression faili having the
form (1) and evaluating to Vi in state S, let MS be a relation over the schema
of S encoding an arbitrary set of moves over S. We define the queries:
Resolved iS=
0</p>
      <p>MS Vi ;</p>
      <p>00^ 0i
Added iS=</p>
      <p>MS
(S(1)</p>
      <p>S(m) T) ;
where 0 = Wjm=1(MS[$1] = Vi[$(2j</p>
      <p>1)]), 00 = Wjm=1(MS[$1] = S(j)[$1]),
and 0i as in Definition 4. The operator in Added iS is a non-standard
replacement operator (that could easily be simulated in sql by means of
functions and if statements in the target-list) that changes each tuple of
the input expression into ( ). Here, ( ) produces tuple 0 as follows:
8i 2 [1: :m] : 0[S(i):$1; S(i):$2] =
( [MS:$1; MS:$2] if [S(i):$1] = [MS:$1]</p>
      <p>[S(i):$1; S(i):$2] otherwise
while for any other component $j of , we have that 0[$j] = [$j].
These queries play a key-role in the incremental evaluation of constraints:
Theorem 2 (Incremental evaluation of constraints). Given S, Vi and MS as
in Definition 5, in the state S reached after performing any move 2 MS
from state state S, the expression faili for each constraint i evaluates to:
Vi</p>
      <p>MS MS=
(Resolved iS)
[</p>
      <p>MS MS=</p>
      <p>Added iS
;
with MSdenoting the projection operator that filters out columns due to MS.
Also, Resolved iS \ Added iS = ;.</p>
      <p>Theorem 2 tells us that, given an arbitrary set of moves over a guessed
function S that occurs in the expression faili, we can compute incrementally the
result of faili in all the states reached by performing each of these moves,
given the result in the current state. In particular, by taking MS = PromiS,
we can compute the exact impact, both in terms of cost variations and in
the extension of Vi, of all promising moves.</p>
      <p>Query Resolved iS returns a set of tuples, each representing a move =
hd; ci in MS concatenated with a violation in Vi in which value d occurs in
the first column of at least one S(i). Changing the co-domain value
associated to d to c will make this violation disappear from Vi in the target state.
As for Added iS, it computes the contributions, in terms of added tuples to
the result of the query faili, of the execution of each move 2 MS. Such
contributions are tuples of the Cartesian product in which value d occurs
at least once (condition 00), and ‘modified’ by properly taking into account
the new value c that the moves under consideration assign. Again, since in
RA we cannot modify tuples under evaluation, we take the dual approach of
changing the condition from to 0. However, since we also must return the
modified tuples, we need to use the non-standard operator, which changes
the tuples in the input relations in the same way as done by 0.</p>
      <p>Moreover, given that Resolved iS \ Added iS = ;, just by grouping and
counting tuples in those results, we can compute with one additional query
the exact cost of all moves in set MS over the constraints. Also, if all Vis are
materialised, we obtain incremental maintenance of the results of the
corresponding failis by deleting and inserting tuples returned by these queries.
Example 6 (University timetabling, continued). Figure 3 shows the results
of computing queries Resolved cToTn2 (part (a)) and Added cToTn2 (part (b))
regarding the subset of moves in P romcToTn2 explicitly shown in Figure 2(d). As
an example, move (p2; r2; c1) will reduce the cost of con2 by 1 (since it occurs
once in Resolved cToTn2, see Figure 3(a), and it does not occur in Added cToTn2, see
Figure 3(b)). Analogously, move (p2; r2; c5) proves to be neutral wrt con2.
Some brief comments follow on the complexity of the three queries. For each
guessed function S and constraint i, and assuming that Vi is materialised
and incrementally updated thanks to results of Theorem 2, we have that:
(i) P romiS can be computed with two joins, the first of which, given the
tight condition , does not lead to an explosion of the number of tuples (any
tuple in Vi can match at most with m tuples of D, with m being usually very
small); (ii) Resolved iS involves a single join operation; (iii) Added iS involves
one more join operation than the expression of constraint i. Also in this case,
the presence of the tight condition 00 avoids in practice the combinatorial
explosion of the number of tuples due to the added join.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Implementation, experiments, future work</title>
      <p>We implemented a new ConSQL solver based on the ideas presented in
Section 4, and built on top of a relational DBMS. The system, written in Java,
takes as input a problem specification in ConSQL and uses standard sql
commands for choosing, evaluating, and performing moves, hence being
independent of the DBMS used. The user can also annotate her specification with
a declarative description of the search strategy to be used, given in terms of
the local-search algorithms (and parameters) to run (gradient descent,
steepest descent, tabu search, min-conflicts, simulated annealing) and their
composition (via random restarts or batches). Details are given in the appendix.</p>
      <p>In order to test the effectiveness of the techniques above on the
scalability of the approach, we performed preliminary experiments, evaluating
the performance gain of DNR+IEC with respect to the naive evaluation of
moves and constraints. It is worth observing that our purpose was not to
beat state-of-the-art solvers (and in fact we do not even comment on
satisfaction), but rather to seek answers to the following questions: what is
the impact of DNR+IEC on: (i) the reduction of the number of moves to
evaluate; (ii) the overall performance of the greedy part of the search.</p>
      <p>We experimented with two problems, Graph colouring and University
timetabling, and two different greedy algorithms: steepest descent and the
violation-directed version of min-conflicts mentioned in Section 4. The
reasons behind our choices are as follows: steepest descent needs to evaluate,
at every iteration, all possible moves, hence it is a good indicator of the
impact of DNR+IEC in terms of effectiveness in neighbourhood reduction and
the consequent performance gain. On the other hand, min-conflicts needs
to evaluate a much smaller set of moves at any step: hence it can be much
more efficient than the former, especially on large scale instances. However,
we will see that also in this case DNR+IEC may play an important role.</p>
      <p>Experiments have been performed on an Athlon64 X2 Dual Core 3800+
PC with 4GB RAM, using MySQL DBMS v. 5.0.75, with no particular
optimisations. Instances of both problems have been solved running both
algorithms with and without DNR+IEC. In order to perform fair comparisons,
the two runs of the same algorithm on the same instance have been executed
by using the same random seed. As for steepest descent, this implies that
the two runs started from the same state, and executed the same sequence of
moves, terminating at the same local minimum. This does not hold for
minconflicts, since when IEC is not activated, constraints are not materialised,
and the order with which tuples are retrieved cannot be controlled without
paying a very high price in terms of efficiency.</p>
      <p>
        As for the choice of problems (specification are given in the appendix)
and instances: specification for graph colouring is very simple and consists
of one guessed column-set and one constraint (see Example 3). Given its
compactness, it gives clean information about the impact of our techniques
on a per-constraint basis. We used some quite large benchmark instances
taken from http://mat.gsia.cmu.edu/COLOR/instances.html, looking for
minimal colourings, with number of needed colours known in advance. As
for University timetabling, in order to deal with real world instances, we
changed our formulation to that of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where schedulings are periodic, and
some data preprocessing is already done (e.g., constraint about equipment is
already reduced to a relation listing rooms suitable for the various courses).
Also, we had to ignore some constraints which would need non-conjunctive
queries, not yet supported by DNR+IEC. The final formulation resulted in
8 ConSQL constraints (taking into account course lengths, room
suitability, teachers’ periodic unavailabilities, conflicts among courses with common
students, and minimum course spread). As for the instances, we used some
of those of Track 3 of the 2007 Intl. Timetabling Competition (http://www.
cs.qub.ac.uk/itc2007), taken from http://tabu.diegm.uniud.it/ctt.
      </p>
      <p>Some typical-case results are reported in Table 2. Enabling DNR+IEC
considerably boosted performance: impressive speed-ups were experienced
on all instances, especially when all moves need to be evaluated (steepest
descent). Also in the cases where DNR+IEC runs did not terminate in the
1hour timeout, the number of iterations executed is always much higher than
with DNR+IEC disabled. However, DNR+IER brings advantages also when
complete knowledge on the possible moves is not needed (min-conflicts),
although speed-ups are unsurprisingly lower. Finally, we observed (see
Figure 4) the strong expected reductions of the size of the neighbourhood, thanks
to the action of DNR (steepest descent). In particular, DNR is able to
filterout often &gt; 30% of the moves at the beginning of search, and constantly
more than 80% (with very few exceptions) when close to a local minimum.</p>
      <p>Given the promising, even if preliminary results, we are working on
extending DNR and IEC to constraints defined by queries with aggregates
and groupings, on evaluating them on other heuristics, and on investigating
database optimisations, e.g., automated indices generation, in order to
execute queries as efficiently as possible and tackle much larger scenarios, as
those mentioned in Section 1.</p>
      <p>Acknowledgements. This work has been partly carried out during visits of
T. Mancini to Uppsala University, supported by the Swedish Foundation for
Intl Cooperation in Research and Higher Education (STINT, grant
KG20034723) and Blanceflor Foundation. P. Flener and J. Pearson are supported
by grant 2007-6445 of the Swedish Research Council (VR).</p>
      <p>Authors thank the anonymous reviewers for their comments and suggestions.</p>
    </sec>
    <sec id="sec-6">
      <title>References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of Databases. Addison Wesley</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ågren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flener</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearson</surname>
          </string-name>
          .
          <article-title>Revisiting constraint-directed search</article-title>
          .
          <source>Information and Computation</source>
          ,
          <volume>207</volume>
          (
          <issue>3</issue>
          ):
          <fpage>438</fpage>
          -
          <lpage>457</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cadoli</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mancini</surname>
          </string-name>
          .
          <article-title>Combining Relational Algebra, sql, Constraint Modelling, and local search</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          &amp;2):
          <fpage>37</fpage>
          -
          <lpage>65</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>De Cesco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Gaspero</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Schaerf</surname>
          </string-name>
          .
          <article-title>Benchmarking curriculum-based course timetabling: Formulations, data formats, instances, validation, and results</article-title>
          .
          <source>In Proc. of PATAT'08</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Garey</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          . Computers and
          <article-title>Intractability: A Guide to the Theory of NP-Completeness</article-title>
          .
          <string-name>
            <given-names>W.H.</given-names>
            <surname>Freeman</surname>
          </string-name>
          &amp; Co.,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Hoos</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Stützle</surname>
          </string-name>
          .
          <article-title>Stochastic Local Search, Foundations and Applications</article-title>
          . Elsevier/Morgan Kaufmann,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Ternovska</surname>
          </string-name>
          .
          <article-title>A framework for representing and solving NP search problems</article-title>
          .
          <source>In Proc. of AAAI</source>
          <year>2005</year>
          , pages
          <fpage>430</fpage>
          -
          <lpage>435</lpage>
          ,
          <year>2005</year>
          . AAAI Press/MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          . Computational Complexity.
          <source>Addison Wesley</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>A survey of deductive database systems</article-title>
          .
          <source>J. of Logic Programming</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <fpage>125</fpage>
          -
          <lpage>149</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schaerf</surname>
          </string-name>
          .
          <article-title>A survey of automated timetabling</article-title>
          .
          <source>Artif. Intell. Review</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <fpage>87</fpage>
          -
          <lpage>127</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Terracina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Panetta</surname>
          </string-name>
          .
          <article-title>Experimenting with recursive queries in database and logic programming systems</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <fpage>129</fpage>
          -
          <lpage>165</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Hentenryck</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Michel</surname>
          </string-name>
          .
          <article-title>Constraint-Based Local Search</article-title>
          . MIT Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rangan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Looks</surname>
          </string-name>
          .
          <article-title>Backbone guided local search for maximum satisfiability</article-title>
          .
          <source>In Proc. of IJCAI</source>
          <year>2003</year>
          , pages
          <fpage>1179</fpage>
          -
          <lpage>1186</lpage>
          ,
          <year>2003</year>
          . Morgan Kaufmann.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>