<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop on Artificial Intelligence for Climate Change, Italian Workshop on Planning
and Scheduling, RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion,
and SPIRIT Workshop on Strategies, Prediction, Interaction, and Reasoning in Italy November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Information Flows during Constraint Satisfaction Search with Dynamic Heuristics: Information from Preprocessing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Richard J. Wallace</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics, School of Computer Science and IT, University College Cork</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>5</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>We still do not have an adequate understanding of heuristic methods used for solving constraint satisfaction problems (CSPs). An example of this involves the efects of preprocessing, an essential means of improving CSP search. The traditional explanation for its beneficial efect is that the resulting “problem reduction" leaves fewer possibilities to explore. Recently, however, it was shown that when dynamic variable ordering heuristics are used, other factors related to domain size reduction are much more important. This paper extends this earlier work and clarifies what is involved. The key idea is that the pattern of domain reductions produced by preprocessing constitutes information that by itself enables heuristics to make better decisions. Treating domain reduction as a code and the set of reductions as a message of length  transmitted to the search algorithm, we can distinguish two factors, message discriminability and code quality. We provide evidence that greater problem reduction enhances both.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;constraint satisfaction problems</kwd>
        <kwd>heuristics</kwd>
        <kwd>preprocessing</kwd>
        <kwd>search</kwd>
        <kwd>information theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A fundamental strategy for solving CSPs is to subject them to some form of preprocessing before
searching for a solution. For this purpose, one uses what are called local consistency techniques. These
are polynomial-time algorithms that test for limited forms of consistency within a problem [2].</p>
      <p>The best-known examples are based on “arc consistency" (AC) and “path consistency". Arc consistency
algorithms test whether a given value or label for a given variable in the problem is supported by each
and every constraint that the variable is directly afected by. Path consistency algorithms test that all
paths between two variables with consistent values contain values that support both values. In both
cases, the thinking is that if a value is not locally consistent, then it cannot be part of any complete set
of labelings that satisfy all the constraints in the problem, aka a solution. Therefore, if we are searching
for a solution, we don’t have to consider such values at all.</p>
      <p>In most accounts of constraint solving, it is assumed that this is the (sole) reason that preprocessing
is useful. That is, it reduces the number of alternatives that one must examine during search. In other
words, preprocessing reduces the search space.</p>
      <p>The benefits of such “problem reduction" are summarized by Edward Tsang in an eponymous chapter
of his textbook ([3], pp. 79-80):</p>
      <p>
        The following are possible gains from problem reduction when combined with searching:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Reducing the search space. Since the size of the search space is measured by the grand
product of all the domain sizes in the problem, problem reduction can help to reduce the
search space by reducing the domain sizes. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Avoiding repeatedly searching futile subtrees.
... If redundant values and redundant compound labels can be removed through problem
reduction, then one can avoid repeatedly searching those futile subtrees. (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Detecting
insoluble problems. (etc.)
      </p>
      <p>
        Because they do not bear on the issues dealt with in this paper, we will disregard points (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ). This
general view is propounded in later texts [4] and other references [5].
      </p>
      <p>However, recent work has shown that for many problems other factors related to preprocessing and
domain reduction are sometimes more significant for search eficiency than reduction of the search
space [6, 7]. This is because domain reduction as such can guide dynamic variable ordering heuristics
in their decision making. This implicit guidance occurs because these heuristics take current domain
size into account, and this can have an enormous efect on search eficiency. In contrast, while search
space reduction is ubiquitous and important, often it gives only modest degrees of improvement.</p>
      <p>In the initial work [6], it was suggested that some form of information theoretic analysis might be
useful. This idea was elaborated in later work [7], where a simple framework was proposed in which
the set of domain reductions is treated as a code of length , where  is the number of variables in the
problem. In addition, that work suggested two ways in which preprocessing can improve search: by
increasing the number of diferences among variables with respect to the heuristic property, which was
called “discriminability", and by enhancing the quality of information conveyed to the dynamic search
heuristic.</p>
      <p>The present paper extends this work in several respects. First, it presents a more systematic account
of both discriminability and code quality. Secondly, it demonstrates that stronger forms of consistency
preprocessing do enhance the discriminability of variables. It also shows that some older findings on
anomalous efects of preprocessing fall into the present framework.</p>
      <p>Considered more generally, this work bears on the question of why ordering heuristics in constraint
satisfaction search are efective. Unfortunately, despite the essential role of ordering heuristics in
constraint satisfaction search, there is still no systematic analysis that explains how they work and why
one heuristic performs better than another on a given problem. So when a new heuristic is proposed
(e.g. [8] [9] [10]) it is often dificult to determine the basis for its superiority or to predict the quality of
its performance in diferent contexts.</p>
      <p>In recent years, there has been some progress in developing a comprehensive framework based
on decision analysis [11]. However, as the present work shows, there are still important gaps in our
knowledge. More specifically, this work shows that a decision-theoretic approach must be complemented
by a deeper look at the information that is utilized in decision making in this context.</p>
      <p>The present paper is organized as follows. Section 2 provides basic concepts and definitions. Section
3 describes the experimental environment used in this work, including algorithms, problems, and
implementation. Section 4 describes the empirical findings that gave rise to the present thesis. Section 5
presents a resolution of a well-known ‘anomalous’ result. Section 6 clarifies the nature of the information
transmitted. Section 7 lays out a framework for characterizing information transmission in this context
and the ways that it afects search efort. Section 8 gives conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background Concepts</title>
      <p>Here, we review some of the basic concepts in the field that form the background of the present work.</p>
      <p>A constraint satisfaction problem (CSP) involves assigning values to a set of variables subject to
restrictions on the way that values can go together. More formally, a CSP can be defined as a tuple,
(, , ) where:
 is a set of variables, 1, . . . , ,
 is a set of domains, , where each  is the set of values that may be assigned to variable 
 is a set of constraints. Each  belonging to  consists of a relation  and a particular subset of the
variables in , called the scope of the constraint.  is based on the Cartesian product of the values of
the domains of the variables in the scope.</p>
      <p>A solution to a CSP is an assignment or mapping from variables to values,  =
{(1, ), (2, ), ....., (, )}, that includes all variables ( = ) and satisfies all the constraint relations
in .</p>
      <p>In the present paper we only consider backtracking methods for finding solutions to CSPs. These
methods build partial solutions that are consistent, extending them with further assignments until a
full solution is obtained. If at some point in this process the partial solution cannot be extended, then
search ‘backs up’ by retracting assignments and replacing them with new ones before attempting to
extend the partial solution.</p>
      <p>Backtrack search is typically enhanced by interleaving local consistency with search decisions, to
form “hybrid search algorithms". Typically, local consistency is done after each new value assignment
in order to simplify the problem further before proceeding to the next decision.</p>
      <p>The eficiency of backtrack search is further enhanced by using heuristics (rules of thumb based on
maximizing or minimizing some problem feature) to choose the next variable or value to add to the
partial solution. A well-known heuristic (mindom) chooses a variable with the smallest domain size;
another (domain over forward degree or d/fd) extends this idea by choosing the variable for which
domain size divided by the number of adjacent variables among the still uninstantiated set is minimized.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Algorithms and Testing</title>
      <sec id="sec-3-1">
        <title>3.1. Search algorithms</title>
        <p>The search algorithm used in this paper is called maintained arc consistency (MAC) [12]. In this version
of hybrid backtracking, following each new assignment, AC is re-established in the subproblem formed
by the variables not yet given an assignment. If this AC fails, then the last value assigned is retracted; if
there are no more values to test for the last variable chosen, then search backs up and tries another value
for the next-to-last variable chosen. This process continues until either a complete set of consistent
assignments is found (aka a solution) or there are no more values to test at the highest level, which
means there are no solutions.</p>
        <p>Forward checking is a more circumscribed form of hybrid backtrack search [13]. In this case, after
each assignment, domains of variables sharing a constraint with the current variable are checked for
consistency with the value just assigned before proceeding to the next assignment.</p>
        <p>The dynamic variable ordering heuristics used in these experiments were mindom and d/fd. By
dynamic, we mean that the heuristic takes account of the current state of the problem at each stage of
search. Mindom is a purely dynamic heuristic, in that it relies on a single property (domain size) that is
afected by preprocessing as well as local consistency performed during search. It is also a fairly weak
heuristic, which limited the dificulty of the problems that could be used in the experiments. d/fd was
also used to determine if the same results would be found for a compound heuristic where dynamic
domain size is one component.</p>
        <p>The static variable ordering heuristic used in these experiments is the maximum (static) degree
heuristic. This selects variables on the basis of their degree in the constraint graph (i.e. the number of
constraints a variable is involved in), choosing the unassigned variable with the highest degree. In this
case the feature used by the heuristic remains the same regardless of how much certain domains are
reduced during preprocessing.</p>
        <p>In these experiments, MAC was performed with d-way branching. Unless otherwise noted, domain
values were tested in lexical order.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing algorithms</title>
        <p>The experiments to be described involved comparisons of search efort after diferent amounts of
problem reduction. In addition to AC itself, this work made use of certain forms of singleton arc
consistency, or SAC [14]. SAC is a form of AC in which a value  is considered the sole representative
of the domain of . If AC cannot be established for the entire problem under this condition, then there
can be no solution with this value, so it can be discarded. If this condition can be established for all
values in problem  , then the problem is singleton arc consistent.</p>
        <p>A related form of consistency is called neighbourhood singleton arc consistency, or NSAC [15]. NSAC
algorithms establish SAC with respect to the neighbourhood of the variable whose domain is a singleton
as opposed to the entire problem.</p>
        <p>For our purposes, the important fact about these algorithms is that with respect to values deleted,
they form a series, where AC is dominated by NSAC, which in turn is dominated by SAC.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Problems used</title>
        <p>The goal of these experiments was to examine the efects of problem reduction on search efort with
diferent heuristics. To this end, certain requirements had to be met regarding problem features and
algorithm performance: (i) problems had to have variables with diferent numbers of constraints
(diferent degrees in the constraint graph), (ii) more restrictive forms of preprocessing had to have
greater efect (more values removed), and for the same reason, (iii) there had to be diferences in search
efort (number of search nodes) following diferent degrees of problem reduction. Care was also taken
to ensure that more stringent forms of preprocessing did not reduce problems so much that they were
trivially solvable; in such cases positive results would also be trivial. In addition, all problems had
solutions. This was done to avoid marked diferences in number of values deleted and cases of zero
search nodes, which would render the present analysis meaningless or at least problematic.</p>
        <p>The following types of problems were used, which in each case met these requirements:
∙ “Geometric problems" whose values had varying degrees of support in diferent constraints (see
[6] for further details),
∙ Random “relop problems" whose constraints were relational operators,
∙ Radio Link Frequency Allocation Problems (RLFAPs) derived from a benchmark problem by
altering the tightness of a randomly chosen set of greater-than constraints and randomly swapping
adjacent variables within constraint pairs,
∙ Driverlog benchmark problems.</p>
        <p>Geometric problems are generated by selecting points at random within the unit square to represent
variables, and adding constraints between those whose Euclidean distance is less than some criterion,
called the ‘distance’ [16]. In the present work, in addition, if there is more than one connected component,
separate components are connected via pairs of variables (one in each component) having the smallest
Euclidean distance between their points in the unit square, in order to make a graph with a single
connected component.</p>
        <p>Two sets of problems were tested having either 80 or 40 variables, with constant domain sizes 15
and 12, respectively. To limit variation in graph density (proportion of possible edges in the constraint
graph), only problems within a small range around a “target" were accepted; for the 80-variable problems
the target was 380 (± 5) constraints; for the 40-variable problems, 160 (± 3).</p>
        <p>Relop problems had 50 variables, domain size 15 and a graph density of 0.4. The binary constraints
were either ̸=, &gt;, or ≥ chosen randomly according to the ratios, 70:05:25.</p>
        <p>Radio Link Frequency Allociation Problems (RLFAPs) are also binary problems, with two kinds of
distance (diference) constraint: |() − ( )| =  and |() − ( )| &gt; , where  and  are
variables,  is an operator that assigns a value to a variable, and  is an integer. These problems had
100 variables with domain sizes of about 40, and constraint graphs of low density.</p>
        <p>In order to obtain a larger sample of RLFAPs with the same basic characteristics, the following method
was used. A single benchmark problem was selected. From this, a set of problems was generated by
taking the original problem and choosing 60% of the |() − ( )| &gt;  constraints at random and
decrementing the value of  in those constraints by about 20. In addition, 25 pairs of variables were
randomly chosen from variables numbered 1 to 50 along with adjacent variables whose labels difered
by no more than 7, and the latter were swapped to create diferent constraints.</p>
        <p>Driverlog problems were benchmarks from the University of Artois website.</p>
        <p>For all problem sets except the driverlog series, a large number of problems were generated that were
ifltered for solutions (in a generate-and-test fashion). From each set, the hardest problems were chosen</p>
        <p>probs dom vs del d/fd vs del deg vs del dom vs dg d/fd vs deg
geo80 – 0.294 0.590+ – 0.171
geo40 0.598+ 0.375* 0.850+ 0.635+ 0.685+
relop50 -0.186 0.032 0.361+ -0.160 0.200
RLFAP100 -0.149 -0.102 0.242* 0.153 0.000
driver -0.371 0.686 0.943* -0.086 0.686
Notes. “dom", “d/fd", “deg" and “del" refer to fractions based on search nodes
and deletions. *  &lt; .05, +  &lt; .01, two-tailed.
for use in the present experiments. This ensured that the efects of diferent forms of preprocessing on
search were more pronounced than they would have been if random samples of problems had been
used due to simple ceiling efects. In other words, if a problem is easy to solve under all conditions,
then deleting more values during preprocessing has very little efect, so factors that afect the relation
between preprocessing and search are harder to detect.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Implementation details</title>
        <p>Algorithms were implemented in Common Lisp, specifically Macintosh Common Lisp (MCL) version 5.1
running on an iMAC (MAC OS X version 10.2.8) with a Power PC 800MHz CPU. In these experiments,
search was discontinued if it exceeded one million nodes. In these cases, the cutof value was used as
the search node number.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Previous Results: More than Problem Reduction</title>
      <p>This section presents some of the basic evidence that more is involved in the efects of preprocessing
on search than simple problem reduction, material presented at greater length in [7]. It is included here
to provide context for subsequent results.</p>
      <p>First we note that conditions for demonstrating this were met in these experiments. Thus, for all
problem sets, NSAC deleted appreciably more values than AC, and SAC usually deleted appreciably
more values than NSAC. In addition, search efort was always less after greater problem reduction.
(Data demonstrating these efects can be found in [7], Table 1.)</p>
      <p>The main finding in these experiments was that search with dynamic heuristics is usually not strongly
related to problem reduction, in contrast to search with a static search heuristic. This was demonstrated
by comparing performance after AC and NSAC, specifically, by testing the correlation between relative
problem reduction and relative search reduction for each set of problems.</p>
      <p>Relative reductions were assessed with these fractions: NSAC− AC was used for total deletions per
NSAC
problem, while AC− NSAC , was used for search nodes. Given the violations of assumptions for parametric</p>
      <p>AC
statistics, statistical tests were done with the non-parametric Spearman Rank Correlation Coeficient
[17].</p>
      <p>The results of these experiments are shown in Table 1. For all problem sets, correlations between
search reduction and domain reduction were always statistically significant for max degree (third data
column). For dynamic heuristics, the correlation between search and domain reduction was rarely
significantly diferent from zero.</p>
      <p>These diferences are partly due to the fact that following greater problem reduction, search with
dynamic heuristics was not always improved. In fact, for some problems it was appreciably worse
despite an overall improvement in the mean. Such efects never occurred with the static degree heuristic.</p>
      <p>These results show that when search involves dynamic ordering heuristics, the efects of preprocessing
involve other factors than problem reduction, although that of course plays a part. Both the
order-ofmagnitude improvements that were sometimes observed after greater problem reduction, and, even
more decisively, deterioration in performance, are inexplicable on this basis, and neither of these are
found with a non-dynamic heuristic like max degree. Moreover, the correlational analysis yielded
some results where a negative correlation was found between search with a dynamic heuristic and the
amount of domain reduction; yet NSAC still led to improvement in search on average in these cases</p>
      <p>Since there is a close relation between the dynamic character of the heuristic and these ‘anomalous’
efects, this is consistent with the notion that domain reductions serve to guide these heuristics during
search toward making generally better decisions. Section 6 gives further evidence that this explanation
is the correct one. But first we will consider an old result that heretofore has not received a proper
explanation, and show that it fits within the present framework.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Resolving an old conundrum</title>
      <p>In some work done in the 1990s it was found that preprocessing before search sometimes resulted in an
excess of overall work, and in some cases search itself was negatively afected [ 12]. These results were
found using the forward checking algorithm, either carried out by itself or preceded by arc consistency.
Perhaps because MAC was so efective with these problems, a similar anomaly was not reported in this
case.</p>
      <p>The authors of this work did not try to analyze this efect. However, since they used a dynamic
variable ordering heuristic, viz, minimum domain size, one would expect that the efect demonstrated
in the previous section would be in operation.</p>
      <p>
        This expectation was verified by, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) using minimum domain size in combination with forward
checking with some of the problems used in the present work, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) replicating the original Sabin and
Freuder results and comparing results obtained with min domain size and max static degree.
      </p>
      <p>
        Sabin and Freuder used (homogeneous) random problems with 50 variables, domain size eight,
tightness = 0.5, and densities of 0.06, 0.07, 0.08 and 0.09, where “density" in this case refers to the
proportion of edges added to a complete spanning tree. For the present tests, 25 problems with these
parameter values were generated at each density. (Incidentally, this series passes through a critical
complexity region, so that while all of the 0.06 problems had solutions, none of the 0.08 and 0.09
problems did.) For these densities in ascending order, with min domain size, 11/25, 8/25, 9/25 and 5/25
problems gave poorer search results (for nodes and constraint checks) when search was preceded by
AC. In contrast, with max static degree the number of nodes was either the same or slightly less when
search was preceded by AC. A very similar pattern of results was found for MAC with or without AC
prior to search, although of course the diferences were much smaller. (The latter and the small number
of problems (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) may account for the failure to report such diferences in [ 12].) Clearly, the Sabin-Freuder
result is due to the same factors demonstrated in the present paper.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. What Kind of Information Does Domain Reduction Supply?</title>
      <p>The previous experiments showed that when dynamic heuristics are used to select variables,
preprocessing involves more than search space reduction. This implies that domain reductions afect heuristic
decisions so as to enhance search, at least on average. One possibility is that information of some sort
is gained from domain reductions that guides these heuristic decisions.</p>
      <sec id="sec-6-1">
        <title>6.1. Ruling out extraneous factors</title>
        <p>To establish that an information channel is indeed being set up between preprocessing and search with
dynamic ordering heuristics, it is necessary to rule out other factors. These include the branching
factor associated with variable selection, which when combined with propagation during search can
reduce the search space appreciably. In addition, there are ‘noise factors’ such as value selection and</p>
        <p>max min
statistic domreduct domreduct lexical
mean 7,207,246 6,796,259 25,495,721
median 439,935 845,579 2,775,788</p>
        <p>Notes. Entries are for search nodes, based on 50 problems.
propagation efects, which can afect search dramatically without being relevant to the information
lfow we are concerned with.</p>
        <p>In order to demonstrate that information coming from preprocessing can in and of itself afect search
results, the following procedure was used. First, a preprocessing algorithm was used that deleted a
suficient number of values to potentially guide decisions during search. For this purpose, NSAC was
used, which eliminated 10-20% of the domain values. Following preprocessing, all the deleted values
were returned to their domains, which in this case were all of the same size originally. In addition
simple backtracking was used, so there was no propagation during search. With these procedures, the
branching factor was held constant, and any confounding efects of propagation were avoided. In order
to remove the efect of value selection, each problem was solved 50 times with values chosen at random
from each domain.</p>
        <p>For these runs, a fixed variable ordering was used based on the size of domain reduction after
preprocessing, with larger reductions chosen over smaller (“max domreduct" in Table 2). Under these
conditions, if this ordering improves on a lexical ordering of the variables, we can conclude that
information was passed from preprocessing to the heuristic used to select variables during search. This
information should improve search overall, although as we know from the previous experiments and
from earlier work, this is not guaranteed in all cases. In addition, an anti-heuristic was used, where
smaller domain reductions were chosen over larger (“min domreduct" in Table 2).</p>
        <p>For this experiment, geovarsat problems were used, whose characteristics were described earlier.
Because basic backtracking is a weak method for solving such problems, small problems were used, with
20 variables and domain size 7. The target number of constraints was 50 ± 5. There were 50 problems.</p>
        <p>Comparisons between the variable ordering based on maximum domain reduction and a lexical
ordering were clearcut. With the heuristic, search was between three and four times better in terms of
search nodes (and of course runtimes). A paired comparison -test based on log-transformed values gave
significant results (  = 4.223,  &lt; 0.001), as did a sign test (35/50 in expected direction  &lt; 0.01). From
this we can conclude that domain reduction in and of itself provides a signal that can guide variable
selection so as to improve search efort on average.</p>
        <p>At the same time, the anti-heuristic (minimum domain reduction) also performed better than the
lexical ordering (cf. Table 2). This was unexpected. In addition, the mean was somewhat lower than
with the heuristic. This was due to one problem for which the heuristic performed very badly; hence,
it is not reflected in the median (also shown in the table). In this case, the paired-comparison -test
based on log-transformed values was significant (  = 2.282,  &lt; 0.01), while the sign test was almost
significant (32/50 in expected direction,  &lt; 0.10). Tentatively, I would ascribe these anomalies to
the relatively small sample size. This seems likely in view of the tremendous variance one obtains
with backtrack search, given that there is no amelioration of the efects of bad variable selections by
propagation.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Comparing domain reductions with constraint weights</title>
        <p>Further insight into what a dynamic heuristic is being informed of can be obtained by comparing
domain reductions with “constraint weights" obtained through random probing [18]. Random probing
is a variation of the weighted degree procedure [19]. In that procedure, each constraint has an associated
“weight", which is the number of times that propagation via that constraint led to a loss of all values
in some variable domain (“domain wipeout"). Sums of these weights are then used to give a weighted
degree of each variable, which is used to enhance the maxdeg or d/fd heuristic.</p>
        <p>Random probing is a preprocessing technique in which a search procedure is carried out by choosing
variables at random up to a limited cutof in terms of either nodes or failures (i.e. domain wipeouts).
This is done repeatedly prior to full search. During these runs, weights (i.e. failures) are tallied just as
with the original weighted degree procedure. This procedure allows (full) search to begin with a set of
constraint weights (and therefore weighted degrees) rather than building these from scratch during the
search process.</p>
        <p>In the present work we are interested in examining the profile of weight-sums (aka weighted-degrees)
itself and comparing it with the profile of domain reductions for the same problem. For this purpose,
the two sets of geovarsat problems were used. For each problem, random probing was carried out using
a schedule of 50 probes, with each probe continuing until 40 failures had occurred. Then the profile
of weight-sums obtained for the entire set of variables in a problem was correlated with the domain
reductions for the same variables after AC, NSAC or SAC. This analysis employed the Spearman Rank
Correlation Coeficient, corrected for tied scores [17].</p>
        <p>Table 3 shows summary statistics based on correlations between summed weights and domain
reductions for the set of variables in each individual problem. For AC, correlations were small (for
80-variable problems, some were negative), and only about half were statistically significant. In contrast,
for both NSAC and SAC, correlations were much higher, and all were positive. For the 40-variable
problem set, correlations based on NSAC or SAC were statistically significant for all 25 problems; for
the 80-variable set, nearly all were. Correlations were usually slightly higher after SAC than NSAC (0.1
for 80-variable problems, 0.2 for 40-variable problems).</p>
        <p>The significance of these results stems from the fact that constraint weights are due to bottlenecks
encountered during search, so with random probing higher weights are associated with more basic,
possibly global bottlenecks. As is well known, search efort can be greatly reduced by choosing
bottleneck variables early in search (at the top of the search tree) [13]. So if these variables can be
identified during preprocessing and this information can be transmitted to the search process, then
obviously it will make search more eficient. Random probing is an efective procedure for doing this.</p>
        <p>If, in turn, domain reduction profiles are positively correlated with weight profiles, this indicates
that large domain reductions are often due to the same bottlenecks. Obviously, domain reductions are
not as discriminating, nor can they be since the number of possible values is much smaller than the
number of possible weight-sums. Moreover, many domains are unchanged (zero reduction). So the
fact that correlations are sometimes low to moderate in size is to be expected. However, it is clear that
SAC-based forms of consistency produce much better correlations, which in turn means that these
stronger consistencies result in significant enhancements of code quality.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Characterizing Information Flow with CSP Search Heuristics</title>
      <sec id="sec-7-1">
        <title>7.1. Information channels and CSP heuristics</title>
        <p>The diagram below shows the standard view of information transmission [20]. Messages from a source
are encoded and transmitted over a “channel", after which they are decoded and passed to a recipient.
source
(en)coder
channel
decoder
recipient
But this process can be viewed somewhat diferently:
action
result
channel
extraction
decision
In this case, the “channel" could be the storage of the result and its subsequent retrieval. Information
“extraction" can involve selection of certain portions of the stored results as well as any transformations
that might be made. This step may also involve combining information from diferent sources. The
“decision" usually involves comparisons of some sort and a selection among diferent actions on this
basis.</p>
        <p>It is obvious that most algorithms make use of information in carrying out their tasks. For example, an
internal comparison sort requires a list of entities with ordinal properties drawn from some specified set
that are presented in an unspecified order. Each entity carries information that guides the decisions made
in the series of comparisons and moves that constitutes a realization of the algorithm. So during the
operation of the algorithm, information is constantly being extracted from the environment. However,
since the channels in this case are noiseless, and there is a 1:1 mapping between transmissions and
operations, for deterministic algorithms like this, this feature of the algorithm is not of great theoretical
interest and can be ignored.</p>
        <p>This is not the case, however, for CSP algorithms, where choices are not deterministic. Here, the
information provides only a rough guide and can even be misleading. So in these cases, this aspect of
the algorithmic procedure becomes an important matter for analysis, as the results presented earlier
show.</p>
        <p>All CSP heuristics operate by extracting information from a problem and using that information
to guide decisions during search. As is well-known, there are many features that can be used. In the
present terms, each of these (and each feature combination) forms a diferent information channel with
respect to the extraction process.</p>
        <p>Given the results in this paper, we can distinguish two kinds of information channel:
∙ information is extracted from ‘nature’, i.e. from pre-existing features of the problem
∙ information is extracted from results produced by another algorithm</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Measuring the amount of information provided by preprocessing</title>
        <p>We can, in fact, characterize the messages sent across each kind of channel in information theoretic terms.
Consider, for example, the case where we use the domain reduction obtained during preprocessing as
a heuristic. (Note that for problems with constant domain size this is equivalent to mindom, but in
other cases it is not.) Here, domain size reduction can be viewed as a code whose letters are the possible
reductions ranging from 0 to max − 1 (or  − 1 for problems with constant domain size). The message
sent from the preprocessing to the search stage always contains  letters, one for each variable (fixed
length code). (We ignore messages containing , since in these cases search would not be carried out.)</p>
        <p>For both kinds of channel, one result of information extraction is that it allows search to be more
discriminating, i.e. there are more distinctions among variables that allow choices to be made in order to
improve search eficiency. Thus, one of the results of preprocessing a problem with a local consistency
algorithm is to enhance discriminability, i.e. the number of discriminations that can be made (in this
case) among variables. This can be measured using the usual formula for the entropy of a system ,
() =

∑︁ ()(1/())
=1
where  is the number of diferent code values, in our case the number of diferent amounts of domain
reduction, and for () we use the relative frequency with which a given domain reduction occurs
among these variables.</p>
        <p>A few examples will give a sense of what this formula yields. Consider first, cases where preprocessing
yields no domain reductions. In these cases the message will contain  zeros and will therefore not
allow any discriminations between variables. In this case the amount of information transmitted is
1
(0)
(0)</p>
        <p>
          = 1  1 = 0
Next, consider cases where there are domain reductions. For example, suppose  = 5, and we have
two messages, (
          <xref ref-type="bibr" rid="ref1 ref2">0, 0, 0, 1, 2</xref>
          ) and (
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0, 1, 2, 3, 4</xref>
          ). In the first message there are three variables that cannot
be discriminated from each other, while in the second all of them can. (Note that in the latter case
 = .) Using the same measure of information, we calculate the information given to the search
algorithm by the first message as:
3 1 1 1 1 1
5  35 + 5  15 + 5  15
        </p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Code quality</title>
        <p>In this situation, an ideal code will contain up to  distinct letters, for the case where all variables should
be distinguished. This means that, unless  is at least as large as  − 1, there may be some comparisons
of potential significance that cannot be encoded through domain reductions. In other words, some
(perhaps most) of the actual encodings will necessarily be deficient, since not all significant diferences
can be encoded.</p>
        <p>This sort of code will also usually be degenerate; that is, two or more value-pairs can encode the
same diference between variables. In fact, if variable  should be chosen before variable  , there are
up to (2− 1) ways of signaling this via domain reduction. In addition, in some cases it wll not matter
which of two variables is chosen. Fortunately, search is not afected by this aspect of the code.</p>
        <p>It might be questioned whether code quality is anything more than discriminability. However, the
following argument shows that the latter is only one component of code quality. Suppose we have a
message consisting of a set of domain reductions. In addition to informing a dynamic heuristic which
variable to select next for assignment, it could also be used to inform an “anti-heuristic", where the
opposite decisions are made (e.g. the anit-heuristic of mindom is maxdom). Now, typically, anti-heuristics
perform on average worse than selecting variables at random. But in this case, the discriminability is
the same. So, in addition to discriminability, we have to consider other aspects of what we are calling
“code quality".</p>
        <p>
          A code based on domain size is not perfect, since (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) there are known heuristics that give reliably
better performance, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) as we have seen in the experiments, there are cases where greater domain
reductions after more stringent preprocessing leads to reductions in search eficiency. Hence, in terms
of eficient search, this code can lead to non-optimal choices. Since the information channel is noiseless,
this has to be considered as a problem of interpretation, i.e. a semantic problem, one that can lead to
mistakes or decision-faults as opposed to transmission errors.
        </p>
        <p>Although at this time, it is entirely unclear how to rigorously characterize the situation, two
approaches seem possible. One is to consider the diference from a ‘perfect’ ordering. The other is to
consider the problem as one of risky choice, where the amount of risk might be characterized in terms
of loss functions.</p>
        <p>In the first case, we can characterize an ideal code as one that establishes a relational homomorphism
between the comparisons between code letters in the message and the ordering relation of variables
that yields the smallest search tree. In the case under consideration, where code letters are actually
numbers whose quantities can be compared, this means that if quantity (code letter)  is larger than
quantity , then choosing the variable associated with  over the variable associated with  will result
in a smaller search tree. And if two or more code letters are the same, then the order in which their
associated variables are chosen will not matter.</p>
        <p>In this case, the diference from the ideal could be measured by the number of inversions of pairs of
variables with respect to a perfect ordering (e.g. if  &gt;  in the perfect ordering,  &gt;  would be an
inversion). Unfortunately, in practice we do not know what the best variable ordering is, so we can
neither determine if there is a relational homomorphism in a given case nor the degree to which the
ordering suggested by domain reductions is discrepant.</p>
        <p>An approach through loss functions could potentially incorporate the impact of a given deviation
from a ‘perfect’ ordering in addition to the number of inversions. In addition, it might be possible to
estimate the loss function through some sort of sampling. However, it must be left for future work to
decide whether methods like these can be made useful in practice.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions and Prospects</title>
      <p>The present work takes a few more steps toward the goal of providing a theoretical (and experimental)
foundation for the employment of search heuristics. In particular, it is now clear why there are vagaries
in search efort after problem reduction through removal of locally inconsistent values.</p>
      <p>In the past, this problem was only addressed implicitly, either by introducing more powerful search
methods [12] or by ameliorating the situation by combinining domain reduction with other heuristic
features such as the degree of a variable [21] (which as we have shown does not solve this particular
problem). Since the nature of the problem is now recognized, it may be possible to deal with it more
directly.</p>
      <p>Using an information-based framework, we were able to demonstrate that there are two factors
associated with information transmission between preprocessing and search, which we have termed
discriminability (among the letters of a message) and code quality. We have also shown that, at least
for AC-based algorithms, stronger forms of local consistency reliably enhance both properties. For
discriminability, this was demonstrated directly; for code quality, this was inferred from correlations
between domain reductions and failure-based constraint weights, which earlier work has shown can
identify bottleneck variables that should be instantiated early in search [22]. Although we were able
to propose a well-defined measure for discriminability, defining code quality precisely and devising
methods to enhance it reliably remain problems for further study.</p>
      <p>This analysis may have implications that go beyond the field of constraint satisfaction. Thus, any
problem solving method that includes some kind of problem analysis and/or alteration anterior to
heuristic choices may involve the kinds of phenomena described in this paper. And the general
phenomenon of information flow applies to a wide range of algorithms that involve non-deterministic
choices among alternatives at some steps of their execution.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Acknowledgments</title>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>This publication has emanated from research supported in part by a research grant from Science
Foundation Ireland under Grant No 12/RC/2289_P2. Table 1 and portions of Table 3 were first published
in Progress in Artificial Intelligence: 23rd EPIA Conference on Artificial Intelligence, Proceedings, Part
II, LNAI No. 14968, 2024, by Springer Nature.
Paulo Cortez, Pedro M. Moreira (Eds.), Progress in Artificial Intelligence. 23rd EPIA Conference
on Artifical Intelligence, EPIA 2024, Part II. LNAI No. 14968, 2024.
[8] Djamal Habet, Cyril Terrioux, Conflict history based search for constraint satisfaction problem,
in: Thirty-fourth ACM/SIGAPP Symposium on Applied Computing, ACM, 2019, pp. 1117–1122.
[9] Hongobo Li, Minghao Yin, Znshan Li, Failure based variable ordering heuristics for solving CSPs,
in: Laurent D. Michel (Ed.), Principles and Practice of Constraint Programming-CP 2021., Leibnitz
International, 2021, pp. 9:1–9.
[10] Ruiwei Wang, Wei Xia, Roland H. C. Yap, Correlation heuristics for constraint programming,
in: Twenty-Ninth International Conference on Tools with Artificial Intelligence - ICTAI’17, IEEE
Press, 2017, pp. 1037–1041.
[11] Richard J. Wallace, A normative-prescriptive-descriptive approach to analyzing CSP heuristics, in:</p>
      <p>Thirtieth International FLAIRS Conference, AAAI Press, 2017, pp. 158–163.
[12] Daniel Sabin, Eugene Freuder, Contradicting conventional wisdom in constraint satisfaction, in:</p>
      <p>Eleventh European Conference on Artificial Intelligence-ECAI’94, Wiley, 1994, pp. 125–129.
[13] Robert M. Haralick, Gordon L. Elliott, Increasing tree search eficiency for constraint satisfaction
problems, Artificial Intelligence 14 (1980) 263–314.
[14] Christian Bessière, Stéphane Cardon, Romuald Debruyne, Christophe Lecoutre, Eficient algorithms
for singleton arc consistency, Constraints 16 (2011) 25–53.
[15] Richard J. Wallace, SAC and neighbourhood SAC, AI Communications 28 (2015) 345–364.
[16] David S. Johnson, Cecilia R. Aragon, Lyle A. McGeoch, Catherine Shevron, Optimization by
simulated annealing: An experimental evaluation. part II. graph coloring and number partitioning,
Operations Research 39 (1991) 378–406.
[17] Sidney Siegel, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, New York, 1956.
[18] Diarmuid Grimes, Identifying Sources of Global Contention in Constraint Satisfaction Search,</p>
      <p>Ph.D. thesis, National University of Ireland, Cork, 2012.
[19] Frédéric Boussemart, Fred Hemery, Christophe Lecoutre, Lakhdar Sais, Boosting systematic search
by weighting constraints, in: Sixteenth European Conference on Artificial Intelligence-ECAI’04,
IOS, 2004, pp. 146–150.
[20] Douglas S. Jones, Elementary Information Theory, Oxford University Press, Oxford, 1979.
[21] Christian Bessière, Jean-Charles Regin, MAC and combined heuristics: Two reasons to forsake FC
(and CBJ?) on hard problems, in: Principles and Practice of Constraint Programming-CP’96. LNCS
No. 1118, Springer, 1996, pp. 61–75.
[22] Richard J. Wallace, Diarmuid Grimes, Experimental studies of variable selection strategies based
on constraint weights, Journal of Algorithms: Algorithms in Cognition, Informatics and Logic 63
(2008) 114–129.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Diego</given-names>
            <surname>Aineto</surname>
          </string-name>
          , Riccardo De Benedictis, Marco Maratea, Munyque Mittlemann, Gianpero Monaco, Enrico Scala, Luciano Serafini, Ivan Serina, Francesco Spegni, Elisa Tosello, Alessandro Umbrico, Mauro Vallati (Eds.),
          <source>Proceedings of the International Workshop on Artificial Intelligence for Climate Change, the Italian Workshop on Planning and Scheduling</source>
          , the RCRA Workshop on
          <article-title>Experimental evaluation of algorithms for solving problems with combinatorial explosion, and</article-title>
          the Workshop on Strategies, Prediction, Interaction, and
          <article-title>Reasoning in Italy AI4CC-IPS-RCRASPIRIT 2024, co-located with 23rd International Conference of the Italian Association for Artificial Intelligence</article-title>
          (AIxIA
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Chrisian</given-names>
            <surname>Bessière</surname>
          </string-name>
          ,
          <article-title>Constraint propagation, in: Handbook of Constraint Programming</article-title>
          , Elsevier,
          <year>2006</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Edward</given-names>
            <surname>Tsang</surname>
          </string-name>
          , Foundations of Constraint Satisfaction, Academic Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Rina</given-names>
            <surname>Dechter</surname>
          </string-name>
          , Constraint Processing, Morgan Kaufmann,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Robert</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Woodward</surname>
          </string-name>
          , Higher-level Consistencies: When, Where, and How Much,
          <source>Ph.D. thesis</source>
          , University of Nebraska,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Richard</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wallace</surname>
          </string-name>
          ,
          <article-title>Explaining the efects of preprocessing on constraint satisfaction search</article-title>
          , in: Luca Longo, Ruairi OReilly (Eds.),
          <source>Thirtieth Irish Conference on Artificial Intelligence and Cognitive Science AICS 2022. Communications in Computer and Information Science No. 1662</source>
          , Springer,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Richard</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wallace</surname>
          </string-name>
          ,
          <article-title>Domain reductions after preprocessing: Efects on dynamic variable ordering heuristics in constraint satisfaction search</article-title>
          , in: Manuel F. Santos, José Machado, Paulo Novais,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>