=Paper=
{{Paper
|id=Vol-487/paper-4
|storemode=property
|title=From Set Unification to Set Constraints
|pdfUrl=https://ceur-ws.org/Vol-487/paper4.pdf
|volume=Vol-487
|dblpUrl=https://dblp.org/rec/conf/birthday/Rossi08a
}}
==From Set Unification to Set Constraints==
Il Milione: A Journey in the Computational Logic in Italy
DALL’UNIFICAZIONE INSIEMISTICA AI VINCOLI SU
INSIEMI
FROM SET UNIFICATION TO SET CONSTRAINTS
Gianfranco Rossi
20
Il Milione: A Journey in the Computational Logic in Italy
SOMMARIO/ABSTRACT making set constraint solving more effective.
In questo articolo riassumiamo brevemente alcuni dei 2 Set Unification
problemi più interessanti che nascono quando si per-
mette di trattare gli insiemi come oggetti di “prima Intuitively, the set unification problem is the problem of
classe” in un linguaggio logico, dall’unificazione di in- computing (or simply testing the existence of) an assign-
siemi ben-fondati e non alla soluzioni di vincoli su insiemi. ment of values to the variables occurring in two set terms
which makes them denote the same set.
In this paper, we briefly summarize some of the most chal- Various forms of set unification have been used in
lenging issues that arise when allowing sets to be dealt various application areas, such as (see [13]): deductive
with as first-class objects in a logic language, ranging from databases, AI and its various sub-fields (e.g., Automated
set unification of well-founded and non-well-founded sets Deduction and Natural Language Processing), program
to set constraint solving. analysis and security, declarative programming languages
with sets.
Keywords: Set unification, hypersets, set constraints. Set unification can be thought of as an instance of E-
unification, i.e., unification modulo an equational theory
E, where the identities in E capture the properties of set
1 Introduction
terms—i.e., the fact that the ordering and repetitions of el-
Sets are familiar mathematical objects, and they are often ements in a set are immaterial.
used as a high-level abstraction to represent complex data The equational theory E is strongly related to the rep-
structures, whenever the order and repetitions of elements resentation adopted for set terms. Two main approaches
are immaterial. have been presented in the literature: the union-based rep-
In the last two decades, a number of proposals have resentation, and the list-like representation. The union-
emerged where sets are dealt with as primitive objects of based representation makes use of the union operator (∪)
a (first-order) logic language. In this context, sets are of- to construct sets. This representation has been often used
ten represented as first-order terms, called set terms, built when dealing with the problem of set unification on its
from symbols of a suitable alphabet, using selected func- own, where set unification is dealt with as an ACI uni-
tion symbols as set constructors. Furthermore, the lan- fication problem—i.e., unification in presence of operators
guage usually provides the typical set-theoretic operations satisfying the Associativity, Commutativity, and Idempo-
to manipulate set objects. tence properties (e.g., [6]).
These short notes summarize some of the most chal- The list-like representation builds sets using an element
lenging issues arising when manipulating finite sets in a insertion constructor (typically denoted by {· | ·}). With
logic language. In Section 2 we briefly review set unifi- this approach, the finite set {t0 , . . . , tn } is represented by
cation, i.e., the key problem of solving equations between a sequence of element insertions
set terms. In Section 3 the unification problem is extended {t0 | {· · · {tn | ∅} · · ·}},
to the case of non-well-founded sets. In Section 4 we in- where t0 , . . . , tn are either individuals or sets. While this
troduce set constraints as a way to allow set-theoretic op- representation restricts the number of set variables which
erations other than set equality to be taken into account. can occur in each set term to one, on the other hand it al-
Finally, in Section 5 we briefly review proposals aiming at lows sets to be viewed and manipulated in a fashion sim-
21
Il Milione: A Journey in the Computational Logic in Italy
ilar to lists. As a matter of fact, this representation has 3 Hypersets
been adopted in a number of logic and functional-logic
programming languages with sets (e.g., CLP (SET ) [10]). Sets considered so far are the so called hereditarily finite
sets, i.e. sets with a finite number of elements, all of which
Various authors have investigated the problem of set uni-
are themselves hereditarily finite. This definition leaves
fication using the list-like representation [3, 12, 23, 8].
still a further possibility for infinity. Let us consider the
In particular, the algorithm presented in [9] considers an
sets x and y that satisfy the equations x = {∅, y}, y =
equational theory E containing the two identities (Ab) and
{x}. They are hereditarily finite, but they hide an infinite
(C`) stating the fundamental properties of the set construc-
descending chain x 3 y 3 x 3 y 3 · · ·. These sets in
tor {· | ·}:
which, roughly speaking, membership can form cycles are
(Ab) {X | {X | Z}} ≈ {X | Z} called non-well-founded sets (or hypersets). Hypersets are
(C`) {X | {Y | Z}} ≈ {Y | {X | Z}}. very important in some areas, such as concurrency theory,
but hyperset theory has been applied in a number of areas
The core of the unification algorithm is very similar in
of logic, linguistics, and computer science, as well.
structure to the traditional unification algorithms for stan-
Introducing hypersets as a data structure in a logic pro-
dard Herbrand terms (e.g., [20]). The main difference
gramming language requires a unification algorithm that is
is represented by the reduction of equations between set
able to deal with objects denoting hypersets. All set uni-
terms, {Y1 | V1 } = {Y2 | V2 }. The algorithm allows also to
fication algorithms cited in the previous section, however,
account for equations of the form X = {t0 , . . . , tn | X},
consider well-founded sets only.
with X 6∈ vars(t0 , . . . , tn ), which turns out to be satisfi-
An hyperset unification algorithm is shown in [1]. The
able for any X containing t0 , . . . , tn thanks to (Ab) and
key idea underlying this algorithm is that of enlarging the
(C`)). As an example, given the set unification problem
domain of discourse from terms (i.e., finite trees) over the
{X | S} = {1, 2} signature Σ to directed labeled graphs over Σ, possibly
with cycles. This data structure, when involving the inter-
(where {1, 2} is a syntactic shorthand for {1 | {2 | ∅}}) preted function symbol {· | ·} used as the set constructor,
the algorithm non-deterministically computes the follow- can be regarded as a convenient way to denote hypersets.
ing (complete) set of solutions: X = 1 ∧ S = {2}, X = For instance, a solution to the equation X = {X} is a
1 ∧ S = {1, 2}, X = 2 ∧ S = {1}, X = 2 ∧ S = {1, 2}. cyclic graph which can be interpreted as an hyperset con-
taining itself as its only element. In addition, a notion of
A general survey of the problem of unification in pres- bisimulation which applies to this kind of graphs is defined
ence of sets, across different set representations and differ- and the interpretation domain is taken as the set of directed
ent admissible classes of set terms, can be found in [13]. labeled graphs over Σ modulo the equivalence relation in-
The computational complexity properties of the set duced by bisimulation.
unification have been investigated by Kapur and Naren- The algorithm in [1] can be seen as an adaptation of
dran [18], who established that these decision problems are the set unification algorithm of [9]. Many of the changes
NP-complete. Complexity of the set unification operation, required to move from set to hyperset unification are the
however, depends on which forms of set terms (e.g., flat same needed when moving from standard unification to
or nested sets, with zero, one, or more set variables) are al- unification over (uninterpreted) rational trees, for which
lowed. The form of set terms in turn is influenced by the set a number of efficient algorithms have been proposed in the
constructors used to build them. Thus, different complex- literature (e.g., [21]). In particular the hyperset unification
ity results can be obtained for different classes of set terms. algorithm in [1] works on Herbrand systems of equations,
For instance, while the set equivalence test of ground set avoiding full variable substitution and adding simple non-
terms denoting flat sets, such as {a, b, c} and {b, c, a}, is membership constraints to avoid the possibly endless re-
rather easy, when the decision problem deals with nested peated insertions of the same elements into hypersets.
set terms involving variables it becomes NP-complete.
Various authors have considered simplified versions of 4 Set Constraints
the (Ab)(C`) problem obtained by imposing restrictions
on the form of the set terms. In particular, various works The algorithms cited above focus only on equality between
have been proposed to study the simpler cases of match- set terms. Besides equality, however, other basic set opera-
ing1 and unification of Bound Simple set terms, i.e., bound tions, such as membership, inclusion, union, etc., are usu-
set terms of the form {s1 , . . . , sn }, where each si is either ally required for dealing with sets in a more general way.
a constant or a variable [4, 16]. A number of proposals have been put forward in the last
fifteen years in which general set-based formulae are con-
1 Set matching can be seen as a special case of set unification, where sidered and procedures to check their consistency are de-
variables are allowed to occur in only one of the two set terms which are veloped. Most of these proposals have emerged in the con-
compared. text of Constraint (Logic) Programming (see, e.g., [15]).
22
Il Milione: A Journey in the Computational Logic in Italy
In this context, set-theoretical operations are conveniently the so-called Computable Set Theory area (C.S.T.), a fruit-
dealt with as (set) constraints, that is arbitrary conjunc- ful research stream born in the 1970’s at the New York
tions of positive and negative atomic predicates built us- University thanks to the initial ideas and subsequent stim-
ing a fixed finite set of predicate symbols denoting set- ulus of J. T. Schwartz (see [7] for a general survey). Work
theoretical operations, whose variables can range over the in C.S.T. has identified increasingly larger classes of com-
domain of sets. Systems of (set) constraints are solved as a putable formulae of suitable sub-theories of the general
whole by suitable (set) constraint solvers, which are able to Zermelo-Fraenkel set-theory for which satisfiability is de-
reduce the given constraints either to false or to a simpli- cidable. Further extensions of these classes are still under
fied form from which it is easier to obtain a solution (i.e., investigation at present. Recent related work is described
a substitution for the variables in the given constraints that in [19]. However, efforts in this area are mainly concerned
make them satisfiable in the selected interpretation). For with decidability results, rather than computing solutions
example, like it is usually required in constraint programming.
X ∈ S ∧ T = S ∪ R ∧ X 6∈ T Other classes of aggregates (akin to sets) have also been
considered in the literature. In particular, various frame-
is a set constraint, where R, S, and T are set variables, that works have introduced the use of multisets where repeated
the set constraint solver can reduce to false. elements are allowed to appear in the collection. An anal-
Set based formalisms allow a natural formulation of a ysis of the problems concerned with the introduction of
number of problems, in quite different areas: combina- multisets—as well as sets and lists—is reported in [11, 12].
torial search problems, warehouse location problems, di-
agnostic related problems (e.g., VLSI circuit verification),
program analysis, network design problems (e.g., weight
5 Efficient Set Constraint Solving
setting). Dealing with such formulations as constraints al- The proposals for (general) set constraints cited above do
low, on the one hand, to solve these problems even if not not take into account efficiency adequately to allow them to
all sets are (fully) known a priori, and, on the other hand, to be effectively applied in many concrete applications. For
compute solutions efficiently, provided constraint reason- example the CLP(SET ) solvers often use a generate & test
ing enables the solver to prune the search space. approach, that non-deterministically assigns values to vari-
As an example, the following is a very compact formu- ables as soon as those values are available. For instance,
lation as a set constraint of the well-known map coloring given the constraint X ∈ {1, 2, 3, 4, 5} ∧ X 6= 10, the
problem for a map of three regions, R1, R2, R3 (where CLP(SET ) solver enumerates all possible values of X be-
R1 borders R2 and R2 borders R3), using two colors, c1 fore asserting that the constraint holds.
and c2: A number of proposals have been developed in the last
{R1, R2, R3} = {c1, c2} ∧ M = {{R1, R2}, {R2, R3}} fifteen years that consider more restricted forms of set con-
∧ {c1} 6∈ M ∧ {c2} 6∈ M. straints but equipped with constraint solving techniques
A complete set constraint solver, i.e., one which is al- that allow them to be processed in a quite more effective
ways able to decide if a given constraint is satisfiable way. Works along these lines include [5, 14, 17].
or not is presented in [10]. The constraint language is In these proposals constraint variables have a finite do-
based on constructed sets using the list-like representa- main attached to them. In the case of set constraints, the
tion and it provides the usual set-theoretic operations as domain is a collection of sets, usually specified as a set
primitive constraints. Sets are allowed to be nested and interval [l, u], where l and u are known sets (typically,
to contain unknown elements (i.e., uninstantiated logical of integers). [l, u] represents a lattice of sets induced by
variables). The constraint solver rewrites any given con- the subset partial ordering relation ⊆ having l and u as
straint C into an equi-satisfiable disjunction of constraints the greatest lower bound and the least upper bound, re-
in solved form—proved to be correct and terminating. In spectively. The constraint solver exploits the information
particular the solver uses the set unification algorithm de- that the domain of variables provides to efficiently com-
veloped in [9] to deal with (set) equalities. A constraint pute simplified forms of the original constraint or to detect
in solved form is guaranteed to be satisfiable in the corre- failures. In its simplest form, the solver uses a local prop-
sponding structure. Therefore the ability to obtain a solved agation algorithm that attempts to enforce consistency on
form guarantees that the original constraint is satisfiable. the values in the variable domains by removing values that
This constraint structure has been exploited to obtain a cannot form part of a solution to the system of constraints.
specific instance of the general Constraint Logic Program- For example, given the set constraint
ming scheme, called CLP(SET ) [10]. A Java implementa- S ∈ {1}..{1, 2, 3, 4} ∧ X ⊆ S ∧ Y ⊆ S
tion of (most of) CLP(SET ) facilities for set management ∧ #X = 2 ∧ Z = Y \X
has been recently developed and made available as part of a where S, X, Y , and Z are set variables and #X denotes
Java library, called JSetL [22], intended to support declar- the cardinality of the set X, the constraint solver in [5]
ative programming in an object-oriented language. is able to infer that the constraint is satisfiable provided
The study of set constraints is strongly related to work in #Z ≤ 2 holds.
23
Il Milione: A Journey in the Computational Logic in Italy
Most of these consistency algorithms are incomplete, so [9] A. Dovier, E. G. Omodeo, E. Pontelli, and G. Rossi.
they have to be combined with a backtracking search pro- {log}: A Language for Programming in Logic with
cedure to produce a complete constraint solver. For exam- Finite Sets. J. of Logic Programming, 28(1):1–44,
ple, in the example above, such a procedure allows to enu- 1996.
merate all possible solutions for Z: Z = {1}, Z = {1, 2}, [10] A. Dovier, C. Piazza, E. Pontelli, and G. Rossi. Sets
Z = {1, 3}, Z = {1, 4}. and Constraint Logic Programming. ACM TOPLAS,
While these constraint languages turn out to allow more 22(5):861–931, 2000.
efficient handling of set constraints with respect to the pro-
posals cited in the previous section (e.g., CLP(SET )), the [11] A. Dovier, C. Piazza, and G. Rossi. A uniform ap-
latter allows more general form of sets to be dealt with: ele- proach to constraint-solving for lists, multisets, com-
ments can be of any type, possibly other sets, and possibly pact lists, and sets. ACM Transactions on Computa-
unknown (e.g., {X, {a, 1}}). For example the set-based tional Logic, 9(3), 2008.
formulation of the map coloring problem shown above can [12] A. Dovier, A. Policriti, and G. Rossi. A uniform ax-
be written—and solved—using CLP(SET ) but not using iomatic view of lists, multisets, and sets, and the rel-
the constraint language in [14] and [5]. evant unification algorithms. Fundamenta Informati-
A current line of research (see [2]) is trying to combine cae, 36(2/3):201–234, 1998.
the general set representation and management of propos- [13] A. Dovier, E. Pontelli, and G. Rossi. Set unification.
als like CLP(SET ), with the efficient constraint solving Theory and Practice of Logic Programming, 6:645–
of “Finite Domain” solvers, in order to have the expres- 701, 2006.
sive power of the former while retaining the execution ef-
[14] C. Gervet. Interval Propagation to Reason about Sets:
ficiency of the latter.
Definition and Implementation of a Practical Lan-
guage. Constraints, 1(3):191–244, 1997.
REFERENCES [15] C. Gervet. Constraints over Structured Domains. In
F.Rossi, P. van Beek, and T. Walsh, ed’s, Handbook
[1] D. Aliffi, A. Dovier, and G. Rossi. From Set to Hy-
of Constraint Programming. Elsevier, 2006.
perset Unification. J. of Functional and Logic Pro-
gramming, 1999(10):1–48, 1999. [16] S. Greco. Optimal Unification of Bound Simple Set
Terms. In Proc. of Conf. on Information and Knowl-
[2] A. Dal Palù, A. Dovier, E. Pontelli, and G. Rossi. edge Management, 326–336, ACM Press, 1996.
Integrating Finite Domain Constraints and CLP with
[17] P. Hawkins, V. Lagoon, and P. J. Stuckey. Solving Set
Sets. In D. Miller, ed., Proc. of 5th ACM-SIGPLAN
Constraint Satisfaction Problems using ROBDDs. J.
Int’l Conf. on Principles and Practice of Declarative
of AI Research, 24: 109–156, 2005.
Programming, 219–229, ACM Press, 2003.
[18] D. Kapur and P. Narendran. Complexity of Unifica-
[3] P. Arenas-Sánchez and A. Dovier. A Minimality tion Problems with Associative-Commutative Opera-
Study for Set Unification. J. of Functional and Logic tors. J. of Automated Reasoning, 9:261–288, 1992.
Programming, 1997(7):1–49, 1997.
[19] V. Kuncak. Polynomial Constraints for Sets with Car-
[4] N. Arni, S. Greco, and D. Saccà. Matching of dinality Bounds. FoSSaCS’99, LNCS 4423, Springer-
Bounded Set Terms in the Logic Language LDL++. Verlag, 2007.
J. of Logic Programming, 27(1):73–87, 1996. [20] A. Martelli and U. Montanari. An Efficient Unifica-
[5] F. Azevedo. Cardinal: A Finite Sets Constraint tion Algorithm. ACM TOPLAS, 4:258–282, 1982.
Solver. Constraints, 12(37):93–129, 2007. [21] A. Martelli and G. Rossi. Efficient Unification with
[6] W. Büttner. Unification in the Data Structure Sets. Infinite Terms in Logic Programming. In Proc. of
In J. K. Siekmann, ed., Proc. of the 8th Int’l Conf. FGCS’84: Int’l Conf. on Fifth Generation Computer
on Automated Deduction, v. 230, 470–488, Springer- Systems, 1984.
Verlag, 1986. [22] G. Rossi, E. Panegai, and E. Poleo. JSetL: A Java
Library for Supporting Declarative Programming in
[7] D. Cantone, E. G. Omodeo, and A. Policriti. Set Java. Software-Practice & Experience, 37:115–149,
Theory for Computing. From Decision Procedures to 2007.
Declarative Programming with Sets. Monographs in
Computer Science, Springer-Verlag, 2001. [23] F. Stolzenburg. An Algorithm for General Set Unifi-
cation and Its Complexity. J. of Automated Reason-
[8] E. Dantsin and A. Voronkov. A Nondeterminis- ing, 22(1):45–63, 1999.
tic Polynomial-Time Unification Algorithm for Bags,
Sets, and Trees. In W. Thomas, ed., FoSSaCS’99,
LNCS 1578, 180–196, Springer-Verlag, 1999.
24
Il Milione: A Journey in the Computational Logic in Italy
6 Contacts
Gianfranco Rossi
Dipartimento di Matematica
Università di Parma
Viale G. P. Usberti 53/A
43100 Parma (Italy)
Phone: +39 (0521) 90.6909
E-mail: gianfranco.rossi@unipr.it
7 Biography
Gianfranco Rossi received a degree in Computer Science from the University of Pisa in 1979. Since November 2001 he
is a Full Professor of Computer Science at the University of Parma. His research activity has been mainly devoted to
Programming Languages, with special attention to Logic Programming languages. Since December 2006 he is President
of the Association of Logic Programming (GULP).
25