=Paper= {{Paper |id=Vol-1433/dc_4 |storemode=property |title=Higher Order Support in Logic Specification Languages for Data Mining Applications |pdfUrl=https://ceur-ws.org/Vol-1433/dc_4.pdf |volume=Vol-1433 |dblpUrl=https://dblp.org/rec/conf/iclp/Hallen15 }} ==Higher Order Support in Logic Specification Languages for Data Mining Applications== https://ceur-ws.org/Vol-1433/dc_4.pdf
Technical Communications of ICLP 2015. Copyright with the Authors.                       1




    Higher Order Support in Logic Specification
    Languages for Data Mining Applications
                                Matthias van der Hallen
                                        KU Leuven
                       (e-mail: firstname.lastname@cs.kuleuven.be)

                       submitted 29 April 2015; accepted 5 June 2015



                                       Abstract

In this paper, we introduce our work on our doctorate with title “Higher Order Support in
Logic Specification Languages for Data Mining Applications”. Current logic specification
languages, such as FO(·) provide an intuitive way for defining the knowledge within a
problem domain.
   Extended support for data representation is lacking however, and we want to introduce
structured recursive types and generic types, together with a first class citizen approach
to predicates. These additions correspond to higher order concepts.
   We provide a background of the current techniques that might be of interest when
implementing these higher order abstractions, such as lazy grounding and oracles. We
sketch the eventual goal of our research, and give an overview of the current state and
research questions that are being considered.

KEYWORDS: Logic Specification, Higher Order, Grounding, Lifted Reasoning




                   1 Introduction and Problem Description
Current logic specification languages, like FO(·) (Denecker and Ternovska 2008),
offer many intuitive ways to specify the knowledge within a problem domain: aggre-
gates, inductive definitions, . . . However, it is still lacking abstractions that are com-
monplace in other paradigms, for example structured recursive types and generic
types. Furthermore, some additional abstractions are possible in logic programming,
such as the treatment of relations as first class citizens. These abstractions are very
powerful and recognized ways of data representation, and they all correspond to
higher order concepts.
   We believe that the addition of these abstractions will make the life of program-
mers and other people who write logic specifications easier, and therefore will open
up additional interest in logic specification languages. Additionally, it will allow one
to write more modular specifications, and reuse them many different times in many
different applications.
   One example of an often reusable specification is the transitive closure of a binary
2                              Matthias van der Hallen

relation, as shown below:
                     ∀a, b : Closure(a, b) ← P (a, b).
                     ∀a, b : Closure(a, b) ← ∃ : P (a, c) ∧ P (c, b).
Another example within the context of data mining is the concept of a graph homo-
morphism. A graph homomorphism is a constrained relation between two graphs.
As such, it is natural to represent it as a predicate. However, data mining applica-
tions frequently use homomorphisms, isomorphisms or other ‘relationships’ as an
object of their reasoning. For example, we want something to hold for no, some,
or all homomorphisms. This treats these predicates as a first class citizen as it
introduces quantification over these predicates.
   The idea that modularity benefits from an expressive higher order language can
be found in earlier work done conjunction with I. Dasseville et al. (Dasseville et al.
2016). Here, logic specification languages are extended with templates, a popular
way of defining reusable concepts in a modular way, using second order definitions.
   The main roadblock towards implementing the forementioned abstractions is that
many logic specification languages use a ground-and-solve technique. The grounding
phase transforms a logic specification theory to an equivalent theory on proposi-
tional level, allowing SAT techniques to be used. It achieves this by enumerating
over finite domains, instantiating the theory for every possible substitution of vari-
ables by domain elements. However, we often do not want to fix our domain be-
forehand: we do not want to fix the number of connections or elements in pattern
mining, or the names of items in item set mining. Furthermore, structured recursive
types and generic types introduce infinite domains of their own, for example the set
of all possible lists. As a result, it will be necessary to develop grounding techniques
that can handle infinite domains.


          2 Background and Overview of the Existing Literature
This research is focussed on providing higher order support to ground-and-solve
systems such as IDP (Imperative Declarative Programming) (de Cat et al. 2014).
The IDP system allows specifications specified in the FO(·) language which it
subsequently grounds, i.e. translates to an equivalent theory on a propositional
level. These translations can then be solved using a SAT-solver.
   Other systems, such as Flora-2 (Kifer 2005), take a different approach. Flora-2
can translate specifications directly to Prolog for which it can use the XSB sys-
tem (Warren 1998) with tabling as an inference engine. The Flora-2 system is
based on F-Logic (Kifer et al. 1995) and HiLog (Chen et al. 1993). F-logic extends
classical logic with the concepts of objects, classes, and types, and allows an object-
oriented syntax. One of HiLogs defining characteristics is that it combines a higher
order syntax with first order semantics so as to remain decidable. As any language
with inductive definitions under the well-founded or the stable semantics is unde-
cidable, this is less of an issue for a system with a language such as FO(·). Also,
in recent times various other, simpler inference techniques such as model checking,
model expansion or querying have gained importance with respect to deduction.
Higher Order Support in Logic Specification Languages for Data Mining Applications 3

                            2.1 Relevant Techniques

Currently, systems that depend on a grounding phase do not combine with specifi-
cations including infinite domains as grounding introduces an exponential blowup
when done in a naive way.
   However, there are several interesting problems where it is not wise to restrict
ourselves to a finite domain. Moreover, as argued in Section 1, our proposed addi-
tions and abstractions to allow more user-friendly data representations introduce
these infinite domains.
   Because of the possible blowup when working with large, (possibly infinite) do-
mains, the design and implementation of novel grounding and inference techniques
is necessary. These techniques must be capable of reasoning on the infinite domain
in only finite time. While this is in general undecidable, various fields of computa-
tional logic and declarative problem solving have developed techniques that provide
effective solutions for a broad and practically important class of such problems. We
envision to combine existing methods, to generalize them and to add novel ones.
   Our main context for this work will be the field of lifted reasoning. Possible (and
interrelated) techniques in this field that aid with reasoning over infinite domains
are:

 Lazy grounding (De Cat et al. 2015; De Cat et al. 2012; Palù et al.
    2009): The current state-of-the-art ground-and-solve systems keep a strict
    separation between the grounding and the solving phase. This means that a
    lot of computing time and storage space is spent on grounding the theory be-
    fore the solver starts its search in solution space. For modellings where large
    or infinite domains are used, this grounding phase can be impossible, or at
    the very least, terribly inefficient. Lazy grounding is a technique that remedies
    this by (operationally) weaving these two phases together. This means that
    the solver can solve everything that is already grounded: i.e. make decisions
    and propagate them. These propagations and decisions then cause ground-
    ing to be generated for other sentences on demand, postponing as much of
    the grounding as possible and beneficial to the systems performance. This
    grounding behavior is called lazy, as it only does the work (and incurs the
    costs) related to grounding when it proves to be necessary for the solver.
 Quantifier elimination and rewriting: Various techniques for the removal of
    quantifiers, or replacement of existential quantifiers by universal quantifiers
    (e.g. Skolemization) exist. Skolemization introduces functions; few state of
    the art solvers support functions, however the IDP system does and already
    makes use of Skolemization to reduce grounding size (De Cat et al. 2013).
    Other elimination and rewriting techniques are possible, for example quan-
    tifier eliminations techniques inspired by the quantifier elimination used in
    Presburger arithmetic (Cooper 1972).
 Bounds propagation for sets: Using several set-axioms and number-theoretic
    properties, it might be possible to deduce bounds for sets which are not con-
    strained to a finite domain explicitly by the modelling. This greatly reduces
4                              Matthias van der Hallen

   the costs of grounding as it can cause large domains to be limited to much
   smaller, better tractable domains.
Propagation of homogeneity in subdomains for cofinite sets: Even if a
   set contains an infinite number of elements, its elements might deviate from
   a simple default rule in only a finite number of situations. This means that
   there is a certain homogeneity or simple defining relation for the remainder
   of the sets domain, and, given this prescription for the set, only a small set
   of additional values carries additional (meaningful) information. Because of
   this, for several inference tasks it can prove unnecessary to enumerate the
   entire set, instead reasoning only on the combination of the prescription and
   the small complementary set to solve the task at hand.
Specialized algorithms: specialized and efficient algorithms for set operations (Dovier
   et al. 2006) and enumerations can be conceived, for example the exhaustive
   enumeration of graphs that satisfy certain logical conditions.
Oracles: The use of subsolvers as oracles is a promising avenue for supporting
   higher order quantification over predicates. The necessary search is performed
   by a subsolver which is given a theory that describes all constraints over the
   quantified predicate. Using negation, universal quantification over a predicate
   is turned into existential quantification. As a result, it is possible to rewrite
   every quantification into an existential Second Order theory (or search prob-
   lem) that a subsolver can tackle.
   The subsolver answers whether the new theory is satisfiable, and if so, the
   correct conclusions about the super-theory are inferred, and if possible, prop-
   agated. The performance of these subsolvers, as well as how much information
   can be shared between them and the optimal level of granularity that decides
   when a subsolver must run, is the subject of ongoing research.
   As an example, consider a shipping dock with multiple separate docking seg-
   ments and ships. We want to package a large payload in (as few as possible)
   smaller load that we can later distribute over the ships. However, we do not
   know which ship will be placed in which dock, and every dock and ship has a
   maximal load capacity that it can bear. We can express this as follows:

             ∃Weight[Package 7→ Int] : ∀Place[Ship 7→ Dock] : ∃Distri[Package 7→ Ship] :
                       ∀ship : sum{pack[Package] : Distri(pack) = ship : Weight(pack)}
                                          < min(dockCap(Place(ship)), shipCap(ship)).

    Informally, this must be read as follows: there exists a function Weight that
    gives the weight for each package, such that for every placement of ships in
    docking segments there exists a distribution Distri of the packages over the
    ships such that the maximal capacity of neither ship nor dock is exceeded.
    Here, the universal quantification over the placement predicate Place can be
    transformed to existential quantification by negating the remainder of the
    logical sentence, which will be solved by a subsolver. We will require that
    this subsolver proves unsatisfiability, as the transformation from universal to
    existential quantification introduces a negation before and after the quantifi-
    cation. Note that this introduces a negation before the existential quantifi-
Higher Order Support in Logic Specification Languages for Data Mining Applications 5

      cation of the Distri predicate, which in the naive schemes results in another
      subsolver.

   These techniques are called “lifted reasoning” because the reasoning task is partly
done at the predicate level, as opposed to the current state-of-the-art that defers
all reasoning until after the grounding phase, when it can be done on propositional
level.
   The development and adaptation of these techniques will pose theoretical and
implementation challenges.
   Furthermore, significant software architectural questions must be answered to
insert these techniques in the classical two phase system with the appropriate level
of modularity and encapsulation.


                                        3 Goal
The goal of this research is to add higher order support to logic specification lan-
guages running on ground-and-solve systems, specifically with data representation
and data mining applications in mind. These higher order support will come in the
form of structured recursive types, generic types, and relations as first class citizens.
   We expect that these additional features make data handling and reasoning more
natural, and that they allow us to write shorter, more specific and modular specifi-
cations. These modular specifications should then be combined into a larger specifi-
cation for solving larger problems, leading the way for a software design methodology
for logical specifications.
   We expect to provide an implementation of the ideas and results stemming from
this research for the FO(·) language and the IDP system that supports and provides
inferences for this language.


                       4 Current State and Future Work
Current research topics consist of analyzing how the work on lazy grounding (De Cat
et al. 2015), justifications (Denecker et al. ) and relevance can be leveraged to
work with infinite domains. The idea behind justifications is that given a certain
interpretation, the truth value of every ground literal can be justified on the basis
of the program. For example, given a true literal its justification could be the body
of one of its rules for which the body is true. This is called a direct justification.
Combining all these justifications results in the justification graph.
   From all possible justification graphs, the relevance of certain literals can be
deduced. Some literals are irrelevant: the truth values of these literals does not
matter for the truth value of the program. Combining relevance derived from the
justification graph together with lazy grounding will likely lead to ways of efficiently
handling predicates with an infinite domain by not providing an interpretation for
subdomains where the predicate can be chosen freely.
   This hypothesis of course needs an experimental evaluation on a well-chosen set
of real world problems. We expect to publish at least one summary of the devised
6                             Matthias van der Hallen

methods and their evaluation to be written for publication, as well as an IDP release
with full support for lazy grounding.
   Furthermore, we investigate our hypothesis that the justification graph of predi-
cates will allow us to learn characterizations of the ‘behavior’ of a predicate on large
(infinite) parts of its domain, which can be used for lifted clause learning. The idea
is that these clauses can then adequately describe the properties of the predicate,
without having to compute it fully. For example, an arbitrary predicate could, by
analysis of its justification graph, be detected to be an ordered list of five elements
of a generic type, leading to a new representation as:
    • five constants of a generic type,
    • a constraint that all 5 constants have the same type a,
    • a constraint that a must contain at least 5 domain elements, and provide an
      ordering.
   Moreover, we are exploring how the concept of oracles can be implemented using
the idea of subcalls to the same solver. To this effect, we are looking at ways to
share information, data structures and other necessary information between differ-
ent solver calls, with the aim of reducing overhead and setup costs. Later, we will
round up our findings regarding oracles for publication.
   We are also collecting a benchmark set and use this to test how eager the system
must be to perform a subsolver call, as well as how detailed the learned clauses
should be when a subsolver does not produce the wanted result, i.e. represents a
conflict. Using this benchmark set, we will evaluate the different answers to the
questions above and publish a paper detailing these experimental results. It will
also provide us with an important indication on whether to publish an IDP version
incorporating these techniques.
   Lastly, we’re looking at set theory as supported by the B method (Cansell and
Méry 2003): The B method provides many set comprehensions and set operations.
Currently, our view on predicates is that they double as sets: P (a). is the same
as a ∈ P . Studying this use of sets and their operations and relating them to the
view where a set is exclusively defined by its ‘elementOf’ relation P will lead to
interesting new insights in porting the capabilities of reasoning about higher order
as available in B method systems (e.g. ProB) to ground-and-solve systems (such
as IDP). We believe that this comparison and the derived insights will provide
material for another publication.
Higher Order Support in Logic Specification Languages for Data Mining Applications 7

                                     References
Cansell, D. and Méry, D. 2003. Foundations of the B method. Computers and Artificial
  Intelligence 22, 3-4, 221–256.
Chen, W., Kifer, M., and Warren, D. S. 1993. HILOG: A foundation for higher-order
  logic programming. J. Log. Program. 15, 3, 187–230.
Cooper, D. C. 1972. Theorem proving in arithmetic without multiplication. Machine
  Intelligence 7 , 91–99.
Dasseville, I., van der Hallen, M., Janssens, G., and Denecker, M. 2016. Seman-
  tics of templates in a compositional framework for building logics. TPLP 16. Accepted
  for publication.
de Cat, B., Bogaerts, B., Bruynooghe, M., and Denecker, M. 2014. Predicate logic
  as a modelling language: The IDP system. CoRR abs/1401.6312.
De Cat, B., Bogaerts, B., Denecker, M., and Devriendt, J. 2013. Model expan-
  sion in the presence of function symbols using constraint programming. In IEEE 25th
  International Conference on Tools with Artificial Intelligence, ICTAI 2013, Washinton,
  USA, November 4-6, 2013, International Conference on Tools For Aritificial Intelli-
  gence, Washington D.C., 4-6 Nov 2013. 1068–1075.
De Cat, B., Denecker, M., and Stuckey, P. 2012. Lazy model expansion by incre-
  mental grounding. In Technical Communications of the 28th International Conference
  on Logic Programming, ICLP 2012, September 4-8, 2012, Budapest, Hungary, Inter-
  nationcal Conference on Logic Programming, Budapest, 4-8 Sept 2012, A. Dovier and
  V. Santos Costa, Eds. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 201–211.
De Cat, B., Denecker, M., Stuckey, P., and Bruynooghe, M. 2015. Lazy model
  expansion: Interleaving grounding with search. The Journal of Artificial Intelligence
  Research 52, 235–286.
Denecker, M., Brewka, G., and Strass, H. A formal theory of justifications. Accepted.
Denecker, M. and Ternovska, E. 2008. A logic of nonmonotone inductive definitions.
  ACM Trans. Comput. Log. 9, 2.
Dovier, A., Pontelli, E., and Rossi, G. 2006. Set unification. TPLP 6, 6, 645–701.
Kifer, M. 2005. Nonmonotonic reasoning in FLORA-2. In Logic Programming and Non-
  monotonic Reasoning, 8th International Conference, LPNMR 2005, Diamante, Italy,
  September 5-8, 2005, Proceedings, C. Baral, G. Greco, N. Leone, and G. Terracina, Eds.
  Lecture Notes in Computer Science, vol. 3662. Springer, 1–12.
Kifer, M., Lausen, G., and Wu, J. 1995. Logical foundations of object-oriented and
  frame-based languages. J. ACM 42, 4, 741–843.
Palù, A. D., Dovier, A., Pontelli, E., and Rossi, G. 2009. GASP: answer set pro-
  gramming with lazy grounding. Fundam. Inform. 96, 3, 297–322.
Warren, D. S. 1998. Programming with tabling in XSB. In Programming Concepts
  and Methods, IFIP TC2/WG2.2,2.3 International Conference on Programming Con-
  cepts and Methods (PROCOMET ’98) 8-12 June 1998, Shelter Island, New York, USA,
  D. Gries and W. P. de Roever, Eds. IFIP Conference Proceedings, vol. 125. Chapman
  & Hall, 5–6.