<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Binary Decision Diagrams to Enumerate Inductive Logic Programming Solutions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hikaru Shindo</string-name>
          <email>hikarushindo@iip.ist.i.kyoto-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masaaki Nishino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akihiro Yamamoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graduate School of Informatics, Kyoto University</institution>
          ,
          <addr-line>Kyoto</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NTT Communication Science Laboratories, NTT Corporation</institution>
          ,
          <addr-line>Kyoto</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper proposes a method for efficiently enumerating all solutions of a given ILP problem. Inductive Logic Programming (ILP) is a machine learning technique that assumes that all data, background knowledge, and hypotheses can be represented by first-order logic. The solution of an ILP problem is called a hypothesis. Most ILP studies propose methods that find only one hypothesis per problem. In contrast, the proposed method uses Binary Decision Diagrams (BDDs) to enumerate all hypotheses of a given ILP problem. A BDD yields a very compact graph representation of a Boolean function. Once all hypotheses are enumerated, the user can select the preferred hypothesis or compare the hypotheses using an evaluation function. We propose an efficient recursive algorithm for constructing a BDD that represents the set of all hypotheses. In addition, we propose an efficient method to get the best hypothesis given an evaluation function. We empirically show the practicality of our approach on ILP problems using real data.</p>
      </abstract>
      <kwd-group>
        <kwd>Inductive Logic Programming</kwd>
        <kwd>Binary Decision Diagram</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper proposes a method that can efficiently enumerate all solutions of a
given ILP problem. Inductive Logic Programming (ILP) is a machine learning
technique that assuming that each datum, background knowledge, and
hypothesis can be represented by first-order logic. Our work is based on the foundation of
ILP established by Plotkin [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Shapiro [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]: inductive inference with clausal
logic. The goal of an ILP system is to find a hypothesis that explains all positive
examples while admitting no negative examples. A lot of methods for finding
hypotheses have been proposed. Inverse entailment [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and saturation [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] are well
known methods of hypothesis discovery. Unfortunately, there are restrictions on
the hypotheses that can be generated by these methods as they are designed to
output only one hypothesis per problem. This restriction may prevent
important hypotheses from being generated, i.e. discovered. To avoid this problem, we
propose a method that can enumerate all hypotheses.
      </p>
      <p>
        As most ILP problems have an infinite number of hypotheses in the absence
of any restriction, we need a new algorithm that enumerates all hypotheses in
a bounded hypothesis space, i.e., every hypothesis is a subset of a finite set of
clauses. Even with this restriction, it is implausible to naively enumerate all
hypotheses because there are 2n candidate hypotheses where n is the number of
clauses in the hypothesis space. Our proposal is an enumeration method that
exploits Binary Decision Diagrams (BDDs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A BDD is a very compact graph
representation of a Boolean function, and allows several operations to be
performed efficiently. The merits of enumerating hypotheses are following:
Preventing hypothesis omission The importance of a hypothesis depends
on the case, so algorithms that give only one hypothesis may not return the
best hypothesis. By using a BDD to enumerate all hypotheses, the user is
assured of getting the important hypothesis for the desired case as the BDD
ensures that it will be enumerated.
      </p>
      <p>Hypothesis selection Users can select a hypothesis or compare some
hypotheses using an evaluation function. Moreover, users can select the top-k best
hypotheses. If there are multiple evaluation metrics for evaluating
hypothesis importance, then finding just one hypothesis may insufficient. We show
that this selection can be executed efficiently with BDDs.</p>
      <p>Online-learning BDDs support efficient binary operations. By using this
feature, we can efficiently perform online leaning, i.e., updating the current set
of hypothesis when new examples are added.</p>
      <p>
        We introduce propositional variables that represent if a clause is contained in
a hypothesis or not. This makes the hypothesis enumeration problem equivalent
to the problem of identifying an n-ary Boolean function. We show an efficient way
of constructing a BDD for Boolean function representation. As an application
of the constructed BDD, we show how to get the best hypothesis in terms of
minimizing an evaluation function. As an example, this paper introduces an
evaluation function based on Minimum Description Length (MDL) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We can
find the best hypothesis in time linear to the number of nodes by dynamic
programming. We apply these methods to real data sets and show that the
resulting BDDs can compactly represent hypotheses.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>Inductive Logic Programming (ILP)</title>
        <p>
          In clausal logic, formulae are constructed of five symbols [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]: constants,
variables, function symbols, predicate symbols, and connectives. A term is a
constant, a variable, or an expression f (t1, . . . , tn) where f is an n-ary function and
t1, . . . , tn are terms. If P is an n-ary predicate symbol and t1, . . . , tn are terms,
then P (t1, . . . , tn) is a formula called an atom. A ground atom is an atom that
contains no variables. A literal is an atom or its negation. A positive literal is
just an atom. A negative literal is the negation of an atom. A clause is a finite
disjunction (∨) of literals. A definite clause is a clause with exactly one positive
literal. If A, B1, . . . , Bn are atoms, then A ∨ ¬B1 ∨ . . . ∨ ¬Bn is a definite clause.
We write definite clauses in the form of the implication A ← B1 ∧ . . . ∧ Bn. Atom
A in the clause is called the head, and the set of negative atoms {B1, . . . , Bn} in
the clause is called the body. We denote the constant true as T, and the constant
f alse as F.
        </p>
        <p>ILP concerns learning a theory from given examples, possibly taking
background knowledge into account. We can distinguish between two kinds of
examples: positive and negative. Usually, we assume that a theory is represented as a
finite set, Σ, of definite clauses, and the positive examples, the negative
examples, and background knowledge are given respectively as finite sets E +, E −, B
of ground atoms.</p>
        <p>Formally, the ILP problem treated in this paper is defined as follows.</p>
        <sec id="sec-2-1-1">
          <title>Input Finite set E +, E −, and B of ground atoms,</title>
          <p>Output A set of definite clauses Σ such that
f or all A ∈ E + Σ ∪ B |= A and f or all A ∈ E − Σ ∪ B 2 A.
(1)
In this paper, we call theory Σ that satisfies equation (1) a hypothesis.
Example 1.</p>
          <p>
            E + = {e(0), e(s2(0))}, E − = {e(s(0))}, B = {},
where e is a 1-ary predicate symbol that becomes true if its argument is an
even number, 0 is a constant symbol interpreted as natural number 0, s is a
1-ary function symbol and sn(0) is interpreted as natural number n. Examples
of correct hypotheses are as follows:
Σ = {e(0), e(s2(0))},
A Binary Decision Diagram (BDD) [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] is a directed acyclic graph that represents
a Boolean function. Fig. 1 is a BDD that represents Boolean function (x0 ∧ x1) ∨
x2. The nonterminal node that has no parents is called root. Each nonterminal
node is labeled with a variable, and has a low-branch, shown as a dashed line,
and a high-branch, shown as a solid line. We denote the node whose index is i as
node i. We start at the root and take the low-branch from node j when xj = 0,
the high-branch when xj = 1. Each terminal node is labeled 0 or 1. A path
from the root to the terminal node labeled 0 corresponds to an assignment for
which the function returns false, and a path from the root to the terminal node
labeled 1 corresponds to an assignment for which the function returns true. In
this paper, the number of nodes means the number of nodes other than terminal
nodes.
          </p>
          <p>2
0
0
1
1</p>
          <p>A binary decision tree is obtained by applying Shannon decomposition
recursively a Boolean function, and a BDD is given by deleting redundant nodes and
sharing equivalent subgraphs of a binary decision tree. BDDs have the following
features.</p>
          <p>
            – The shape of the minimum BDD is uniquely determined by the order of
input variables. If the order of input variables is given, this feature enables
the equivalency of two Boolean functions to be checked in constant time if
they are represented as BDDs.
– The nummber of nodes of a BDD that represents an m input Boolean function
is O( 2m ) in the worst case [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. However, many practical Boolean functions
can be represented as a BDD, which is a comparatively small graph.
– Binary operations between BDDs can be executed efficiently [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. For
example, given two BDDs representing logical functions F and G, then the BDD
representing H = F ∧ G can be computed in time linear to input BDD sizes.
– Users can count the number of solutions, solve the Boolean optimization
problem, and randomly sample a solution in O(n), where n is the number
of the nodes of the BDD (see [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]).
          </p>
          <p>In this paper, we assume that the index of terminal node 0 is 0, that of
terminal node 1 is 1, and that of the root is n + 1, where n is the number of
nodes of the BDD. We also assume for all nodes i and node j, where 0 ≤ i ≤ n+1
and 0 ≤ j ≤ n + 1,</p>
          <p>i &lt; j ⇒ node j is not the child of node i.
3
3.1</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Restrictions</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Enumerating hypotheses by constructing BDDs</title>
      <p>In general, there are an infinite number of hypotheses for an ILP problem if no
restrictions are set. Furthermore, the algorithm proposed here does not finish
in finite steps if there is a recursive structure in the hypothesis space. Let us
introduce some concepts for bounding the hypothesis space.
Definition 1. (Mutually recursive clauses) Let H be a hypothesis space that is
finite set of definite clauses. If a series of definite clauses {Ci ∈ H}i=0,...,n and
substitutions θ1, . . . , θn exist, and they are expressed as</p>
      <p>C1θ1 = A ← . . . ∧ X1 ∧ . . . ,
C2θ2 = X1 ← . . . ∧ X2 ∧ . . . ,
.
.</p>
      <p>.</p>
      <p>Cnθn = Xn−1 ← . . . ∧ A ∧ . . . ,
then C1, C2, . . . , Cn are mutually recursive clauses.</p>
      <p>
        Definition 2. (Variable-bounded [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) Definite clause A ← B1 ∧ . . . ∧ Bn is
variable-bounded if v(A) ⊇ v(Bi) (i = 1, . . . , n), where v(C) is the set of
all variables in C. The hypothesis space H is variable-bounded if all C ∈ H are
variable-bounded.
      </p>
      <p>The requirements that hypothesis space H should hold are as follows:
Requirement 1 The hypothesis space does not contain any mutually recursive
clauses.</p>
      <p>Requirement 2 The hypothesis space is variable-bounded.</p>
      <p>Requirement 1 ensures that we can trace all literals present in the
hypothesis space in a finite number of steps. Requirement 2 ensures that for a clause
Cθ = A ← B1 ∧ . . . ∧ Bn ∈ H, if A has no variables, then Bi (i = 1, . . . , n) also
has no variables.
3.2</p>
      <sec id="sec-3-1">
        <title>Introduction of propositional variables</title>
        <p>If the hypothesis space is H, every ILP problem is interpreted as a search
problem of the subset of H. The enumeration is understood as a problem involving
the family of sets. This means we can formulate the enumeration problem as
an identification problem of a Boolean function with appropriate propositional
variables.</p>
        <p>For each clause C ∈ H, we introduce a propositional variable vC∈Σ which
becomes true if and only if clause C is in hypothesis Σ. For readability, we use
an expression [C ∈ Σ] instead of vC∈Σ , that is we let</p>
        <p>C ∈ Σ ⇔ [C ∈ Σ] = T.
(2)
We construct a BDD that represents the set of hypotheses with these
propositional variables.</p>
        <p>Example 2. When the hypothesis space H is defined as

e(s(x)) ← e(x),  ,
e(s2(x)) ← e(x),



an example of an assignment to the propositional variables generated is:
[e(0) ∈ Σ] = T, [e(s2(0)) ∈ Σ] = F,
[e(x) ∈ Σ] = F, [e(s2(x)) ∈ Σ] = F,
[e(s(0)) ∈ Σ] = F, [e(s2(x)) ← e(x) ∈ Σ] = T,
[e(s(x)) ∈ Σ] = F, [e(s2(x)) ← e(s(x)) ∈ Σ] = F,
[e(s(x)) ← e(x) ∈ Σ] = F, [e(s2(x)) ← e(s(x)) ∧ e(x) ∈ Σ] = F.</p>
        <p>We get the following hypothesis from this assignment:
In this section, we show how to construct a BDD that represents the set of
hypotheses. We assume that positive examples E +, negative examples E −,
background knowledge B, and hypothesis space H are given. We also assume that</p>
        <sec id="sec-3-1-1">
          <title>E +, E −, and B consist of only ground atoms. Let us denote the ILP problem as</title>
          <p>P = (E +, E −, B, H). For C ∈ E + ∪ E −, we define FC as a BDD that represents
the Boolean function whose input corresponds to Σ ⊆ H; it becomes true if and
only if Σ ∪ B |= C on problem P . We also define IC as a BDD that represents
the Boolean variable [C ∈ Σ], and BKC as a BDD that represents a constant
which becomes true if and only if C ∈ B. By using binary operations between</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>BDDs, then FC where C ∈ E + ∪ E − is recursively defined as</title>
          <p>FC = BKC ∨
_</p>
          <p>D∈H
Dθ=C←∃Bθ1∧...∧Bn</p>
          <p>ID ∧ ^ FBi .</p>
          <p>(3)
The right side of equation (3) represents the fact that Σ ∪ B |= C if C ∈ B,
or there exists substitution θ for definite clause D ∈ H such that the head of
Dθ becomes C by the substitution θ, and Σ ∪ B |= Bi (i = 1, . . . , n) where
B1, . . . , Bn are the atoms of the body of Dθ.</p>
          <p>A BDD that represents the set of hypotheses of problem P is</p>
          <p>^
C∈E+</p>
          <p>FC ∧</p>
          <p>¬FC .</p>
          <p>^
C∈E−
(4)
The goal is to construct a BDD that represents the expression (4). We calculate
each FC where C ∈ E + ∪ E − following equation (3). The algorithm is written
as Algorithm 1. Here we say that FT is a BDD that represents the constant
true, and FF is a BDD that represents the constant f alse. The function
constructBDD returns the BDD that represents the set of hypotheses by calling
constructF for each clause C ∈ E + ∪ E −. The function constructF takes
clause C as an input and returns a BDD that represents the Boolean function
that becomes true if and only if Σ ∪ B |= C.
Example 3. We show how to construct a BDD that represents a set of
hypotheses. We assume the positive examples E +, the negative examples E − and
background knowledge B in Example 1, and the hypothesis space H in Example 2.
Fe(0), Fe(s(0)), and Fe(s2(0)) are constructed as follows,</p>
          <p>Fe(0) = Ie(0) ∨ Ie(x),
Fe(s(0)) = Ie(s(0)) ∨ Ie(x) ∨ Ie(s(x)) ∨ Ie(s(x))←e(x) ∧ Fe(0) ,
Fe(s2(0)) = Ie(s2(0)) ∨ Ie(x) ∨ Ie(s(x)) ∨ Ie(s2(x)) ∨ Ie(s2(x))←e(x) ∧ Fe(0)
∨ Ie(s2(x))←e(s(x)) ∧ Fe(s(0)) ∨ Ie(s2(x))←e(s(x))∧e(x) ∧ Fe(s(0)) ∧ Fe(0) .
The BDD that represents the set of hypotheses is</p>
          <p>Fe(0) ∧ Fe(s2(0)) ∧ ¬Fe(s(0)).</p>
          <p>Generated propositional variables and the constructed BDD are as shown in Fig.
2.</p>
          <p>The BDD in Fig. 2 represents the set of 28 hypotheses with 8 nodes. We get
individual hypotheses by tracing from the root to terminal node 1 . Examples
0 [e(0) ∈ Σ]
1 [e(x) ∈ Σ]
7 [e(s2(x)) ← e(x) ∈ Σ]
8 [e(s2(x)) ← e(s(x)) ∈ Σ]
9 [e(s2(x)) ← e(s(x)) ∧ e(x) ∈ Σ]
0
1
2
0
3
4
7
5
6
1
of the hypotheses of this problem are listed as follows,</p>
          <p>{e(0), e(s2(0))}, {e(0), e(s2(x))}, {e(0), e(s2(x)) ← e(x)}.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Applications of BDD-based enumeration</title>
      <p>As described in Section 1, there are several benefits to using a BDD to enumerate
hypotheses. Here we show how BDDs can be used for finding the hypothesis that
maximizes a given evaluation function. Here we take the Minimum Description
Length (MDL) principle as an example in a practical evaluation of the proposed
method.
4.1</p>
      <sec id="sec-4-1">
        <title>Minimum description length</title>
        <p>
          Minimum description length (MDL) is a principle of model selection that
states that the best model has the shortest description. The MDL principle was
introduced to machine learning by Quinlan and Rivest [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. MDL has been used
in ILP in [
          <xref ref-type="bibr" rid="ref12 ref13 ref4">13, 12, 4</xref>
          ]. In Progol [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the length of hypothesis Σ is measured as
the number of atoms in the hypothesis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] as follows
        </p>
        <p>DL(Σ) =</p>
        <p>X l(C),
C∈Σ
(5)
where l(C) is the number of atoms composing clause C.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Search for the best hypothesis</title>
        <p>We can get the best hypothesis consistent with the data from the BDD by finding
the minimum-weight path from the root to the terminal node. For each clause
C in hypothesis space H, we assign the number of atoms of C as the weight of
the corresponding node’s high-branch for a BDD written as (4). We denote the
weight of the high-branch of node i as wi with the value
wi = l(Ci),
(6)
where Ci is the clause that corresponds to node i. Every low-branch has weight
of 0.</p>
        <p>We get the best hypothesis that minimize description length (5) by finding
the minimum-weight path from the root to terminal node 1 on a BDD whose
weights are assigned following equation (6).</p>
        <p>
          It is known that this type of optimization problem can be efficiently solved in
time linear to BDD size by using dynamic programming (see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). The running
time is O(n+B), where B is the number of BDD nodes. The algorithm is written
as Algorithm 2. Here highi is the index of the node that is traced using the
high-branch from node i, and lowi is the index of the node that is traced using
the low-branch from node i. The value of m[i] represents the minimal sum of
the weights from the node i to a terminal node. The value of t[i] represents
which branch should be taken at node i to minimize the sum of the weights.
Here t[i] = 0 means that the low-branch should be taken, t[i] = 1 means that
the high-branch should be taken, and t[i] = 2 means that either branch can be
taken.
        </p>
        <p>
          Similarly, the problem of finding the top-k best hypothesis corresponds to
finding the top-k shortest paths on the BDD. We omit detail due to the space
limit, but it is straightforward to apply the algorithm for finding the top-k
shortest paths [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to obtain the top-k best hypotheses.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Enumeration of the best hypotheses</title>
        <p>In some cases, some different hypotheses have the same best sore. In such cases,
outputting just one of them is not effective. By using BDDs, it is easy to output
a set of all such hypotheses. We call this the top-tie BDD. We denote F as a
constructed BDD that represents the set of hypotheses. We can construct F by
using the result of Algorithm 2. After calculating mi, ti for F in Section 4.2,
we execute the following operations until the number of nodes of F becomes one.
First, we set i as the index of the root. Then,
1. When ti = 0, switch the target of the high-branch of the node i to terminal
node 0 . Then call the procedure recursively for the partial BDD traced by
the low-branch from node i.
2. When ti = 1, switch the target of the low-branch of the node i to the terminal
node 0 . Then call the procedure recursively for the partial BDD traced by
the high-branch from node i.</p>
        <p>
          Algorithm 2 Search for the best hypothesis
Input: BDD F , weights w, hypothesis space H
Output: the best hypothesis Σbest
1: n ← the number of nodes of F
2: m[0] ← ∞
3: m[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] ← 0
4: from i = 2 to n + 1
5: if m[lowi] &lt; m[highi] + wi
6: m[i] ← m[lowi]
7: t[i] ← 0
8: if m[lowi] &gt; m[highi] + wi
9: m[i] ← m[highi] + wi
10: t[i] ← 1
11: else
12: m[i] ← m[lowi]
13: t[i] ← 2
14: Σ ← ∅
15: j ← n + 1
16: while j 6= 1 then
17: if tj = 1
18: Add Cj to Σ
19: j ← highj
20: else
21: j ← lowj
22: return Σ
3. When ti = 2, call the procedure recursively for the partial BDDs traced by
the high-branch and low-branch from node i, respectively.
        </p>
        <p>We say FT denotes the BDD constructed by these operations, and U is the set
of all variables. S denotes the set of variables that appear on F , and D = U − S.
FD denotes the BDD that represents the logical conjunction of the negations of
elements in D. The top-tie BDD is given by subjecting FT and FD to the logical
“and” operation. The algorithm is schematized in Algorithm 3. Here F.top is a
BDD that consists of only the root of F , F.high is a partial BDD that is traced
using the high-branch from the root of F , and F.low is a partial BDD that is
traced using the low-branch from the root of F .</p>
        <p>Theorem 1. The BDD yielded by Algorithm 3 represents the set of the best
hypotheses.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>In order to show the efficiency of our method, we apply it to the following ILP
problems,
Algorithm 3 Construction of top-tie BDD
Input: BDD F , set of variables U , array t
Output: a BDD that represents the set of the best hypotheses
1: S ← the set of variables appearing on F
2: D ← U − S
3: FD ← a BDD that represents the logical conjunction of the negations of elements
of D
4: FT ← topTie(F, t)
5: return FT ∧ FD</p>
      <sec id="sec-5-1">
        <title>Classification problem of natural numbers We assumed that even num</title>
        <p>bers are given as positive examples, and odd numbers are given as negative
examples.</p>
        <p>Classification problem of real data We used a public data set. We assumed
that instances whose class label corresponds to a target concept are given as
positive examples, others are given as negative examples.</p>
        <p>The experiment environment is as follows, OS: Ubuntu 17.10, CPU: Intel Xeon(R)
E5-1650 v3 3.50GHz×12 , RAM: 125.8GB, Language: Scala 2.11.4, BDD
implementation: JavaBDD3.
5.1</p>
      </sec>
      <sec id="sec-5-2">
        <title>Classification of natural numbers</title>
        <p>We created an ILP problem involving natural numbers in which even numbers
are given as positive examples, and odd numbers are given as negative examples.
We constructed the BDD representing the set of hypotheses for various problem
sizes. We assumed positive examples E + and negative examples E − are given as
follows,
When n is even,</p>
        <p>E + = {e(0), e(s2(0)), . . . , e(sn(0))},</p>
        <p>E − = {e(s(0)), e(s3(0)), . . . , e(sn+1(0))}.
3 http://javabdd.sourceforge.net/</p>
        <p>We assumed B = ∅. Let J be the set consisting of definite clauses such as
C = A ← B1 ∧ . . . ∧ Bn which satisfy the following conditions. (i) C holds
Requirement 1 and Requirement 2, (ii) |A| &lt; D∈mE+a∪xE−|D|, where |A| is the
sum of the number of constant symbols, variable symbols, and function symbols
appearing in atom A. (iii) Predicate symbols, function symbols and constants
appearing on C also appear on E + ∪ E −. (iv) If a variable appears on a clause,
it also appears on all literals of the clause. We assume H = J ∪ E + ∪ E −.
Example 4. In the case of n = 1, E +, E −, B, and H are, respectively,
E + = {e(0), e(s2(0))},
E − = {e(s(0))},</p>
        <p>B = ∅, and
The results are given in Table1 1. For each n, the number of nodes is smaller
than that of hypotheses. n = 8 allows more than 1.80 × 10308 hypotheses to be
represented by a BDD consisting of 219 nodes. This shows that many hypotheses
can be compactly represented as a BDD. The BDD construction time is faster
than searching for all hypotheses naively.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Classification of real data</title>
        <p>We created an ILP problem from real data, and constructed a BDD that
represents the set of the hypotheses. We also searched for the best hypothesis and
constructed a BDD that represents the set of the best hypotheses. We assumed
that the hypothesis space consists of just definite clauses that have body length
under 2. We used data (1) Soybean(small)4 and (2) Shuttle Landing Control5
from UCI Machine Learning Repository6. We set the target concept to D1,
no auto.
best top-tie BDD best hypothesis
Problem variables nodes hypotheses construction time search time
Soybean 2243 2251 2 1153msec 911msec</p>
        <p>Shuttle 117 117 1 6msec 9msec
In Table 2, the BDD has very few nodes compared to the number of hypotheses.
We can conclude the set of the hypotheses can be expressed compactly as a BDD.
Taking the number of hypotheses into account, we can get the best hypothesis
faster by searching the BDD than by naive full search of the hypotheses. The
best hypotheses found in problem (1) Soybean(small) are,
Σbest = {class(x, D1) ← stem canker(x, above soil)},
Σbest = {class(x, D1) ← f ruiting bodies(x, absent)}.</p>
        <p>We can see that the best hypotheses consist of a simple definite clause that have
body length of 1.
4 https://archive.ics.uci.edu/ml/datasets/soybean+(small)
5 https://archive.ics.uci.edu/ml/datasets/Shuttle+Landing+Control
6 http://archive.ics.uci.edu/ml/index.php</p>
        <p>
          In Table 3, the number of nodes of the top-tie BDD is close to the number
of variables. This is because almost all variables have a constraint that should
never be true. This yields a redundant BDD as almost all nodes have a
highbranch to terminal node 0 and low-branch to the next node. In this case, we
can express the same set more compactly with a ZDD(Zero-suppressed Decision
Diagram) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        In the field of ILP, many hypothesis discovery methods that include inverse
entailment [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and saturation [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] have been proposed, but no hypothesis
enumeration methods have been proposed. Yamamoto [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] proposed a general
hypothesis finding method that applies upward refinement to the residue hypothesis
that is given from H |= ¬B ∧ E, and also showed that it is a complete method
for solving any hypothesis finding problem in clausal logic. In our method, all
hypotheses are represented as a BDD.
      </p>
      <p>
        BDDs and their variants are also widely used in the field of probabilistic logic
programming for achieving efficient probabilistic inference and parameter
learning [
        <xref ref-type="bibr" rid="ref5 ref7 ref8">5, 8, 7</xref>
        ]. The major difference from these approaches is that our algorithm
exploits BDDs to solve enumeration problems of logic programs. One future work
is to apply our enumeration algorithm to learning tasks of probabilistic relational
rules [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Since our algorithm enables the set of all candidate logic programs to
be stored, it will contribute to efficient search.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we proposed a BDD-based method to enumerate hypotheses of an
ILP. We pointed out that it is infeasible to enumerate all hypotheses without
any restrictions, and we showed how to construct the BDD that represents the
set of hypotheses in a bounded hypothesis space by applying BDD operations
recursively. We also detailed our construction algorithm. As an application, we
showed that users can get the best hypothesis following an evaluation function
from the constructed BDD in time O(n), where n is the number of nodes. We
also introduced a method to enumerate the best hypothesis as indicated by an
evaluation function. We empirically showed that our method can be applied to
real data.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work was partly supported by JSPS KAKENHI Grant Number 17K19973.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akers</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>Binary decision diagrams</article-title>
          .
          <source>IEEE Trans. Comput</source>
          .
          <volume>27</volume>
          (
          <issue>6</issue>
          ),
          <fpage>509</fpage>
          -
          <lpage>516</lpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arikawa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning elementary formal systems</article-title>
          .
          <source>In: Theoretical Computer Science</source>
          . vol.
          <volume>95</volume>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>113</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bryant</surname>
          </string-name>
          , R.:
          <article-title>Graph-based algorithms for boolean function manipulation</article-title>
          .
          <source>In: IEEE Trans. on Computers</source>
          . vol. C-
          <volume>35</volume>
          , pp.
          <fpage>677</fpage>
          -
          <lpage>691</lpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Conklin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>Complexity-based induction</article-title>
          .
          <source>Machine Learning</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <fpage>203</fpage>
          -
          <lpage>225</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kimmig</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toivonen</surname>
          </string-name>
          , H.:
          <article-title>Problog: A probabilistic prolog and its application in link discovery</article-title>
          .
          <source>In: Proc. IJCAI</source>
          . pp.
          <fpage>2462</fpage>
          -
          <lpage>2467</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eppstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Finding the k shortest paths</article-title>
          .
          <source>SIAM J. Comput</source>
          .
          <volume>28</volume>
          (
          <issue>2</issue>
          ),
          <fpage>652</fpage>
          -
          <lpage>673</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Firens</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Van Den Broeck, G.,
          <string-name>
            <surname>Renkens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shterionov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jansssns</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Inference and learning in probabilistic logic programs using weighted boolean formulas</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <volume>358401</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gutmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning the parameters of probabilistic logic programs from interpretations</article-title>
          .
          <source>In: Proc. ECML/PKDD</source>
          . pp.
          <fpage>581</fpage>
          -
          <lpage>596</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          :
          <article-title>The Art of Computer Programming</article-title>
          , Volume
          <volume>4</volume>
          ,
          <string-name>
            <surname>Fascicle</surname>
            <given-names>1</given-names>
          </string-name>
          :
          <string-name>
            <given-names>Bitwise</given-names>
            <surname>Tricks</surname>
          </string-name>
          &amp;
          <article-title>Techniques; Binary Decision Diagrams</article-title>
          .
          <string-name>
            <surname>Addison-Wesley Professional</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liaw</surname>
            ,
            <given-names>H.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          :
          <article-title>On the obdd-representation of general boolean functions</article-title>
          .
          <source>IEEE Transactions on Computers</source>
          <volume>41</volume>
          (
          <issue>6</issue>
          ),
          <fpage>661</fpage>
          -
          <lpage>664</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Minato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Zero-suppressed bdds for set manipulation in combinatorial problems</article-title>
          .
          <source>In: Proceedings of the 30th International Design Automation Conference</source>
          . pp.
          <fpage>272</fpage>
          -
          <lpage>277</lpage>
          . DAC '93,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivasan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michael</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Compression, significance and accuracy</article-title>
          .
          <source>In: In Machine Learning: Proceedings of the Ninth International Workshop</source>
          . pp.
          <fpage>338</fpage>
          -
          <lpage>347</lpage>
          . Morgan Kaufmann (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A strategy for constructing new predicates in first order logic</article-title>
          .
          <source>In: In Proceedings of the Third European Working Session on Learning</source>
          . pp.
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          . Pitman (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Inverse entailment and progol</article-title>
          .
          <source>New Generation Computing</source>
          <volume>13</volume>
          (
          <issue>3</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>286</lpage>
          (
          <year>Dec 1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning from positive data</article-title>
          . In: Muggleton,
          <string-name>
            <surname>S</surname>
          </string-name>
          . (ed.)
          <source>Inductive Logic Programming</source>
          . pp.
          <fpage>358</fpage>
          -
          <lpage>376</lpage>
          . Springer Berlin Heidelberg (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Nienhuys</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwei</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
          </string-name>
          , R.D.:
          <article-title>Foundations of Inductive Logic Programming (</article-title>
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Plotkin</surname>
          </string-name>
          , G.D.:
          <article-title>A note on inductive generalization</article-title>
          .
          <source>Machine Intelligence</source>
          <volume>5</volume>
          ,
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rivest</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          :
          <article-title>Inferring decision trees using the minimum description length principle</article-title>
          .
          <source>Inf. Comput</source>
          .
          <volume>80</volume>
          (
          <issue>3</issue>
          ),
          <fpage>227</fpage>
          -
          <lpage>248</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Rissanen</surname>
          </string-name>
          , J.: Paper:
          <article-title>Modeling by shortest data description</article-title>
          .
          <source>Automatica</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <fpage>465</fpage>
          -
          <lpage>471</lpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rouveriol</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Extensions of inversion of resolution applied to theory completion</article-title>
          .
          <source>In: Inductive Logic Programming</source>
          . pp.
          <fpage>63</fpage>
          -
          <lpage>92</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>E.Y.</given-names>
          </string-name>
          :
          <article-title>An algorithm that infers theories from facts</article-title>
          .
          <source>In: Proceedings of the 7th International Joint Conference on Artificial Intelligence -</source>
          Volume
          <volume>1</volume>
          . pp.
          <fpage>446</fpage>
          -
          <lpage>451</lpage>
          . IJCAI'
          <fpage>81</fpage>
          , Morgan Kaufmann Publishers Inc. (
          <year>1981</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Hypothesis finding based on upward refinement of residue hypotheses</article-title>
          .
          <source>Theor. Comput. Sci</source>
          .
          <volume>298</volume>
          (
          <issue>1</issue>
          ),
          <fpage>5</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>