<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GPU-Accelerated Hypothesis Cover Set Testing for Learning in Logic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eyad Algahtani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitar Kazakov</string-name>
          <email>kazakov@cs.york.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of York</institution>
          ,
          <addr-line>Heslington, York YO10 5GH</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>ILP learners are commonly implemented to consider sequentially each training example for each of the hypotheses tested. Computing the cover set of a hypothesis in this way is costly, and introduces a major bottleneck in the learning process. This computation can be implemented more e ciently through the use of data level parallelism. Here we propose a GPU-accelerated approach to this task for propositional logic and for a subset of rst order logic. This approach can be used with one's strategy of choice for the exploration of the hypothesis space. At present, the hypothesis language is limited to logic formulae using unary and binary predicates, such as those covered by certain types of description logic. The approach is tested on a commodity GPU and datasets of up to 200 million training examples, achieving run times of below 30ms per cover set computation.</p>
      </abstract>
      <kwd-group>
        <kwd>Inductive Logic Programming</kwd>
        <kwd>learning in logic</kwd>
        <kwd>hypothesis cover set</kwd>
        <kwd>data parallelism</kwd>
        <kwd>GPU</kwd>
        <kwd>GPGPU</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Classical ILP learners are implemented as sequential algorithms, which either
consider training examples one at a time (Golem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) or they search through the
hypothesis space computing the cover set of one hypothesis at a time (Foil [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]),
or they combine elements of these two sequential strategies, e.g. they select a
positive example, and then explore the hypothesis space implied by it (Progol [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
Aleph [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). In all cases, computing the cover set of a hypothesis is a costly process
that usually involves considering all negative examples1 along with all relevant
positive examples2 in a sequential manner. This is a computation that is based
on set membership tests. These tests can be parallelised e ciently through the
use of data parallel computations. In particular, GPU-based data parallelism has
proven very e cient in a number of application areas [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. We propose a
GPUaccelerated approach to the computation of the cover set for a given hypothesis
1 These may not always be explicitly provided, but estimated [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or inferred using an
assumption, such as output completeness in Foidl [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
2 In the case of a cautious learner, that is all positive examples. For eager learners,
only the positive examples not covered yet by another clause are considered.
for a subset of rst order logic, which results in a speed up of several orders of
magnitude. This approach could potentially be integrated with various strategies
for the exploration of the hypothesis space. At present, the hypothesis language is
limited to logic formulae using unary and binary predicates, such as those covered
by certain types of description logic. This should be considered in the context of
an increasing number of ILP algorithms targeting speci cally description logic as
a data and hypothesis language [17{19]. We test our approach on a commodity
GPU, Nvidia GeForce GTX 1070, and data comprising datasets of size up to
200 million examples, achieving speeds of under 30ms on the full dataset.
      </p>
      <p>The rest of this paper is structured as follows: Section 2 introduces related
background information on GPU and GPGPU, Section 3 presents and evaluates
our novel, parallel implementation of the cover set computation for propositional
data, Section 4 extends our approach to hypotheses with binary predicates, and
Section 5 discusses the results and future work on the implementation of an ILP
algorithm making use of the GPU-accelerated cover set computation proposed
here.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Previous work on the e cient implementation of cover set evaluation in ILP has
included e orts to avoid redundancy, e.g. through the use of query packs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
and the use of concurrent computation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Fonseca et al [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] provides a review
and an evaluation of strategies for parallelising ILP, which are split in three
groups. Firstly, when the target predicate amounts to a description of separate
classes, each of these can be learned independently, in parallel, potentially from
a separate copy of the whole dataset. Secondly, the search space can be explored
in a parallel fashion, dividing the task among several processes. Thirdly, the data
can be split and processed separately, in parallel, including for the task of testing
hypothesis coverage, where the results of such mapping are merged in the nal
step.
      </p>
      <p>
        Another way to categorise parallel evaluation approaches in ILP is according
to whether they use shared or distributed memory [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Ohwada and Mizoguchi
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used the former approach to combine branch-and-bound ILP search with
a parallel logic programming language, while Graham, Page and Kamal [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
parallelised the hypothesis evaluation by assigning di erent clauses to di erent
processors to be evaluated at the same time. In the distributed memory
category, Matsui et al [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] studied the impact of parallel evaluation on FOIL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
achieving linear growth in the speed-up for up to 4 processors, and lower relative
gains for larger numbers as communication overheads started to become more
prominent. In other work, Konstantopoulos [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] combined a parallel coverage
test on multiple machines (using the MPI library) with the Aleph system [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Our approach shares similarities with the last two approaches [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
However, it uses the GPU in a shared memory architecture. The GPU provides much
greater computational powers for the same cost when compared to a typical
multi-core CPU; thus, e cient and more e ective use of the available hardware
resources is achieved. Also, in our approach, the entire data is copied (gigabytes
of data are transferred within few seconds) from the main memory to the GPU's
own memory once, before the learning process starts. The data remain in the
GPU's memory until the learning process is completed. Our approach has a
very low communication overhead compared to that of Konstantopoulos [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ];
this reduces the learning time dramatically (especially for millions of examples)
by allowing more rules to be evaluated per second. Moreover, during learning, as
the CPU sends the rule for evaluation (asynchronously) to the GPU, it is then
free to focus on the remaining parts of the learning process while waiting for the
result.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>GPU and GPGPU</title>
      <p>
        The graphics processing unit (GPU) is a specialised hardware that has many
processing cores intended to accelerate graphics-related computations. In recent
years, there has been a growing interest in using the massive computation
capabilities of GPUs to accelerate general purpose computations, especially in the
scienti c community. This has evolved into the concept of General-Purpose GPU
(GPGPU) that enables the use of GPUs for the acceleration of general purpose
computations [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The GPU is built upon the Single Instruction Multiple Data
(SIMD) architecture, i.e. applying the same operation to every data item in
parallel. The GPU is essentially assigning each thread to process a certain area of
the input data and output its results. The number of threads depends on the
GPU used, and it is not uncommon to have more than 1000 threads.
      </p>
      <p>Figure 1 demonstrates the di erences between the CPU and GPU on the
problem of summing up two arrays. One can observe that in the CPU, the result
for each row is computed sequentially using a single thread. However, in the
GPU, there are many threads each of which will process a group of array rows
(indices) simultaneously, resulting in a massive performance gain, especially for
large data. A sequence diagram of a typical GPU-accelerated program can be
seen in Figure 2.</p>
      <p>In that gure, the program is executed mostly by the CPU. However, when
the program contains a computationally intensive operation, it sends it to the
GPU which calculates the results and returns them back to the CPU. After
sending the computation task to the GPU, the control is returned immediately
to the CPU, which can either wait for the results or do something else while
waiting. When developing GPU-accelerated programs, the code for both GPU
and CPU is written using the same high level programming language, residing
in the same le(s) and compiled at the same time with all other code.</p>
      <p>
        There are many programming platforms available that enables the writing,
debugging and execution of general purpose GPU programs. Some of them are
brand-speci c, such as CUDA [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], others are brand-agnostic, e.g. OpenCL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Since our work revolves around the CUDA platform, we will discuss it in more
detail. Compute Uni ed Device Architecture (CUDA) is the Nvidia brand-speci c
platform for writing general purpose programs for its GPUs. CUDA divides the
GPU into grids, each of which can run a separate general purpose GPU
program, known as kernel, and contains multiple blocks of threads (thread blocks);
the number of thread blocks within a grid is known as grid size. The execution of
thread blocks is done through Streaming Multiprocessors (SMs). A GPU has one
or multiple SMs, where each SM can execute certain number of threads blocks
in parallel (independently of other SMs). The GPU has a hardware scheduler
that constantly distribute thread blocks to SMs. Whenever a SM is idling (or
nished computing), the scheduler feeds another thread blocks. In other words,
all the SMs are computing results (in parallel) independent of each other while
the hardware scheduler keeps them busy at all times by continuously feeding
thread blocks to the available SM (s). When all SMs in the GPU are actively
computing, the GPU can easily (depending on the hardware) execute 10,000s
of threads simultaneously. A kernel needs at minimum two parameters: the grid
size (number of blocks) and block size (number of threads per block). There are
standard, commonly used approaches to computing the number of blocks as a
function of the dataset size, and of N , the user-speci ed number of threads per
block. For the purposes of this study, which focuses on the development of a
suitable representation of the data and the development of the corresponding
algorithms, we will only be concerned with the choice of N , from which all other
relevant parameters of the computation will be derived.
      </p>
      <p>There are many types of memory in the GPU. The rst type is the registers,
which are the fastest memory on the GPU and the smallest in capacity. The
second type is global memory. It is the largest memory on the GPU and the
slowest in terms of speed of access. The third type is the local memory, which
is an abstraction of the global memory with smaller capacity and as slow as
the global memory. The fourth type is the shared memory, which is a memory
shared among the threads within the same block. It can be very fast (comparable
to the registers) or slow (caused by multiple threads trying to access the same
memory bank at the same time). The fth type is the constant memory, which
is a read-only memory that can have high access speeds (very close to register
access speed). The last memory type is the texture memory, which is a read only
memory suitable for 2D spatial locality needs. Texture memory has higher access
speed than local and global memory. See Figure 3 for the CUDA memory model
and Table 1 for a summary of CUDA memory types.</p>
      <p>After discussing CUDA and its memory model, we will discuss the typical
ow of GPU-accelerated program using CUDA. For example, assume we want to
calculate the sum of two arrays, A and B, using CUDA. First, we will allocate
memory to store three arrays in the GPU's memory, namely, A, B and R (in
order to store the result). Second, we will copy the contents of A and B from the
main memory (RAM) into the GPU allocated memory. Next, we run the GPU
code to sum the two arrays in parallel (just like in Figure 1) and store the result
in R. After the GPU nishes the computations, R has the result, so we copy it
from the GPU's memory into the CPU. The ow of the program is identical to
Figure 2. However, in Figure 4 we provide more technical details.</p>
      <p>
        Fig. 3. CUDA memory model [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Fig. 4. Typical CUDA-accelerated application</p>
    </sec>
    <sec id="sec-4">
      <title>Computing Propositional Hypothesis Cover Sets</title>
      <sec id="sec-4-1">
        <title>Proposed Method</title>
        <p>A GPU can manipulate matrices very e ciently. Here this is rst used to show
how to speed up the calculation of the cover set of propositional hypotheses.
We adopt the terminology used in Description Logic, and talk about concepts,
de ned as sets over a universe of individuals in the same way in which unary
predicates can be de ned for a domain consisting of a set of terms. We can
represent a propositional data set as a Boolean 2D matrix M with each column
corresponding to a concept, and each row to an individual (see Figure 5). The
membership of an individual to a concept (i.e. whether the unary predicate is
true when it takes this term as argument) is represented as a Boolean value in
the matrix. To make it possible to process a large amount of data (of the order
of gigabytes), this array is stored in the global memory of the GPU.</p>
        <p>Using the above representation, it is possible to employ data parallelism
(Single-Instruction-Multiple-Data) for the application of logic operations to any
subset of concepts. The rows of the table are divided among multiple threads.
For maximum e ciency, the number of threads per block (block size) is always
chosen to be equal to a power of 2, N = 2i, where the upper limit of i is
imposed by the GPU hardware. Since this article does not focus on performance
optimisation, we only report results for N = 32 (threads/block), a value that
resulted in good performance across the board that is representative for the
problem and GPU at hand. The grid size for the kernel is then calculated from
the number of individuals and the block size. For 200,000,000 individuals and
N = 32, this results in the use of 6,250,000 blocks.</p>
        <p>m</p>
        <p>Now the cover set of a conjunction \i=1Ci of any set of concepts can be
calculated through the parallel execution of all threads Tj , each of which
processing sequentially the individuals Ik it has been allocated (see Algorithm 1).
Lazy evaluation is used to calculate whether each individual is a member of the
conjunction of concepts: if it is not covered by one of them, the rest are ignored
and the result is returned immediately (see the if. . . break line of pseudocode).</p>
        <p>m</p>
        <p>The cover set of a disjunction [i=1Ci of a set of concepts is computed in an
analogous way, starting for each individual with result set to 0, and applying
the logic OR operator until the temporary result is set to 1 or all concepts in
S are considered. Similarly, the negation operator divides all individuals among
multiple threads, and returns a vector result(1..numberOfIndividuals) with
all Boolean values negated.</p>
        <p>
          In all cases, the size of the cover set is computed by summing up the elements
of the vector result(1..numberOfIndividuals). This is done with the help of
the Nvidia Thrust library in a process that adds a very small overhead, which is
included in all reported execution times. For all logic operators, the vector
representing the cover set can be left in the global memory of the GPU to implement
memoization [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. When given two sets of positive and negative examples of a
target concept, computing the cover set for each is controlled by a separate CPU
process, and both are run in parallel. The CPU threads communicate directly
with the GPU to conduct the coverage test (in the GPU) as well as to retrieve
back the results. All results here report the overall run time of executing both
parallel CPU threads.
        </p>
        <p>
          It should be noted that the 2D matrix shown in Figure 5 and discussed
above is represented internally as a 1D array using Row-major order in order to
minimise the time for memory allocation/access. E.g. allocating memory for a
10 10 array would require 1 + 10 = 11 memory allocation calls, whereas a 1D
array of size 100 would only need one such call. Allocating memory sequentially
also facilitates coalesced global memory access [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Results and Evaluation</title>
        <p>Due to the lazy evaluation strategy used, the Worst Case Execution Time
(WCET) for evaluating the cover set of a conjunction of concepts is when the
Algorithm 1 For a Boolean matrix M (individuals
concepts)
procedure ParallelConceptConjunctionCoverSet
set S := list of concepts in conjunction
parallel_foreach thread T_i
| foreach individual I_j in thread T_i
| | set result(row(I_j)) := 1
| | foreach concept C_k in set S
| | | set result(row(I_j)) := result(row(I_j)) &amp;&amp; M(row(I_j),column(C_k))
| | | if (result(row(I_j)) == 0) break
| | endfor
| endfor
endfor
return Boolean array: result(1..numberOfIndividuals)
membership matrix M contains all 1s. This also computationally equivalent to
the WCET for evaluating the cover set a disjunction of concepts with M only
made of zeros. Here we list the parallel (GPU) and Serial (GPU and CPU)
execution times for data sets in which the number of individuals (Table 2, Table 3
and Figure 6) or the number of concepts (Table 4 and Figure 7) is varied. The
counting of examples ( nal step) for both CPU (Serial) and GPU (Serial and
Parallel) are computed in parallel (in the GPU) using Nvidia's Thrust library.
Also, due to CPU caching, there are uctuations in the CPU execution times,
i.e. in the serial CPU experiment.</p>
        <p>The results show that both the worst case and best case execution times
have linear time complexity with respect to the number of individuals. Clearly,
memoization can be used as a trade-o between time and space complexity, e.g.</p>
        <p>m
by using the existing result for \i=1Ci in order to compute the cover set of the
conjunction of any superset of these concepts. When the number of concepts is
varied, the slope of the WCET plot appears to suggest an approximately linear
growth.
TEixmeceu(tmiosn) 10
1
0.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Computing Cover Sets for Hypotheses with Binary</title>
    </sec>
    <sec id="sec-6">
      <title>Predicates</title>
      <p>We shall now consider how the data and hypothesis languages can be extended
to include roles, i.e. binary predicates. We adopt a representation for this type
of data as shown in Figure 8. We use an array with three columns in which the
middle contains the name of the role/predicate, here represented with an integer.
The pair of individuals related through a given role are listed in Column 1 and
Column 3, respectively.</p>
      <p>With this representation in mind, we can, for instance, compute a hypothesis
of the type 9role.(C1 u u Cm), e.g. 9hasChild.(Female u M usician) using
Algorithm 1 to compute the conjunction and Algorithm 2 to apply the existential
restriction operator. In Alg. 2, setAllElementsInResultT oV al inP arallel(value)
is a utility function used to ll all the elements of the result array (in parallel)
with either 0 or 1. Algorithms for other DL operators, such as value restriction
or number restriction can be de ned in a similar way.</p>
      <p>Algorithm 2 For an individual-role-individual matrix R and Boolean matrix
M (individuals concepts)
procedure ParallelExistentialRestriction
Call setAllElementsInResultToVal_inParallel(0)
var Concept := the concept in the restriction
set IndA := list of individualA (individualA values) from R matrix (same role)
set IndB := list of individualB (individualB values) from R matrix (same role)
parallel_foreach thread T_i
| foreach individualA I_A in set IndA
| | | set result(row(I_A)) := M(row(I_B),column(Concept))
| endfor
endfor</p>
      <p>Fig. 8. Individual-Role-Individual matrix representation
6</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>
        We have demonstrated here how GPGPU can be used to accelerate the
computation of the cover set for a hypothesis expressed in propositional logic. The
results show that even the use of a commodity GPU can provide the ability to
process data sets of size well beyond what can be expected from a CPU-based
sequential algorithm of the same type, and within a time that makes the
evaluation of many thousands of hypotheses on a dataset with 108 training examples
a viable proposition. We have also indicated how our approach can be extended
to handle binary predicates. The goal in sight is the ability to evaluate
hypotheses on DL databases, e.g. linked open data ontologies. The approach adopted
here ts well with the open world assumption that is commonly made in DL.
An example of the potential bene ts is, for instance, the ability to learn about
user preferences in e-commerce [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. A case study with real-world data would
need to identify the type of DL needed and potentially extend appropriately the
algorithms for the computation of the cover set, and add an appropriate search
strategy.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Feng</surname>
          </string-name>
          , C.:
          <article-title>E cient Induction of Logic Programs</article-title>
          .
          <source>Turing Institute</source>
          , pp.
          <volume>368</volume>
          {
          <issue>381</issue>
          (
          <year>1990</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>CUDA</given-names>
            <surname>Zone</surname>
          </string-name>
          . URL: https://developer.nvidia.com/cuda-zone.
          <source>Retrieved: June</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>OpenCL</given-names>
            <surname>Overview</surname>
          </string-name>
          . URL: https://www.khronos.org/opencl/. Retrieved:
          <year>June 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          : Learning Logical De nitions from Relations.
          <source>Machine Learning</source>
          <volume>5</volume>
          :
          <fpage>239</fpage>
          {
          <fpage>266</fpage>
          (
          <year>1990</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Inverse Entailment and Progol. New Generation Computing</source>
          <volume>13</volume>
          (
          <issue>3</issue>
          ):
          <volume>245</volume>
          {
          <fpage>286</fpage>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Srinivasan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Aleph Manual (</article-title>
          <year>2001</year>
          ). URL: http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/aleph.html
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning from Positive Data</article-title>
          .
          <source>Selected Papers from the 6th International Workshop on Inductive Logic Programming</source>
          , pp.
          <volume>358</volume>
          {
          <issue>376</issue>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Camacho</surname>
          </string-name>
          , R.: Strategies to Parallelize
          <source>ILP Systems. Inductive Logic Programming: Proc. of the 15th Intl Conf</source>
          . (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Blockeel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dehaspe</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demoen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssens</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vandecasteele</surname>
          </string-name>
          , H.:
          <article-title>Executing Query Packs in ILP</article-title>
          .
          <source>International Conference on Inductive Logic Programming ILP</source>
          <year>2000</year>
          , pp.
          <volume>60</volume>
          {
          <issue>77</issue>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivasan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Camacho</surname>
          </string-name>
          , R.:
          <article-title>Parallel ILP for Distributed-memory Architectures</article-title>
          .
          <source>Machine Learning</source>
          <volume>74</volume>
          (
          <issue>3</issue>
          ):
          <volume>257</volume>
          {
          <fpage>279</fpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ohwada</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mizoguchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Parallel Execution for Speeding up Inductive Logic Programming Systems</article-title>
          .
          <source>Proceedings of the 9th Intl Workshop on Inductive Logic Programming</source>
          , pp.
          <volume>277</volume>
          {
          <issue>286</issue>
          (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Accelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining</article-title>
          .
          <source>Proceedings of the 2003 IEEE Bioinformatics Conference CSB2003</source>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Matsui</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inuzuka</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seki</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Itoh</surname>
          </string-name>
          , H.:
          <article-title>Comparison of Three Parallel Implementations of an Induction Algorithm</article-title>
          .
          <source>Proceedings of the 8th Intl Parallel Computing Workshop</source>
          , pp.
          <volume>181</volume>
          {
          <issue>188</issue>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Konstantopoulos</surname>
            ,
            <given-names>S. K..:</given-names>
          </string-name>
          <article-title>A Data-parallel Version of Aleph</article-title>
          .
          <source>Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning</source>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Owens</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luebke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Govindaraju</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Kruger, J.,
          <string-name>
            <surname>Lefohn</surname>
            ,
            <given-names>A. E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Purcell</surname>
          </string-name>
          , T.:
          <article-title>A Survey of General-Purpose Computation on Graphics Hardware</article-title>
          .
          <source>Computer Graphics Forum</source>
          ,
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <volume>80</volume>
          {
          <fpage>113</fpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Cali</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          , R.J.:
          <article-title>Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming</article-title>
          .
          <source>New Generation Computing</source>
          <volume>16</volume>
          ,
          <issue>263</issue>
          {
          <fpage>281</fpage>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Buhmann, L.,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>DL-Learner</surname>
          </string-name>
          {
          <article-title>A Framework for Inductive Learning on the Semantic Web</article-title>
          .
          <source>Journal of Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>39</volume>
          ,
          <issue>15</issue>
          {
          <fpage>24</fpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Fanizzi</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>DL-FOIL: Concept Learning in Description Logic</article-title>
          .
          <source>Proc. of the 18th Conf. on Inductive Logic Programming</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Qomariyah</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Learning from Ordinal Data with Inductive Logic Programming in Description Logic</article-title>
          .
          <source>Proc. of the Late Breaking Papers of the 27th Intl Conf. on Inductive Logic Programming</source>
          ,
          <volume>38</volume>
          {
          <fpage>50</fpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>NVIDIA CUDA Programming Guide</surname>
          </string-name>
          (
          <year>2007</year>
          ). URL: http://developer.download.nvidia.com/compute/cuda/1.0/
          <string-name>
            <given-names>NVIDIA</given-names>
            <surname>CUDA Programming</surname>
          </string-name>
          <article-title>Guide 1.0.pdf</article-title>
          . Retrieved: May
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Michie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Memo Functions and Machine Learning</article-title>
          ,
          <source>Nature</source>
          <volume>218</volume>
          :
          <fpage>19</fpage>
          {
          <fpage>22</fpage>
          (
          <year>1968</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>