=Paper= {{Paper |id=Vol-2003/NeSy17_paper2 |storemode=property |title=Propositional Rule Extraction from Neural Networks under Background Knowledge |pdfUrl=https://ceur-ws.org/Vol-2003/NeSy17_paper2.pdf |volume=Vol-2003 |authors=Maryam Labaf,Pascal Hitzler,Anthony B. Evans |dblpUrl=https://dblp.org/rec/conf/nesy/LabafHE17 }} ==Propositional Rule Extraction from Neural Networks under Background Knowledge== https://ceur-ws.org/Vol-2003/NeSy17_paper2.pdf
                     Propositional Rule Extraction from Neural
                      Networks under Background Knowledge

                          Maryam Labaf1,2 , Pascal Hitzler1 , and Anthony B. Evans2
                1
                    Data Semantics (DaSe) Laboratory, Wright State University, Dayton, OH, USA
                      2
                        Dept. of Math. and Stat., Wright State University, Dayton, OH, USA



                      Abstract. It is well-known that the input-output behaviour of a neural
                      network can be recast in terms of a set of propositional rules, and under
                      certain weak preconditions this is also always possible with positive (or
                      definite) rules. Furthermore, in this case there is in fact a unique minimal
                      (technically, reduced ) set of such rules which perfectly captures the input-
                      output mapping.
                      In this paper, we investigate to what extent these results and correspond-
                      ing rule extraction algorithms can be lifted to take additional background
                      knowledge into account. It turns out that uniqueness of the solution can
                      then no longer be guaranteed. However, the background knowledge often
                      makes it possible to extract simpler, and thus more easily understand-
                      able, rulesets which still perfectly capture the input-output mapping.


            1       Introduction
            The study of rule extraction from trained artificial neural networks [2,8,15] ad-
            dresses the desire to make the learned knowledge accessible to human interpre-
            tation and formal assessment. Essentially, in the propositional case, activations
            of input and output nodes are discretized by introducing an arbitrary thresh-
            old. Each node is interpreted as a propositional variable, and activations above
            the threshold are interpreted as this variable being “true”, while activations be-
            low the threshold are interpreted as this variable being “false”. If I denotes the
            power set (i.e., set of all subsets) of the (finite) set B of all propositional variables
            corresponding to the nodes, then the input-output function of the network can
            be understood as a function f : I → I: For I ∈ I, we interpret each p ∈ I as
            being “true” and all p 6∈ I as being “false”. The set f (I) then contains exactly
            those propositional variables which are “true” (or activated) in the output layer.
                In propositional rule extraction, one now seeks sets Pf of propositional rules
            (i.e., propositional Horn clauses) which capture or approximate the input-output
            mapping f . In order to obtain such sets, there exist two main lines of approaches.
            The first is introspective and seeks to construct rules out of the weights associated
            with the connections between nodes in the network, usually proceeding in a layer-
            by-layer fashion [8]. The second is to regard the network as a black box and to
            consider only the input-output function f . This was, e.g., done in [15] where
            it was shown, amongst other things, that a positive (or definite) ruleset can
            always be extracted if the mapping f is monotonic, and that there is indeed




Copyright © 2017 for this paper by its authors. Copying permitted for private and academic purposes.
2       Maryam Labaf, Pascal Hitzler, Anthony B. Evans

a unique reduced such ruleset; we will provide sufficient technical details about
these preliminary results in the next section.
    However, rulesets extracted with either method are prone to be large and
complex, i.e., from inspection of these rulesets it is often difficult to obtain real
insights into what the network has learned. In this paper, we thus investigate rule
extraction under the assumption that there is additional background knowledge
which can be connected to network node activations, with the expectation that
such background knowledge will make it possible to formulate simpler rulesets
which still explain the input-output functions of the networks, if the background
knowledge is also taken into account.
    The motivation for this line of work is the fact that in recent years there has
been a very significant increase in the availability of structured data on the World
Wide Web, i.e., it becomes easier and easier to actually find such structured
knowledge for all different kinds of application domains. That this is the case is,
among other things, a result of recent developments in the field of Semantic Web
[4,12], which is concerned with data sharing, discovery, integration and reuse,
and where corresponding standards, methods and tools are being developed.
E.g., structured data in the form of knowledge graphs, usually encoded using
the W3C standards RDF [3] and OWL [11], has been made available in ever
increasing quantities for over 10 years [5,17]. Other large-scale datasets include
Wikidata [20] and data coming from the schema.org [9] effort which is driven by
major Web search engine providers.
    In order to motivate the rest of the paper, consider the following very sim-
ple example. Assume that the input-output mapping P of the neural network
without background knowledge is

                       p1 ∧ q → r                p2 ∧ q → r

and that we also have background knowledge K in form of the rules

                           p1 → p                p2 → p.

We then obtain the simplified input-output mapping PK , taking background
knowledge into account, as
                                 p ∧ q → r.
    The example already displays a key insight why background knowledge can
lead to simpler extracted rulesets: In the example just given, p serves as a “more
general” proposition, e.g., p1 could stand for “is an apple” while p2 could stand
for “is a banana”, while p could stand for “is a fruit”. If we now also take, e.g.,
q to stand for “is ripe” and r to stand for “can be harvested”, then we obtain a
not-so-abstract toy example, where the background knowledge facilitates a sim-
plification because it captures both apples and bananas using the more general
concept “fruit”.
    In this paper, we will formally define the setting for which we just gave an
initial example. We will furthermore investigate to what extent we can carry over
results regarding positive rulesets from [15] to this new scenario with background
                Propositional Rule Extraction under Background Knowledge                3

knowledge. We will see that the pleasing theoretical results such as uniqueness of
a solution no longer hold. However, existence of solutions can still be guaranteed
under the same mild conditions as in [15], and we will still be able to obtain
algorithms for extracting corresponding rulesets.
    The rest of the paper will be structured as follows. In Section 2 we will
introduce notation as needed and recall preliminary results from [15]. In Section 3
we present the results of our investigation into adding background knowledge.
In Section 4, we briefly discuss related work, and in Section 5 we conclude and
discuss avenues for furture work.


2    Preliminaries

We recall notation and some results from [15] which will be central for the rest of
the paper. For further background on notions concerning logic programs, cf. [13].
    As laid out in the introduction, let B be a finite set of propositional variables,
let I be the power set of B, and we consider functions f : I → I as discretizations
of input-output functions of trained neural networks. In this paper, we consider
only positive (or definite) propositional rules, which are of the form p1 ∧· · ·∧pn →
q, where q and all pi are propositional variables. A set P of such rules is called
a (propositional) logic program. For such a rule, we call q the head of the rule,
and p1 ∧ · · · ∧ pn the body of the rule.
    A logic program P is called reduced if all of the following hold.

 1. For every rule p1 ∧· · ·∧pn → q in P we have that all pi are mutually distinct.
 2. There are no two rules p1 ∧ · · · ∧ pn → q and r1 ∧ · · · ∧ rm → q in P with
    {p1 , . . . , pn } ⊆ {r1 , . . . , rm }.

   To every propositional logic program P over B we can associate a semantic
operator TP , called the immediate consequence operator, which is the function

       TP : I → I :
    TP (I) = {q | there exists p1 ∧ · · · ∧ pn → q in P with {p1 , . . . , pn } ⊆ I}.

This operator is well-known to be monotonic in the sense that whenever I ⊆ J,
then TP (I) ⊆ TP (J).
    We make some additional mild assumptions: We assume that the proposi-
tional variables used to represent input and output nodes are distinct, i.e., each
propositional variable gets used either to represent an input node, or an output
node, but not both. Technically, this means that B can be partitioned into two
sets B1 and B2 , i.e., B = B1 ∪˙ B2 , and we obtain the corresponding power sets
I1 and I2 such that TP : I1 → I2 .
    While the definition of the immediate consequence operator just presented
is very common in the literature, we will now give a different but equivalent
formalization, which will help us in this paper. For any I = {p1 , . . . , pn } ⊆ B,
let c(I) = p1 ∧ · · · ∧ pn . In fact, whenever I ⊆ B, in the following we will often
simply write I although we may mean c(I), and the context will make it clear
4        Maryam Labaf, Pascal Hitzler, Anthony B. Evans


    Algorithm 1: Reduced Definite Program Extraction
     Input: A monotone mapping f : I1 → I2 .
     Output: P , a definite logic program with TP (I) = f (I) for all I ∈ I1 .
      1: Initialization: P = ∅.
      2: Choose a total linear order ≺ on I1 , such that for any Ii , Ij ∈ I1 with i < j
         we have |Ii | < |Ij |.
      3: for all I = {p1 , . . . , pn } ∈ I1 , chosen in ascending order according to ≺ do
      4:    for all q ∈ f (I) do
      5:       if there is no q1 ∧ · · · ∧ qn → q in P with {q1 , . . . , qn } ⊆ I then
      6:          add the rule p1 ∧ · · · ∧ pn → q to P .
      7:       end if
      8:    end for
      9: end for
     10: Return P as result.



which notation is meant; e.g., if I appears as part of a logical formula, then we
actually mean c(I).
   Now, given a logic program P and I ∈ I1 , we obtain
                             TP (I) = {q ∈ B2 | I ∧ P |= q},
where |= denotes entailment in propositional logic. Please note that weVuse an-
other common notational simplification, as I ∧ P is used to denote I ∧ R∈P R.
   In [15], the following was shown.
Theorem 1. Let f : I1 → I2 be monotonic. Then there exists a unique reduced
logic program P with TP = f . Furthermore, this logic program can be obtained
using Algorithm 1.
    If we drop the precondition on f to be monotonic, then Theorem 1 no longer
holds, because of the fact mentioned above that immediate consequence opera-
tors are always monotonic.
    We will now investigate Theorem 1 when considering additional background
knowledge. It will be helpful to have the following corollary from Theorem 1 at
hand.
Theorem 2. Given a logic program P , there is always a unique reduced logic
program Q with TP = TQ .
Proof. Given P , we know that TP is monotonic. Now apply Theorem 1.
   Let us give an example for reducing a given program. Let B1 = {p1 , p2 , p3 }
and B2 = {q1 , q2 } be input and output sets, respectively, and consider the logic
program P given as
                   p1 ∧ p2 → q1                      p1 ∧ p2 ∧ p3 → q1
                   p1 ∧ p3 → q1                                 p1 → q2
                   p1 ∧ p2 → q2 .
                Propositional Rule Extraction under Background Knowledge         5

Applying Algorithm 1 then yields the reduced program

                  p1 ∧ p2 → q1                     p1 ∧ p3 → q1
                       p1 → q2 .


3     Rule extraction with Background Knowledge
We consider the following setting. Assume P is a logic program which captures
the input-output function of a trained neural network according to Theorem 1.
Let furthermore K be a logic program which constitutes our background knowl-
edge, and which may use additional propositional variables, i.e., propositional
variables not occurring in P . We then seek a logic program PK such that, for all
I ∈ I1 , we have

              {q ∈ B2 | I ∧ P |= q} = {q ∈ B2 | I ∧ K ∧ PK |= q}.              (1)

In this case, we call PK a solution for (P, K).

3.1   Existence of Solutions
We next make two more mild assumptions, namely (1) that no propositional
variable from B2 appears in K, and that (2) propositional variables from B1
appear only in bodies of rules in K. The first is easily justified by the use case,
since we want to explain the network behaviour, and the occurrence of variables
from B2 in K would bypass the network. The second is also easily justified by the
use case, which indicates that network input activations should be our starting
point, i.e. the activations should not be altered by the background knowledge.
    If we drop assumption (2) just stated, then existence of a solution cannot
be guaranteed: Let B1 = {p1 , p2 }, let B2 = {q1 , q2 }. Then, for the given pro-
grams P = {p1 → q1 , p2 → q2 } and K = {p1 → p2 } there is no solution for
(P, K). To see this, assume that PK be a solution for (P, K). Then because
p2 ∧ P |= q2 we obtain that p2 ∧ K ∧ PK |= q2 . But then p1 ∧ K ∧ PK |= q2
although p1 ∧ P 6|= q2 , i.e., PK cannot be a solution for (P, K).
    If condition (2) from above is assumed, though, a solution always exists.

Proposition 1. Under our standing assumptions on given logic programs P and
K, there always exists a solution for (P, K) which is reduced.

Proof. Because rule heads from K never appear in P , we obtain

                {q ∈ B2 | I ∧ P |= q} = {q ∈ B2 | I ∧ K ∧ P |= q}

for all I ∈ I1 , i.e., P is always a solution for (P, K). Existence of a reduced
solution then follows from Theorem 2.

   Our interest of course lies in determining other solutions which are simpler
than P .
6         Maryam Labaf, Pascal Hitzler, Anthony B. Evans


    Algorithm 2: Construct all reduced solutions for (P, K)
      Input: Logic programs P and K with TP : I1 → I2 and TK : I1 → I3 which
              satisfy our standing assumptions, where B2 = {q1 , . . . , qn }.
      Output: All the reduced solutions for (P, K).
       1: Set S = ∅ and B = B1 ∪ B3 .
       2: Set I to be the power set of B.
       3: Set R to be the power set of I.
       4: for all (R1 , . . . , Rn ) ∈ Rn do
       5:   for all i ∈ {1, . . . , n} do
       6:      Qi = {c(B) → qi | B ∈ Ri }
       7:   end for S
       8:   Set Q = i∈{1,...,n} Qi .
       9:   if TQ is a solution for (P, K) then
      10:      Apply Algorithm 1 to TQ to obtain a reduced program S with TS = TQ .
      11:      if S 6∈ S then
      12:         Add S to S.
      13:      end if
      14:   end if
      15: end for
      16: Return S as result.



Proposition 2. There exist logic programs P and K which satisfy our standing
assumptions, such that there are two distinct reduced solutions for (P, K).

Proof. Let B1 = {p1 , p2 , p3 } and B2 = {q}. Then consider the programs P as

                     p2 ∧ p3 → q                    p1 ∧ p3 → q

and K as

                    p2 ∧ p3 → r1                   p1 ∧ p3 → r1
                         p1 → r2                        p2 → r2 .

The two logic programs

                           PK1 = {r1 → q}            and
                           PK2 = {p3 ∧ r2 → q}

are then both reduced solutions for (P, K).

    We will see later in the proof of Theorem 3, that the number of reduced
solutions is actually worst-case exponential in the combined size of P and K.

3.2     Algorithms
    We first present a naive algorithm for computing all reduced solutions for
given (P, K). It is given as Algorithm 2 and it uses a brute-force approach to
                Propositional Rule Extraction under Background Knowledge          7

check all possible logic programs which can be constructed over the given propo-
sitional variables, whether they constitute a solution for (P, K). For each such
solution, it then invokes Algorithm 1 to obtain a corresponding reduced pro-
gram, which is then added to the solution set. The algorithm is quite obviously
correct and always terminating, and we skip a formal proof of this.
    The given algorithm is of course too naive to be practically useful for anything
other than toy examples. Still, it is worst-case optimal, as the following theorem
shows – note that Algorithm 2 has exponential runtime because of line 4.

Theorem 3. The problem of finding all solutions to (P, K) is worst-case expo-
nential in the combined size of P and K.

Proof. Let n be any positive integer. Define the logic program Pn to consist of
the single rule p1 ∧ · · · ∧ pn → q and let

                    Kn = {pi → ri,1 , pi → ri,2 | i = 1, . . . , n}.

Then, for any function f : {1, . . . , n} → {1, 2}, the logic program

                        Pf = {r1,f (1) ∧ · · · ∧ rn,f (n) → q}

is a reduced solution for (Pn , Kn ). Since there exist 2n distinct such functions
f , the number of reduced solutions in this case is 2n , so their production is
exponential in n, while the combined size of Pn and Kn grows only linearly in n.

   A more efficient algorithm for obtaining only one reduced solution is given
as Algorithm 3. It is essentially a combination of Algorithms 1 and 2.

Proposition 3. Algorithm 3 is correct and always terminating.

Proof. Like Algorithm 1, Algorithm 3 checks all combinations of I ∈ I1 and
q ∈ TP (I) and makes sure that there are rules in the output program such that
I ∧ K ∧ S |= q. The rules for the output program are checked one by one in
increasing length until a suitable one is found. Note that the rule I → q is going
to be checked at some stage, i.e. the algorithm will either choose this rule, or a
shorter one, but in any case we will eventually have I ∧ K ∧ S |= q. This shows
that the algorithm always terminates and that we obtain I ∧ K ∧ S |= q for all
q ∈ TP (I).
    In order to demonstrate that the algorithm output S is indeed a solution
for (P, K), we also need to show that for all q ∈ B2 and H ∈ I1 we have that
H ∧ K ∧ S |= q implies q ∈ TP (H). This is in fact guaranteed by line 11 of
Algorithm 3, i.e. the algorithm output S is indeed a solution for (P, K).
    We finally show that the output of the algorithm is reduced. Assume oth-
erwise. Then there are I1 → q and J → q in S with I1 ( J. By our condition
on the order we thus have I1 ≺ J and so we know that I1 → q was added
to S earlier in the algorithm than J → q. now let us look at the instance of
line 12 in Algorithm 3 when the rule J → q was added to S. In this case (using
notation from the algorithm description, and S denoting the current S at that
8        Maryam Labaf, Pascal Hitzler, Anthony B. Evans


    Algorithm 3: Reduced solution for (P, K)
     Input: Logic programs P and K with TP : I1 → I2 and TK : I1 → I3 which
             satisfy our standing assumptions.
     Output: A reduced solution for (P, K).
      1: Set S = ∅ and B = B1 ∪ B3 .
      2: Set I to be the power set of B.
      3: Choose a total linear order ≺ on I, such that for any Ii , Ij ∈ I with i < j we
         have |Ii | < |Ij |.
      4: for all I = {p1 , . . . , pm } ∈ I1 , chosen in ascending order according to ≺ do
      5:   for all q ∈ TP (I) do
      6:      if I ∧ K ∧ S 6|= q then
      7:         Set endloop = false.
      8:         Choose first J = {b1 , . . . , bn } ∈ I according to ≺.
      9:         while endloop = false do
     10:             if I ∧ K ∧ S ∧ (J → q) |= q then
     11:                if {H ∈ I1 | H ∧ K ∧ S ∧ (J → q) |= q} ⊆ {H ∈ I1 | q ∈ TP (H)}
                        then
     12:                    Add the rule J → q to S and set endloop = true.
     13:                end if
     14:             else
     15:                Choose next J = {b1 , . . . , bn } ∈ I according to ≺.
     16:             end if
     17:         end while
     18:      end if
     19:   end for
     20: end for
     21: Return S as a result.




moment) we know that I ∧ K ∧ S ∧ (J → q) |= q and I ∧ K ∧ S 6|= q. This
implies I ∧ K ∧ S |= J, and because I1 ⊆ J we obtain I ∧ K ∧ S |= I1 . But we
also have already observed that I1 → q is already contained in S at this stage,
and thus we obtain I ∧ K ∧ S |= q, which contradicts the earlier statement that
I ∧ K ∧ S 6|= q. We thus have to reject the assumption that S is not reduced;
hence S is indeed reduced. This completes the proof.



   To close, we give a somewhat more complex example. Let B1 = {p1 , p2 , p3 }
and B = {q1 , q2 , q3 , q4 }. Consider the program P as


                       p1 → q1                             p2 → q1
                       p1 → q2                             p2 → q2
                       p1 → q3                             p2 → q3
                       p1 → q4                        p2 ∧ p3 → q4
                Propositional Rule Extraction under Background Knowledge            9

and K as
                     p1 → r1                       p2 ∧ p3 → r1
                     p1 → r2                            p2 → r2
                     p2 → r3 .
Then there is only one reduced solution PK for (P, K), which is
                      r2 → q1                         r2 → q2
                      r2 → q3                         r1 → q4 .
Note, that PK is simpler and shorter than P .

4     Related Work
It would be out of place to have a lengthy discussion of related work in neural-
symbolic integration, or even just on the topic of rule extraction, in this brief
paper. We hence limit ourselves to some key pointers including overview texts.
We already discussed the rule-extraction work [15] on which our work is based,
and [8] which pursues a different approach based on inspecting weights. For
more extensive entry points to literature on neural-symbolic integration we refer
to [2,6,7,10] and to the proceedings of the workshop series on Neural-Symbolic
Learning and Reasoning.3
    Regarding the novel aspect of this work, namely the utilization of background
knowledge for rule extraction, we are not aware of any prior work which pursues
this. However, concurrently the second author has worked on lifting the idea to
the application level in [19], by utilizing description logics and Semantic Web
background knowledge in the form of ontologies and knowledge graphs [12] to-
gether with the DL-Learner system [16] for rule extraction. The results herein,
which are constrained to the propositional case, can be considered foundational
for the more application-oriented work currently pursued along the lines of [19].
    We are also greatful that a reviewer pointed out a possible relationship of our
work with work laid out in [18] in the context of abduction in logic programming.
Looked at on a very generic level, the general abduction task is very similar to
our formulation in equation (1), which means that the field of abduction may in-
deed provide additional insights or even algorithms for our setting. On the detail
level, however, [18] differs significantly. Most importantly, [18] consideres literals
or atoms as abducibles, i.e., an explanation consists of a set of literals, while in
our setting explanations are actually rule sets. Another difference is that [18]
considers logic programs under the non-monotonic answer set semantics, i.e.,
logic programs with default negations, while we consider only logic programs
without negation in our work – it was laid out in much detail in [15] that propo-
sitional rule extraction under negation has a significantly different dynamics.
Nevertheless, the general field of abduction in propositional logic programming
may provide inspiration for further developing our approach, but working out
the exact relationships appears to be a more substantial investigation.
3
    http://neural-symbolic.org/
10      Maryam Labaf, Pascal Hitzler, Anthony B. Evans

5    Conclusions and Further Work
We have investigated the issue of propositional rule extraction from trained neu-
ral networks under background knowledge, for the case of definite rules. We have
shown that a mild assumption on the background knowledge and monotonicity
of the input-output function of the network suffices to guarantee that a reduced
logic program can be extracted such that the input-output function is exactly
reproduced. We have also shown that the solution is not unique. Furthermore,
we have provided algorithms for obtaining corresponding reduced programs.
    We consider our results to be foundational for further work, rather than
directly applicable in practice. Our observation that background knowledge can
yield simpler extracted rulesets of course carries over to more expressive logics
which extend propositional logic.
    It is such extensions which we intend to pursue, which hold significant promise
for practical applicability: structured information on the World Wide Web, as
discussed in the Introduction, is provided in logical forms which are usually
non-propositional fragments of first-order predicate logic, or closely related for-
malisms. In particular, description logics [1], i.e. decidable fragments of first-
order predicate logic, form the foundation of the Web Ontology Language OWL.
First-order rules are also commonly used [14]. This raises the question how to
extract meaningful non-propositional rules from trained neural networks while
taking (non-propositional) background knowledge, in a form commonly used on
the World Wide Web, into account.

Acknowledgements. The first two authors acknowledge support by the Ohio Fed-
eral Research Network project Human-Centered Big Data.


References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.):
    The Description Logic Handbook: Theory, Implementation, and Applications.
    Cambridge University Press, 2nd edn. (2010)
 2. Bader, S., Hitzler, P.: Dimensions of neural-symbolic integration – A structured
    survey. In: Artëmov, S.N., Barringer, H., d’Avila Garcez, A.S., Lamb, L.C., Woods,
    J. (eds.) We Will Show Them! Essays in Honour of Dov Gabbay, Volume One. pp.
    167–194. College Publications (2005)
 3. Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1. Turtle –
    Terse RDF Triple Language. W3C Recommendation (25 February 2014), available
    at http://www.w3.org/TR/turtle/
 4. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American
    284(5), 34–43 (May 2001)
 5. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data – The Story So Far. Interna-
    tional Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
 6. d’Avila Garcez, A., Besold, T.R., de Raedt, L., Földiak, P., Hitzler, P., Icard,
    T., Kühnberger, K.U., Lamb, L.C., Miikkulainen, R., Silver, D.L.: Neural-
    symbolic learning and reasoning: Contributions and challenges. In: McCallum,
    A., Gabrilovich, E., Guha, R., Murphy, K. (eds.) Proceedings of the AAAI 2015
                 Propositional Rule Extraction under Background Knowledge            11

    Spring Symposium on Knowledge Representation and Reasoning: Integrating Sym-
    bolic and Neural Approaches. AAAI Press Technical Report, vol. SS-15-03. AAAI
    Press, Palo Alto (2015)
 7. d’Avila Garcez, A.S., Lamb, L.C., Gabbay, D.M.: Neural-Symbolic Cognitive Rea-
    soning. Cognitive Technologies, Springer (2009)
 8. d’Avila Garcez, A.S., Zaverucha, G.: The connectionist inductive lerarning and
    logic programming system. Applied Intelligence 11(1), 59–77 (1999)
 9. Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data
    on the web. Commun. ACM 59(2), 44–51 (2016)
10. Hammer, B., Hitzler, P. (eds.): Perspectives of Neural-Symbolic Integration, Stud-
    ies in Computational Intelligence, vol. 77. Springer (2007)
11. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.):
    OWL 2 Web Ontology Language Primer (Second Edition). W3C Recommendation
    (11 December 2012), http://www.w3.org/TR/owl2-primer/
12. Hitzler, P., Krötzsch, M., Rudolph, S.: Foundations of Semantic Web Technologies.
    CRC Press/Chapman & Hall (2010)
13. Hitzler, P., Seda, A.K.: Mathematical Aspects of Logic Programming Semantics.
    CRC Press/Chapman and Hall (2010)
14. Krisnadhi, A., Maier, F., Hitzler, P.: OWL and rules. In: Polleres, A., d’Amato, C.,
    Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P.F. (eds.)
    Reasoning Web. Semantic Technologies for the Web of Data – 7th International
    Summer School 2011, Galway, Ireland, August 23-27, 2011, Tutorial Lectures. Lec-
    ture Notes in Computer Science, vol. 6848, pp. 382–415. Springer (2011)
15. Lehmann, J., Bader, S., Hitzler, P.: Extracting reduced logic programs from arti-
    ficial neural networks. Appl. Intell. 32(3), 249–266 (2010)
16. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement
    operators. Machine Learning 78(1-2), 203–250 (2010)
17. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N.,
    Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia – A large-
    scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2),
    167–195 (2015)
18. Lin, F., You, J.: Abduction in logic programming: A new definition and an abduc-
    tive procedure based on rewriting. Artif. Intell. 140(1/2), 175–205 (2002)
19. Sarker, M.K., Xie, N., Doran, D., Raymer, M., Hitzler, P.: Explaining trained
    neural networks with semantic web technologies: First steps. In: Proceedings of the
    Twelveth International Workshop on Neural-Symbolic Learning and Reasoning,
    NeSy’17, London, UK, July 2017 (2017), to appear
20. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Com-
    mun. ACM 57(10), 78–85 (2014)