<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>10:30 - 11:00 Oliver Ray and Bruno Golenia A Neural Network Approach for First-Order Abductive Inference 11:00 - 11:30 Amitabha Mukerjee Using attentive focus to discover action structures from perceptual data 11:30 - 12:00 Kun Tu and Hava Siegelmann Text-based Reasoning with Symbolic Memory Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lunch break</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <volume>1</volume>
      <fpage>32</fpage>
      <lpage>38</lpage>
      <abstract>
        <p>15:30 - 16:00 Sebastian Bader Extracting Propositional Rules from Feed-forward Neural Networks by Means of Binary Decision Diagrams 16:00 - 16:30 Amitabha Mukerjee and Madan Dabeeru Symbol emergence in design 16:30 - 16:45 Leo de Penning, Bart Kappe and Karel van den Bosch A Neural-Symbolic System for Automated Assessment in Training Simulators: A Position Paper 16:45 - 17:45 Discussion session, chaired by Luis Lamb</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Coffee break
Artur d'Avila Garcez, City University London, UK
Pascal Hitzler, University of Karlsruhe (TH), Germany
Programme Committee
Sebastian Bader, University of Rostock, Germany
Howard Blair, Syracuse University, NY, U.S.A.</p>
      <p>Claudia d’Amato, University of Bari, Italy
Marco Gori, University of Siena, Italy
Barbara Hammer, TU Clausthal, Germany
Ioannis Hatzilygeroudis, University of Patras, Greece
Steffen Hölldobler, TU Dresden, Germany
Henrik Jacobsson, Google Zurich, Switzerland
Ekaterina Komendantskaya, University of St. Andrews, Scotland
Kai-Uwe Kühnberger, University of Osnabrück, Germany
Luis Lamb, Federal University of Rio Grande do Sul, Brazil
Hannes Leitgeb, University of Bristol, UK
JamesL. McClelland, Stanford University, CA, U.S.A
Anthony K. Seda, University College Cork, Ireland
Ron Sun, Rensselaer Polytechnic Institute, NY, U.S.A.</p>
      <p>Frank van der Velde, Leiden University, The Netherlands
Gerson Zaverucha, Federal University of Rio de Janeiro, Brazil
Artificial Intelligence researchers continue to face huge challenges in their quest
to develop truly intelligent systems. The recent developments in the field of
neural-symbolic computation bring an opportunity to integrate well-founded
symbolic artificial intelligence with robust neural computing machinery to help
tackle some of these challenges.</p>
      <p>Neural-symbolic systems combine the statistical nature of learning and the
logical nature of reasoning.</p>
      <p>The Workshop on Neural-Symbolic Learning and Reasoning provides a forum for
the presentation and discussion of the key topics related to neural-symbolic
integration.</p>
      <p>Topics of interest include:
•
•
•
•
•
•
•
•</p>
      <p>The representation of symbolic knowledge by connectionist systems;
Learning in neural-symbolic systems;
Extraction of symbolic knowledge from trained neural networks;
Reasoning in neural-symbolic systems;
Biological inspiration for neural-symbolic integration;
Neural networks and probabilities;
Neural networks and relational learning;</p>
      <p>Applications in robotics, semantic web, engineering, bioinformatics, etc.
Keynote talk by Ben Goertzel: Cognitive Synergy: A Principle to
Guide the Tight Integration of Heterogeneous Components in
Integrative AI Systems………………………………………………………………1
Oliver Ray and Bruno Golenia: A Neural Network Approach for
First-Order Abductive Inference……………………………………………………2
Amitabha Mukerjee: Using attentive focus to discover action
structures from perceptual data……………………………………………………9
Kun Tu and Hava Siegelmann: Text-based Reasoning with
Symbolic Memory Model…………………………………………………………..16
Sebastian Bader: Extracting Propositional Rules from
Feed-forward Neural Networks by Means of Binary Decision
Diagrams……………………………………………………………………………22
Amitabha Mukerjee and Madan Dabeeru: Symbol emergence
in design……………………………………………………………………………28
Leo de Penning, Bart Kappe and Karel van den Bosch: A
Neural-Symbolic System for Automated Assessment in Training
Simulators: A Position Paper……………………………………………………..35
Cognitive Synergy: A Principle to Guide the Tight Integration
Heterogeneous Components in Integrative AI Systems
of
Ben Goertzel, Novamente LLC
The concept of "cognitive synergy" is introduced, as a formalization of the idea
that in a cognitive system containing multiple heterogeneous learning processes,
the different processes should be connected in such a way that each one can get
help from the others when it "gets stuck." The role of cognitive synergy in the
OpenCog integrative AI architecture is described, with examples given involving
the application of OpenCog to control animated agents in virtual worlds. The
potential implications of cognitive synergy for the design of neural-symbolic
systems is also discussed, in the context of an in-development system called
XIA-MAN that is intended to combine neural net evolution with OpenCog to
control a Nao humanoid robot.
A Neural Network Approach for First-Order Abductive Inference
Oliver Ray and Bruno Gole´nia
Department of Computer Science</p>
      <p>University of Bristol</p>
      <p>Bristol, BS8 1UB, UK
foray,goleniabg@cs.bris.ac.uk</p>
    </sec>
    <sec id="sec-2">
      <title>Abstract</title>
      <p>This paper presents a neural network approach for
first-order abductive inference by generalising an
existing method from propositional logic to the
first-order case. We show how the original
propositional method can be extended to enable the
grounding of a first-order abductive problem; and
we also show how it can be modified to allow the
prioritised computation of minimal solutions. We
illustrate the approach on a well-known abductive
problem and explain how it can be used to perform
first-order conditional query answering.
1</p>
    </sec>
    <sec id="sec-3">
      <title>Introduction</title>
      <p>Neurosymbolic research aims to combine neural inference
methods with symbolic knowledge formalisms in order to
better understand and exploit the cognitive functions of brain
and mind. The integration of neural networks and logic
programs is a major focus in this field. But, most existing work
only deals with neural representations of propositional logic
programs – which are not very well suited to applications
with complex or incomplete information. Thus, our goal is
to develop a connectionist approach for typed abductive logic
programs – that are specifically intended for this purpose.</p>
      <p>Abductive logic programs extend normal logic programs
by allowing the truth of particular literals, called abducibles,
to be assumed subject to certain conditions, called integrity
constraints, when deciding which instances of a query, or
goal, succeed from some program, or theory. Thus, any
solution to a first-order abductive problem has two parts: a set
of variable bindings, called an answer, denoting a successful
instance of the goal; together with a set of abducibles, called
an explanation, denoting a set of auxiliary assumptions. In
this way, abductive logic programming can be seen as a form
of conditional query answering.</p>
      <p>This paper presents a neural network approach for
firstorder abductive inference. The approach generalises an
existing method of Ray &amp; Garcez [9] from propositional logic
to the first-order case. Like its predecessor, the approach is
based on translating an abductive problem into a neural
network such that the fixpoints of the network are in one-to-one
correspondence with the solutions of the original problem.
Unlike other methods for neural abduction, such as [3; 10; 2;
13; 7; 12; 8; 1], our approach has the benefits of placing no
restrictions on the use of negation or recursion and being able
to systematically compute all required solutions.</p>
      <p>Our main contributions are twofold: we show how the
propositional method can be extended to enable the
practical grounding of a first-order abductive problem; and we also
show how it can be modified to allow the prioritised
computation of minimal solutions. The former extension uses
techniques from the field of Answer Set Programming (ASP) [6]
to reduce the number of logically redundant clauses and
literals in the ground program. The latter modification ensures
the hypothesis space is searched in a way that gives higher
priority to solutions with fewer abducibles. We illustrate the
approach and explain how it can be used to perform first-order
conditional query answering.</p>
      <p>The rest of the paper is structured as follows: Section 2
recalls the relevant background material; Section 3 presents
our approach; and the paper concludes with a summary and
directions for future work.
2</p>
    </sec>
    <sec id="sec-4">
      <title>Background</title>
      <p>This section recalls some basic notions of neural networks
and logic programs. The definitions closely follow those of
Ray &amp; Garcez [9], except that variables are now explicitly
typed and may appear in any input to an abductive problem.
(Threshold) Neural Networks: A neural network, or just
network, is a graph (N; E) whose nodes N are called neurons
and whose edges E N N are called connections. Each
neuron n 2 N is labelled with a real number t(n) called its
threshold and each connection (n; m) 2 E is labelled with a
real number w(n; m) called its weight. The state of a network
is a function s that assigns to each neuron the value 0 or 1. A
neuron is said to be active if its value is 1 and it is said to be
inactive if its value is 0. For each state s of the network, there
is a unique successor state s0 such that a neuron n is active in
s0 iff its threshold is exceeded by the sum of the weights on
the connections coming into n from nodes which are active
in s. A network is said to be relaxed iff all of its neurons are
inactive. A fixpoint of the network is any state that is identical
to its own successor state. If a fixpoint t is reachable from an
initial state s by repeatedly computing successor states, then
t is referred to as the fixpoint of s.
8 wont start(C) f lat battery(C) 9
&gt;&gt;&gt;&gt;&gt;&lt; wwoonntt ssttaarrtt((CC)) snpoafrkuepll(uCg)s dirty(C) &gt;&gt;&gt;&gt;=&gt;</p>
      <p>head lights work(c1)
&gt;&gt;:&gt;&gt;&gt; sfpuaerlkgapuluggesedmirpttyy((cc32)) &gt;&gt;&gt;&gt;;&gt;
wont start(C) ^ :spark plugs dirty(C)
f lat battery(C) ^ head lights work(C)
no f uel(C) ^ :f uel gauge empty(C) ^ :broken gauge(C)
8&lt; fnloatf ubaelt(tCer)y(C) =9
: broken gauge(C) ;</p>
      <p>car(C)
(Typed) Logic Programs: A typed logic program, or just
program, is a set of rules of the form H B1 ^ : : : ^ Bn ^
:C1 ^ : : : ^ :Cm in which the H, Bi and Cj are atoms
and in which any variable is associated with a unary
predicate called its type or domain. The atom to the left of the
arrow is called head of the rule, while the literals to the right
comprise the body. The head atom H and the positive body
atoms Bi are said to occur positively in the rule, while the
negated body atoms Cj are said to occur negatively. A rule
with no negative body literals is called a definite clause and
written H B1 ^ : : : ^ Bn. A rule with no body literals
is called a fact and simply written H. A rule with no head
literal is called a denial and written B1 ^ : : : ^ Bn. If P
is a program, then BP (the Herbrand base of P ) is the set
of all atoms built from the predicate and function symbols
in P ; and GP (the ground expansion of P ) is the program
comprising all well-typed ground instances of the clauses in
P . In addition, Ap+ and AP denote, respectively, the sets of
ground atoms that occur positively and negatively in GP . A
stable model of P is a Herbrand interpretation I BP that
coincides with the least Herbrand model of the definite
program P I obtained by removing from GP each rule containing
a negative literal not in I, and by deleting all of the negative
literals in the remaining rules.</p>
      <p>Abductive Logic Programs: An abductive logic program
[4] is a triple (T; IC; A) where T is a program (the theory),
IC is a set of denials (integrity constraints), A is a set of
facts (abducibles). Given a conjunction G of literals (goals),
the task of ALP is to compute a set of variable substitutions
(an answer) and a set of ground abducibles (an
explanation) such that there is a stable model of T [ (which, in
the terminology of [5], is called a generalised stable model
of T ) that satisfies all of the denials in IC and all of the
literals in G . To specify an abductive problem, one must state
the theory T , goal G, constraints IC, and abducibles A. If
needed, the types of any variables can be explicitly given as a
set of facts D (domain declarations) with one atom p(X) for
every variable X of type p. For convenience, we will refer
to the collection of five inputs (T; G; IC; A; D) as a (typed)
abductive context. A context is propositional if it contains no
variables. When a context has many different solutions it is
usual to prefer explanations with the fewest number of
abducibles. Intuitively, this corresponds to the principle of
Occam’s Razor, which favours the simplest hypotheses that fit
the data. In practice, this means that subset-minimal and/or
cardinality-minimal explanations are usually required.</p>
      <p>Example 2.1. Consider the abductive context in Figure 1.</p>
      <p>The theory T describes a collection of three classic cars. It
states that a car C wont start if its battery is flat, if its fuel tank
is empty, or if its spark plugs are dirty. It also states that the
headlights of the first car c1 are working, that the fuel gauge
of the second car c2 is showing empty, and that the spark
plugs of the final car c3 are dirty. The constraints IC state,
firstly, that the battery of a car cannot be flat if the headlights
of that car are working and, secondly, that it is impossible
for a car to have no fuel if its fuel gauge is not empty and
not broken. The abducibles A allow us to assume that any
car has a flat battery, has no fuel, and/or has a broken fuel
gauge. The goal G asks for which cars C it is possible to
show that (1) the car does not start and that (2) the car does
not have dirty spark plugs. The domain declarations D assert
that all occurrences of the variable C represent cars.</p>
      <p>This problem has many solutions. There are two
cardinality-minimal solutions which bind C to c2 after
abducing no f uel(c2) or f lat battery(c2), respectively. There
is one more subset-minimal solution which binds C to c1
after abducing no f uel(c1) and broken gauge(c1). Note
that, by the second constraint, no f uel(c1) can only be
abduced if broken gauge(c1) is also abduced (at its gauge
is not showing empty). Note also, by the first constraint,
f lat battery(c1) cannot be abduced (as its headlights work).</p>
      <p>More than a hundred non-minimal solutions can be obtained
by adding redundant abducibles to the explanations above.</p>
      <p>As non-minimal solutions are often of little practical use,
this paper will develop a neural approach for preferentially
finding minimal solutions of first-order abductive problems.</p>
      <p>First order Neural network abduction
The approach in this section builds on well-known methods
for translating propositional logic programs into neural
networks whose fixpoints correspond to stable models. In actual
fact, we build upon an extension of these methods proposed
by Ray and Garcez [9] for translating propositional abductive
contexts into neural networks whose fixpoints correspond to
generalised stable models. This is done by rewriting negative
literals as abducibles and adding clauses that, when translated
into a neural network, perform a systematic search through
the space of abductive solutions.</p>
      <p>In principle, one could try and solve a first-order problem
by applying the method of [9] to the propositional context
obtained by taking all possible well-typed ground instances of
the abductive inputs. But, in practice, this naive approach is
not feasible as it generates neural networks that are too large
and take too long to find the minimal solutions.</p>
      <p>To address these limitations, we developed an improved
method for more effectively computing the minimal solutions
of first-order problems. The new approach exploits an ASP
system called LPARSE [11] to allow the efficient grounding
of a first-order program; and it also ensures the prioritised
computation of minimal solutions by modifying the search
strategy of the original method.</p>
      <p>Given a typed first-order abductive context, the new
method has six main steps: First it introduces new predicates
into the language to represent abducibles and negations. Then
it translates the typed abductive context into a normal
propositional abductive context using LPARSE together with some
appropriate pre- and post-processing. Next it rewrites any
negative literals in the theory as abducibles subject to some
simple integrity constraints that preserve their logical
meaning. After that, it adds some clauses that allow the
resulting network to compute abductive solutions. At this point, it
translates the resulting context into a neural network. Finally
it allows the network to compute the abductive solutions in
order of minimality. These steps are now described in turn.</p>
      <p>STEP 0: Extending the Language
The logical transformations used in our method require the
introduction of predicates to represent abducibles and
negations. For every predicate p in a given abductive context, we
assume two new predicates, denoted py and p , which
represent the assumption of p and the negation of p, respectively.</p>
      <p>STEP 1: Obtaining a Propositional Context
Our approach for translating a typed first-order abductive
context into a normal propositional abductive context uses a
system called LPARSE which, as described in [11], is
practical tool for grounding typed logic programs. Compared to a
naive propositionalisation approach, LPARSE further
simplifies the ground program by removing literals that are known
to be true from the body of a clause and by removing clauses
whose bodies are known to be false from the program. Since
LPARSE is not specifically designed to work with abductive
logic programs, some pre- and post-processing is needed to
make the approach work. Thus we have three sub-steps:</p>
      <p>Pre-processing: We first turn an abductive context into
a logically equivalent program by temporarily employing
negative cycles to represent abducibles. This is needed to
prevent LPARSE from treating abducibles as undefined
domain predicates, which would otherwise be removed from
the ground program. Then domain declarations of the form
#domain p(V ) are added for each variable V of type p in D
(which causes LPARSE to add the atom p(V ) into the body of
every clause containing V ). Finally, the goal G is turned into
a clause by using a new propositional atom goal as the head.</p>
      <p>This is carried out by the function below.</p>
      <p>Definition 3.1 ( ). Let X = (T; G; IC; A; D) be a typed
abductive context and let G0, A0 and D0 be as follows</p>
      <p>G0 = fgoal L1 ^ : : : ^ Ln j L1 ^ : : : ^ Ln = Gg,
A0 = aa a 2 A
:a
:a
D0 = f#domain p(V ) j p(V ) 2 Dg
Then Y = (X) is the logic program T [G0 [IC [A0 [D0.</p>
      <p>Grounding: The resulting program is now grounded and
simplified by LPARSE, as indicated by the function below.</p>
      <p>Definition 3.2 ( ). Let Y be a logic program (possibly
containing domain declarations). Then Z = (Y ) is the ground
logic program obtained by running LPARSE on Y .</p>
      <p>Post-processing: After running LPARSE, any surviving
ground instances of the negative cycles introduced by the
preprocessing phase are converted back into explicit abducibles.</p>
      <p>This is done by replacing each cycle of the form a :a
and a :a with a bridging clause a ay and making
the ground atom ay abducible. It is interesting to note that
this re-conversion is not strictly necessary as any negations
will be replaced by abducibles and integrity constraints in the
next step. But, it is more efficient to exploit the fact that each
cycle represents a single abducible in order to avoid the
unnecessary introduction of one additional abducible and two
additional integrity constraints. The bridging clauses allow
to distinguish instances of an atom whose truth is abduced
from those whose truth is implied. In addition, any resulting
ground instances of the goal clause are added to the theory
T1 and their head atom goal becomes the new propositional
goal. This process is performed by the function below.
define
Definition 3.3 ( ). Let Z be any ground logic program and</p>
      <p>ZIC = f L1 ^ : : : ^ Ln 2 Zg
ZG = fgoal L1 ^ : : : ^ Ln 2 Zg
ZA
= fa
:a 2 Zg [ fa</p>
      <p>:a 2 Zg</p>
      <p>ZT [ ZG [ fa</p>
      <p>ZT = Z n (ZIC [ ZG [ ZA)</p>
      <p>Then W = (Z) is the propositional abductive context
(T1; G1; IC1; A1; D1) such that</p>
      <p>T1 =
:a g 2 ZAg
ay j fa
G1
IC1
A1
D1
=
=
=
= ;
fgoalg
ZIC
fay j fa
:a g 2 ZAg
&gt;
&gt;
&gt;
&gt;
&gt; car(c3)
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c1)
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c2)
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c3)
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c1)
&gt;
&gt;
&gt;
&gt;
&lt; wont start(c2)</p>
      <p>wont start(c3)
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c1)
&gt;
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c2)
&gt;
&gt;
&gt;
&gt;
&gt;
&gt; wont start(c3)
&gt;
&gt;
&gt;
&gt;
&gt; spark plugs dirty(c3)
&gt;
&gt;
&gt;
&gt;
&gt; f uel gauge empty(c2)
&gt;
&gt;
&gt;
&gt;
: head lights work(c1)
8
&gt;
&gt;
&gt;
&lt;
&gt;
&gt;
8
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
:
f
:
:
goal</p>
      <p>g
goal
goal
&gt; goal
:
wont start(c1)
wont start(c2)
wont start(c3)
^:
^:
^ :
spark plugs dirty(c1)
spark plugs dirty(c2)
spark plugs dirty(c3) &gt;
;
9
&gt;
&gt;
&gt;
=
&gt;
&gt;
9
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
=
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
;
f lat battery(c1) head lights work(c1)</p>
      <p>^
f lat battery(c2) head lights work(c2)</p>
      <p>^
f lat battery(c3) head lights work(c3)</p>
      <p>^
no f uel(c1) broken gauge (c1)
^
^:</p>
      <p>f uel gauge empty(c1)
^ :
no f uel(c2)
broken gauge(c2)</p>
      <p>f uel gauge empty(c2)
^ :
no f uel(c3) broken gauge (c3)
^</p>
      <p>^:
broken gauge (c1) broken gauge (c1)</p>
      <p>^
broken gauge (c1)</p>
      <p>broken gauge (c1)
broken gauge (c3)</p>
      <p>broken gauge (c3)
^
broken gauge (c3)</p>
      <p>broken gauge (c3)
^ :
^ :</p>
      <p>f uel gauge empty(c3)
f lat battery(c1)
f lat battery(c2)
f lat battery(c3)
no f uel(c1)
no f uel(c2)
no f uel(c3)
spark plugs dirty(c3)
&gt; &gt;
&gt; &gt; no f uel(c3)
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
spark plugs dirty(c1)
&gt; &gt;
&gt; &gt; f lat battery(c1)
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
spark plugs dirty(c2) &gt; &gt; f lat battery(c2)
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; :</p>
      <p>f lat battery(c3)
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt; 8
&gt; broken gauge(c1)
&gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; broken gauge(c2)
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; broken gauge(c3)
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; no f uel(c1)
&gt; &gt;
&gt; &gt;
= &lt;</p>
      <p>no f uel(c2)
[
no f uely(c1)
no f uely(c2)
no f uely(c3)
broken gaugey(c1)
&gt;
&gt;
&gt;
broken gaugey(c2) &gt;
&gt;
&gt;
&gt;
&gt;
broken gaugey(c3) &gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
=
9
f lat batteryy(c1)
f lat batteryy(c2)
f lat batteryy(c3)
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
;
no f uely(c1); f lat batteryy(c1); broken gaugey(c1); broken gauge (c1)
no f uely(c2); f lat batteryy(c2); broken gaugey(c2)
no f uely(c3); f lat batteryy(c3); broken gaugey(c3); broken gauge (c3)
Example 3.1. The result of applying the above
transformation to the abductive context of Example 2.1 is shown in
Figure 2 — which, for convenience, also shows, within boxes, the
clauses and literals removed by LPARSE. Note that the
omitted rules can never be true and the omitted constraints can
never be violated.</p>
      <p>STEP 2: Obtaining a Definite Theory
To avoid potential problems resulting from the unrestricted
use of negation in the theory (which could result in networks
where some states have no fixpoints) we use a well-known
equivalence between negation and abduction which allows us
to treat each negative literal :p(t1; : : : ; tn) as an abducible
p (t1; : : : ; tn) subject to integrity constraints stating that an
atom and its negation cannot both be true or both be false in
the same model. Intuitively, we are free to assume the
falsity of any atom if it is consistent to do so. As formalised by
the function , below, we add such constraints for all ground
atoms appearing negatively in the context. For convenience,
this function uses some notation for representing the positive
form of an expression obtained by replacing all negative
literals by their abducible proxies.</p>
      <p>Definition 3.4 ( ). Let C = H B1 ^ : : : ^ Bn ^ :C1 ^
: : : ^ :Cm be a clause. Then C is the (definite) clause C =
H B1 ^ : : : ^ Bn ^ C1 ^ : : : ^ Cm.</p>
      <p>Definition 3.5 ( ). Let W = (T1; G1; IC1; A1; D1) be a
propositional abductive context. Then (W ) is the abductive
context, (T2; G2; IC2; A2; D2) such that</p>
      <p>T2 =
G2 =
IC2 =
fC j C 2 T1g
fC j C 2 G1g
fC j C 2 IC1g
a ^ a
:a ^ :a
A [ fa j a 2 AT1[IC1[G1 g
D1</p>
      <p>a 2 AT1[IC1[G1
A2
D2
STEP 3: Adding Clauses for Abduction
We now add some extra clauses that, when translated into a
neural network, will perform a systematic search through the
space of abductive solutions.</p>
      <p>As in the method of Ray &amp; Garcez [9], we use a binary
counter whose output determines which abducibles are
assumed and which are not. In earlier work, the output of this
counter followed a simple binary sequence: 000, 001, 010,
011, 100, etc. But, in this paper, we wish to keep the
number of abducibles as small as possible for as long as possible.</p>
      <p>Thus we use a modified search sequence (called a Banker’s
sequence) where the number of true bits increases
monotonically: 000, 100, 010, 001, 110, etc.</p>
      <p>In fact, we use two cascaded Banker’s sequences, one for
pure abducibles ay 2 A2 and one for (the negations of)
the negative literals a 2 A2. The clauses which define
these counters are formalised in the theory C below. Firstly,
C1;n1 denotes a counter with n1 bits a1n1 ; : : : ; a11 bits
corresponding to the abducibles ayn1 ; : : : ; ay1. Secondly, C2;n2
denotes a counter with n2 bits a2n2 ; : : : ; a21 bits corresponding
to the abducibles an2 ; : : : ; a1. Each counter i has HOLDi
and M OV Ei conditions that determine its behaviour when a
global next signal is applied to the network. If HOLDi is
true, it will keep its current value. If M OV Ei is true, it will
advance to the next value in the Banker’s sequence. If neither
is true, the counter will reset to zero.</p>
      <p>The definition of each counter Ci;n is split into five macros
Ci;n to Ci;n that together provide a logical specification of</p>
      <p>1 5
the counter state transition table. In brief, bij is true iff the
jth bit of counter i is true; cij is true iff bij is true and bij 1 is
false; dij is true iff cij is true and cik is false for all k &lt; j; eij is
true iff bij is false and bij 1 is true; and alli is true iff all the
bits of counter i are true.</p>
      <p>C
Ci;n
Ci;n</p>
      <p>1
Ci;n</p>
      <p>2
Ci;n</p>
      <p>3
Ci;n</p>
      <p>4
Ci;n
5
=
=
=
=
=
=
=</p>
      <p>C1;n1 [ C2;n2
5
[ Ci;n
k
such that</p>
      <p>where
k=1
8
&gt;&gt;&gt; bik
&gt;
&gt;
&gt;
n &gt;&gt;
[ &lt;
bik
n
^
j=1
n
^</p>
      <p>9
:dij ^ bin k ^ M OV Ei &gt;&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
=
:dij ^ ein k ^ M OV Ei
k8=0 &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;: bbiikk n jbb=iikk1^^ :HnOeLxtDi 9
&gt;&gt;&gt;&lt;&gt; bin ^ :bij ^ M OV Ei &gt;&gt;&gt;&gt;=
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
;
j=0
n
^ bi</p>
      <p>j
j=0
&gt;&gt;&gt; alli
&gt;
:
n k 1
[ [
k=2 j=0
8 eik
&gt;&gt;&gt; ci
&gt; k
&gt;
n &gt;&gt;
[ &lt; dik
k=1 &gt;&gt;</p>
      <p>&gt; i
n &gt;:&gt;&gt;&gt; nbakik 1
[ [
&gt;
&gt;
&gt;
&gt;
;
9
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
=
bik 2 j
bj ^ dk ^ M OV Ei
i i
i i
:bk ^ bk 1
i i
bk ^ :bk 1
k 1
^</p>
      <p>i i
:cj ^ ck
ABDki
j=1 &gt;
dik ^ M OV Ei &gt;&gt;&gt;&gt;&gt;
&gt;
;
bij
bj ^ dk ^ M OV Ei
i i
next ^ all2
next ^ :all2
:all2
f alse
b1k for all k
:b2k for all k</p>
      <p>The 2nd counter increments until its maximum is reached,
whence it resets. The 1st counter holds until the 2nd
resets, whence it increments. To preferentially minimise the
number of assumed abducibles the outputs a1n1 : : : a11 of the
1st counter are obtained by copying the corresponding bits
b1n1 : : : b11. To preferentially maximise the number of atoms
assumed false, the outputs a2n2 : : : a21 of the 2nd counter are
obtained by negating the corresponding bits b2n2 : : : b21.</p>
      <p>As in [9], we advance the counter if a fixpoint f p is reached
that violates the goal goal or integrity constraints ic.
Previously, this was done by a clock with a constant period
determined by worst case propagation delay through the
network generated by the program. For efficiency, we now add
clauses to detect as soon as a fixpoint is reached. These are
given by the theory S below, which uses four new
propositions z0; z+; z ; z$ to compare the current state of each atom
z (in a set Z of atoms) with their previous states.</p>
      <p>In brief, z0 denotes the previous state of z; z+ is true iff
the current and previous states are both true; z is true iff
they are both false; and z$ is true iff they are the same. The
w1-w4 are wait signals that just give the counters enough
time to compute their next value before the network decides
whether the current abducibles are a solution (in which
case the signal soln is activated) or whether another set
of abducibles are needed (in which case the signal next is
activated). f p indicates when a fixpoint has been reached.</p>
      <p>The signal done indicates when all solutions are exhausted.</p>
      <p>S
SZ</p>
      <p>1
SZ</p>
      <p>2
SZ
3
=
=
=
=</p>
      <p>SZ
1 [ S2Z [ S3Z
8 z0 z
&gt;&gt;&gt; z+
[ &lt; z</p>
      <p>$
z2Z &gt;&gt;&gt;: zz$
where</p>
      <p>9
z ^ z0 &gt;&gt;&gt;
:z ^ :z0 =
z+ &gt;&gt;
&gt;
;
z
next
w1
w2
w3
8 w1
&gt;&gt; w2
&gt;
&gt;
&gt;
&gt;&lt; w3</p>
      <p>w4
&gt;
&gt;
&gt;&gt;&gt;&gt; f p
:
8 next
&gt;&lt; next</p>
      <p>soln
&gt;: done
^ z$ ^ ^4 :wi ^ :next &gt;&gt;&gt;&gt;&gt;&gt;
;
9
&gt;
=
z2Z i=1
:next ^ f p ^ ic
:next ^ :goal ^ f p
:next ^ goal ^ f p ^ :ic
all1 ^ all2 ^ f p
&gt;
;
9
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
=</p>
      <p>To properly specify these additional clauses, it is necessary
to state the parameters n1, n2 and Z in C and S. This is done
by the function below, which defines n1 as the number of
abducibles ay, n2 as the number of negations a , and Z as
the set of all atoms appearing (positively or negatively) in
the context. In addition, this function adds a special atom
ic into the head of every integrity constraint, and adds the
literal :next into the body of every rule to ensure the
network is fully relaxed whenever a new solution is attempted.</p>
      <p>The resulting network architecture is summarised in Figure 3.</p>
      <p>Definition 3.6 ( ). Let (T2; G2; IC2; A2; D2) be a
propositional abductive context with G2 = fgoalg and D2 = ;.</p>
      <p>Now let n1 = jfay 2 A2gj, let n2 = jfa 2 A2gj, let
Z = AT+2[G2[IC2 [ AT2[G2[IC2 , and let R be the program</p>
      <p>R = H B1 ^ : : : ^ Bn ^ :next</p>
      <p>j H B1 ^ : : : ^ Bn 2 T2
Now we translate the clauses obtained so far into a network
using the method of Ray &amp; Garcez [9], recalled below, which
is based on well-known neurosymbolic techniques.</p>
      <p>Definition 3.7 ( ). Let P be a logic program, then (P ) is
the network (N; E) such that
N = [
E = [
r2GP
r2GP
and for all r = H
t(r) = n</p>
      <p>1=2
r; H; B1; : : : ; Bn; C1; : : : ; Cm
j r = H B1 ^ : : : ^ Bn ^ :C1 ^ : : : ^ :Cm
(r; H); (B1; r); : : : ; (Bn; r); (C1; r); : : : ; (Cm; r)
j r = H B1 ^ : : : ^ Bn ^ :C1 ^ : : : ^ :Cm</p>
      <p>B1^: : :^Bn^:C1^: : :^:Cm 2 GP
t(H) = 1=2
t(Bi) = 1=2
t(Cj ) = 1=2
w(r; H) = 1
w(Bi; r) = 1
w(Cj ; r) =
1</p>
      <p>STEP 5: Computing Abductive Solutions
In the last step, we compute the answers with the neural
network. Starting from a relaxed network, we activate the
node next. Then, on every other subsequent state, we check
whether the network has reached a fixpoint where soln
activated. If it has, then the current abducibles are recorded and
we see which of the goal rules is activated in order to extract
the successful ground instances of the query. If more
solutions are required then we activate next in order to seek the
next answer. We stop this process when a suitable answer is
computed or done is true, meaning the entire search space
has been explored.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>We believe that the integration of abductive and inductive
inference is necessary to develop improved learning and
reasoning approaches. In this paper we presented a neurosymbolic
approach for first-order abductive inference. Unlike most
other such approaches, we do not impose any restrictions on
the use of negation or recursion. However, the ASP
grounding procedure assumes that all variables have finite domains,
which limits the use of function symbols. We have
implemented our method using the C++ programming language
and applied it to the abductive context shown in Figure 1.
In this way, we correctly obtained the minimal solutions
followed by all of the other non-minimal solutions. Even in this
simple example we note that the search time increases
significantly if a naive grounding is used in place of our more
efficient ASP approach. We also note that the time needed
to compute all minimal solutions is significantly less than
the time needed to compute all (non-minimal) solutions. In
this sense we have improved existing methods of
neurosymbolic abduction. However, the complexity of the approach
is still exponential in the number of abducibles, which limits
its practical use. In future work, we will compare our neural
approach with related symbolic approaches and investigate
ways of parallelising our method.
stable
states
z $i
fp
z i</p>
      <p>SZ2</p>
      <p>SZ3
next
current
states
SZ1</p>
      <p>R
soln done
goal</p>
      <p>ic
a1† a†n1 a1* a*n2
C2,n2</p>
      <p>C1,n1</p>
      <sec id="sec-5-1">
        <title>MOVE2</title>
      </sec>
      <sec id="sec-5-2">
        <title>HOLD2</title>
      </sec>
      <sec id="sec-5-3">
        <title>MOVE1</title>
        <p>HOLD1
all 2
an22
a21
all 1
a1n1
a11
negative assumptions
abducible assumptions</p>
        <p>Using attentive focus to discover action ontologies from perception</p>
        <p>Amitabha Mukerjee
Department of Computer Science &amp; Engineering</p>
        <p>Indian Institute of Technology Kanpur</p>
        <p>amit@cse.iitk.ac.in
The word “symbol”, as it is used in logic and
computational theory, is considerably
different from its usage in cognitive linguistics and
in everyday life. Formal approaches that
define symbols in terms of other symbols
ultimately need to be grounded in
perceptualmotor terms. Based on cognitive evidence that
the earliest action structures may be learned
from perception alone, we propose to use
attentive focus to identify the agents
participating in an action, map the characteristics of
their interaction, and ultimately discover
actions as clusters in perceptuo-temporal space.</p>
        <p>We demonstrate its applicability by learning
actions from simple 2D image sequences, and
then demonstrate the learned predicate by
recognizing 3D actions. This mapping, which also
identifies the objects involved in the
interaction, informs us on the argument structure of
the verb, and may help guide syntax.
Ontologies in such systems are learned as different
granularities in the clustering space; action
hierarchies emerge as membership relations
between actions.
1
Learning the concepts for concrete objects require the
perceptual system to abstract across visual presentations
of these objects. In contrast, modeling actions present
a more complex challenge [Fleischman and Roy, 2005],
[Sugiura and Iwahashi, 2007]. Yet actions are the
central structure for organizing concepts; the corresponding
language units (verbs) also acts as “heads” (predicates)
in sentences, controlling how an utterance is to be
interpreted. Typically the structure for an action/verb
includes a set of possible constituents that participate
in the action, and also some constraints on the type of
action (e.g. the type of motion that may constitute “A
chases B”).</p>
        <p>In this work, we consider the learning of the
structure of actions, based on image sequences. Cognitively,
there is evidence that some action schemas are acquired
through perception in a pre-linguistic stage [Mandler,
2004]; later these are reinforced via participation, and
may eventually seed linguistic aspects such as argument
structure.</p>
        <p>We postulate that a key aspect of this process is the
role of perceptual attention [Regier, 2003],[Ballard and
Yu, 2003]. Thus, an action involving two agents may
involve attention shifts between them, which helps limit
the set of agents participating in the action. The set of
agents participating in an action eventually generalizes
to the argument structure. In [Ballard and Yu, 2003],
human gaze was directly tracked and matched with
language fragments, and verbs such as “picking up” and
“stapling” were associated with certain actions.
However, the verbal concepts learned were specific to the
context, and no attempt was made to generalize these
into action schemas, applicable to new scenes or
situations. Top down attention guided by linguistic inputs is
used to identify objects in [Roy and Mukherjee, 2005].</p>
        <p>More recently, in [Guha and Mukerjee, 2007] attentive
focus is used to learn labels for simple motion
trajectories, but this is also restricted to a particular visual
domain.
1.1</p>
        <p>From Percept to Concept to Symbol
The word “symbol”, as it is used in logic and
computational theory is considerably different from its usage
in cognitive linguistics and in everyday life. The OED
defines it as “Something that stands for, represents, or
denotes something else”. This meaning carries over to
the cognitive usage, where it is viewed as a tight coupling
of a set of mental associations (the semantic pole) with
the psychological impression of the sound (the
phonological pole) [?]. Formally, however, a symbol is detached
from any meaning, it is just a token constructed from
some finite alphabet, and is related only to other such
tokens. A computer system dealing with such symbols
can define many relations with other symbols, but finds
it difficult to relate it to the world, and this makes it
difficult also to keep the relations between symbols up to
date. The objective of this work is to try to align a
symbol to a perceptual stimulus, so as to provide grounding
for the symbols used in language or in reasoning.</p>
        <p>In other work, we have addressed the question of
learning the language label (or the phonological pole)
of a symbol [Satish and Mukerjee, 2008]. Here we
focus on modeling the semantic pole, especially with
respect to action ontologies. Such models, called Image
Schema in Cognitive Linguistics [Langacker, 1999] or
Perceptual Schema in Experimental Psychology
[Mandler, 2004], involve abstractions on low-level features
extracted from sensorimotor modalities (positions and
velocities), as well as the argument structure.</p>
        <p>We ask here if, given a system that is observing a
simple 2D scene (see fig. 1) with shapes like squares and
circles chasing each other, is it possible for it to cluster
all 2-agent interactions in some meaningful way into a
set of action schemas? If so, do these action schemas
relate reliably to any useful conceptual structures?
Further, is there any possibility of learning any
relationships between these action schemata, thus constructing
a primitive ontology? Note that all this has to take place
without any language, witout any human inputs in any
form.</p>
        <p>Constructing such action templates has a long history
in Computer vision, but most gather statistics in
viewspecific ways with an emphasis on recognition [Xiang
and Gong, 2006; ?]. We restrict ourselves to two-object
interactions, using no priors, and our feature vectors are
combinations of relative position and velocity vectors of
the objects (we use a simple inner product). We perform
unsupervised clustering on the spatio-temporal feature
space using the Merge Neural Gas algorithm [Strickert
and Hammer, 2005]; the resulting clusters constitute our
action schemas. By considering different levels of cluster
granularity in the unsupervised learning process, we also
learn subsets of coarse concepts as finer action concepts,
resulting in an action hierarchy which may be thought
of as a rudimentary ontology.</p>
        <p>Having learned the action schema based on a given
input, we apply it to recognize novel 2-body interactions
in a 3D fixed camera video, in which the depth of a
foreground object is indicated by it’s image y-coordinate.</p>
        <p>We show that the motion features of humans can be
labelled using the action schemas learned.
2</p>
        <p>Analysis: Role of Attentive Focus
One of the key issues we explore in this work is the
relevance of perceptual attention. It turns out that
restricting computation to attended events somehow results in
a better correlation with motions that are named in
language. This may reflect a bias in conceptualization
towards actions that attract attention. Like other models
that use attention to associate agents or actions to
language [Ballard and Yu, 2003; Guha and Mukerjee, 2007],
we use attentive focus to constrain the region of visual
salience, and thereby the constituents participating in an
action. We use a computational model of dynamic visual
attention [Singh et al., 2006] to identify agents possibly
in focus.</p>
        <p>In order to analyze the different types of motion
possible in the scene, we first perform a qualitative analysis
of the motions. We assume that all objects have an
intrinsic frame with a privileged “front” direction defined
either by its present direction of motion, or by the last
such observed direction. Let the reference object be A,
then the pose of located object B w.r.t. the frame of A
can be described as a 2-dimensional qualitative vector
[Forbus et al., 1987], where each axis is represented as
{−, 0, +} instead of quantitative values. This results in
eight possible non-colliding states for the pose of B. In
each pose, the velocity of B is similarly encoded,
resulting in 9 possible velocities (including non-moving).</p>
        <p>This results in 72 possible relations, and
distinguishing the situation when the reference object A is
moving, from that when it is stationary, results in a total of
144 possible states. Linguistic labels(Come-Close(CC),
Move-Away(MA), Chase(CH), Go-Around(GoA),
MoveTogether(MT), Move-Opposite(MO)) are manually
assigned to each of these qualitative relative motion states.</p>
        <p>The motion in nearly half the states do not appear to
have clear linguistic terms associated with them, and
these undenominated interactions are left empty. The
remaining classes assigned are shown in Figure 2.
Qualitative classification for the frames in Fig.1 is shown in
Fig. 3.</p>
        <p>Next, we analyze the frequency of these cases observed
on the Chase video. Fig. 2 compares the frequency of the
qualitative states with non-stationary first object, in the
situation where all possible object pairs are considered
(no attentive focus), versus that where using attentive
cues pairs of agents attended to within a temporal
window of 20 frames become candidates for mutual
interaction; all other agent pairings are ignored. The frequency
of indeterminate qualitative cases are 58% in the first
situation and 24% in the second. Thus, attentive focus
biases the learning towards relations that we have names
for in language.
3
We consider a bottom-up model of visual attention (not
dependent on task at hand) [Itti, 2000]. Here we
consider a model designed to capture bottom-up attention
in dynamic scenes based on motion saliency [Singh et al.,
2006]. Objects are taken as the attentive foci instead of
pixels. Motion saliency map is computed from optical
flow, a confidence map is introduced to assign higher
salience to objects not visited for a long time. A small
3</p>
        <sec id="sec-5-3-1">
          <title>Name</title>
          <p>pos·velDiff
pos·velSum</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>Formula</title>
          <p>(~xB − ~xA) · (~vB − ~vA)
(~xB − ~xA) · (~vB + ~vA)
foveal bias is introduced to mediate in favour of proximal
fixations against large saccadic motions.
Winner-TakeAll network on the combined saliency map gives the most
salient object for fixation.
4</p>
          <p>Unsupervised Perceptual Clustering
Perceptual systems return certain abstractions of the
raw sensory data - “features” - which are used for
recognition, motor control, categorization, etc. In this work
we use two features that capture the interaction of two
agents. All learning takes place in the space of these two
features, (Table 1); the first feature captures the
combination of relative position and velocity, the second the
relative position and magnitude.</p>
          <p>These feature vectors are then clustered into
categories in an unsupervised manner based on a notion of
distance between individuals. We use the Merge
Neural Gas(MNG) algorithm[Strickert and Hammer, 2005]
for unsupervised learning which has been shown to be
well-suited for processing complex dynamic sequences as
compared to the other existing models for temporal data
processing like Temporal Kohonen map, Recursive SOM
etc. This class of temporal learning algorithms are more
flexible with respect to the state specifications and time
history compared to HMMs or VLMMs. MNG algorithm
performs better than other unsupervised clustering
algorithms like K-Windows [Vrahatis et al., 2002], DBSCAN
[Ester et al., 1996] because of the utilization of the
temporal information present in the frame sequences unlike
the other algorithms.
The Neural Gas algorithm [Martinetz and Schulten,
1994] learns important topological relations in a given
set of input vectors (signals) in an unsupervised manner
by means of a simple Hebb-like learning rule. It takes a
distribution of high-dimensional data, P(ξ) and returns
a densely connected network resembling the topology of
the input.</p>
          <p>For input feature vectors arriving from temporally
connected data, the basic neural gas algorithm can be
generalized by including explicit context representation
which utilizes the temporal ordering present in the
feature vectors of the frames, resulting in the Merge Neural
Gas algorithm [Strickert and Hammer, 2005]. Here, a
Context vector is adjusted based on the present winning
neuron data. Cluster labels for the frames are obtained
in the final iteration of the algorithm based on the
winner neuron.
5</p>
          <p>Concept Acquisition: Chase video
Unsupervised clustering using the Merge Neural Gas
algorithm is used on the feature vectors from the video,
corresponding to object pairs that were in attentive
focus around the same time. Salient objects in a scene
are ordered by a computational model of bottom-up
dynamic attention[Singh et al., 2006]. The most salient
object is determined for each frame, and other objects
that were salient within k frames before and after (we
use k = 10) are considered as attended simultaneously.</p>
          <p>Dyadic feature vectors are computed for all object pairs
in these 2k frames.</p>
          <p>Owing to the randomized nature of the algorithm, the
number of clusters varies from run to run. Clusters with
less than ten frames are dropped. With the aging
parameter set to 30, the number of clusters came out to
be four in 90% of the runs; the set of four clusters with
highest total classification accuracy (refer Table 2) are
considered below.</p>
          <p>In order to validate these clusters with human
concepts, we asked three subjects (Male, Hindi-English/
Telugu-English bilinguals, Age-22, 20 and 30) to label
the scenes in the video. They were shown the video twice
and in the third viewing they were asked to speak out
one of three action labels (CC, MA, Chase) which was
recorded. Given the label and the frame when this was
uttered, the actual event boundaries and participating
objects for the groundtruth data were assigned by
inspection. In case of disagreement, we took the majority
view.</p>
          <p>The percentage accuracies shown in table 2 do not
reflect the degree of match, since although an event may
last over 15 frames, even if 10 frames have been detected,
it is usually quite helpful. This can be seen in 6 which
present results along a time line for Chase; each row
reflects a different combination of agents (small square,
big square, circle). At first glance, figures like 6 would
seem to reflect a higher accuracy than 84% in table 2.</p>
          <p>A surprising result was found when by experimenting
with the edge aging parameter in the Merge Neural Gas
4
4000
3000
) 2000
m
luS 1000
e
.svo 0
p
(2−1000
e
tru−2000
a
e
F−3000
−4000
algorithm. The number of clusters increase as aging
parameter is decreased, and at one stage eight clusters were
formed (edge aging parameter=16). The Total
Classification Accuracy (TCA) was about 51 and we would have
discarded the result, but inspecting the frames revealed
that the clusters may be reflecting what appeared to be
hierarchy of action types. Thus cluster C1 from the
earlier classification (majority correlation=CC) was broken
up into C1, C5, C6. C1 was found to contain frames where
both objects are moving towards each other whereas C5
contains frames where the smaller object is stationary
and the other moves closer. Thus Come-Closer and
Move-Away appear to be sub-classified into 3 classes
(two one object static cases, and one both moving case).</p>
          <p>This ‘finer’ classification is given in Table 3.</p>
          <p>Argument order in Action Schemas
In another experiment, we investigated the importance
of argument ordering by re-classifying the same frames,
but reversing the order of the objects used in the dyadic
vector computation. Earlier, if the larger object was
arg1 or reference object, now it became arg2 or
nonreference object. If the corresponding concept changed,
especially if it flipped, this would reflect a semantic
necessity to preserve the argument order; otherwise the
arguments were commutative. Using the coarser clusters,
we observe that the argument order is immaterial since
the majority relation is unchanged (black) for C1 and
C2 (CC,MA respectively). On the other hand, both C3
and C4 (correlations with Chase) are flipped (Table 4).</p>
          <p>Thus, the fact that argument order is important for
Chase is learned implicitly within the action schema
itself. The non-commutativity of CCone−object−static and
M Aone−object−static could not be established because of
the skewed distribution of frames in the input video
amongst the two sub-classes for the action verbs.
5.2</p>
          <p>Comparison with K-Windows</p>
          <p>Clustering
We compare the clustering accuracy obtained by the
unsupervised Merge Neural Gas algorithm with K-windows
algorithm [Vrahatis et al., 2002]. K-Windows is an
improvement of K-Means clustering algorithm with a
better time complexity and clustering accuracy. We set the
value of k in this algorithm to 4 and run it on the input
feature vectors obtained after attentive pruning. The
initial cluster points for the algorithm are set randomly.</p>
          <p>Table 5 gives the clustering results obtained.</p>
          <p>The lower accuracy (as compared to results in Table 2)
is expected because K-windows treats each feature vector
as a separate entity without utilizing the information
present in the temporal ordering of the frames.
6</p>
          <p>Recognizing actions in 3D
In order to test the effectiveness of the clusters learned,
we test the recognition of motions from a 3D video of
three persons running around in a field (Fig.7). In
human classification of the action categories (into one of
CC, MA, Chase), the dominant predicate in the video,
(777 out of 991 frames), is Chase.</p>
          <p>In the image processing stage, the system learns the
background over the initial frames based on which it
segments out the foreground blobs. It is then able to track
all the three agents using the Meanshift algorithm.
Assuming camera height near eye level, the bottom-most
point in each blob corresponds to that agent’s contact
with the ground, from which its depth can be determined
within some scaling error (157 frames with extensive
occlusion between agents were omitted). Given this depth,
one can solve for the lateral position - thus, we are able to
obtain, from a single view video, the (x, y) coordinates
for each agent in each frame, within a constant scale.</p>
          <p>Based on this, the relative pose and motion parameters
are computed for each agent pair, and therefrom the
features as outlined earlier. Now these feature vectors are
classified using the action schemas (coarse clusters)
al5
ready obtained from the Chase video (2D) (Table 6).
7</p>
          <p>Discussion and Conclusion
We have outlined how our unsupervised approach learns
action schemas of two-agent interactions resulting in an
action ontology. The image schematic nature of the
clusters are validated by producing a description for a 3D
video. The approach provided here underlines the role
of concept argument structures in aligning with linguistic
expressions, and that of bottom-up dynamic attention in
pruning the visual input and in aligning linguistic focus.</p>
          <p>Once a few basic concepts are learned, other
concepts can be learned without direct grounding, by using
conceptual blending mechanisms on the concept itself.</p>
          <p>These operations are often triggered by linguistic cues,
resulting in new concepts, as well as their labels being
learned together, in a later stage. Indeed, the vast
majority of our vocabularies are learned later purely from the
linguistic input [Bloom, 2000]. But this is only possible
because of the grounded nature of the first few concepts,
without which these later concepts cannot be grounded.</p>
          <p>Thus the perceptually grounded nature of the very first
concepts are crucial to subsequent compositions.
[Xiang and Gong, 2006] T. Xiang and S. Gong. Beyond
tracking: Modelling activity and understanding behaviour.</p>
          <p>International Journal of Computer Vision, 67(1):21–51,
2006.</p>
          <p>7
Text-based Reasoning with Symbolic Memory Model</p>
          <p>Kun Tu1;2 and Hava Siegelmann2
1 College of Computer Science and Engineering, South China University of Technology
2 Department of Computer Science, University of Massachusetts Amherst</p>
          <p>hava@cs.umass.edu</p>
          <p>Abstract
In this paper, a sentence-based reasoning model is
introduced for the prediction of new individual
activities by means of memory reconsolidation that
enables the integration of incoming evidence with
related past experience. Both the evidence and
previous experience are stored in extended semantic
networks (ESN) as memory. They are then
processed in Bayesian networks for inferring new and
unified memory. Symbolic approaches, which
focus on the structural aspect of language, ensure the
correct extraction of the key information of words
according to the context. Effective mechanisms for
information propagation, Bayesian networks (BN)
construction and combination are adopted to enable
inference reasonable and adaptive to different
scenarios based on the topic domain. Our model is
compared to other reasoning systems through
experiments. The results show that our model can
both deduce more implicit information from texts,
and avoid some incorrect reasoning caused by
confusing data in the knowledgebase.
1
Successful reasoning on activities from text should be more
adaptive to different scenarios than other data-driven
inference models that focus on structure and parameter
leaning. For that, a preprocessing must include both the
integration with commonsense knowledge and the representation of
events after parsing the sentences [Klein and Manning, 2003].</p>
          <p>Previous approaches have applied some language
processing operations and built effective reasoning models. One
method establishes “coreference mappings” of data in a
memory system to reduce the number of ambiguous sentence
interpretations [Livingston and Riesbeck, 2009]; another builds
a large commonsense knowledgebase to make inference from
key words [Liu and Singh, 2004]. These methods, although
efficient in many cases, lack a systematic approach for
disambiguation of word senses and scenario topics [Dahlgren,
1988], and hence may lead to partially incorrect inference.</p>
          <p>Additionally, some data from commonsense knowledge are
not always true given a specific scenario. For example, the
assertion “if someone is a lawyer, he practices law” could sound
correct, however, there are non-practice lawyers and lawyers
are not likely to practice law during “their holidays”
[Kathleen and Joyce, 1989].</p>
          <p>To improve on the inference of individual activities from
simple sentences, we propose a reasoning model with the
characteristic of memory reconsolidation. Memory
reconsolidation can be thought as an information-processing
procedure where a recall of memory can be updated or
strengthened as a result of integration of incoming information into
the pre-existing “memory network”[Tulving and Thomson,
1973]. During this procedure, information similar in
meanings, topics or scenarios within a domain is activated and
selectively moved to working memory for cognitive
process. Inspired by this phenomenon, we build a robust and
scenario-adaptive memory system with extended semantic
networks (ESN) as the symbolic representation of the
language. Stanford Parser [Klein and Manning, 2003] is
embedded to parse sentences. Key information such as subjects,
verb and objects is extracted into working memory and
referred to WordNet [Fellbaum, 1998] and VerbNet [Kipper
et al., 2006] for disambiguation of word senses and
topics. Ontology categories of noun phrases, built from
WordNet and stored in directed acyclic graphs, describe the
features of entities of individual subjects. The relations of
entities(representing activities, events or features of entities),
are stored as sub-symbolic memory in multiple Bayesian
networks (BN). The BNs are built from a)statistical analysis of
information in ESN, b)sentences containing causal relation
and c)other knowledge bases. Related BNs are adaptively
selected by matching the keywords of topics in a scenario, and
are combined together if they have nodes in common.</p>
          <p>The selection and combination of BNs enable inference
processes to avoid inaccurate conclusion and obtain more
implicit information than previous models.
2
We use the following programs to enable our memory model:</p>
          <p>1) Stanford parser, a Java program, is used to analyze the
grammatical structure of sentences and obtain subjects,
objects and predicates [Klein and Manning, 2003]. 2) WordNet
is an English lexical database that provides multiple
meanings and topic domains of a word. It also groups words into
sets of cognitive synonyms and indicate the relations of the
words. (e.g. “dog” belongs to “canine”, “canine” belongs to
“carnivore”, “placental” and “mammal”.) We use it to
construct an acyclic graph for word disambiguation.</p>
          <p>3) VerbNet classifies verbs into classes. Each class is
assigned with thematic roles, selectional restrictions on the
arguments and syntactic frames. For example, the Class
“Hit18.1” has a constraint on the syntax of the class members,
showing the subject of “hit” should be a human or animal.</p>
          <p>Such constraints are used to disambiguate the meanings from
WordNet and ConceptNet.</p>
          <p>4) ConceptNet provides a commonsense knowledgebase
that can describe concepts of nouns and provides causal
relations between predicates. The knowledge in ConceptNet is
expressed as five-tuple assertions (“relation type”, “A”, “B”,
“f”, “I”), where “relation type” indicates the relation of “A”
and “B”, f is the number of times a fact is uttered in their
training corpus, i counts how many times an assertion was inferred
during the ’relaxation’ phase. For example, (CapableOf
“animal” “grow” “f=2; i=2;”) has a relation type “CapabaleOf”,
which indicates the category of “animal” is capable to
perform the activity “grow”. Since some relation types provide
causal relations of the verb phrases (VP), we build two-node
Bayesian networks from each of the assertions and use them
for inference on focused topics.
3 Information Representation
Predicates in sentences describe the features or activities of
the subjects. Some predicates can break into a form of “joiner
+ object”, where the joiner contains a verb, indicating the
relations between the subjects and the objects (see Figure 1(a))
. From a graphical point of view, a vertex can symbolically
represent a subject or an object while an edge can represent
their relation. For this reason, our extended semantic
networks(ESNs) are used as a memory that represents the
information derived from sentences .</p>
          <p>The ESNs focus on the relations between the subject and
the object. Since the meaning of a word can be captured by
the distribution of commonly co-occurring words or phrases
[Landauer and Dumais, 1997]. The semantic roles of verbs
have been characterized with nouns, and were shown to
predict the brain activity associated with the meanings of nouns
[Mitchell et al., 2008]. The relation in an ESN contains a verb
that helps to understand the subjects in a certain scenario.
3.1</p>
          <p>Definition of Extend Semantic Network
Suppose S = (s1; s2; : : : ; sn) is a set of sentences, an
extended semantic network to store S is a graph G0 = (V 0; E0)
, where V 0 = fvkjk = 1; 2; : : : ; mg is a set of vertices
representing the concepts of subjects or objects in the sentences in
S; E0 = fed(Vi; Vj )jd = 1; 2; : : : ; i; j = 1; 2; : : : ; mg is a
set of edges representing the relations between vertex Vi and
Vj , which are indicated in the sentences in S.</p>
          <p>Notice that ed(Vi; Vj ) is the d-th edge from vertex Vi to
Vj , which means that there can be more than one relations
between two concepts. Vertex Vi can be the same as Vj (that
is, i = j), in which case it is a unary relation of Vi (e.g. edge
e1(V3; V3) in Figure 1(b)) representing a intransitive verb or
a property of a vertex.</p>
          <p>In our model, the memory system has two kinds of
memory: a long-term memory and a working memory. The
longterm memory is an ESN that stores all the refined information
from working memory. The working memory is a
temporarily created ESN and receives information from the Stanford
Parser, long-term memory and inferences from BNs. The
working memory provides a disambiguation mechanism to
ensure the accuracy of information by topic matching. Then,
the disambiguated information would be either stored into the
long-term memory as an update or propagated to BNs as
evidence for inferences.
3.2 Parsing a Sentence
The subject and predicate of an input sentence are first
extracted with the Stanford Parser.</p>
          <p>Figure 1 shows the storing of sentences in an ESN. In
Figure 1(a), the subject “Mike” and the objects “apple” and
“room” are stored as vertices. The joiners “eat” and “is in”
in the Verb Phases (VP) are stored as directed edges between
vertices (Figure 1(c)).</p>
          <p>Sentence1: Mike             eats              the apple.</p>
          <p>Sentence2: Mike             is in               the room.</p>
          <p>subject
joiner</p>
          <p>object
predicate
(a) predicates containing an object
Sentence3: the apple              is red.</p>
          <p>Sentence3: the apple        is red.</p>
          <p>subject</p>
          <p>predicate
(b) a predicate with no
objects
V1</p>
          <p>Mike
e1(V1,V1) is red e1(V3,V3) is red
e1(V1,V3) eat apple apVp1le V3
e1(V3,V1) eaten_by</p>
          <p>V2
e1(V1,V2) is in</p>
          <p>room
(c) symbolic representation in ESN</p>
          <p>Ontology Categories
Ontology categories classify the vertices according to the
word senses. WordNet provides such hierarchical category
data with regard to the word meanings.</p>
          <p>Our model uses WordNet to build directed acyclic graphs
to represent these categories. Each category has descriptions
of its features and activities. Figure 2 shows a sub-graph of
the category graph generated from data in WordNet.
Suppose “research professor” belongs to the categories of
“professor” and “researcher”. In WordNet, “professor” is a kind
of “academician” who “works at a college or university”.
After extracting the subject and predicate in the annotation of
“professor” with the Stanford Parser, “professor” can be
assigned to the category of “academician” and the feature
activities are “work at a college” or “work at a university”.
Similarly, “research professor” is also in the descendant categories
of “educator” (“someone who educates young people”) and
“researcher” ( “scientist who cdeavotteesghoimrseylf to research”).</p>
          <p>professional </p>
          <p>person
educator
academician
professor
person
scientist
researcher
Research 
professor
……
…..</p>
          <p>……</p>
          <p>In our category graph, a category can obtain the features
or activities either from its parent categories or by importing
data from the knowledgebase in ConceptNet.</p>
          <p>The ontology categories are important to the entities in
working memory because they can help to disambiguate the
word sense and provide predicates to connect BNS(see next
section).</p>
          <p>Possible States of a ESN Vertex
During information processing in the human brain, some
neurons are highly active, whereas others remain silent. This
leads to the formation of neural circuits for specific memory.</p>
          <p>In analogy with biological neural memory, a vertex in an ESN
has different states: 1)active, 2)semi-active and 3)inactive .</p>
          <p>Table 1 shows the differences among the three states of a
vertex. An active vertex represents an entity ready for
inference. It comes from a subject or an object in a sentence or
from the long-term memory. Its meaning is disambiguated
and its renewed relations in the working memory with other
entities will be directly updated to the long term memory. A
semi-active vertex will become an active vertex if it is linked
to another active vertex once there exists a new relation
between them. An inactivated vertex is not for inference and
remains in the long term memory. This can reduce the size of
ESNs, which makes computation less complex.</p>
          <p>state
active
semi-active
inactive</p>
          <p>Long-term
memory
Yes
No
Yes</p>
          <p>Working
Memory
Yes
Yes
No</p>
          <p>For
inference
Yes
Yes
No
information
update
Yes
No
No
assigned
a category
Yes
No</p>
          <p>No
3.4 The Directed Edges
In a sentence, a predicate with an object is regarded as the
form of “joiner + object”. Edges in the ESNs represent the
joiners. They denote the relations between vertices in ESNs
and have features that are important to the inference in BNs:</p>
          <p>1) The indicator of the evidence. Edges are indicated as
evidence if they are generated from input sentences. These
edges will be directly stored in both the long-term and
working memory regardless of their probabilities.</p>
          <p>2) The probability of an edge. The non-evidence edges
are generated from inferences by BNs. Their probabilities
can change after each inference. In a working memory, a
nonevidence edge is added or removed depending on whether its
probability is beyond or below a given threshold. After
inferences in the working memory are finished, the non-evidence
edges are transported to the long-term memory.</p>
          <p>3) Edges representing a passive voice. As each BN in our
model predicts the activities of only one subject, the
interaction of two entities cannot be inferred in one BN. A reversed
edge representing the passive voice of a verb is used in the
ESN to ensure information can be propagated among the BNs
for different entities. For example (Figure 1), “Mike eats the
apple”. An edge e3(V3; V1) representing “be eaten by” will
be created, and a node representing “be eaten by Mike” in a
BN if the activities of the apple need to be inferred.</p>
          <p>4) Unary and binary relation. Edges connecting two
different vertices indicate binary relations. Unlike standard SNs,
loop edges are allowed as unary relations in our ESN when
the predicate has no object(e.g. “stops” ), or when the
predicate only describes a property of its subject (e.g. “is red”
e1(V3; V3) in Figure 1(b)).</p>
          <p>5) Multiple edges between two vertices. Multiple
relations between two vertices are represented as multiple
edges in the ESN. In referring to Figure 1(c), suppose also
“Mike likes the room”, then another edge “like”, denoted as
e2(V1; V2), would connect “Mike” to the “room”.
3.5 Disambiguation for Meanings
Words with multiple senses are classified into multiple
categories in WordNet and cause ambiguities. We suggest
operations that can disambiguate the word senses:
1) choosing categories with constraints in VerbNet :</p>
          <p>For example, “a bat catches insets”. The “bat” can be a
kind of “mammal” or a kind of “club” for ball game.
VerbNet constrain that “Catch” should have a human or animal as
agent, so the meaning of “bat” should be a “mammal”(
belonging to the “animal” category) not a “club” for sport.</p>
          <p>2)selecting the word sense by matching the topics in the
working memory and those in BNs:</p>
          <p>E.g. “bank” has more than four meanings in its
noun form in WordNet and thus can be related to
topics such as “deposit”, “depository financial institution”,
“flight maneuver”, “slope” and so on. The sentence “he
goes to the bank” can be confusing because the “bank”
may relate to any of the topics. Two BNs
containing conditional probability P (depositmoneyjgotobank) and
P (jumpintothewaterjgotothebank) can limit the meaning
to “bank building” and “sloping land beside water” within the
topic domains “deposit” or “slope, water”. Other information
in the working memory (e.g. a vertex representing “river”,
has the same topic “water” as “sloping land beside water”)
can then decide that the “bank” should be related to a “slope”.
4</p>
          <p>An Adaptive BN Mechanism for Reasoning
Our model focuses on update new activities of entities based
on new information in memory. Each BN at a time infers
activities of only one subject and takes in predicates from
working memory to infer new ones.</p>
          <p>During inference, activities in the working memory are
regarded as in the same scenario, and BNs are selected to the
topic domain only.
BNs are used in our model to update information in ESNs.</p>
          <p>Each BN is constructed with the causal relations between
predicates. Thus, the agent of the predicate should be the
same as the inferred ESN. Each BN has a vector containing
words and representing a topic domain. Multiple BNs will be
used for different entities in the working memory. Figure 3
shows two BNs for the hunter and antelope.</p>
          <p>e1(V1,V2) hunt
V1 hunter</p>
          <p>antelope V2
e1(V2,V1) be hunted by
(a) hunting scenario in an ESN</p>
          <p>Be  hunted 
by hunter</p>
          <p>Get 
injured</p>
          <p>Run 
away</p>
          <p>Be 
catched
(b) a BN for the antelope
Hunt 
antelope
success</p>
          <p>Be 
happy
(c) a BN for the hunter</p>
          <p>The Nodes
We choose the predicates as the nodes of BNs. Figure 3
illustrates how the predicate “be hunted by hunter” is passed from
the working memory to a BN as a node .</p>
          <p>Evidence in BNs
In BNs, each node has a property called “evidence”. At the
beginning of inference, nodes are set to “evidence” if they
match the edges of evidence in the working memory. “be
hunted by hunter”(Figure 3(b)) and “hunt antelope” (Figure
3(c)) are viewed as evidence.
4.2</p>
          <p>BN Construction
We propose a mechanism for automaitc self-generation of BN
nodes. There are two ways to construct a BN in our model.</p>
          <p>The first way is to learn from experience. After an update
in the long-term memory, if two predicates of a subject
coappear for a large number of times, they are considered to
have causal relation and are extracted as a predicate pattern
pair. The predicate pattern here consists of a verb phrase and
the specific object category.</p>
          <p>For example, an “alarm sounds” and 8 out of 12 “person”s
hear it. The vertex “alarm” and the 12 vertices of different
“person” are activated and moved to the working memory.</p>
          <p>The 12 “person”s are in the same category of human
according to WordNet. So the predicate pattern is “be heard by +
human”. “Sound” and “heard by human” are converted to
BN nodes. Both predicates share “alarm” as their subject.</p>
          <p>VerbNet has the syntax restriction to ensure the verb phrase
“be heard by” to have a correct “agent” alarm and object
“human”. The edge “be heard by human” only goes from the
vertex “alarm” to the other 12 “human” vertices. When counting
the frequency of “be heard by”, the maximum number should
be 12. Thus p(heardby = truejsound = true) = 8/12 =2/3.</p>
          <p>Another conditional distribution we can get from the data is
p(soundjheardby). The conditional distribution is initialized
according to the frequency.</p>
          <p>The second method is to extract information from other
knowledge bases. For example, BN can be constructed from
data in the eight of the twenty semantic relations in
ConceptNet as described in Table 2.</p>
          <p>relation</p>
          <p>type
(PrerequisiteEventOf</p>
          <p>“A”,“B”)
(FirstSubeventOf
“A”,“B”)
(EffectOf
“A”,“B”)
(CapableOf
“A”,“B”)
(SubeventOf</p>
          <p>“A”,“B”)
(MotivationOf</p>
          <p>“A”,“B”)
(DesirousEffectOf
“A”,“B”)</p>
          <p>(IsA
“A”,“B”)
probability
function
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (A = truejB = true)
P (A = truejB = f alse)
P (B = truejA = true)
P (B = truejA = f alse)</p>
          <p>Since some commonsense knowledge bases do not provide
the probabilities of causal relations, we initialize the
probability distributions to default values(Table 2). The probabilities
of nodes without a parent are set to 0.5.</p>
          <p>BN Selection and Connection
BNs need to be selected and joined together as a new large
Bayesian net for the activity prediction.</p>
          <p>The BN selection is based on the predicates in the
sentences and the topics in the working memory. The first step
is to search BNs whose nodes represent the same predicate as
in the sentences. E.g. for the sentence “He goes to the bank”,
BNs containing probability P (depositmoneyjgotobank) or
P (jumpintothewaterjgotothebank) are all selected.</p>
          <p>The second step is to remove BNs with a different topic
from the working memory. E.g. if the working memory has
entities in the categories related to “money” or edges related
to financial operation, BNs without these topics are removed.</p>
          <p>An adjacency matrix M is built for combining BNs. If BN
i and BN j share a common predicate, then Mij = Mji = 1,
otherwise, the element in the matrix is 0. By multiply the
adjacency matrix, a path connecting the BNs can be found. This
approach saves us from learning and inferring about unrelated
data, and hence reduces much complexity.</p>
          <p>The Joint Distribution Function
Suppose the shared node s has parents sk 2 pai(s) in BN i
and parents yl 2 paj (s) in BN j, where yl 2= pai(s). the
conditional distribution p(sjpai(s)) in BN i and p(sjpaj (s))
in BN j are known and the new joint conditional distribution
p(sjpai(s); paj (s)) should be initialized for the new network.
5</p>
          <p>Making Inference on Evidence
There are several main steps to infer new information from
new sentences:</p>
          <p>1)A new entity from arriving sentences are automatically
created as an active vertex in the working memory. If there
are other entities have relation to the new entity, they are
brought to the working memory as well.</p>
          <p>2)Classify the entities into categories.</p>
          <p>3)Predicates in the coming sentences are evidences. New
predicates can be inferred from the evidence via BNs. BNs
containing the to-be-inferred predicates are target BNs. They
are selected for the next step.</p>
          <p>4)Find a link path from the target BN to the BN containing
evidence with the adjacency matrix. (e.g. fig.5) (If the link
does not exist, then the evidence does not affect the activities
in that target BN.)</p>
          <p>5)Combine the BNs in the path to a large BN. Set the joint
conditional distribution values of the share nodes.</p>
          <p>6)Search a d-connecting path from the evidence node to the
target node.</p>
          <p>7)Calculate the probability of the variables in the BN
8)Update the working memory and the long-term memory
6</p>
          <p>Case Study
In the following, we will compare our model and Direct
Memory Access(DMAP) on the data provided by
[Livingston and Riesbeck, 2009]. A scenario is built to compare
the inference result of our system and ConceptNet.
6.1</p>
          <p>Comparison with DMAP
DMAP uses a story of bombing attack at U.S. soldiers to test
how well the model can understand the text. One of the
stories is as follow: “An attack occurred in Afghanistan. The
bombing was performed by Al Qaeda. The attack occurred
on July 18, 2008. The attack targeted United States soldiers.”</p>
          <p>The aim of DMAP is to integrate the information of
“bombing” and “attack” by using a language pattern to map
the two words to the same reference in its memory
system.E.g, the sentence “the bombing was performed by Al
Qaeda” is represented as (performedBy, Bombing-54
AlQaeda). DMAP then searches if there are assertions such
as (performedBy ?attack Al-Qaeda) to map the two words.</p>
          <p>DMAP successed in integrating the information only if
“bombing” and “attack” share the same assertion pattern
(performedBy xxx Al-Qaeda). However, if there is another
keyword that share the assertion pattern as “bombing”, DMAP
would be confused. For example, if the “Al-Qaeda performed
a celebration” after the attack, the three word “bombing”,
“attack” and “celebration” would share the same pattern
“performed by Al-Qaeda”. Thus, DMAP could falsely reason that
“bombing” is equal to “celebration”.</p>
          <p>In our model, a disambiguation for the word sense is
performed in the working memory by using WordNet. The
meaning of “bombing” can be referred to “bomb”, which has
a meaning “throw bombs or attack with bombs”. The
keyword “attack” in the annotation is extracted by the Stanford
Parser and matches the “attack” in the sentences. In this way,
the word “celebration” can be omitted as an unrelated word.
6.2 Comparison with ConceptNet
Suppose there are three sentences in a scenario: “Mike
swings a bat”, “John throws baseball” and “Jones catches the
ball”. ConceptNet and our model will use the same
knowledgebase to reason on the sentences. The following
assertions can be found in the database in ConceptNet(explanation
of the assertion can be seen in section 2):
1. (CapableOf “batter” “hit ball” “f=2;i=1;”)
2. (SubeventOf “play tennis” “hit ball” “f=7;i=1;”)
3. (CapableOf “baseball player” “hit ball” “f=7;i=1;”)
4. (CapableOf “bat” “hit ball” “f=2;i=0;”)
5. (MotivationOf “play tennis” “hit ball” “f=3;i=0;”)
6. (Isa “batter” “baseball player” “f=2;i=0”)
7. (CapableOfReceivingAction “baseball pitcher” “throw
base</p>
          <p>ball” “f=2;i=0;”)
8. (CapableOf “baseball player” “throw ball” “f=3;i=1;”)
9. (SubeventOf “play baseball” “throw ball” “f=2;i=1;”)
10. (CapableOf “ baseball pitcher” “throw ball” “f=4;i=1;”)
11. (SubeventOf “play football” “throw ball” “f=2;i=0;”)
12. (Capableof “cathcer” “catch ball”)
13. (CapableOf “batter” “swing bat” “f=2;i=0;”)
14. (Isa “catcher” “baseball player” “f=2; i=0”)</p>
          <p>By matching the keywords, ConceptNet can get the
following results about “Mike”: “Mike might be a batter.(from
assertion 13)” “Mike can hit ball.( from assertion 1)”, “Mike
might be a baseball player.( from assertion 3)” “Mike might
play tennis(from assertion 2)” “Mike might be a bat.( from
assertion 4)” Notice that “tennis” and “baseball” are
different sports and “Mike” cannot play them at the same time.</p>
          <p>Additionally, ConceptNet cannot infer the exact activities of
“John” because there are four different assertions regarding
“throw ball”(from assertion 8 to 11).</p>
          <p>In our model, the probability generated from data in
ConceptNet is initialized as in Table 2. Joint conditional
distribution is initialized as follow:P (Sjxk; yl) = 0:8,
P (Sjxk; yl) = P (Sjxk; yl) = 0:55,P (Sjxk; yl) = 0:1.</p>
          <p>BNs were constructed and combined according to the
knowledge base(Figure 4).</p>
          <p>Is 
baseball 
pitcher
Catch 
ball</p>
          <p>Is 
baseball 
player
pitch
Throw 
baseball
Hit ball</p>
          <p>Swing  
bat</p>
          <p>Is 
batter</p>
          <p>Table 3 summaries the inferred result. We choose the
threshold of probability for adding an edge in memory as
0.65, hence the first three row in table 3 are selected ass new
inferred information and added to long-term memory. Notice
that “is a baseball pitcher” is not in the D-connecting path
given the evidence “swing bat”. This means it is independent
of “swing bat”. Table 4 shows the inference about John’s
activities. Our concluded inferences, as shown in Table 3 and
Table 4, are more reasonable than that in the ConceptNet.
probability
P (Isbatter)
P (Hitball)
P (Isbaseballplayer)
P (T hrowball)
P (Isbaseballpitcher</p>
          <p>P (Catchball)</p>
          <p>Discussion and Future Work
In this paper, we built a reasoning model to represent and
infer new information from texts. Experiments compare
DMAP, ConceptNet and our model. Results show that our
model is both robust and scalable because the disambiguation
mechanism enables it to avoid inaccurate reasoning caused by
confusing data. Through calculation, our model can obtain
reasonable probabilities of the activities of entities.</p>
          <p>The parameters of BNs are set to default values when BNs
are first constructed. In the future, we will add a training
mechanism of BNs to adjust the parameters after the BN
combination. The threshold for adding edges should also be
adjusted according to a specific scenario.
8</p>
          <p>Acknowledgement
We thank Chinese Scholarship Council and the ONR grant
“Symbolic and Continuous Representation in Adaptive
Memory with Efficient Inference” number N00014-09-1-0069.</p>
          <p>Extracting Propositional Rules from Feed-forward Neural Networks by
Means of Binary Decision Diagrams</p>
          <p>Sebastian Bader
Department of Computer Science, University of Rostock, Germany</p>
          <p>sebastian.bader@uni-rostock.de
We discuss how to extract symbolic rules from
a given binary threshold feed-forward
network. The proposed decompositional approach
is based on an internal representation using
binary decision diagrams. They allow for an e
cient composition of the intermediate results as
well as for an easy integration of integrity
constraints into the extraction. We also discuss
some experimental results indicating a good
performance of the approach.
1
During the training process, neural networks acquire
knowledge by generalising from raw data. Unfortunately,
this learnt knowledge is hidden in the weights associated
to the connections and humans have no direct access
to it. One goal of rule extraction is the generation of
a human-readable description of the output units
behaviour with respect to the input units. Usually, the
result is described in form of if-then rules, giving
conditions that activate (or inactivate) a given output unit.</p>
          <p>Rule extraction from connectionist systems is still an
open research problem, even though a number of
algorithms exists. For an overview of di erent approaches we
refer to [Andrews et al., 1995] and [Jacobsson, 2005].
Extraction techniques can be divided into pedagogical and
decompositional approaches. While the rst conceives
the network as a black box, the latter decomposes the
network, constructs rules describing the behaviour of the
simpler parts, and then re-composes those results.</p>
          <p>In [Bader et al., 2007], we proposed the
CoOpalgorithm, a decompositional approach for the
extraction of propositional rules from feed-forward neural
networks. Here, we discuss an extension of this approach. In
this new extension, binary decision diagrams (BDD) are
used to store intermediate results, i.e., rules extracted
from single units (perceptrons). This representation has
three advantages:
1. results are stored in a very compact form,
2. intermediate results can easily be combined, and
3. integrity constraints can easily be incorporated.</p>
          <p>After a presentation of all necessary concepts in
Section 2, we discuss the proposed extension of the
CoOpapproach, i.e., the extraction of BDDs from simple
perceptrons. Afterwards, we discuss how to compose the
intermediate results and how to incorporate integrity
constraints. In Section 6 rst experimental results are
presented and further work is discussed in Section 7.
2</p>
          <p>Preliminaries
In this section, we introduce some necessary concepts.</p>
          <p>After de ning feed-forward neural network and the
ruleextraction problem, we discuss binary decision diagrams.</p>
          <p>Feed-forward arti cial neural networks (ANNs), also
called connectionist systems, consist of simple
computational units (neurons) which are connected. The set of
units U together with the connections C U U form an
acyclic directed graph. In this paper we concentrate on
networks with units applying the 1-threshold function.</p>
          <p>Such a neural network can be represented as a 6-tuple
hU; Uinp ; Uout ; C; !; i. Uinp ; Uout U denote input and
output units of the network, i.e., are sources and sinks
of the underlying graph. The functions ! : C ! R
assign a weight to every connection and : U ! R a
threshold to every unit. Every unit u has an activation
value actu 2 f 1; +1g which is set from outside for
input units, or computed based on the activation value of
its predecessor units and the threshold (u) as follows:
actu =
(+1 if P
1</p>
          <p>c=(v;u)2C actv !(c)
otherwise
Figure 1 shows a simple network serving as running
example throughout the paper.</p>
          <p>Because every unit can be active (actu = +1) or
inactive (actu = 1) only, we can associate a propositional
variable u to it, which is assumed to be true if and only if
the unit u is active, and we use u to denote the negation
of u. Furthermore, we can characterise network inputs
as interpretations I Uinp of the propositional variables
Uinp . We use actu(I) to denote the state of unit u if
all input units contained in I are active and all other
input units are inactive. Using this notation, we can
de ne the rule extraction problem as follows: The rule
extraction problem for a given node u of a feed-forward
a
b
2
e
4
f
4
4
2
g
h
a
b
c
d
e
f
c d e f
g h
threshold network hU; Uinp ; Uout ; C; !; i is the
construction of a propositional formulae F over Uinp such that
for all interpretations I we nd
actu(I) =
+1 if I j= F</p>
          <p>1 otherwise
I.e., we are looking for propositional formula stating
necessary and su cient conditions (in terms of the
activation of input units) such that the unit u is active.</p>
          <p>Usually not all input combinations make sense in a
given application domain, because some of them would
correspond to invalid states of the world. We use the
term valid inputs to denote the set of allowed input
combinations. Even more important for the extraction is the
fact that all training samples are taken from this subset.</p>
          <p>Therefore, the network learns to solve a task under the
implicit conditions hidden in the selection of inputs.
Integrity constraints are a way to make those conditions
explicit during the extraction. An integrity constraint
is a formula IC over Uinp describing the set of valid
inputs V P(Uinp ) as follows: For all I Uinp we nd
I j= IC if and only if I 2 V . Using integrity constraints,
we can reformulate the extraction problem as follows:
The rule extraction problem for a given network N and
a given integrity constraint IC is the construction of a
propositional formulae F over Uinp such that for all
interpretations I with I j= IC we nd</p>
          <p>Networks can be decomposed into their basic
building blocks, namely single units together with their
incoming connections. Those single units can be seen as
simple sub-networks (perceptrons) consisting of a
number of inputs and a single output unit, together with the
corresponding weighted connections. To simplify the
notations we use Pp = h ; I; !i to denote the perceptron
corresponding to the unit p together with its threshold
, the set of predecessor units I and the weight function
!. Figure 2 shows the perceptron for the output unit g.</p>
          <p>Binary decision diagrams (BDD) are a data structure
to represent propositional formulae in a very compact
way and to manipulate them easily. A nice introduction
e
f
1
2
3
5
4
g
!(x) =</p>
          <p>Pg = h ; I; !i with</p>
          <p>= 4
I = fc; d; e; f g
81 if x = c
&gt;&lt;&gt;2 if x = d
&gt;3 if x = e
&gt;
:5 if x = f
pf(2) = (b ^ ?) _ (:b ^ &gt;) = :b
pf(3) = (b ^ &gt;) _ (:b ^ ?) = b
pf(4) = (a ^ :b) _ (:a ^ b)
can be found for example in [Andersen, 1999].
Intuitively, a BDD is a directed acyclic graph with a variable
associated to every node and such that all nodes n 6= 0; 1
have exactly two successors, called high and low branch
of n. The nodes 0 and 1 are the sinks of the graph. We
use h ; 0; 1; R; N i to refer to a BDD with sinks 0 and
1, a set of nodes N and a root-node with identi er R.</p>
          <p>And we use hi; v; h; li to denote the node with
identi</p>
          <p>er i, with variable v = var(i), and with high and low
branch pointing to the nodes with identi ers h and l,
respectively.</p>
          <p>Usually a BDD is assumed to be ordered and reduced.</p>
          <p>It is called ordered i there exists a linear order on
the variables and the successors of a node are marked
with variables that are bigger with respect to . It is
called reduced if no two nodes for the same variable have
identical high and low branch, and for no node high and
low-branch coincide.</p>
          <p>BDDs represent propositional formulae in if-then-else
normal form. The corresponding formula for a given
node is de ned recursively as follows:
pf(0) := ?</p>
          <p>pf(1) := &gt;
pf(i) := (var(i) ^ pf(h)) _ (: var(i) ^ pf(l))
Figure 3 shows a simple BDD using a graphical
representation as well as the underlying data structure and
the corresponding logic formulae for every node.</p>
          <p>e 5
fe;fg</p>
          <p>3 c 1
fd;fg fc;fg</p>
          <p>-1
fd;eg
c -3</p>
          <p>fc;eg</p>
          <p>9 c 7 c 5
fd;e;fg fc;e;fg fc;d;fg
c 1</p>
          <p>fc;d;eg
c 11</p>
          <p>fc;d;e;fg</p>
          <p>-7
fdg
c -5
fc;dg
c -9</p>
          <p>fcg
f -1</p>
          <p>ffg
Cg
f -1
ffg</p>
          <p>-11
fg
e -5</p>
          <p>feg
-11
fg
d
0</p>
          <p>From Perceptrons to Search Trees
In this section, we discuss an algorithm to extract a BDD
from a single unit such that the BDD represents
necessary and su cient condition on the inputs to turn the
unit active. Following [Bader et al., 2007], we de ne
input patterns I as subsets of the inputs I of a given
perceptron Pp = h ; I; !i which are assumed to be
active. The inputs not contained in I can be either active
or inactive. And we de ne the corresponding minimal
input imin(I) as follows:
imin(I) =</p>
          <p>X !(a)
a2I</p>
          <p>X j!(a)j
a2InI
The minimal input is computed by adding the
contribution of the xed inputs Pa2I !(a) and the minimal
input caused by all other inputs. A perceptron is called
positive, if all weights are positive. For the following
constructions, we assume the perceptrons to be positive.</p>
          <p>In Section 5, we discuss how to apply the extraction to
arbitrary perceptrons.</p>
          <p>The construction of BDDs below is based on the search
trees described in [Bader et al., 2007]. These search trees
contain a node for every possible input pattern. Children
of a given node correspond to input patterns which
contain exactly one symbol more and all nodes are sorted
with respect to their minimal inputs. If the minimal
input of some node exceeds the threshold, that node is
marked (I.e., the corresponding input pattern represents
a su cient condition to turn the perceptron active). The
complete tree is pruned by removing all those nodes for
which no descendant is marked and all those nodes which
are descendants of marked nodes. The construction of a
pruned tree is shown in Algorithm 1. Figure 4 shows the
full and the resulting pruned search tree on top of it.</p>
          <p>Input: A positive perceptron Pp+.</p>
          <p>Output: A pruned search tree.
1 Fix an order such that b c if !(b) !(c).
2 Create a root node for the empty input pattern.
3 Add a child labelled x for each input symbol x</p>
          <p>(sorted wrt. ).
4 foreach newly added node labelled y do
5 Add a new child c for every symbol z with</p>
          <p>y z (sorted wrt. ).
6 Label c with the corresponding pattern I.
7 Mark c if imin(I) &gt; (p).
8 Remove all descendants of marked nodes.
9 Remove all nodes for which no descendant is</p>
          <p>marked.</p>
          <p>Algorithm 1: Constructing a pruned search tree.</p>
          <p>Exploiting the structure of these search trees, we can
easily construct BDDs representing conditions to turn
the perceptron active. I.e., we nd the perceptron to be
active for all those input patterns which, understood as
interpretation, turn the logic formula corresponding to
the BDD true.
e 5
fe;fg
d</p>
          <p>3 c 1
fd;fg fc;fg</p>
          <p>-1
fd;eg
c -3</p>
          <p>fc;eg</p>
          <p>9 c 7 c 5
fd;e;fg fc;e;fg fc;d;fg
c 1</p>
          <p>fc;d;eg
c 11
fc;d;e;fg
1
e -5
feg
d</p>
          <p>-7
fdg
c -5
fc;dg
c -9</p>
          <p>fcg
Before presenting an algorithm to construct BDDs from
a given pruned search tree, we introduce some further
notations. Every node in the search tree is represented
as a pair hI; Ci with I being the corresponding input
pattern and C being the set of children. id(n) denotes
a unique identi er for the node n (e.g., the
corresponding input pattern, or some index), this identi er is also
used as internal index for the BDD nodes. We assume
id(n) := 0 if there is no node n. var(n) denotes the
symbol which is added to the input pattern at node n.</p>
          <p>The construction of a BDD for a given search tree
is shown as Algorithm 2. This algorithm transforms a
search tree into a BDD, by traversing the tree in a
leftdepth- rst manner. A node's high branch points to 1, if
its minimal input exceeds the threshold. Otherwise, it
points to its left-most child, or to 0 if there is no child.</p>
          <p>The low-branch points to the right sibling, or to 0 if
there is none. The result for the perceptron Pg is shown
in Figure 5.
10
11
12
13
14
15
16</p>
          <p>Input: A search tree T for Pp = h ; I; !i wrt. .</p>
          <p>Output: A corresponding OBDD h ; 0; 1; R; N i.
1 if T is empty then
2 R = 0 and N = fg
3 else if T contains only the root node then</p>
          <p>R = 1 and N = fg
R = id(rl) for the leftmost child of the root.</p>
          <p>N = fg.
foreach leaf node n in T do</p>
          <p>Add hid(n); var(n); 1; li to N with l = id(r)
for the right sibling r.
foreach node hI; Ci in T with left sibling l do</p>
          <p>Let hid(l); var(l); hl; lli be the node for l
Let hid(ll); var(n); hl1 ; ll1 i be the node for
the leftmost child ll of l
if mci(l) 2!(l) + 2!(n) &gt; then</p>
          <p>Add hid(n); var(n); ll1 ; lni to N with
ln = id(rn) for the right sibling rn of n
foreach other non-root node hI; Ci do</p>
          <p>Add a node hid(n); var(n); id(c); li to N for
the leftmost child c and l = id(rn) for the
right sibling rn of n and l = 0 if there is
none</p>
          <p>Algorithm 2: Constructing a BDD.</p>
          <p>Please note, that the BDD can be constructed without
constructing the search tree rst. The tree is used only
to describe the underlying ideas. All conditions tested
in Algorithm 2 can be tested by expanding the tree
stepby-step. Looking a little closer at the constructed search
tree we nd that some sub-trees have an identical
internal structure, which is exempli ed in Figure 6. If the
condition tested in Line 13 of Algorithm 2 is ful lled, two
neighbouring sub-trees are structured identically. mci(n)
denotes the minimum of all minimal inputs associated
to nodes below n. Please note, that mci(n) can be
computed without expanding the sub-tree by looking at the
associated input pattern.</p>
          <p>The mentioned structural equivalence can be exploited
by using a shortcut into the already constructed BDD
and thus preventing the expansion of an identical
subtree. Figure 6 contains a number of those shortcuts,
e.g., one from node fbg to the node fc; ag, because the
children of fbg are annotated the same way as the node
below and right of fc; ag. Please note that this identity
can be recognised without expanding the second
subtree, i.e., the construction of whole tree below fbg can
be avoided.</p>
          <p>The condition on Line 13 is ful lled whenever the
perceptron shows a so called n-of-m behaviour, i.e., if there
are m inputs from which n su ce to turn the
perceptron active. In this case, there will be n equivalent
subtrees, which can be shortcut. As discussed in [Towell and
Shavlik, 1993], this occurs quite frequently while training
neural networks.
h
a</p>
          <p>1</p>
          <p>Composition of Intermediate Results
In the previous section, we have been concerned with
positive perceptrons only. But we can easily turn every
perceptron into a positive one, by multiplying negative
weights by 1 and inverting the corresponding input
symbols. By doing so, we can apply the algorithm to
all output units of a given network, and obtain a BDD
describing necessary conditions with respect to the
predecessor units that turn the output unit active. But
some of the input symbols may have been inverted. I.e.,
we need another algorithm to construct BDDs stating
conditions which turn a perceptron inactive. Due to the
symmetry of the threshold function, we nd this
algorithm to be dual to Algorithm 1 and 2. I.e., by inverting
the order and the inequalities we obtain an algorithm
that constructs such a BDD.</p>
          <p>Once we have extracted the BDD for a given output
unit, we can continue by substituting the nodes
testing non-input nodes (i.e., nodes not corresponding to
input units of the network) by their corresponding BDDs.</p>
          <p>A non-negated node is replaced by the BDD as
constructed above, and negated nodes are replaced by the
dual BDDs. As mentioned above, BDDs have been
designed to allow for an e cient manipulation of logic
formulae. And in fact it is straightforward to compose the
intermediate results into an overall diagram by simply
replacing the nodes. But this is not the best approach,
because the resulting `global' BDD would not be ordered
any more. But while expanding the BDD, we can keep it
ordered (and reduced) as described in [Andersen, 1999].</p>
          <p>After expanding all non-input nodes, we obtain a nal
BDD representing necessary and su cient conditions on
the network's input to turn a given output unit active.</p>
          <p>Using BDDs as internal data structure has some
further advantages. We can actually extract all output
units into the same global BDD. Doing so leads
automatically to a sharing of intermediate results, because
common substructures are contained only once within
this BDD. Figure 7 shows the nal `global' BDD for the
network from Figure 1. Please note that the right node
labelled b is used for both output units g and h.</p>
          <p>Furthermore, we can integrate integrity constraints in
a straightforward fashion. Instead of starting with an
empty BDD, we extract the output nodes into a BDD
representing the integrity constraints. To exemplify this,
a network has been trained to the Encode-Decoder task.</p>
          <p>It contains 8 input, 8 output units and 3 hidden units
Ct
a -3</p>
          <p>fag</p>
          <p>-1
fb;ag
c -1
fc;ag
d</p>
          <p>-1
fd;ag
e -1
fe;ag
c -1
fc;bg
d</p>
          <p>-1
fd;bg
e -1
fe;bg
d</p>
          <p>-1
fd;cg
e -1
fe;cg
c 1
fc;b;ag
d</p>
          <p>1 e 1 d 1 e 1 e 1
fd;b;ag fe;b;ag fd;c;ag fe;c;ag fe;d;ag
d</p>
          <p>1 e 1 e 1
fd;c;bg fe;c;bg fe;d;bg
e 1</p>
          <p>fe;d;cg</p>
          <p>-3
fbg
c -3
fcg
d</p>
          <p>-3
fdg
e -1
fe;dg
e -3
feg
d</p>
          <p>3 e 3 e 3
fd;c;b;agfe;c;b;agfe;d;b;ag
e 5
fe;d;c;b;ag
e 3</p>
          <p>fe;d;c;ag
1
0</p>
          <p>-5
fg
e 3
fe;d;c;bg
and is trained to learn the identity mapping for all inputs
in which exactly one unit is active. I.e., the network has
to learn a compressed representation within the hidden
layer. But applying the algorithm presented above yields
an unwanted result shown in Figure 8 on the left. Using
the integrity constraint that at most one input is active
at a time yields the BDD shown on the right. 900 nodes
have to be constructed (including all intermediate
results while constructing the BDD) for the `normal' BDD,
but only 124 while using the integrity constraint. This
shows the advantage of using integrity constraints right
from the beginning of the extraction process. Usually
they are used to re ne the extraction result afterwards.</p>
          <p>This would be possible here as well by simple computing
the conjunction of the `normal' BDD with one
representing the integrity constraint. But starting with the
constraint avoids the construction of many intermediate
nodes which would be removed afterwards.
6
To evaluate the approach a Prolog implementation has
been used to gather some statistics. The results are
shown in Table 1. The table shows average numbers
for di erent numbers of inputs, the size of the full search
tree, the number of minimal input patterns, the size of
the corresponding BDD and the number of BDD nodes
per input pattern. All numbers have been collected from
100 random perceptrons per size. The extraction using
the full search tree is not feasible due to the exponential
growth. The number of input patterns is a conservative
lower bound for the size of the pruned search tree,
because those trees have at least one node per minimal
input pattern. The result shows that the use of BDD
proposed here yields a very compact representation. Even
though the number of nodes in the BDD grows, the ratio
(node/IP) of size of the BDD and the number of minimal
coalitions decreases.
i4
i4
i4
i4
i4
i4</p>
          <p>i4
i5
i6
i5
i6
i7
i5
i6
i7
i3
i5
i6
i7
i2
i3
i5
i6
i7
i8
o1
i1
i2
i3
i5
i6
i7
i8
i3
i5
i6
i7
i5
i6
i7
i5
i6
i7
i5
i6
o1
i1
i2
i3
i4
i5
i6
i7
i8
1
0
0</p>
          <p>1
Figure 8: The result of extracting output unit 1 from
an 8-3-8 encoder-decoder network. The BDD on the left
is the result of the `normal' extraction. On the right
the constraint that at most one unit is active has been
incorporated.</p>
          <p>jIj
1
5
10
15
20</p>
          <p>T
j j
2
32
1024
32768
1048576</p>
          <p>A second experiment has been performed to show the
e ect of the usage of integrity constraints while
extracting the BDDs. A network with 6 inputs, 4 hidden and 2
jBDDj
791
258
3
497
341:90
151
94
1
118:11494
2
199
4
442:62</p>
          <p>396:98
111
5
92
6</p>
          <p>n
output units has been used for the experiment. The
possible inputs have been constrained by a maxn integrity
constraint for 0 n 6. The results are presented
in Figure 9. For every n the experiment has been
conducted for the same 100 randomised networks and the
following numbers have been collected: the size of the
sub-BDD encoding the constraint, the minimal,
maximal and average size of the nal BDD. Please note that
the numbers show the total number of internal nodes
constructed for the BDD, i.e., including all necessary
intermediate nodes. For n = 1, i.e., the biggest restriction,
we obtain very small BDDs. The size of the BDD grows
up to n = 4 and decreases again for n &gt; 4. From those
observations we can conclude that the incorporation of
integrity constraints into the extraction process can lead
to big savings in terms of nodes constructed for the nal
BDD. Without their use during the extraction, we would
have to construct the BDD corresponding to n = 6. This
big BDD of 400 nodes would have to be re ned with
respect to the constraints afterwards. There seem to be
cases (e.g., for n = 4) where the use of integrity constrain
yields larger BDDs, but nonetheless, the nal BDD does
not have to be revised afterwards, and the di erence is
not too big.
7</p>
          <p>Conclusions and Future Work
A novel approach for the extraction of propositional rules
from feed-forward networks of threshold units has been
presented. After decomposing the network into
perceptrons, binary decision diagrams representing
preconditions that activate or inactivate the perceptron have
been extracted. Those intermediate representations can
be composed using the usual algorithms for BDDs, or
they can be combined during their construction by
extracting one into the other. The latter approach does
also allow for an incorporation of integrity constraints
{ already during the extraction of the intermediate
results. As already mentioned in [Bader et al., 2007], the
pruned search trees constructed above are related to the
approach presented in [Krishnan et al., 1999]. But due
to a di erent order, we do not need to expand them
completely, which would otherwise be necessary.</p>
          <p>The extraction as presented here is applicable to
all feed-forward networks composed of binary threshold
units computing 1-threshold function. This limitation
can be softened by allowing arbitrary symmetric
threshold functions. The symmetry is necessary to construct
negative and positive forms of the perceptron without
changing the global network function.</p>
          <p>Finally, we discussed rst experimental results
indicating a good performance of the approach. On the
one hand, we obtain a very compact representation and
on the other hand, we circumvent the construction of
non-necessary intermediate results while incorporating
integrity constraints right from the start.</p>
          <p>Nonetheless, much remains to be done. In particular,
the extraction for non-threshold units has to be studied.</p>
          <p>For the encoder-decoder experiments mentioned above
the algorithm has simply been applied to networks
computing the symmetric hyperbolic tangent as activation
function. Interestingly, the result coincide with our
expectations. This is due to the fact, that networks when
trained to compute crisp decisions tend to behave like
threshold networks. But the details of this need to be
investigated in the future. Furthermore, a detailed
analysis of the performance is necessary, in particular using
networks trained for real-world problems. The approach
as presented here detects equivalent sub-BDDs for
nof-m patterns. But there are more cases for equivalent
sub-BDDs [Mayer-Eichberger, 2008]. Those have to be
integrated into the extraction procedure. It would also
be interesting to study the evolution of a network during
the training process by repeatedly applying the
extraction method and compare the results.</p>
          <p>Acknowledgements The author is thankful for the
comments of Valentin Mayer-Eichberger and two
anonymous reviewers.</p>
          <p>References
Amitabha Mukerjee, Madan M Dabbeeru</p>
          <p>Indian Institute of Technology Kanpur
amit@cse.iitk.ac.in, mmadan@iitk.ac.in
A key step in mapping the more conceptual stages
of design onto computational systems involves
identifying a vocabulary and ontology. While a
number of high-level ontologies have been
proposed, these are difficult to ground in terms of
actual design instances, and manual definitions of the
symbols are often incomplete and difficult to
maintain. As an alternative, we propose an ”infant
designer” paradigm which abstracts patterns for the
”functionally feasible regions” (FFR) while
evaluating many individual configurations in the design
space. These learned FFR patterns (which may
arise due to minimal levels of functional
acceptability, or from optimization) often embody
dependency relationships among the design parameters,
i.e. the good designs lie along lower-dimensional
manifolds in the design parameter space. We show
how such manifolds exist in several design
situations; each combination of the original design
parameters may be thought of as a ”chunk”; the space
of these chunks models only the ”good designs”.</p>
          <p>Next, we show how the patterns defined based on
these chunks constitute image schemas, which may
be implicit (e.g. the pattern for an FFR), or
explicit (where the relationship is observable). These
patterns or image schemas are incipient semantic
model leading to symbols. We present examples
of how such image schemas are arrived at with the
help of universal motor design.
1</p>
          <p>Efforts towards standardizing the design
vocabulary
Evolving a standardized vocabulary for design has emerged
as an important focus in engineering design. Possible
applications include developing design repositories [Bohm et al.,
2005], computer assisted conceptual design [Gero and
Fujii, 2000], etc. It is clear that vocabularies are structured,
that is there are considerable relations between terms.
Often, this is viewed as an ontology or as a structured
relationship that captures a part of the semantics of these terms.</p>
          <p>One popular view of the engineering system considers the
flow of energy, information, etc, and proceeds downward into
detailed design. With its roots in value engineering ideas
from the 1940s, these notions were seeded by the analysis
in Pahl and Beitz [Pahl and Beitz, 19881996] and a
particularly influential study by Welch and Dixon [Richard and
Dixon, 1994], leading to modern ontological models like
the widely used functional basis model [Hirtz et al., 2002]
or implementations on ontology tools [Nanda et al., 2007;
Szykman et al., 2001].</p>
          <p>The above represents the human-engineered approach to
defining symbols. This type of approach is initially
tempting because it tends to meet immediate applications, but a
long history in knowledge-based systems has shown it to be
brittle, i.e. subject to failure under even minor deviations in
the domain. In general, it may be that symbols are more
meaningfully developed by abstracting from existing data.</p>
          <p>The novel contribution of this paper is to show that at least
in certain types of design tasks, lower-dimensional surfaces
are revealed by multi-objective optimization. The intrinsic
dimensions in these pareto-surfaces might constitute one
approach to obtaining “symbols” directly from experiential data
as opposed to engineering them by programming definitions /
rules. These approaches are detailed further in section 1.2 and
section 3, but first we look more closely at the term “symbol”,
and what is understood by its semantics.
1.1</p>
          <p>The semantics of design symbols
Unfortunately the term “symbol”, as it is used in the logic
and computational theory is considerably different from its
usage in cognitive linguistics and in everyday life. In the
latter usage, symbols are imbued with meaning grounded on
experience, whereas in the formal usage, it is merely a
token constructed from some finite alphabet, and is related only
to other such tokens. If we present an analogy, a blind man
knows “red” is a different color from “blue” and “green” but
his understanding of red is dramatically different from that of
a sighted person, because the semantic pole is not connected
to direct experience.On the other hand, “symbol” has come to
be understood in cognitive science (and also traditionally in
linguistics, e.g. de Saussure ( [De Saussure, 19161986]), as
the tight binding of the of the psychological impression of the
sound (the “phonological pole”) with the mental image of the
meaning (the semantic pole) [Langacker, 1986]. The mental
image or image schema includes all sorts of associations and
Design
Exp. 2
Design
Exp.3</p>
          <p>Pattern 1
Pattern 2
Pattern 3</p>
          <p>Chunk Image schema
(Semantic Pole)</p>
          <p>term
Image schema
is somewhat different for each user, though social convention
ensures a degree of overlap between mental images within the
language community.</p>
          <p>However, the notion of symbol is more far-reaching than
communication. It turns out that to some extent, the
symbols help divide up the world into classes, and eventually, it
may reflect changes in how we think. For instance, Korean
language makes a distinction between spatial tight-fit
situations, kkita, (as in “put the cap on the pen”, “hand in glove”)
from other usages of “in” or “on”. Infants growing up in
English and Korean linguistic environments were sensitive to
both contrasts, but English children appear to lose this
sensitivity around the time they start acquiring language,
suggesting that the language construct may have weakened their
sensitivity to these changes [McDonough et al., 2003].</p>
          <p>On the other hand, incompatibility of design vocabulary is
rarely a problem between humans (that’s why exceptions
often become memorable). If designers A and B are talking,
and A does not have a particular symbol λ, its image-schema
may emerge through a small amount of discussion; in many
cases, just a single example may be enough to stretch an
existing concept λ0 in A to the current one. Of course, the new
symbol λ0 remains imprecise, and designer A is aware of it,
and subsequent uses of λ0 will serve to ground it. All this is
possible because the semantic pole for the human is a
complex, elastic set of associations that cannot be defined in terms
of a single predicate or even a range, it is the set of all
situations where the symbol may be encountered (figure 2). All
these associations need to be learned, and cannot be inferred
based on a single definition (not to mention issues such as
nonmonotonicity); hence the programmer-given single
definition, usually created to demonstrate the example at hand, is
a hopelessly inadequate semantics for a design symbol; and
that is why we need bottom-up symbol discovery in order to
ground a design vocabulary.
jeep
Image 1
is-a</p>
          <p>is-a
sedan
Image 2</p>
          <p>car
Image 4</p>
          <p>High
level
is-a
hatchback
Image 3</p>
          <p>Low
level
An alternative that has been proposed for modeling design
concepts is to attempt to move more towards the human
process, to learn symbols based on design experience[Gero and
Fujii, 2000]. The human design process is a constant,
motivated exploration of the design space, e.g. through sketching.</p>
          <p>All the while, the designer is focusing on the designs that are
“good” in some functional sense, and eventually, some kinds
of patterns emerge as the common characteristics of these
designs. This is one sense in which sketches “talk back” to the
designer [Goldschmidt, 2003]. These patterns result in
constraints whereby many of the initial design variables can be
combined, a process cognitively known as chunking [Gobet
et al., 2001].</p>
          <p>For example, in designing a padlock, we may learn that the
shackle diameter increases roughly in proportion with body
size. Thus these two parameters can then be brought down to
a single chunk. These chunks, which limit the choices used
in “good designs”, may be what are used by expert designers
[Gross, 1986].</p>
          <p>An early attempt at discovering patterns in the design space
of shapes may be seen in relation to 2D shapes in the work of
[Park and Gero, 1999]. [Moss et al., 2004] have developed
a system in which a design observer agent considers trends
among good designs and try to extracts chunks. Similarly
a recent approach by [Sarkar et al., 2008], considers
Singular Value Decomposition (SVD) on a co-occurance matrix of
matrix of variables and constraints to identify the relations
between different variable groups.</p>
          <p>However, none of these proposals attempt to learn their
symbols in a grounded manner, and therefore lack the
flexibility of the human designer. By grounded, we refer to the
progressive manner in which a human designer learns her
concepts - the more abstract ones are based on earlier,
concrete concepts, but are still presented through instances. In
the end, many concepts are grounded in terms of a number
of experiential instances. For a human designer, this
learning cannot be limited to the years of training as a designer,
but must include all of her knowledge about the world, the so
called commonsense knowledge. Thus, the fact that a fat peg
will not go into a thin hole is part of her prior knowledge.
Indeed, it is likely that the process by which she acquires these
patterns, built upon many layers of pre-existing knowledge,
may be similar in some salient ways with her earliest
learning.</p>
          <p>In this work, we propose to take the first step towards
building such a grounded semantics, which we call the birth of
symbols. In a human design scenario, say while “talking” to
a sketch, a designer may get a conscious awareness of a
constraint without verbalizing it - this is referred to as reification,
becoming real - and is a key step in forming new symbols.</p>
          <p>Sometimes, amorphous implicit schemas, which are formed
well before we are aware of them [Gladwell et al., 2005] are
incipient symbols, but they need to prove their mettle before
they become true symbols. This interpretation is in line with
a long tradition in psychology and linguistics, that symbols
are “aware” or conscious [Mandler, 2004].
2 Infant designer
A system learning symbols is like a baby who is first
discovering regularity of object behaviour in the world. She can
make various choices, and evaluate them based on some
notion of function. Considering the peg-in-hole task just alluded
to, we see how she might learn the concept that a peg must be
smaller than a hole.</p>
          <p>The functional model considered is simple - the design is
functionally feasible if the peg can go in (actually our system
computes the configuration space - the penetration region
disappears when w &gt; t). We consider a horizontal version of the
peg-in-hole - a latch is entering a slot on a bolt, say. Figure 3
shows how after evaluating a number of instances in the
design space of latch-widths w and slot-widths t; in (w, t) space,
a clear 45 degree line emerges, separating the “good designs”
from the bad.</p>
          <p>Does this constitute symbolic knowledge for the infant
designer? Most likely not. However, it is something that might
become a symbol as she acquires other concepts that she
can refer to. What is interesting in the results of figure 3
is how, after experiencing just a few instances, the pattern is
inchoate, so the baby keeps trying to insert the fat square into
the smaller circle, filling up the negative (black) area of the
figure. Eventually the defining boundary becomes sharper,
and at some point it can be said to knows the principle, at
least implicitly.</p>
          <p>At the next step for our infant designer, we consider the
concept that a designer knows as “fit”. By now our infant
learner will attempt to insert pegs only if they are smaller than
(a) Latch-in-Slot assembly</p>
          <p>(b) 10 instances
1
0.8
the slot. The function is defined in terms of the degree of fit
- how much does it wiggle? Defining the wiggle in terms of
the area of the free-space in the configuration space, we see
that if the wiggle desired is very small, we get the situation
on the left, and if it is very large, we get the situation on the
right. Eventually, the learner learns the concept of “fit” as
a chunk (composed as w − t) - thus, given a level of fit, it
imposes a constraint where w and t are related in a manner
where they constitute a one-dimensional chunk instead of two
independent variable.</p>
          <p>Of course, from a machine learning perspective, both these
examples are rather elementary. Our objective in presenting
it is merely to emphasize the role of even the earliest
knowledge in many advanced design situations. These two
concepts are also among our earliest knowledge achievements;
typically, infants learn containment (peg in hole) by about
3 months, and tight vs loose by 5 months [Casasola et al.,
2003]. Many cognitive scientists believe that our concepts of
abstraction, including the is-a crucial to constructing
hierarchies, is a metaphorical extension of containment [Lakoff and
Johnson, 1999].
3</p>
          <p>Symbol emergence
As the designer matures from infancy, we can consider the
more general process by which symbols form. These may
correspond to the stages shown in figure 5. At first, the
designer explores with instances in the design space,
distinguishing the good designs from the bad. Eventually a
subset of the design space emerges as the Functionally Feasible
region (FFR), or the space of “good designs”. Often, FFRs
correspond to narrow bands of functional feasibility. This
Tight fit</p>
          <p>Medium fit</p>
          <p>Loose fit
5
w
10</p>
          <p>k
00
-1.5
5
w
0
10</p>
          <p>k
-1.5
5
w
10
may be because they are the result of (possibly unconscious)
multi-objective optimization - thus, if there are k design
objectives, then they constitute a k − 1 surface in the
objective space. In continuum design situations (i.e. the search
space is continuous and not combinatorial), if the function
measures that map from the design variable space to the
objective space are continuous, their Jacobians would be
wellposed, and the near neighbours in the objective space may
correspond to near neighbours in the design space. While
this assumption is flawed for a large class of difficult
optimization problems (e.g. Quadratic assignment), it often holds
for a large if not preponderant fraction of real tasks. Thus, in
such situations, we may designs that lie along a k − 1
paretosurface (or “manifold”) in the objective space (shown as a
folded patch in the figure), and a similar lower-dimensional
manifold in the design space as well. Each dimension of this
lower dimensional space reflects an inter-relation between
independent design parameters (e.g. the shackle diameter and
the lock size). Sometimes, some of these dimensional
mappings or chunks may recur in many design situations - this
makes the chunk useful, which is an important criteria for
becoming a symbol. In the interim, the designer may use
these chunks with a dim awareness of it for a long period,
even several years. Later, a label may get attached to it, and
many other associations would eventually accrue to this term
/ image-schema pair; it would then constitute a truly reified
symbol.</p>
          <p>Thus a key aspect of design symbol formation is
dimensionality reduction, - i.e. finding low-dimensional patterns
in high-dimensional space. There are two classes of
dimensionality reduction algorithms - linear methods like PCA or
ICA [Bishop, 2006], or nonlinear approaches, which may
be global (Isomaps [Tenenbaum et al., 2000] ) or local
(Lo</p>
          <p>Ω
Design Space
Ω</p>
          <p>FFR</p>
          <p>FFR
Ω</p>
          <p>FFR</p>
          <p>D Dimensionality
reduction
d</p>
          <p>Reification
Manifold space</p>
          <p>Symantic pole
Phonological pole
cally Linear Embedding or LLE [Saul and Roweis, 2003] and
Laplacian Eigenmaps [Belkin and Niyogi, 2002]). Here we
present some results based on the LLE algorithm, which is an
eigenvector method that works based on the assumption that
the same weighted sum between neighbours would hold both
in the high and the low dimensional spaces (algorithm 1).
We illustrate the working of the process based on the
Universal Motor,which has been well studied in the product family
design literature [Simpson, 1998]. The design space
consists of eight design variables: Nc (number of wire turns
on armature) Ns (number of turns on each field pole), Awa
(cross-section area of armature wire), Awf (cross-section
area of the field wire), ro (radius of motor), t (thickness
of stator) , I: (current drawn by motor), L (stack length).</p>
          <p>Function is measured through a set of performance
behaviors: strength, mass, energy and efficiency. The
corresponding performance metrics in terms of these design variables
can be πtorque(~v) =</p>
          <p>NcΠφI , πmass(~v) = masswindings +
massarmature + masswindings, πpower(~v) = VtI − I2(Ra +
Rs) − 2I, and πefficiency(~v) = πpower . (following
[Simp</p>
          <p>VtI
son, 1998]). We may now consider that the feasible designs
have (i) the magnetizing intensity H &lt; 5000 and (ii) the outer
Algorithm 1 Local Linear Embedding
1. Compute the neighbors Xj of each data</p>
          <p>point,Xi.
2. Compute the weights Wij that best
reconstruct each data point Xi from its
neighbors, minimizing the reconstruction
error ( (W ) = Pi |Xi − Pj WijXj|2) by
constrained linear fits.
3. Compute the vectors Γi best reconstructed
by the weights Wij, minimizing the
quadratic form (Φ(Γ) = Pi |Γi − Pj WijΓj|2) by
its bottom nonzero eigenvectors.
radius of the stator ro greater than the thickness of the stator
t.</p>
          <p>We next outline two experiments designed to reveal the
inter-relationships in the parameter space when it comes to
the optimized designs. The results suggest that the optimized
designs are not scattered uniformly across the design space,
but reveal certain inter-relations between the design
parameters. Thus, the initial parameter space of 8 parameters may
actually constitute only two independent parameters when it
comes to the optimized designs. While these results hold only
for these design classes, the implications might be more
general, and imply far-reaching consequences in obtaining
symbols as dimension-reducing patterns in continuous parameter
space of a wide ranging set of problems. However, whether
these results will scale up to other remains a subject of
considerably more research; the results below only indicate that
this may be so.
3.2 Two-dimensional design space
In an initial experiment, we consider a minimal parameter
set for the universal motor - modeling the design
variability in terms of only two design parameters L and I, while
keeping other parameters constant [Simpson, 1998]. For a
desired functional range of power 280 W&lt; πpower &lt; 295,
the FFR (the valid designs resulting from this constraint) is
shown in Figure 6(a). These lie along a small band, which
can be thought of as a curved 1-D manifold (with a slight
thickness). 6(b).</p>
          <p>The mapping between the nonlinear feasible region (Fig.
6 (b)) and the one-dimensional chunk for it below (Fig. 6
(c)) shows the continuity of mapping between these. If we
take three data points A,B, and C in L, I space. Let us say
X = [A B C], each data point is a real-valued vector, with
of dimensionality 2. With the help of Local Linear
Embedding (LLE) algorithm [Roweis and Saul, 2000], we
construct a neighborhood preserving mapping from L, I space
to Γ. The three points A= (32.0, 4.09), B= (22.5, 3.5455)
and C= (10.5, 12.000) and their corresponding mappings in
the lower-dimensional manifold are γA = −0.2102, γB =
−0.1430 and γC = 0.0007.</p>
          <p>This reduction of the two design parameters to a single γ
represents the first stage of symbol formation. If, later, this
γ chunk is discovered in other situations, then a label, say
“gavagai”, may attach to it. Then as the term “gavagai” may
spread in the design community, and might occur in many
4
10
15 20 25
Stack Length ( L )
It)(10
n
e
rr 8
u
C
c
iltrc 6
e
E
4
A
10</p>
          <p>20 25
Stack Length (L)
(b)</p>
          <p>B</p>
          <p>Figure 6: Chunking on the L, I subspace for Universal
Motors: (a) The implicit constraint on the L, I subspace of the
Design Space is learned for 2000 design instances under the
functional specification 280 W&lt; πpower &lt; 295 W. (b) The
feasible designs in the L, I subspace. (c) The mapping onto
a low-dimensional (1-D) space; this reveals that for good
designs, the stack-length L and the motor current I are related.</p>
          <p>A,B, C: individual design instances in (b) and (c).
other situations, and each such association would form part of
the semantics of the term gavagai. A computational system
that learns this term in this way would need to participate in
such discussions in the design community to keep its
semantics current. This is another reason why static programmed
machine semantics, even if they can capture all the usages
at a given point of time, fail in the long run as human usage
changes.</p>
          <p>High-dimensional spaces: Multi-Objective</p>
          <p>Optimization
If we are to consider the eight-dimensional design space for
the Universal motor, a more useful approach towards finding
FFRs may be to consider a multi-objective optimization
problem based on a set of performance metrics. If design solution
A is better than solution B in all the functional criteria, we
say that A dominates— B. The set of all non-dominated
solutions is the non-dominated front or pareto-front , and usually
lies along a surface in the space of objective functions. For
the Universal motors example, the multi-objective
optimization problem may be formulated as follows:</p>
          <p>Multi-Objective Optimization</p>
          <p>Minimize
Maximize
Maximize
Subject to
πmass(v)
πefficiency(v)
πtorque(v)
g1(v) ≡ r − t &gt; 0
g2(v) ≡ 5000 − H &gt; 0,
g3(v) ≡ 2.0 − πMass ≥ 0,
g4(v) ≡ 0.5 ≤ πtorque ≤ 5.0,
g5(v) ≡ 300 ≤ πP ower ≤ 600
g6(v) ≡ πefficiency − 0.15 ≥ 0
4
e
u
q
r
πto2
0
1
E</p>
          <p>D
1
−3
−2</p>
          <p>C
D</p>
          <p>E
di4mension (6d)</p>
          <p>10</p>
          <p>We use the well known NSGA-II [Deb, 2001]
evolutionary algorithm, with population size 2000, and probability of
crossover 0.8, mutation probability 0.33 and 0.1 (for real/
binary). The estimated pareto front for maximizing both the
torque (πtorque) and efficiency (πefficiency) while
minimizing the mass (πmass) is shown in Fig. 7(a). The designs in
this non-dominated front in objective space are identified in
the original 8-parameter design space. We now attempt to
see if these 8-D points actually constitute a lower
dimensionality manifold, by considering the reconstruction error when
mapped to differing dimensionalities from 2 to 8 (figure Fig.
8; the sharp knee at 2 indicates considerable information
abstraction, and Fig. 7(b) shows the mapping to a 2-dimensional
space obtained by LLE. This mapping reveals that neighbours
in the high dimensional space remain nearby in the
lowerdimensional space at least for this universal motor problem.</p>
          <p>The results here signify that for the universal motor,
obtaining the FFR as a 2-dimensional non-dominated surface in
objective space can lead to a dimensionality reduction to 2 in
the design space as well. These two dimensions possibly
reflect inter-relations between the original eight parameters that
pertain to the better designs in the design space. In terms of
symbol formation, these two dimensions (“XX” and “YY”,
say), if they are found repeatedly in other domains as well,
may eventually become symbols. With sufficient experience,
the relation between these two parameters and the design may
eventually be encoded into design rules: e.g. “higher YY is
u4sually associated with the more efficient designs”.
Subsequent experience may also alter the way we understand these
chunks, and therefore rules like the above that are built on it;
through this demonstration we are primarily arguing that by
keeping these symbols grounded, it would be possible to keep
updating their semantics and their inter-relations (the rules),
thus providing a truly flexible symbol system, in contrast to
static symbol systems.</p>
          <p>We must be careful to point however, that in general a
k−1dimensional pareto-surface in objective space may not map
to an equivalent manifold in design space - there are a large
number of situations where the performance metrics mapping
from design space to objective space are not so well-behaved,
and such results may not hold. Nonetheless, even if a subset
of design parameters are well-behaved, at least some
dimensionality reduction may occur in these spaces. To obtain an
estimate of the dimension of the manifold for our data set,
we use the technique based on the idea that a dimensionality
reduction algorithm should preserve information on a global
scale, so that the inverse mapping error should be minimal.</p>
          <p>For a given input dataset X = {X1, . . . XN } ⊂ RD, the
dimensional reduction algorithm such as LLE provide a
reduced dimensional representation Y = {Y1, . . . YN } ⊂ Rd
of the original data set X. How to determine the
reduceddimensionality d is not clear; one approach may be to
consider several d’s and select that which minimizes the residual
bijection error (rd) = Pi ||fd−1(fd(Xi − Xi)||,[Martin and
Backer, 2005] wherefd : X → Y is the map produced by
LLE. By observing the behavior of rd for different values of
d shown in Fig. 8 we can suggest the intrinsic dimension
for the universal motor is most likely 2; i.e. the initial space
of 8 parameters can, given these optimization conditions, be
reduced to two incipient “symbols”.
4</p>
          <p>Conclusion
The main contributions of this work is the proposal that
nonlinear manifold learning may constitute an important step in
discovering latent relationships among the many parameters
that define how the world works. A key constraint is our
incomplete characterization of the situations in which such a
lower-dimensional characterization would exist.</p>
          <p>Among the work that would need to be done next is to the
conjoints of more than one symbol; i.e. given the design
elements each as an individual symbol, we need to be able to say
what the conjunction of these elements (the syntax) will do,
and whether the resulting object - a design instance - will be
adequate to meet the design task or not. Again, depending on
the “good designs” that emerge in the process, a combination
of symbols may come to be designated as a symbol on its own
right, leading to the birth of abstract symbols.</p>
          <p>The argument presented here implies that in the long run,
to create viable computer vocabularies for design or AI, we
must train the systems to learn these relationships, by
experiencing many design and real world situations. This may be
done in an accelerated manner, but the system must be
exposed to something like the vast array of experiences of a
human - or possibly many more, since the abstraction
processes as computationally available today may not be as
efficient. As different systems are deployed in solving different
problems, their somewhat differing input sets would result in
somewhat different abstractions for the same symbols. These
resulting design agents may therefore be somewhat less
predictable than current computers, but such is the price of
flexibility.
[Belkin and Niyogi, 2002] M. Belkin and P. Niyogi. Laplacian</p>
          <p>Eigenmaps and Spectral Techniques for Embedding and
Clustering. Advances in Neural Information Processing Systems, 1:585–
592, 2002.</p>
          <p>Pattern recognition and machine
[Bohm et al., 2005] M.R. Bohm, R.B. Stone, and S. Szykman.
Enhancing Virtual Product Representations for Advanced Design
Repository Systems. Journal of Computing and Information
Science in Engineering, 5:360, 2005.
[Casasola et al., 2003] Marianella Casasola, Leslie B. Cohen, and</p>
          <p>Elizabeth Chiarello. Six-month-old infants’ categorization of
containment spatial relations. Child Development, 74:679–693,
2003.
[De Saussure, 19161986] F. De Saussure. Course in general
lin</p>
          <p>guistics. Open Court, 1916/1986.
[Deb, 2001] Kalyanmoy Deb. Multi-Objective imization using
Evolutionary Algorithms. Chichester, John Wiley and Sons, Ltd., 1
edition, 2001.
[Gero and Fujii, 2000] JS Gero and H. Fujii. A computational
framework for concept formation for a situated design agent.</p>
          <p>Knowledge-Based Systems , 13(6):361–368, 2000.
[Gladwell et al., 2005] M. Gladwell, H. Finder, and C-SPAN
(Television network. Blink: The power of thinking without thinking.</p>
          <p>Penguin Books, 2005.
[Gobet et al., 2001] F. Gobet, P.C.R. Lane, S. Croker, P.C.H.</p>
          <p>Cheng, G. Jones, I. Oliver, and J.M. Pine. Chunking mechanisms
in human learning. Trends in Cognitive Sciences, 5(6):236–243,
2001.
[Goldschmidt, 2003] G. Goldschmidt. The backtalk of
self</p>
          <p>generated sketches. Design Issues, 19(1):72–88, 2003.
[Gross, 1986] Mark Donald Gross. Design as Exploring
Constraints. PhD thesis, Department of Architecture ,Massachusetts</p>
          <p>Institute of Technology, February 1986.
[Hirtz et al., 2002] J. Hirtz, R.B. Stone, D.A. McAdams, S.
Szykman, and K.L. Wood. A functional basis for engineering design:
Reconciling and evolving previous efforts. Research in
Engineering Design, 13(2):65–82, 2002.
[Lakoff and Johnson, 1999] George Lakoff and Mark Johnson.
Philosophy in the Flesh: The embodied mind and its challenge to
Western thought. Basic Books, New York, 1999.
[Sarkar et al., 2008] Somwrita Sarkar, Andy Dong, and John S</p>
          <p>Gero. A learning and inference mechanism for design
optimization problem (re)-formulation using singular value
decomposition. In Proceedings of DETC’08, ASME Design Engineering
Technical Conferences and Computers and Information in
Engineering Conference, August 2008.
[Simpson, 1998] Timothy W. Simpson. A Concept Exploration</p>
          <p>Method for Product Family Design. PhD thesis, Georgia Tech</p>
          <p>University, Dept Mechanical Engineering, 1998.
[Szykman et al., 2001] S. Szykman, R.D. Sriram, and W.C. Regli.</p>
          <p>The role of knowledge in next-generation product development
systems. Journal of computing and information Science in
Engineering, 1:3, 2001.
[Tenenbaum et al., 2000] J.B. Tenenbaum, V. Silva, and J.C.
Langford. A global geometric framework for nonlinear dimensionality
reduction, 2000.</p>
          <p>A Neural-Symbolic System for Automated Assessment in Training Simulators</p>
          <p>A Position Paper
Leo de Penning, Bart Kappé, Karel van den Bosch</p>
          <p>TNO Defense, Security and Safety</p>
          <p>Soesterberg, The Netherlands
leo.depenning@tno.nl, bart.kappe@tno.nl, karel.vandenbosch@tno.nl
Performance assessment in training simulators is a
complex task. It requires monitoring and
interpreting the student’s behaviour in the simulator using
knowledge of the training task, the environment
and a lot of experience. Assessment in simulators is
therefore generally done by human observers. To
capture this process in an automated system is
challenging and requires innovative solutions. This
paper proposes a new module for automated
assessment in simulators that is based on
NeuralSymbolic Learning and Reasoning and the
Recurrent Temporal Restricted Boltzmann Machine
(RTRBM). The module is capable of using existing
and learning new rules for performance
assessment, by observing experts and students
performing the training tasks. These rules are used to
validate and support the assessment process and to
automatically assess student performance in a
training simulator. The module will be developed in a
three year research project on assessment in driving
simulators for testing and examination.
1
Performance assessment in training simulators has always
been a complex task that is generally performed by human
observers. Performance assessment by automated systems is
often limited to simple training tasks, because assessing
complex tasks requires the modelling of all interrelations
between the information present in the simulation, the
training tasks, and the constructs being assessed (e.g.
competences). Also, when it comes to more subjective assessments
(e.g., how ‘safe’ is the student driving), conventional
modelling techniques fall short, as the applied assessment rules are
often implicit and difficult to elicitate from the simulation or
domain experts.</p>
          <p>We propose a new module for automated assessment as
part of the Virtual Instruction platform SimSCORM
[Penning et al., 2008]. This assessment module will be able to
learn new rules from the task description, (real-time)
simulation data, related assessment data of domain experts or
students and already existing rules (also called background
knowledge). These rules can be presented in a
humanreadable (‘symbolic’) form, facilitating the validation of the
assessment rules and supporting the assessment process.</p>
          <p>The automated assessment module requires real-time
interaction with the simulator(s), the student and human
assessors, and a description of the training task, a student profile
and the simulated environment. SimSCORM provides a
generic platform for definition and presentation of
simulation based training content and interaction between the
content, its users and the simulation based on international
standards (e.g. SCORM, HLA, XML, etc.). Via this
platform the automated assessment module can easily access the
objects and attributes in the simulation and get information
on the student profile and progress.</p>
          <p>
            Figure 1 depicts the automated assessment module
(named CogAgent) in the SimSCORM context.
SimSCORM provides a player that presents a SCORM based
training task to the students and possibly one or more
assessors (e.g. teachers, examiners or students) via a (web-based)
Learning Management System. This player uses SimAgent
to interact with the simulator(s) and CogAgent to do
automated performance assessment and learn new assessment
rules from observation. Therefore, the player configures
CogAgent with information on the training task, measured
variables, student profile, assessed constructs and existing
symbolic rules. During execution of the training task,
assessors can provide feedback on the assessed constructs which
will be presented to CogAgent as short-term evaluations
(depicted as assessment data). SimAgent will act as a
generic interface between the simulator(s) and CogAgent, and
pre-processes received data from the simulator(s) based on
measured variable descriptions. Based on the information
from the player and SimAgent, CogAgent determines an
overall (or long-term) evaluation for the assessed constructs
which will be presented to the students (and assessors) as
assessment result. Parallel to this it uses the measured data
and assessment data to adapt the internal knowledge on
assessment rules, resulting in new rules that can be validated
afterwards. All information, including the symbolic rules,
will be encoded in XML as part of the working memory of
the agents and will be distributed via SOAP (either locally
or via a web-service).
The CogAgent must be able to learn new rules from
observation and existing rules, infer conclusions from these rules
and present them in a human readable form. Research on
Neural-Symbolic Learning and Reasoning focuses on the
integration of learning techniques and architectures from
Neural Networks with the symbolic presentation and
reasoning techniques in (Fuzzy) Logic Programs
            <xref ref-type="bibr" rid="ref21">(see [Bader and
Hitzler, 2005])</xref>
            .
          </p>
          <p>The Neural-Symbolic model proposed for CogAgent is
based on the Recurrent Temporal Restricted Boltzmann
Machine (RTRBM) [Sutskever et al., 2009] and is depicted
in Figure 2. This partially connected symmetric neural
network implements an auto-associative memory of its input
layers (called visible layers). CogAgent contains three
visible layers that represent its beliefs, desires and intentions
(introduced by [Bratman, 1999]). Beliefs are variables
related to the training task (initial conditions, dynamic
behaviour and measured variables) and the student profile.
Intentions are variables related to actions or instructions. And
desires are variables related to performance assessments
(e.g. evaluations or rewards). Beliefs and intentions are
directly related to the current state of the context whereas
desires will be related to future states as well using Temporal
Difference learning [Sutton, 1988]. This technique learns
the model to predict a maximum obtainable value for its
desires (e.g. overall evaluation scores) based on the current
and previous states. Otherwise, the model would only learn
to map short-term evaluations, which is not desired in this
case.</p>
          <p>The hidden layer of the RTRBM is connected to the
visible layers with symmetric connections. Each hidden unit
represents a rule or relation between one or more visible
units. It also contains recurrent hidden-to-hidden
connections that enable the RTRBM to learn the temporal
dynamics in the visible layers using an algorithm based on
contrastive divergence and backpropagation through time. Using
this layer we can infer the posterior probability of beliefs,
intentions and desires in relation to the state of current and
previous beliefs, intentions and desires.
3.1</p>
          <p>Symbolic Rules and Fuzzy Atoms
As described in section 2, the rules CogAgent needs to
encode, learn and reason about are relations (or causalities)
between XML encoded constructs, which will be called
atoms hereafter. An XML based atom describes a belief,
intention or desire as a function of measured data from the
simulator and/or assessment data from the assessors (or
students). In case of training simulators this data is often
expressed in both continuous and binary values. Therefore we
need to use functions in the visible units that can express
both. In [Chen and Murray, 2003] sigmoid functions are
introduced that contain a ‘noise-control’ parameter to allow
a smooth translation from noise-free deterministic behaviour
to binary-stochastic behaviour. These continuous stochastic
functions can express both binary and continuous variables.</p>
          <p>The ‘noise-control’ parameter controls the steepness of the
sigmoid function and can be trained, such that the behaviour
of a function dynamically changes according to the
distribution of its input values. We will extend our model with such
functions to create a Recurrent Temporal Continuous
Restricted Boltzmann Machine (RTCRBM).</p>
          <p>To express relations between atoms in symbolic rules we
propose to use the temporal propositional logic described in
Lamb et al., [2007]. This logic contains several modal
operators that extend classical modal logic with a notion of
past and future. All these operators can be translated to a
form that relates only to the immediate previous timestep
(denoted by the temporal operator ●). This allows us to
encode any rule from this language in the RTCRBM as a
combination of visible units (or atoms) and recurrent hidden
units that represent applied rules in the previous timestep.</p>
          <p>For example the proposition α“β denotes that a proposition
α has been true since the occurrence of proposition β. This
can be translated to: β → α“β and α ∧ ●(α“β) → α“β,
where α and β are modelled by visible units and ●(α“β) is
modelled by a recurrent hidden unit.</p>
          <p>We extend this logic with the use of equality and
inequality formulas to represent the atoms for continuous variables
(e.g. A=x, A&lt;x, etc). Note that the atoms for binary
variables can also be represented as A=true or A=false, which
allows us to handle the outcome of these atoms in the same
way as with the continuous atoms. But for readability we
will use the classical notion A and ¬A.</p>
          <p>Due to the stochastic nature of the sigmoid functions used
in our model, the atoms can be regarded as fuzzy sets with a
Gaussian membership function. This allows as to represent
fuzzy concepts, like good and bad or fast and slow or
approximations of learned values, which is especially useful
when reasoning with implicit and subjective rules. In fact
our model can be regarded as a neural-fuzzy system similar
to the fuzzy systems described in [Kosko, 1992] and [Sun,
1994].</p>
          <p>Now let’s take the training task depicted in Figure 3.
Using our extended temporal propositional logic, we can
describe rules about the conditions, scenario and performance
assessment related to this task.
Figure 3. Example training task for driving simulation. The
Trainee drives on an urban road, approaching an intersection. The
Trainee has to apply the yield-to-the-right-rule.</p>
          <p>Example rules for a driver training task:
Conditions:
(Area = urban)
(Weather ≥ good)
(Time ≥ 6) ∧ (Time ≤ 18)
Scenario:
(Speed &gt; 0) ∧ ApproachingIntersection → CrossIntersection
ApproachingIntersection ∧ ◊(ApproachingTraffic = right)
((Speed &gt; 0) ∧ (HeadingIntersection)) “ (DistanceIntersection &lt; x) →</p>
          <p>ApproachingIntersection
Assessment:</p>
          <p>ApproachingIntersection ∧ (DistanceIntersection = 0) ∧
(ApproachingTraffic = right) ∧ □(Speed = 0) → (Evaluation = good)
ApproachingIntersection ∧ (DistanceIntersection = 0) ∧
(ApproachingTraffic = right) ∧ ◊(Speed &gt; 0) → (Evaluation = bad)
The rule with temporal operator “, denotes that
ApproachingIntersection is true when the driver has been driving
towards an intersection since a certain distance x to an
interTo encode and extract symbolic rules in symmetric
connectionist networks, like the RBM, Pinkas [1995] describes a
generic method that directly maps these rules to the energy
function of such networks. Therefore he describes an
extension to propositional logic, called penalty logic that applies
a penalty to each rule. This penalty can be regarded as the
“certainty” or “reliability” of a rule and is directly related to
the weights of the connections between the units that form
this rule. To apply the encoding and extraction algorithms of
Pinkas successfully to our model we need extend our
temporal propositional logic with the use of penalties. [Sun, 1994]
describes a method to map atoms with classical modal
operators to real values. We propose to extend this method to
create a mapping of atoms and rules with the modal
operators used in our model to penalties. Furthermore we need to
investigate what changes are required to the algorithms to
handle the use of equality formulas and continuous
variables. For example, we need to prove that it is possible to
infer the correct value for unknown continuous variables in
a rule via pattern reconstruction based on known values and
(previously) applied rules. And to encode and extract rules
with inequality formulas we need to be able to transform
these to and from rules that contain only equality formulas.</p>
          <p>The penalties that are encoded or learned by our model
can be used to rank the rules according to their applicability
in a certain context or scenario, giving the students and
assessors a nice overview of the applied rules. Also they allow
us to solve ambiguities in the application of rules, by using
such a ranking to select the most applicable (or reliable) rule
in each case.
4</p>
          <p>References
[Bratman, 1999] Michael E. Bratman. Intention, Plans, and</p>
          <p>Practical Reason. Cambridge University Press, June
1999.
[Pinkas, 1995] Gadi Pinkas. Reasoning, nonmonotonicity
and learning in connectionist networks that capture
propositional knowledge. In Artificial Intelligence v.77 n.2,
pages 203-247, September 1995.
[Sun, 1994] Ron Sun. A neural network model of causality.</p>
          <p>In IEEE Transactions on Neural Networks, Vol. 5, No. 4.</p>
          <p>pages 604-611. July, 1994.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Dahlgren</source>
          , 1988]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dahlgren</surname>
          </string-name>
          .
          <article-title>Using Commonsense Knowledge to Disambiguate Word Sense. Natural Language Understanding and Logic Programming</article-title>
          . Amsterdam, North Holland,
          <volume>2</volume>
          <fpage>edition</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Fellbaum</source>
          , 1998]
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum. WordNet</surname>
          </string-name>
          :
          <article-title>An Electronic Lexical Database</article-title>
          . MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Kathleen and Joyce</source>
          , 1989]
          <article-title>Dahlgren Kathleen and McDowell Joyce</article-title>
          .
          <article-title>Knowledge representation for commonsense reasoning with text</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>149</fpage>
          -
          <lpage>170</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Kipper et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>Karin</given-names>
            <surname>Kipper</surname>
          </string-name>
          , Anna Korhonen, Neville Ryant, and
          <string-name>
            <given-names>Martha</given-names>
            <surname>Palmer</surname>
          </string-name>
          .
          <article-title>Extending verbnet with novel verb classes</article-title>
          .
          <source>In Fifth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2006</year>
          ), Genoa, Italy,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Klein and Manning</source>
          , 2003]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Klein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Fast exact inference with a factored model for natural language parsing</article-title>
          . pages
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Landauer and Dumais</source>
          , 1997]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          .
          <article-title>A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge</article-title>
          .
          <source>Psychol. Rev</source>
          ,
          <volume>104</volume>
          :
          <fpage>211</fpage>
          -
          <lpage>240</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Liu and Singh</source>
          , 2004]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <article-title>Conceptnet: A practical commonsense reasoning toolkit</article-title>
          .
          <source>BT Technology Journal</source>
          , Volume
          <volume>22</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Livingston and Riesbeck</source>
          , 2009]
          <string-name>
            <given-names>K.</given-names>
            <surname>Livingston</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Riesbeck</surname>
          </string-name>
          .
          <article-title>Resolving references and identifying existing knowledge in a memory based parser</article-title>
          .
          <source>In the AAAI 2009 Spring Symposium</source>
          ,
          <article-title>Learning by Reading and</article-title>
          and Learning to Read, California, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Mitchell et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Tom M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <surname>Svetlana</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Shinkareva</surname>
          </string-name>
          , Andrew Carlson,
          <string-name>
            <surname>Kai-Min</surname>
            <given-names>Chang</given-names>
          </string-name>
          , Vicente L. Malave, Robert A.
          <string-name>
            <surname>Mason</surname>
          </string-name>
          , and Marcel Adam Just.
          <article-title>Predicting human brain activity associated with the meanings of nouns</article-title>
          .
          <source>Science</source>
          ,
          <volume>320</volume>
          (
          <issue>5880</issue>
          ):
          <fpage>1191</fpage>
          -
          <lpage>1195</lpage>
          ,
          <year>2008</year>
          .
          <volume>10</volume>
          .1126/science.1152876.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Tulving and Thomson</source>
          , 1973]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tulving</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Thomson</surname>
          </string-name>
          .
          <article-title>Encoding specificity and retrieval processes in episodic memory</article-title>
          .
          <source>Psychological Review</source>
          ,
          <volume>80</volume>
          (
          <issue>5</issue>
          ):
          <fpage>352</fpage>
          -
          <lpage>373</lpage>
          ,
          <year>1973</year>
          . ISI Document Delivery No.:
          <source>Q9393 Times Cited: 1541 Cited Reference Count: 91 AMER PSYCHOLOGICAL ASSOC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Andersen</source>
          , 1999]
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Andersen</surname>
          </string-name>
          .
          <article-title>An introduction to binary decision diagrams</article-title>
          .
          <source>Lecture Notes</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Andrews et al.,
          <year>1995</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Andrews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Diederich</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tickle</surname>
          </string-name>
          .
          <article-title>A survey and critique of techniques for extracting rules from trained arti cial neural networks</article-title>
          .
          <source>Knowledge{Based Systems</source>
          ,
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Bader et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bader</surname>
          </string-name>
          , S. Holldobler, and V.
          <string-name>
            <surname>Mayer-Eichberger</surname>
          </string-name>
          .
          <article-title>Extracting propositional rules from feed-forward neural networks | a new decompositional approach</article-title>
          .
          <source>In Proceedings of the 3rd International Workshop on Neural-Symbolic Learning and Reasoning</source>
          , NeSy'07,
          <year>January 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Jacobsson</source>
          , 2005]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jacobsson</surname>
          </string-name>
          .
          <article-title>Rule extraction from recurrent neural networks: A taxonomy and review</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>17</volume>
          (
          <issue>6</issue>
          ):
          <volume>1223</volume>
          {
          <fpage>1263</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Krishnan et al.,
          <year>1999</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          , G. Sivakumar, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          .
          <article-title>A search technique for rule extraction from trained neural networks</article-title>
          .
          <source>Non-Linear Anal.</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>273</volume>
          {
          <fpage>280</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [
          <string-name>
            <surname>Mayer-Eichberger</surname>
          </string-name>
          ,
          <year>2008</year>
          ]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mayer-Eichberger</surname>
          </string-name>
          .
          <article-title>Towards solving a system of pseudo boolean constraints with binary decision diagrams. Master's thesis</article-title>
          , Universidade Nova de Lisboa,
          <string-name>
            <surname>SEP</surname>
          </string-name>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Towell and Shavlik</source>
          , 1993]
          <string-name>
            <given-names>G.</given-names>
            <surname>Towell</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          .
          <article-title>Extracting re ned rules from knowledge-based neural networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>13</volume>
          :
          <fpage>71</fpage>
          {
          <fpage>101</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>t 8 n</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>ru6 C</mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Langacker</source>
          , 1986]
          <string-name>
            <given-names>Ronald W.</given-names>
            <surname>Langacker</surname>
          </string-name>
          .
          <article-title>An introduction to cognitive grammar</article-title>
          .
          <source>Cognitive Science</source>
          , (
          <volume>10</volume>
          ):
          <fpage>l</fpage>
          -
          <lpage>40</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Martin and Backer</source>
          , 2005]
          <string-name>
            <given-names>S</given-names>
            <surname>Martin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            <surname>Backer</surname>
          </string-name>
          .
          <article-title>Estimating manifold dimension by inversion error</article-title>
          .
          <source>In ACM Symposium on Applied Computing</source>
          , pages
          <fpage>22</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>[McDonough</surname>
          </string-name>
          et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>L.</given-names>
            <surname>McDonough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Choi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Mandler</surname>
          </string-name>
          .
          <article-title>Understanding spatial relations: Flexible infants, lexical adults</article-title>
          .
          <source>Cognitive Psychology</source>
          ,
          <volume>46</volume>
          (
          <issue>3</issue>
          ):
          <fpage>229</fpage>
          -
          <lpage>259</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Moss et al.,
          <year>2004</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Moss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cagan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotovsky</surname>
          </string-name>
          .
          <article-title>Learning from design experience in an agent-based design system</article-title>
          . Research in Engineering Design,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>77</fpage>
          -
          <lpage>92</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Nanda et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>J</given-names>
            <surname>Nanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.J.</given-names>
            <surname>Thevenot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.W.</given-names>
            <surname>Simpson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.B.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bohm</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.B.</given-names>
            <surname>Shooter</surname>
          </string-name>
          .
          <article-title>Product family design knowledge representation, aggregation, reuse, and analysis</article-title>
          .
          <source>AI EDAM</source>
          ,
          <volume>21</volume>
          (
          <issue>02</issue>
          ):
          <fpage>173</fpage>
          -
          <lpage>192</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Pahl and Beitz</source>
          , 19881996]
          <string-name>
            <given-names>G</given-names>
            <surname>Pahl</surname>
          </string-name>
          and
          <string-name>
            <given-names>W</given-names>
            <surname>Beitz</surname>
          </string-name>
          .
          <article-title>Design:A Systematic Approach</article-title>
          , pages
          <fpage>199</fpage>
          -
          <lpage>400</lpage>
          . Council/Springer-Verlag, London/Berlin,
          <year>1988</year>
          /
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Park and Gero</source>
          ,
          <year>1999</year>
          ]
          <string-name>
            <given-names>SH</given-names>
            <surname>Park</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Gero</surname>
          </string-name>
          .
          <article-title>Qualitative representation and reasoning about shapes</article-title>
          .
          <source>In Visual and Spatial Reasoning in Design</source>
          , volume
          <volume>99</volume>
          , pages
          <fpage>55</fpage>
          -
          <lpage>68</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Richard and Dixon, 1994] Welch V Richard and John R Dixon.
          <article-title>Guiding conceptual design through behavioural reasoning</article-title>
          . Research in Engineering,
          <volume>6</volume>
          :
          <fpage>169</fpage>
          -
          <lpage>188</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>[Roweis and Saul</source>
          , 2000]
          <string-name>
            <given-names>S.T.</given-names>
            <surname>Roweis</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.K.</given-names>
            <surname>Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding</surname>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>[Saul and Roweis</source>
          , 2003]
          <string-name>
            <given-names>L K</given-names>
            <surname>Saul and S T Roweis</surname>
          </string-name>
          .
          <article-title>Think globally, fit locally: unsupervised learning of low dimensional manifolds</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>4</volume>
          :
          <fpage>119</fpage>
          -
          <lpage>155</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>