<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A vector representation of Fluid Construction Grammar using Holographic Reduced Representations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yana Knight</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Spranger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luc Steels</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artificial Intelligence Laboratory</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Brussels (VUB) Pleinlaan 2</institution>
          ,
          <addr-line>1050 Brussels, Belgium Dr. Aiguader 88, Barcelona 08003</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sony Computer Science Laboratories 3-14-13 Higashigotanda</institution>
          ,
          <addr-line>141-0022 Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>560</fpage>
      <lpage>565</lpage>
      <abstract>
        <p>The question of how symbol systems can be instantiated in neural network-like computation is still open. Many technical challenges remain and most proposals do not scale up to realistic examples of symbol processing, for example, language understanding or language production. Here we use a top-down approach. We start from Fluid Construction Grammar, a wellworked out framework for language processing that is compatible with recent insights into Construction Grammar and investigate how we could build a neural compiler that automatically translates grammatical constructions and grammatical processing into neural computations. We proceed in two steps. FCG is translated from symbolic processing to numeric processing using a vector symbolic architecture, and this numeric processing is then translated into neural network computation. Our experiments are still in an early stage but already show promise.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Since the early days of cognitive science in the late nineteen
fifties, there has been a struggle to reconcile two approaches
to model intelligence and cognition: a symbolic and a
numeric one. The symbolic approach postulates an abstract
layer with symbols, symbolic structures, and operations over
these symbolic structures, so that it is straightforward to
implement the kind of analysis that logicians, linguists, and
psychologists tend to make. AI researchers have built remarkable
technology to support such implementations based on high
level ‘symbolic’ languages like LISP.</p>
      <p>The numeric approach wants to look at cognitive
processing in terms of numeric operations. It is motivated by the fact
that biological neuronal networks are dynamical systems and
that numeric processing can model self-organizing processes.
So the numeric approach tries to get intelligent behavior
without needing to postulate symbolic structures and operations
explicitly. There have been several waves exploiting this
numeric approach under the head of neural networks and most
recently deep learning.</p>
      <p>
        The symbolic approach has proven its worth in
modeling very large scale language systems, search engines,
problem solvers, models of expert knowledge, ontological and
episodic memory, etc., but most of these applications rely
heavily on a human analyst who identifies the relevant
symbols and symbol processing operations. It is usually claimed
that the symbolic approach is unable to deal with learning
and grounding, but this criticism often ignores work within
the large field of (symbolic) machine learning and work on
grounding symbolic representations in perception and action
by physical robots. While the numeric approach has proven
its worth in the domains of pattern recognition which includes
feature extraction, category formation, and pattern detection,
it has not been equally successful in the implementation of
‘true’ physical symbol systems
        <xref ref-type="bibr" rid="ref8">(Newell &amp; Simon, 1976)</xref>
        .
More specifically, it turns out to be non-trivial to represent
a group of properties of an object (a feature structure), to
compare feature-structures to each other, and to handle
variable binding and feature structure merging - all operations
which many researchers have argued to be necessary for
intelligence. We believe the symbolic and the numeric approach
can only be reconciled when they are viewed as two levels of
description of the same system whereby the former describes
and models natural objects at a higher level than the latter.
Each level has its own abstractions at which regularities are
revealed and each own laws of operation. It is necessary and
highly interesting to find out how the different levels map to
each other. This paper sets some small steps in this
direction. We do not go immediately from the symbol level to
the numeric level but rather use a two-step process: mapping
the symbolic level to a symbolic vector layer, as suggested
by several researchers
        <xref ref-type="bibr" rid="ref10 ref2 ref4 ref6">(Hinton, 1990; Neumann, 2001; Plate,
1994; Gayler, Levy, &amp; Bod, 2010)</xref>
        and then mapping this
layer to a possible neural implementation level in terms of
populations of neurons, which has also been explored already
in
        <xref ref-type="bibr" rid="ref1">(Eliasmith, 2013)</xref>
        .
      </p>
      <p>
        This paper focuses only on the first step. Experiments have
also been done for the second step using the Nengo
framework
        <xref ref-type="bibr" rid="ref1">(Eliasmith, 2013)</xref>
        but are not reported here. The
paper begins by introducing Fluid Construction Grammar as a
challenging test case for studying how to map symbolic
processing to numeric processing. It then proceeds to describe
a potential approach for the translation of FCG to vector
form, namely Holographic Reduced Representations (HRR).
Finally, it presents the results of experiments using HRR to
produce a vector representation of FCG feature structures and
core operators.
      </p>
    </sec>
    <sec id="sec-2">
      <title>FCG and its key operations</title>
      <p>
        Fluid Construction Grammar is a computational platform for
implementing construction grammars
        <xref ref-type="bibr" rid="ref13">(Steels, 2011)</xref>
        . It is
a typical example of a complex symbol system addressing
a core competence of the human brain, namely the
representation and processing (comprehension, production,
learning) of language. FCG was originally designed for modeling
language learning and language change
        <xref ref-type="bibr" rid="ref14 ref15">(Steels, 2012)</xref>
        , and
language-based robot interaction
        <xref ref-type="bibr" rid="ref14 ref15">(Steels &amp; Hild, 2012)</xref>
        . More
recently research has focused on challenging problems in
linguistics and broader coverage grammars. The components of
FCG are symbols, feature structures, transient structures and
constructions.
      </p>
      <p>Symbols are the elementary units of information. They
stand in for syntactic categories (like ‘noun’ or ‘plural’),
semantic categories (like ‘animate’ or ‘future’), unit-names
(e.g. ‘noun-phrase-17’), grammatical functions (like
‘subject’ or ‘head’), ordering relations of words and phrases (e.g.
‘meets’ or ‘preceeds’), meaning-predicates, etc. A basic
grammar of a human language like English would certainly
feature thousands of such symbols, and the set of
meaningpredicates is basically open-ended. Symbols can be bound to
variables, which are written as names with a question-mark
in front as: ?unit, ?gender, ?subject, etc. Symbol names are
chosen to make sense for us but of course the FCG interpreter
has no clue what they mean. The meaning of a symbol only
comes from its functions in the rest of the system.</p>
      <p>Feature structures are a way to group information about
a particular linguistic unit, for example, a word or a phrase.
A feature structure has a name to index it (which is again a
symbol, possibly a variable) and a set of features and
values. Construction grammars group all features of a unit
together, whatever the level. So a feature structure has phonetic
and phonologic features, morphological information,
syntactic and semantic categories, pragmatic information, as well
as structural information about the many possible relations
between units (constituent structure, functional structure,
argument structure, information structure, temporal structure,
etc.). All of these are represented explicitly using features
and values. The values of a feature can be elementary
symbols, sets of symbols (e.g. the constituents of a phrase form
a set), sequences or feature structures, thus allowing a
hierarchically structured feature structure.</p>
      <p>Feature structures are used to represent transient
structures. These are the structures built up during
comprehension and production. The features are grouped into a semantic
pole, which contains the more semantic oriented features,
including pragmatics and semantic categorisations, and a
syntactic pole, which contains the form-oriented features. For
comprehension, the initial transient structure contains all the
information that could be gleaned from the form of the
utterance by perceptual processes and then this transient
structure is progressively expanded until it contains enough
information to interpret the utterance. For production, the initial
transient structure contains the meaning to be expressed and
then this structure is transformed until enough information is
present to render a concrete utterance. There are often
multiple ways to expand a transient structure so a search space is
unavoidable.</p>
      <p>Constructions are also represented as feature structures
and they are more abstract than transient structures. They
typically contain variables that can be bound to the elements of
transient structures and they contain less information about
some of the units. Constructions have a conditional part
which has to match with the transient structure they try to
expand and a contributing part which they add to the transient
structure if the conditional part matches. The conditional part
is decomposed into a production lock which constrains the
activation of a construction in production and a
comprehension lock which constrains the construction in
comprehension. When the lock fits with the transient structure, all
information from the construction which is not there yet is merged
into the transient structure. So match and merge are the most
basic fundamental operations of the grammar.</p>
      <p>
        Here is a simplified example of the double object
construction
        <xref ref-type="bibr" rid="ref3">(Goldberg, 1995)</xref>
        handling phrases like “she gave him a
book”. It has a unit for the clause as a whole
(?ditransitiveclause) and for the different constituents (?NP-1, ?verb,
?NP2 and ?NP-3). The conditional part is on the right-hand side
of the arrow and the contributing part on the left-hand side.
Units in the conditional part have a comprehension lock (on
top) and a production lock (below). The sign between units
means ‘immediately preceeds’.
      </p>
      <p>2 ?ditransive-clause 3 2 ?verb 3
64 cofn?s?NtNitPPu-e-12n,,t?s?v:NerPb-,3g 75 6466 sseyffmnrie-n-vcvdaea-liolveebnenrjcc((e??e:Nr:ePce-2iv)ger)g 7757
2 ?ditransive-clause 3 2 ?NP-1 3
6666664 #ftrccpreaaar0c/uneuesdssievfierece-(rarr?re(teee?cvdseee:(vi?nveeten,v(?t?ec?enarvtue,e?scnteertria)v),n,esrf)egrred), 7777775 66664 rsscpeeeahfmmsreear--:sefcaunnalnott-::ccma?ftiitcaon:annaNui:tmsiPvreaeertfeegrring 77775
2 ?verb
66 rseefmer-efunnt:c?tieovne:npt redicating 77
66666 sfeuamncd-toverar(gl?eocneacrue(?s:etrra)n,sferred)g 777777
466 sfysnu-bvja(l?eNncPe-:1), 57
dir-obj(?NP-3)g
3
2 ?NP-2</p>
      <p>3
66 sseemm--fcuant:cftiaonni:mraetfeegrring 7
7
7
7
5
66 referent: ?receiver
4 phrasal-cat: NP</p>
      <p>case: not-nominative
2 ?NP-3 3
66 sseemm--fcuant:cftion: referring 7</p>
      <p>physobjg 7
66 referent: ?transferred 77
4 phrasal-cat: NP 5
case: not-nominative</p>
      <p>A regular speaker of a language knows probably something
on the order of half a million constructions. So it is not
possible to simply throw them in one bag and try constructions
randomly. FCG therefore features various mechanisms to
finetune which construction should be selected and, if more than
one construction matches, which one should be pursued
further. They include priming networks, organisation of
constructions into sets, partial orderings, a scoring mechanism,
footprints preventing some constructions from becoming
active, etc.</p>
      <p>Obviously Fluid Construction Grammar is a sophisticated
computational formalism but all the mechanisms it proposes
are absolutely necessary to achieve accurate (as opposed to
approximate) comprehension and correct production of
utterances given a particular meaning. Due to the space
limitations, the reader is referred to the rapidly growing literature
on FCG for more details (see also emergent-languages.org).</p>
    </sec>
    <sec id="sec-3">
      <title>Holographic Reduced Representations</title>
      <p>
        The kind of structures used in FCG can be represented
using the AI techniques provided by symbolic programming
languages such as LISP. It is very non-trivial to implement
FCG but doable and adequate implementations exist. We now
come to the key question: Can similar mechanisms also be
implemented using a numeric approach? This means that the
basic elements of FCG are encoded in a numeric format and
the basic FCG-operations are translated into numeric
operations over them. Various efforts to implement symbolic
systems in neural terms have already been undertaken
        <xref ref-type="bibr" rid="ref11">(Shastri
&amp; Ajjanagadde, 1993)</xref>
        . The key problem however is scaling:
The number of neurons required to represent the millions of
symbols in human-scale grammars so far becomes
biologically totally unrealistic
        <xref ref-type="bibr" rid="ref1">(Eliasmith, 2013)</xref>
        .
      </p>
      <p>
        Vector-based approaches known as Vector Symbolic
Architectures (VSA) have demonstrated promising results for
representing and manipulating symbolic structures using
distributed representations. Smolensky’s tensor product is one
of the simplest variants of VSA
        <xref ref-type="bibr" rid="ref12">(Smolensky, 1990)</xref>
        .
However, the main problem with this approach is that a
tensor binding results in an n2-dimensional vector and in case
of recursive representations does not scale well
        <xref ref-type="bibr" rid="ref1">(Eliasmith,
2013)</xref>
        . Alternative approaches have been suggested, such
as Binary Spatter Codes
        <xref ref-type="bibr" rid="ref5">(Kanerva, 1997)</xref>
        and Holographic
Reduced Representations (HRR)
        <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
        , where
binding is done with circular convolution, which results in a
ndimensional vector, so that the number of dimensions does
not increase. Hinton explored distributed representations
based on the idea of reduced representations
        <xref ref-type="bibr" rid="ref4">(Hinton, 1990)</xref>
        and later
        <xref ref-type="bibr" rid="ref6">(Neumann, 2001)</xref>
        demonstrated that connectionist
representational schemes based on the concept of reduced
representation and on the functional composition of
hierarchical structures can support structure-sensitive processes which
show a degree of systematicity. VSAs provide a means for
representing structured knowledge using distributed vector
representations and as such provide a way to translate
symf or j = 0; :::::; n
1
      </p>
      <p>
        Feature-set A feature-set consists of a set of feature-value
pairs. This can be mapped to HRR using vector addition
        <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
        :
      </p>
      <p>Z = X</p>
      <p>Y + T</p>
      <p>S</p>
      <p>Feature structures Feature structures in FCG consist of
feature-sets combined into units. Each unit has a unique name
which is stored as a symbol in the symbol memory. A feature
structure is constructed in the same way as a feature-set, i.e by
convolution and addition, except that to also include units, we
now convolve unit-feature-value triples rather than
featurevalue pairs. The feature structure is the addition of all triples:</p>
      <p>Z = U</p>
      <p>X</p>
      <p>Y + U</p>
      <p>T</p>
      <p>
        S
(6)
bolic to vector representations
        <xref ref-type="bibr" rid="ref1">(Eliasmith, 2013)</xref>
        . Since
vectors can be used by many machine learning methods (for
example, neural networks, support-vector machines, etc.), once
a symbol system has been translated to a vector space
architecture, a subsequent implementation of such a system in
numeric terms should give us access to the machine learning
methods associated with distributed representations.
      </p>
      <p>
        Given the claims made by these various authors, we
decided to explore VSA, more specifically Holographic
Reduced Representations (HRR), for implementing Fluid
Construction Grammar, and then further translate this
representation using existing neural mappings
        <xref ref-type="bibr" rid="ref1">(Eliasmith, 2013)</xref>
        . The
remainder of this section reflects on what is required for the
mapping from FCG to VSA.
      </p>
      <sec id="sec-3-1">
        <title>Representing FCG entities</title>
        <p>
          Symbols A symbol in FCG can be mapped to a randomly
generated n-dimensional vector. All the elements of the
vectors are drawn from a normal distribution N (0,1/n) following
          <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
          . The symbol and its symbol vector are stored in
an error-correction memory as explained later.
        </p>
        <p>
          Feature-value pairs A feature-value pair (the primary
component of a feature structure) can be mapped to a circular
convolution of two vectors, the feature vector and the value
vector. Following
          <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
          , we define convolution as
follows:
        </p>
        <p>Z = X</p>
        <p>
          Y
f or j = 0; :::::; n
1
Once we have a combined feature-value vector, we can, given
the feature, extract the value, using circular correlation
          <xref ref-type="bibr" rid="ref10 ref6">(Plate,
1994; Neumann, 2001)</xref>
          , which convolves the pair with the
approximate inverse of the feature vector
          <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
          :
X = Z
        </p>
        <p>Y
n 1
z j = å xk y j k
k=0
n 1
x j = å zk y j+k
k=0
(1)
(2)
(3)
(4)
(5)</p>
        <p>Since we can represent feature structures, we can also
represent transient structures as well as constructions of arbitrary
length.</p>
        <p>Parts of the structure (also called a trace) can be retrieved
using the correlation operation (Equation 3). For example,
given U X and the whole structure, we can obtain Y.</p>
        <p>
          However, correlation on traces is noisy. A trace preserves
just enough information to recognise the result but not to
reconstruct it. Therefore, we need an error-correction memory
that stores vectors for possible units, features and values. The
memory is used to compare the noisy output of the
correlation operation with all vectors known to the system. Various
comparison measures can be used, however, the most
standard one is dot product, which for two normalized vectors is
equal to the cosine of their angle
          <xref ref-type="bibr" rid="ref6">Neumann (2001)</xref>
          . We define
the following similarity for two vectors:
sim(X ;Y ) =
        </p>
        <p>X</p>
        <p>Y
jjX jjjjY jj</p>
        <p>This similarity is used to retrieve the vector stored in the
error-correction memory with the highest similarity to the
output of correlation. That vector represents the most
plausible value of the feature in a particular trace.
(7)</p>
      </sec>
      <sec id="sec-3-2">
        <title>Matching and Merging</title>
        <p>The promise of distributed representations is that they can do
very fast operations over complete feature structures (such as
comparing them) without traversing the components as would
be done in a symbolic implementation. Let us see how far we
might be able to get without decomposition. FCG basically
needs (i) the ability to copy a feature structure, (ii) to compare
two feature structures (matching) and (iii) to merge a feature
structure with another one.</p>
        <p>
          Copying It is not difficult to copy two feature structures
because it means to copy the two vectors. However we
often need to replace all variables in a feature structure either
by new variables (e.g. when copying a construction before
applying it) or by their bindings (e.g. when creating the
expanded transient structure after matching). It has been
suggested that copy-with-variation can be done by convolving
the current structure A with a transformation vector T
          <xref ref-type="bibr" rid="ref10">(Plate,
1994)</xref>
          :
        </p>
        <p>A</p>
        <p>
          T = B
The transformation vector is first constructed by convolving
the new values with the inverse of the current values, then
adding up the pairs by vector addition. For example, in order
to set the value of the lex-cat feature with the current value
?x to the new value which is the binding of ?x, e.g. noun,
the inverse of ?x should be convolved with noun. The full
transformation vector is
x0 y + z0
w
Such vectors can be hand-constructed
          <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
          – which
is not desirable – or learnt from examples as shown in
          <xref ref-type="bibr" rid="ref7">(Neumann, 2002)</xref>
          .
(8)
(9)
        </p>
        <p>Matching In general, matching two feature structures can
be done by the same principle that is used in the
errorcorrecting memory, i.e. similarity (Equation 7). Since we use
the dot product as our similarity measure, we have a
computationally fast operation, which is well understood
mathematically. Using dot product provides us with a way to compare a
feature structure to every structure in a pool of structures and
to find the structure with the highest similarity as the closest
match. However this ignores two problems we have not
tackled yet: (i) Match in FCG is an includes rather than
similarity operation: if the source structure is a subset of the target
structure, they should still match, even if the similarity
between the two structures is low. In fact, this is very common
because the lock of a construction is always a subset of the
transient structure, and (ii) This does not take variable
bindings yet into account.</p>
        <p>Merging Merging two feature structures is straightforward
because their respective vector representations can simply be
added. It is possible to deal with variable bindings by first
transforming both feature structures by replacing the
variables by their bindings as discussed earlier. However, there
are also some tricky issues to be resolved (e.g. variables may
be bound to other variables making a variable-chain and then
one of these has to be substituted for all the others).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Preliminary implementation experiments</title>
      <p>We now report on first steps in implementing the FCG!VSA
mapping described above. Experiments were carried out in
Python using the mathematical extension numpy.</p>
      <p>
        Feature encoding and retrieval First, we tested the
precision of value retrieval from a feature set and a feature
structure. We were particularly interested in the relationship
between HRR vector dimensionality, length of FCG feature
structure and retrieval accuracy. We therefore tested
different lengths of FCG feature sets/structures (5, 50, 500) vs
dimensionality of HRR vectors (10, 100, 1,000 etc). We did
100 runs for each combination (results averaged). Each time
HRR vectors for individual features were random-initialized
and combined into a feature-structure representing HRR
vectors using convolution and addition. Then we attempted to
retrieve all feature values and measured the number of
correct retrievals divided by the original FCG feature sets length.
Figure 1 (top) illustrates how precision score increases with
vectors of higher dimensionality, consistent with previous
experiments with HRR
        <xref ref-type="bibr" rid="ref6">(Neumann, 2001)</xref>
        . To encode FCG
feature sets with an average length of about 50-100 features, we
required around 3,000-dimensional HRR vectors. This figure
also illustrates how differences in HRR vector dimensionality
are related to the cardinality of the feature-set. For example,
in order to represent and successfully retrieve all values from
a 5-pair set, around 300 dimensions appears to be sufficient,
while a 500-pair feature-set requires just over 30,000. Our
feature-values pairs behave in accordance with
        <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
        ,
which can described as follows:
n = 3:16(k
      </p>
      <p>m
0:25)ln q3
(10)
where n is a lower bound for the dimensionality of the vectors
in order for retrieval to have a probability of error q; k is the
number of pairs in a trace and m is the number of vectors
in the error-correction memory. For example, to have a q of
10 1 in a 5-pair trace with 1,500 items in the error-correction
memory, n should be approximately 213. For a smaller q
of 10 2, around 300 dimensions is required. This roughly
follows the n and q observed and illustrated in Figure 1 (top).</p>
      <p>Feature structures (triples) behave similarly to pairs
although dimensions required to encode triples increase.
Figure 1 (bottom) illustrates how both feature sets
and feature structures scale for various structure sizes
(5;10;50;100;500;1,000 pairs/triples). These results can be
directly translated to FCG. A toy grammar starts at 1-5 units
per construction with 1-10 feature-value pairs in each. A
more complex grammar can have around 10 units with
approximately the same number of pairs in each unit.
Represented as triples, such structures can be encoded in vectors
of around 6,000 dimensions. Really large grammars of 30
units and 30 feature-values pairs in each unit require roughly
100,000 dimensions.
ious conditions working with feature sets (rather than feature
structures) for simplicity.</p>
      <p>
        First, we investigated how similarity (Equation 7) between
two HRR vectors responds to structural changes vs changes
in underlying feature values. Figure 2 (top, bottom) shows
that changes to feature values result in a greater decrease in
similarity (reaching 0.0 after 102 for a 100-pair structure)
than structural changes i.e. adding new pairs, which led to
a more gradual similarity degradation (reaching 0.0 after 105
for the same structure size). The difference between these
two types of changes is important in FCG, where structures
can be structurally different and still match, while structures
that for example, differ in feature values, should not. This
finding is also in line with previous experiments comparing
HRR structures using dot product
        <xref ref-type="bibr" rid="ref10">(Plate, 1994)</xref>
        , where
similarity was more sensitive to the content of feature-value pairs
rather than the quanitiy of pairs.
      </p>
      <p>Matching We numerically investigated if HRR
representations can be used to implement the FCG match operation in
two phases: We investigated changes in sim(X ;Y ) under
var</p>
      <p>When adding or removing new pairs, similarity (Equation
7) is affected as illustrated in Figure 3. Initially, both
structures contained 1,000 pairs; the second structure was
subsequently changed by an order of magnitude at a time. Thus the
first structure gradually became a subset of the second. The
graph illustrates that as the structure becomes extended with
new pairs, similarity of the two structures begins to drop,
despite the fact that the structures share the initial 1,000 pairs.
However, this degradation is very gradual, and similarities
reach 0.0 only after 105 new pairs have been added.
Furthermore, it can be seen that for larger structures such degradation
is more gradual than for smaller ones (see Figure 2 bottom).
For example, for 10-pair structures similarity is almost 0.0
after 103 new pairs have been added. This is still, however,
fairly gradual, considering that a 10-pair structure had 1,000
pairs added to it before becoming dissimilar to the original.
The drop in sim(X ;Y ) appears to be asymmetrical: removing
pairs gives lower similarity than adding the same number of
pairs. This is expected since removing pairs results in less
shared information between the structures than adding pairs.</p>
      <p>These findings are good and bad news at the same time.
On the one hand it is good that feature value changes have
a more drastic effect on sim(X ;Y ) than structure changes.
On the other hand, the system will have to be able to
autonomously find out whether two HRR vectors represent
feature sets which differ structurally or in terms of feature values.
Possibly this distinction can be learnt. But finding a solution
that is invariant to HRR vector dimension size and feature set
cardinality is likely not an easy task. Another problem is the
commutative nature of sim(X ;Y ) which essentially does not
allow to determine which is the more inclusive feature set.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This paper has speculated how a linguistically and
computationally adequate formalism for language, namely Fluid
Construction Grammar, could be represented in a Vector
Symbolic Architecture, more specifically Holographic Reduced
Representations, as a step towards a neural implementation.
We proposed a number of steps and reported some
preliminary implementation experiments with promising results.
The main conclusion so far is that a number of
fundamental issues remain to be solved to make the FCG!VGA
mapping fully operational, particularly the issue of implementing
a matching operation that uses a binding list and possibly
extends it while matching takes place.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Research reported in this paper was funded by the Marie
Curie ESSENCE ITN and carried out at the AI lab, Vrije
Universiteit Brussel and the Institut de Biologia Evolutiva
(UPFCSIC), Barcelona, financed by the FET OPEN Insight Project
and the Marie Curie Integration Grant EVOLAN. We are
indebted to comments from Johan Loeckx and Emilia Garcia
Casademont.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Eliasmith</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>How to build a brain: A neural architecture for biological cognition</article-title>
          . New York, NY: Oxford University Press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Gayler</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          , Levy,
          <string-name>
            <given-names>S. D.</given-names>
            , &amp;
            <surname>Bod</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Explanatory aspirations and the scandal of cognitive neuroscience</article-title>
          .
          <source>In Proceedings of the first annual meeting of the bica society</source>
          (pp.
          <fpage>42</fpage>
          -
          <lpage>51</lpage>
          ). Amsterdam: IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Constructions: A construction grammar approach to argument structure</article-title>
          . Chicago: University of Chicago Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>Mapping part-whole hierarchies into connectionist networks</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>46</volume>
          ,
          <fpage>47</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Fully distributed representation</article-title>
          .
          <source>In Proc. real world computing symposium</source>
          (pp.
          <fpage>358</fpage>
          -
          <lpage>365</lpage>
          ). Tsukubacity, Japan: Real World Computing Partnership.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Holistic processing of hierarchical structures in connectionist networks</article-title>
          .
          <source>Doctoral dissertation</source>
          , School of Informatics,The University of Edinburgh UK.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Learning the systematic transformation of holographic reduced representations</article-title>
          .
          <source>Cognitive Systems Research</source>
          ,
          <volume>3</volume>
          ,
          <fpage>227</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Newell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          (
          <year>1976</year>
          ).
          <article-title>Computer science as empirical inquiry: Symbols and search</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>19</volume>
          ,
          <fpage>113</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Ohlsson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Langley</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1985</year>
          ).
          <article-title>Identifying solution paths in cognitive diagnosis (Tech. Rep</article-title>
          . No.
          <source>CMU-RI-TR-85-2)</source>
          . Pittsburgh, PA: Carnegie Mellon University, The Robotics Institute.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Plate</surname>
            ,
            <given-names>T. A.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Distributed representations and nested compositional structure</article-title>
          .
          <source>Doctoral dissertation</source>
          , Graduate Department of Computer Science, University of Toronto,.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Shastri</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ajjanagadde</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>1993</year>
          ).
          <article-title>From simple associations to systematic reasoning: A connectionist representation of rules, variables, and dynamic bindings</article-title>
          .
          <source>Behavioral and Brain Sciences</source>
          ,
          <volume>16</volume>
          , 417?
          <fpage>494</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Smolensky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>Tensor product variable binding and the representation of symbolic structures in connectionist systems</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>46</volume>
          ,
          <fpage>159</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Steels</surname>
            ,
            <given-names>L</given-names>
          </string-name>
          . (Ed.). (
          <year>2011</year>
          ).
          <article-title>Design patterns in Fluid Construction Grammar</article-title>
          . Amsterdam: John Benjamins.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Steels</surname>
            ,
            <given-names>L</given-names>
          </string-name>
          . (Ed.). (
          <year>2012</year>
          ).
          <article-title>Experiments in cultural language evolution</article-title>
          . Amsterdam: John Benjamins.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Steels</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hild</surname>
          </string-name>
          , M. (Eds.). (
          <year>2012</year>
          ).
          <article-title>Language grounding in robots</article-title>
          . New York: Springer-Verlag.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>