=Paper=
{{Paper
|id=Vol-1419/paper0092
|storemode=property
|title=A Vector Representation of Fluid Construction Grammar Using Holographic Reduced Representations
|pdfUrl=https://ceur-ws.org/Vol-1419/paper0092.pdf
|volume=Vol-1419
|dblpUrl=https://dblp.org/rec/conf/eapcogsci/KnightSS15
}}
==A Vector Representation of Fluid Construction Grammar Using Holographic Reduced Representations==
<pdf width="1500px">https://ceur-ws.org/Vol-1419/paper0092.pdf</pdf>
<pre>
                     A vector representation of Fluid Construction Grammar
                           using Holographic Reduced Representations
                                                         Yana Knight1 and
                                                       Michael Spranger2 and
                                                           Luc Steels1
                                                  1 Artificial Intelligence Laboratory,

                                                 Free University of Brussels (VUB)
                                                Pleinlaan 2, 1050 Brussels, Belgium
                                             Dr. Aiguader 88, Barcelona 08003, Spain
                                              2 Sony Computer Science Laboratories

                                          3-14-13 Higashigotanda, 141-0022 Tokyo, Japan


                            Abstract                                      that the symbolic approach is unable to deal with learning
  The question of how symbol systems can be instantiated in               and grounding, but this criticism often ignores work within
  neural network-like computation is still open. Many technical           the large field of (symbolic) machine learning and work on
  challenges remain and most proposals do not scale up to realis-         grounding symbolic representations in perception and action
  tic examples of symbol processing, for example, language un-
  derstanding or language production. Here we use a top-down              by physical robots. While the numeric approach has proven
  approach. We start from Fluid Construction Grammar, a well-             its worth in the domains of pattern recognition which includes
  worked out framework for language processing that is compat-            feature extraction, category formation, and pattern detection,
  ible with recent insights into Construction Grammar and inves-
  tigate how we could build a neural compiler that automatically          it has not been equally successful in the implementation of
  translates grammatical constructions and grammatical process-           ‘true’ physical symbol systems (Newell & Simon, 1976).
  ing into neural computations. We proceed in two steps. FCG is           More specifically, it turns out to be non-trivial to represent
  translated from symbolic processing to numeric processing us-
  ing a vector symbolic architecture, and this numeric processing         a group of properties of an object (a feature structure), to
  is then translated into neural network computation. Our exper-          compare feature-structures to each other, and to handle vari-
  iments are still in an early stage but already show promise.            able binding and feature structure merging - all operations
  Keywords: Vector Symbolic Architectures; Fluid Construc-                which many researchers have argued to be necessary for intel-
  tion Grammar; Connectionist Symbol Processing
                                                                          ligence. We believe the symbolic and the numeric approach
                        Introduction                                      can only be reconciled when they are viewed as two levels of
                                                                          description of the same system whereby the former describes
Since the early days of cognitive science in the late nineteen            and models natural objects at a higher level than the latter.
fifties, there has been a struggle to reconcile two approaches            Each level has its own abstractions at which regularities are
to model intelligence and cognition: a symbolic and a nu-                 revealed and each own laws of operation. It is necessary and
meric one. The symbolic approach postulates an abstract                   highly interesting to find out how the different levels map to
layer with symbols, symbolic structures, and operations over              each other. This paper sets some small steps in this direc-
these symbolic structures, so that it is straightforward to im-           tion. We do not go immediately from the symbol level to
plement the kind of analysis that logicians, linguists, and psy-          the numeric level but rather use a two-step process: mapping
chologists tend to make. AI researchers have built remarkable             the symbolic level to a symbolic vector layer, as suggested
technology to support such implementations based on high                  by several researchers (Hinton, 1990; Neumann, 2001; Plate,
level ‘symbolic’ languages like LISP.                                     1994; Gayler, Levy, & Bod, 2010) and then mapping this
   The numeric approach wants to look at cognitive process-               layer to a possible neural implementation level in terms of
ing in terms of numeric operations. It is motivated by the fact           populations of neurons, which has also been explored already
that biological neuronal networks are dynamical systems and               in (Eliasmith, 2013).
that numeric processing can model self-organizing processes.
So the numeric approach tries to get intelligent behavior with-              This paper focuses only on the first step. Experiments have
out needing to postulate symbolic structures and operations               also been done for the second step using the Nengo frame-
explicitly. There have been several waves exploiting this nu-             work (Eliasmith, 2013) but are not reported here. The pa-
meric approach under the head of neural networks and most                 per begins by introducing Fluid Construction Grammar as a
recently deep learning.                                                   challenging test case for studying how to map symbolic pro-
   The symbolic approach has proven its worth in model-                   cessing to numeric processing. It then proceeds to describe
ing very large scale language systems, search engines, prob-              a potential approach for the translation of FCG to vector
lem solvers, models of expert knowledge, ontological and                  form, namely Holographic Reduced Representations (HRR).
episodic memory, etc., but most of these applications rely                Finally, it presents the results of experiments using HRR to
heavily on a human analyst who identifies the relevant sym-               produce a vector representation of FCG feature structures and
bols and symbol processing operations. It is usually claimed              core operators.


                                                                    560
             FCG and its key operations                                  then this structure is transformed until enough information is
                                                                         present to render a concrete utterance. There are often multi-
Fluid Construction Grammar is a computational platform for
                                                                         ple ways to expand a transient structure so a search space is
implementing construction grammars (Steels, 2011). It is
                                                                         unavoidable.
a typical example of a complex symbol system addressing
                                                                            Constructions are also represented as feature structures
a core competence of the human brain, namely the repre-
                                                                         and they are more abstract than transient structures. They typ-
sentation and processing (comprehension, production, learn-
                                                                         ically contain variables that can be bound to the elements of
ing) of language. FCG was originally designed for modeling
                                                                         transient structures and they contain less information about
language learning and language change (Steels, 2012), and
                                                                         some of the units. Constructions have a conditional part
language-based robot interaction (Steels & Hild, 2012). More
                                                                         which has to match with the transient structure they try to ex-
recently research has focused on challenging problems in lin-
                                                                         pand and a contributing part which they add to the transient
guistics and broader coverage grammars. The components of
                                                                         structure if the conditional part matches. The conditional part
FCG are symbols, feature structures, transient structures and
                                                                         is decomposed into a production lock which constrains the
constructions.
                                                                         activation of a construction in production and a comprehen-
   Symbols are the elementary units of information. They                 sion lock which constrains the construction in comprehen-
stand in for syntactic categories (like ‘noun’ or ‘plural’),             sion. When the lock fits with the transient structure, all infor-
semantic categories (like ‘animate’ or ‘future’), unit-names             mation from the construction which is not there yet is merged
(e.g. ‘noun-phrase-17’), grammatical functions (like ‘sub-               into the transient structure. So match and merge are the most
ject’ or ‘head’), ordering relations of words and phrases (e.g.          basic fundamental operations of the grammar.
‘meets’ or ‘preceeds’), meaning-predicates, etc. A basic
                                                                            Here is a simplified example of the double object construc-
grammar of a human language like English would certainly
                                                                         tion (Goldberg, 1995) handling phrases like “she gave him a
feature thousands of such symbols, and the set of meaning-
                                                                         book”. It has a unit for the clause as a whole (?ditransitive-
predicates is basically open-ended. Symbols can be bound to
                                                                         clause) and for the different constituents (?NP-1, ?verb, ?NP-
variables, which are written as names with a question-mark
                                                                         2 and ?NP-3). The conditional part is on the right-hand side
in front as: ?unit, ?gender, ?subject, etc. Symbol names are
                                                                         of the arrow and the contributing part on the left-hand side.
chosen to make sense for us but of course the FCG interpreter
                                                                         Units in the conditional part have a comprehension lock (on
has no clue what they mean. The meaning of a symbol only
                                                                         top) and a production lock (below). The ≤ sign between units
comes from its functions in the rest of the system.
                                                                         means ‘immediately preceeds’.
   Feature structures are a way to group information about
a particular linguistic unit, for example, a word or a phrase.                                                             
A feature structure has a name to index it (which is again a                                       ?verb
                                                                               ?ditransive-clause
                                                                             
symbol, possibly a variable) and a set of features and val-                                         sem-valence:
                                                                                                                           
                                                                              constituents:
                                                                                                    {receiver(?receiver)}  ←
                                                                                                                            
                                                                              {?NP-1, ?verb,
ues. Construction grammars group all features of a unit to-                                          syn-valence:          
gether, whatever the level. So a feature structure has phonetic                   ?NP-2, ?NP-3}         {ind-obj(?NP-2)}
and phonologic features, morphological information, syntac-
tic and semantic categories, pragmatic information, as well              
                                                                           ?ditransive-clause
                                                                                                             
                                                                                                                ?NP-1                   
as structural information about the many possible relations               # predicates:
                                                                                                               sem-function: referring 
                                                                                                             
between units (constituent structure, functional structure, ar-           {cause-receive(?event),
                                                                         
                                                                                                               referent: ?causer       
                                                                          causer(?event,?causer),                                     ≤
gument structure, information structure, temporal structure,                                                  sem-cat: {animate}
                                                                          transferred(?event,?transferred),  
                                                                                                                                         
                                                                                                              phrasal-cat: NP
                                                                                                                                         
etc.). All of these are represented explicitly using features            
                                                                            receiver(?event ?receiver)}
                                                                                                                 case: nominative
and values. The values of a feature can be elementary sym-                    0/
bols, sets of symbols (e.g. the constituents of a phrase form
                                                                          ?verb
a set), sequences or feature structures, thus allowing a hierar-
                                                                                                      
chically structured feature structure.                                    referent: ?event            ?NP-2                   
                                                                          sem-function: predicating 
   Feature structures are used to represent transient struc-                                          sem-function: referring 
                                                                          sem-valence:               
                                                                                                       sem-cat: {animate}      
tures. These are the structures built up during comprehen-                {actor(?causer),          ≤                          ≤
                                                                         
                                                                          undergoer(?transferred)}     referent: ?receiver    
sion and production. The features are grouped into a semantic            
                                                                          syn-valence:
                                                                                                     
                                                                                                     
                                                                                                        
                                                                                                          phrasal-cat: NP
                                                                                                                                 
pole, which contains the more semantic oriented features, in-                                             case: not-nominative
                                                                            {subj(?NP-1),
                                                                                                    
cluding pragmatics and semantic categorisations, and a syn-                  dir-obj(?NP-3)}
tactic pole, which contains the form-oriented features. For
                                                                                            ?NP-3
comprehension, the initial transient structure contains all the
                                                                                                                      
information that could be gleaned from the form of the ut-                                  sem-function: referring 
                                                                                            sem-cat: {physobj}      
terance by perceptual processes and then this transient struc-                             
                                                                                            referent: ?transferred
                                                                                                                     
                                                                                                                     
ture is progressively expanded until it contains enough infor-                             
                                                                                             phrasal-cat: NP
                                                                                                                     
mation to interpret the utterance. For production, the initial                               case: not-nominative
transient structure contains the meaning to be expressed and


                                                                   561
   A regular speaker of a language knows probably something               bolic to vector representations (Eliasmith, 2013). Since vec-
on the order of half a million constructions. So it is not possi-         tors can be used by many machine learning methods (for ex-
ble to simply throw them in one bag and try constructions ran-            ample, neural networks, support-vector machines, etc.), once
domly. FCG therefore features various mechanisms to fine-                 a symbol system has been translated to a vector space archi-
tune which construction should be selected and, if more than              tecture, a subsequent implementation of such a system in nu-
one construction matches, which one should be pursued fur-                meric terms should give us access to the machine learning
ther. They include priming networks, organisation of con-                 methods associated with distributed representations.
structions into sets, partial orderings, a scoring mechanism,                Given the claims made by these various authors, we de-
footprints preventing some constructions from becoming ac-                cided to explore VSA, more specifically Holographic Re-
tive, etc.                                                                duced Representations (HRR), for implementing Fluid Con-
   Obviously Fluid Construction Grammar is a sophisticated                struction Grammar, and then further translate this representa-
computational formalism but all the mechanisms it proposes                tion using existing neural mappings (Eliasmith, 2013). The
are absolutely necessary to achieve accurate (as opposed to               remainder of this section reflects on what is required for the
approximate) comprehension and correct production of utter-               mapping from FCG to VSA.
ances given a particular meaning. Due to the space limita-
tions, the reader is referred to the rapidly growing literature           Representing FCG entities
on FCG for more details (see also emergent-languages.org).                Symbols A symbol in FCG can be mapped to a randomly
                                                                          generated n-dimensional vector. All the elements of the vec-
      Holographic Reduced Representations                                 tors are drawn from a normal distribution N (0,1/n) following
                                                                          (Plate, 1994). The symbol and its symbol vector are stored in
The kind of structures used in FCG can be represented us-                 an error-correction memory as explained later.
ing the AI techniques provided by symbolic programming
                                                                             Feature-value pairs A feature-value pair (the primary
languages such as LISP. It is very non-trivial to implement
                                                                          component of a feature structure) can be mapped to a circular
FCG but doable and adequate implementations exist. We now
                                                                          convolution of two vectors, the feature vector and the value
come to the key question: Can similar mechanisms also be
                                                                          vector. Following (Plate, 1994), we define convolution as
implemented using a numeric approach? This means that the
                                                                          follows:
basic elements of FCG are encoded in a numeric format and
the basic FCG-operations are translated into numeric opera-
tions over them. Various efforts to implement symbolic sys-                                            Z = X ⊗Y                         (1)
tems in neural terms have already been undertaken (Shastri                               n−1
& Ajjanagadde, 1993). The key problem however is scaling:                           z j = ∑ xk y j−k       f or j = 0, ....., n − 1     (2)
The number of neurons required to represent the millions of                              k=0
symbols in human-scale grammars so far becomes biologi-                   Once we have a combined feature-value vector, we can, given
cally totally unrealistic (Eliasmith, 2013).                              the feature, extract the value, using circular correlation (Plate,
   Vector-based approaches known as Vector Symbolic Ar-                   1994; Neumann, 2001), which convolves the pair with the
chitectures (VSA) have demonstrated promising results for                 approximate inverse of the feature vector (Plate, 1994):
representing and manipulating symbolic structures using dis-
tributed representations. Smolensky’s tensor product is one                                          X = Z ⊕Y                           (3)
of the simplest variants of VSA (Smolensky, 1990). How-
ever, the main problem with this approach is that a ten-                                  n−1
sor binding results in an n2 -dimensional vector and in case                        x j = ∑ zk y j+k       f or j = 0, ....., n − 1     (4)
                                                                                          k=0
of recursive representations does not scale well (Eliasmith,
2013). Alternative approaches have been suggested, such                     Feature-set A feature-set consists of a set of feature-value
as Binary Spatter Codes (Kanerva, 1997) and Holographic                   pairs. This can be mapped to HRR using vector addition
Reduced Representations (HRR) (Plate, 1994), where bind-                  (Plate, 1994):
ing is done with circular convolution, which results in a n-                                  Z = X ⊗Y + T ⊗ S                       (5)
dimensional vector, so that the number of dimensions does
not increase. Hinton explored distributed representations                    Feature structures Feature structures in FCG consist of
based on the idea of reduced representations (Hinton, 1990)               feature-sets combined into units. Each unit has a unique name
and later (Neumann, 2001) demonstrated that connectionist                 which is stored as a symbol in the symbol memory. A feature
representational schemes based on the concept of reduced                  structure is constructed in the same way as a feature-set, i.e by
representation and on the functional composition of hierarchi-            convolution and addition, except that to also include units, we
cal structures can support structure-sensitive processes which            now convolve unit-feature-value triples rather than feature-
show a degree of systematicity. VSAs provide a means for                  value pairs. The feature structure is the addition of all triples:
representing structured knowledge using distributed vector
representations and as such provide a way to translate sym-                                Z = U ⊗ X ⊗Y +U ⊗ T ⊗ S                      (6)


                                                                    562
   Since we can represent feature structures, we can also rep-                Matching In general, matching two feature structures can
resent transient structures as well as constructions of arbitrary          be done by the same principle that is used in the error-
length.                                                                    correcting memory, i.e. similarity (Equation 7). Since we use
   Parts of the structure (also called a trace) can be retrieved           the dot product as our similarity measure, we have a compu-
using the correlation operation (Equation 3). For example,                 tationally fast operation, which is well understood mathemat-
given U ⊗ X and the whole structure, we can obtain Y.                      ically. Using dot product provides us with a way to compare a
   However, correlation on traces is noisy. A trace preserves              feature structure to every structure in a pool of structures and
just enough information to recognise the result but not to re-             to find the structure with the highest similarity as the closest
construct it. Therefore, we need an error-correction memory                match. However this ignores two problems we have not tack-
that stores vectors for possible units, features and values. The           led yet: (i) Match in FCG is an includes rather than similar-
memory is used to compare the noisy output of the correla-                 ity operation: if the source structure is a subset of the target
tion operation with all vectors known to the system. Various               structure, they should still match, even if the similarity be-
comparison measures can be used, however, the most stan-                   tween the two structures is low. In fact, this is very common
dard one is dot product, which for two normalized vectors is               because the lock of a construction is always a subset of the
equal to the cosine of their angle Neumann (2001). We define               transient structure, and (ii) This does not take variable bind-
the following similarity for two vectors:                                  ings yet into account.
                                                                              Merging Merging two feature structures is straightforward
                                     X Y                                   because their respective vector representations can simply be
                    sim(X,Y ) =                               (7)
                                   ||X||||Y ||                             added. It is possible to deal with variable bindings by first
   This similarity is used to retrieve the vector stored in the            transforming both feature structures by replacing the vari-
error-correction memory with the highest similarity to the                 ables by their bindings as discussed earlier. However, there
output of correlation. That vector represents the most plau-               are also some tricky issues to be resolved (e.g. variables may
sible value of the feature in a particular trace.                          be bound to other variables making a variable-chain and then
                                                                           one of these has to be substituted for all the others).
Matching and Merging
The promise of distributed representations is that they can do                 Preliminary implementation experiments
very fast operations over complete feature structures (such as             We now report on first steps in implementing the FCG→VSA
comparing them) without traversing the components as would                 mapping described above. Experiments were carried out in
be done in a symbolic implementation. Let us see how far we                Python using the mathematical extension numpy.
might be able to get without decomposition. FCG basically                     Feature encoding and retrieval First, we tested the preci-
needs (i) the ability to copy a feature structure, (ii) to compare         sion of value retrieval from a feature set and a feature struc-
two feature structures (matching) and (iii) to merge a feature             ture. We were particularly interested in the relationship be-
structure with another one.                                                tween HRR vector dimensionality, length of FCG feature
   Copying It is not difficult to copy two feature structures              structure and retrieval accuracy. We therefore tested differ-
because it means to copy the two vectors. However we of-                   ent lengths of FCG feature sets/structures (5, 50, 500) vs di-
ten need to replace all variables in a feature structure either            mensionality of HRR vectors (10, 100, 1,000 etc). We did
by new variables (e.g. when copying a construction before                  100 runs for each combination (results averaged). Each time
applying it) or by their bindings (e.g. when creating the ex-              HRR vectors for individual features were random-initialized
panded transient structure after matching). It has been sug-               and combined into a feature-structure representing HRR vec-
gested that copy-with-variation can be done by convolving                  tors using convolution and addition. Then we attempted to
the current structure A with a transformation vector T (Plate,             retrieve all feature values and measured the number of cor-
1994):                                                                     rect retrievals divided by the original FCG feature sets length.
                            A⊗T = B                            (8)         Figure 1 (top) illustrates how precision score increases with
The transformation vector is first constructed by convolving               vectors of higher dimensionality, consistent with previous ex-
the new values with the inverse of the current values, then                periments with HRR (Neumann, 2001). To encode FCG fea-
adding up the pairs by vector addition. For example, in order              ture sets with an average length of about 50-100 features, we
to set the value of the lex-cat feature with the current value             required around 3,000-dimensional HRR vectors. This figure
?x to the new value which is the binding of ?x, e.g. noun,                 also illustrates how differences in HRR vector dimensionality
the inverse of ?x should be convolved with noun. The full                  are related to the cardinality of the feature-set. For example,
transformation vector is                                                   in order to represent and successfully retrieve all values from
                                                                           a 5-pair set, around 300 dimensions appears to be sufficient,
                         x 0 ⊗ y + z0 ⊗ w                     (9)          while a 500-pair feature-set requires just over 30,000. Our
                                                                           feature-values pairs behave in accordance with (Plate, 1994),
Such vectors can be hand-constructed (Plate, 1994) – which                 which can described as follows:
is not desirable – or learnt from examples as shown in                                                                m
(Neumann, 2002).                                                                                n = 3.16(k − 0.25)ln 3                 (10)
                                                                                                                     q


                                                                     563
where n is a lower bound for the dimensionality of the vectors               ious conditions working with feature sets (rather than feature
in order for retrieval to have a probability of error q; k is the            structures) for simplicity.
number of pairs in a trace and m is the number of vectors                       First, we investigated how similarity (Equation 7) between
in the error-correction memory. For example, to have a q of                  two HRR vectors responds to structural changes vs changes
10−1 in a 5-pair trace with 1,500 items in the error-correction              in underlying feature values. Figure 2 (top, bottom) shows
memory, n should be approximately 213. For a smaller q                       that changes to feature values result in a greater decrease in
of 10−2 , around 300 dimensions is required. This roughly                    similarity (reaching 0.0 after 102 for a 100-pair structure)
follows the n and q observed and illustrated in Figure 1 (top).              than structural changes i.e. adding new pairs, which led to
   Feature structures (triples) behave similarly to pairs al-                a more gradual similarity degradation (reaching 0.0 after 105
though dimensions required to encode triples increase.                       for the same structure size). The difference between these
Figure 1 (bottom) illustrates how both feature sets                          two types of changes is important in FCG, where structures
and feature structures scale for various structure sizes                     can be structurally different and still match, while structures
(5;10;50;100;500;1,000 pairs/triples). These results can be                  that for example, differ in feature values, should not. This
directly translated to FCG. A toy grammar starts at 1-5 units                finding is also in line with previous experiments comparing
per construction with 1-10 feature-value pairs in each. A                    HRR structures using dot product (Plate, 1994), where simi-
more complex grammar can have around 10 units with ap-                       larity was more sensitive to the content of feature-value pairs
proximately the same number of pairs in each unit. Repre-                    rather than the quanitiy of pairs.
sented as triples, such structures can be encoded in vectors
of around 6,000 dimensions. Really large grammars of 30
units and 30 feature-values pairs in each unit require roughly
100,000 dimensions.


                                                                             Figure 2: Top: Comparison of similarity values for changes in
                                                                             structure vs changes in bindings for a structure of 10,000 pairs.
                                                                             Bottom: Comparison of similarity as structures of different original
                                                                             length are extended.

Figure 1: Top: The effects of dimensionality on precision scores                When adding or removing new pairs, similarity (Equation
in feature sets and structures of various length. Bottom: Scaling of
sets and structures (vector dimensionality at which precision scores         7) is affected as illustrated in Figure 3. Initially, both struc-
become 1.0).                                                                 tures contained 1,000 pairs; the second structure was subse-
                                                                             quently changed by an order of magnitude at a time. Thus the
   Matching We numerically investigated if HRR representa-                   first structure gradually became a subset of the second. The
tions can be used to implement the FCG match operation in                    graph illustrates that as the structure becomes extended with
two phases: We investigated changes in sim(X,Y ) under var-                  new pairs, similarity of the two structures begins to drop, de-


                                                                       564
spite the fact that the structures share the initial 1,000 pairs.                             Acknowledgments
However, this degradation is very gradual, and similarities                Research reported in this paper was funded by the Marie
reach 0.0 only after 105 new pairs have been added. Further-               Curie ESSENCE ITN and carried out at the AI lab, Vrije Uni-
more, it can be seen that for larger structures such degradation           versiteit Brussel and the Institut de Biologia Evolutiva (UPF-
is more gradual than for smaller ones (see Figure 2 bottom).               CSIC), Barcelona, financed by the FET OPEN Insight Project
For example, for 10-pair structures similarity is almost 0.0               and the Marie Curie Integration Grant EVOLAN. We are in-
after 103 new pairs have been added. This is still, however,               debted to comments from Johan Loeckx and Emilia Garcia
fairly gradual, considering that a 10-pair structure had 1,000             Casademont.
pairs added to it before becoming dissimilar to the original.
The drop in sim(X,Y ) appears to be asymmetrical: removing                                        References
pairs gives lower similarity than adding the same number of                Eliasmith, C. (2013). How to build a brain: A neural ar-
pairs. This is expected since removing pairs results in less                 chitecture for biological cognition. New York, NY: Oxford
shared information between the structures than adding pairs.                 University Press.
   These findings are good and bad news at the same time.                  Gayler, R. W., Levy, S. D., & Bod, R. (2010). Explanatory
On the one hand it is good that feature value changes have                   aspirations and the scandal of cognitive neuroscience. In
a more drastic effect on sim(X,Y ) than structure changes.                   Proceedings of the first annual meeting of the bica society
On the other hand, the system will have to be able to au-                    (pp. 42–51). Amsterdam: IOS Press.
tonomously find out whether two HRR vectors represent fea-                 Goldberg, A. (1995). Constructions: A construction gram-
ture sets which differ structurally or in terms of feature values.           mar approach to argument structure. Chicago: University
Possibly this distinction can be learnt. But finding a solution              of Chicago Press.
that is invariant to HRR vector dimension size and feature set             Hinton, G. (1990). Mapping part-whole hierarchies into con-
cardinality is likely not an easy task. Another problem is the               nectionist networks. Artificial Intelligence, 46, 47–75.
commutative nature of sim(X,Y ) which essentially does not                 Kanerva, P. (1997). Fully distributed representation. In Proc.
allow to determine which is the more inclusive feature set.                  real world computing symposium (pp. 358–365). Tsukuba-
                                                                             city, Japan: Real World Computing Partnership.
                                                                           Neumann, J. (2001). Holistic processing of hierarchical
                                                                             structures in connectionist networks. Doctoral dissertation,
                                                                             School of Informatics,The University of Edinburgh UK.
                                                                           Neumann, J. (2002). Learning the systematic transformation
                                                                             of holographic reduced representations. Cognitive Systems
                                                                             Research, 3, 227–235.
                                                                           Newell, A., & Simon, H. A. (1976). Computer science as
                                                                             empirical inquiry: Symbols and search. Communications
                                                                             of the ACM, 19, 113–126.
                                                                           Ohlsson, S., & Langley, P. (1985). Identifying solution paths
                                                                             in cognitive diagnosis (Tech. Rep. No. CMU-RI-TR-85-2).
                                                                             Pittsburgh, PA: Carnegie Mellon University, The Robotics
                                                                             Institute.
                                                                           Plate, T. A. (1994). Distributed representations and nested
Figure 3: Gradual changes to a feature set of 1,000 feature-
                                                                             compositional structure. Doctoral dissertation, Graduate
value pairs and their effects on similarity values.
                                                                             Department of Computer Science, University of Toronto,.
                                                                           Shastri, L., & Ajjanagadde, V. (1993). From simple associa-
                        Conclusions                                          tions to systematic reasoning: A connectionist representa-
                                                                             tion of rules, variables, and dynamic bindings. Behavioral
This paper has speculated how a linguistically and computa-
                                                                             and Brain Sciences, 16, 417?494.
tionally adequate formalism for language, namely Fluid Con-
                                                                           Smolensky, P. (1990). Tensor product variable binding and
struction Grammar, could be represented in a Vector Sym-
                                                                             the representation of symbolic structures in connectionist
bolic Architecture, more specifically Holographic Reduced
                                                                             systems. Artificial Intelligence, 46, 159–216.
Representations, as a step towards a neural implementation.
                                                                           Steels, L. (Ed.). (2011). Design patterns in Fluid Construc-
We proposed a number of steps and reported some prelim-
                                                                             tion Grammar. Amsterdam: John Benjamins.
inary implementation experiments with promising results.
                                                                           Steels, L. (Ed.). (2012). Experiments in cultural language
The main conclusion so far is that a number of fundamen-
                                                                             evolution. Amsterdam: John Benjamins.
tal issues remain to be solved to make the FCG→VGA map-
                                                                           Steels, L., & Hild, M. (Eds.). (2012). Language grounding in
ping fully operational, particularly the issue of implementing
                                                                             robots. New York: Springer-Verlag.
a matching operation that uses a binding list and possibly ex-
tends it while matching takes place.


                                                                     565

</pre>