=Paper=
{{Paper
|id=Vol-1419/paper0039
|storemode=property
|title=Usage-based Grammar Learning as Insight Problem Solving
|pdfUrl=https://ceur-ws.org/Vol-1419/paper0039.pdf
|volume=Vol-1419
|dblpUrl=https://dblp.org/rec/conf/eapcogsci/CasademontS15
}}
==Usage-based Grammar Learning as Insight Problem Solving==
Usage-based Grammar Learning as Insight Problem Solving Emilia Garcia-Casademont2 Luc Steels1,2 1 ICREA 2 Institut de Biologia Evolutiva (UPF-CSIC) Dr. Aiguader 88, Barcelona 08003, Spain Abstract success of construction grammar in empirical linguistics, studies of child language, historical linguistics and other ar- We report on computational experiments in which a learning agent incrementally acquires grammar from a tutoring agent eas of language studies - even though there is some excep- through situated embodied interactions. The learner is able tional work, such as by Nancy Chang (Chang, 2008). This to detect impasses in routine language processing, such as computational gap is undoubtly due to the fact that the field missing a grammatical construction to integrate a word in the rest of the sentence structure, to move to a meta-level to repair of computational construction grammar is still in its infancy these impasses, primarily based on semantics, and to then and not enough computational research has been done so far expand or restructure his grammar using insights gained from on possible learning strategies. repairs. The paper proposes a cognitive architecture able to support this kind of insight learning and tests it on a grammar Nevertheless computer simulations are a valuable comple- learning task. ment to the empirical work on language learning. They would help us a great deal to get a much clearer idea of what con- Keywords: Usage-based language learning; insight problem structions look like from a computational point of view and solving; Fluid Construction Grammar. how they are acquired. Computer simulations allow us to test the role of different cognitive mechanisms for grammar learn- What is usage-based language learning ing by performing experiments with and without these mech- Many researchers have argued for a usage-based approach anisms and by varying the learning challenges and the nature (Langacker, 2000) to language learning based on empirical and amount of the input the learner gets. data from child language acquisition (Tomasello, 1992). This approach emphasizes that learning is essentially data-driven Insight Problem Solving and gradual (Bybee, 2006). It takes place in embodied situ- Today, most machine-learning of language uses a form ated interactions in which not only the utterance and its syn- of Bayesian unsupervised grammar learning operating over tactic properties (e.gṫhe ordering of the words) but also the large amounts of data (Bod & Scha, 2003). Because these underlying meanings and communicative context are avail- models are data-driven, they subscribe to one of the key tenets able to the learner. of a usage-based approach to grammar learning and they have The usage-based approach goes hand in hand with a con- therefore already been used for simulating child grammar structional perspective on grammar (Goldberg, 2006), which acquisition (Bannard, Lieven, & Tomasello, 2009). But al- emphasizes that language structure is motivated by usage, though this approach is obviously relevant, we explore here a instead of innate, arbitrary, formal principles of universal complementary learning method which views grammar learn- grammar. Grammar learners have to discover what role the ing as a form of insight problem solving. grammatical markers and syntactic patterns play in express- Insight problem solving is familiar from the psychological ing meaning and achieving communicative functions, and literature on problem solving (Ohlsson, 1984). It makes a dis- how syntax helps to dampen combinatorial explosions and tinction between two modes: routine and meta level problem avoid ambiguity. Construction grammar therefore argues that solving. For routine problem solving, the problem solver uses the fundamental unit of grammar is the construction, a sign a set of micro-operations that either provide a direct solution that relates meaning or function with form with the inter- or are a step towards a solution. The main challenge for the mediary of syntactic and semantic categorizations. A con- problem solver is to find a path between the initial problem struction is very rich, packaging constraints at different lay- state and the goal in a search space of possible hypotheses. ers of language (from phonetics and phonology to morphol- Meta-level problem solving is necessary when the prob- ogy, syntax, semantics and pragmatics) and from many dif- lem solver reaches an impasse. The given statement of the ferent perspectives (phrase structure, functional structure, ar- problem (as understood by the problem solver), the represen- gument structure, temporal structure, information structure, tations being used, and the known operators do not allow a etc.). Constructions are acquired in a gradual data-driven way straightforward solution. Meta-level operators may then un- with learners creating progressively a huge inventory of more block the impasse. For example, they may reinterpret the sophisticated constructions linked in networks (Tomasello, problem statement by relaxing some of its constraints (as 1992). needed for solving the 9-dot problem), they may change inter- Computational simulations of usage-based constructional nal representations inferred from the problem (as in the An- learning are rare, despite widespread support for a usage- thony and Cleopatra problem (Patrick & Ahmed, 2014)), or based approach in cognitive science circles and a growing possibly introduce new operators. One very common meta- 258 operator, studied in particular by Koehler in his classic insight But that is not the end of the story. After solving a problem learning with chimpanzees, is to coerce objects to have func- using meta-operators, a learner should then try to expand his tions that they normally do not have (Koehler, 1956). For available repertoire of routine operators (i.e. constructions) example, to view a shoe as a hammer so that it can play the to capture the insight gained from dealing with the impasse role of the instrument in a hitting action. to ensure that in the future the impasse does not occur any- In the case of language, the micro-operations for routine more. When that is the case, we talk about insight learning problem solving constitute the application of constructions to (Koehler, 1956). It is based on a set of learning-operators that expand a transient structure (which captures a particular hy- make small changes to the grammar, for example, add a new pothesis for comprehending or producing an utterance) to a lexical category to the lexical construction of a word or make more extended transient structure. For example, a construc- a construction more general by relaxing some constraints on tion might combine an article and a noun into a noun phrase. one of its slots. Although some exploration and backtracking may be nec- essary to know which construction needs to be applied, the problem solver is in principle able to solve the problem, i.e. to reconstruct the meaning of an utterance as listener or to pro- duce an utterance expressing the target meaning as speaker. An impasse here means, for example, that the hearer en- counters an unknown word, a particular word does not fit within the syntactic pattern implied by its immediate context, an ordering constraint is violated, no structure can be found that integrates all words in the utterance, or there is syntactic and semantic ambiguity implying that some grammatical sig- nals preventing ambiguity have not been picked up properly. These impasses are frequent when learning a new language. The speaker may also reach an impasse because, for example, Figure 2: Example of interaction between two agents, one acting as he may lack a word to express a particular meaning, a word tutor and the other as learner. A human experimenter creates scenes he would like to use does not fit with a construction already about which the agents can converse. The robots are equiped with chosen, the available constructions may leave significant am- sophisticated components for perception, action, world modeling, and script following in order to play a language game. Grammar biguity, etc. learning and tutoring takes place within the context of such games. Meta problem solving for language involves a set of meta- operators and a cognitive architecture as in Figure 1. For ex- The rest of the paper reports on experiments attempting to ample, the listener could try to handle a sentence which he computationally implement insight language learning. Our cannot parse by ignoring possible errors in it (e.g. the wrong goal is not to handle the full complexity of human language, word order). Or he may handle the presence of an unknown that would not be possible at this point and the results would word by inferring from the syntactic context, the situation be very hard to analyse, but rather, to create a minimal model and the ontology, what a possible meaning might be. model in order to test the feasibility of the approach and The speaker could coerce a word to have an additional lexi- examine the first concrete examples of meta-operators and cal category so that it fits, as in “He googled him” where the learning-operators. We therefore use a minimal representa- noun ”google” has been coerced into a verb. tion of meaning and focus on a minimal form of syntactic structure, namely phrase structure. Meta-operator Meta-level The experiments are intended for the MYON robotic plat- Problem form shown in Figure 2 (Steels & Hild, 2012). There are Solving Diagnostic Repair two humanoid robots, one acting as tutor and the other one as learner. The situation is manipulated by an experimenter Transient Transient Transient Transient which puts objects on the table (e.gȧ small piece of paper, a structure structure structure structure State 1 State 2 State 3 State 4 Routine wooden block, etc.) and performs actions with them. The Problem robots are able to perceive the situation before them, segment Solving Construction Construction the objects and detect relations, such as spatial relations or application application movements. They can also point to objects and gesture suc- cess or failure in the language game. The relations are com- Figure 1: Cognitive architecture supporting insight language learn- positional (as in ”a black round wooden block”) with unlim- ing. There is a layer of routine problem solving, diagnostics con- ited recursion (as in ”a small paper which is on a small paper tinuously monitoring activity at this layer in order to discover an which is on the table”). The situation model can also include impasse, and meta-level operators try to repair the impasse, for ex- ample by coercing a word to fit with an available grammatical con- mental states (e.g. ”Jorge wants the wooden block (to be) on struction. the table”). 259 The next section first describes routine language process- (away-arg-1 ?r-2 ?o-3) ; the object that is moving ing. Then we turn to meta-level problem solving, dis- (away-arg-2 ?r-2 ?o-1) ; away from cussing syntactic and semantic expansions, and to learning (size small ?o-3) ; the moving object is small operators. The paper concludes with some experimental re- (material paper ?o-3) ; and its material is paper sults. There is an on-line web demo and additional material (physobj table ?o-1) ; the movement is away from a table available at www.biologiaevolutiva.org/lsteels/demos/EAP- (material wood ?o-1) ; which is made of wood garcia-steels-2015/ The different linkages between the predications through ’Routine’ Language Processing their arguments equalities may be represented graphically as Meaning representation a semantic network (See Figure 3). One of the objects in this network is the topic of the utterance, for example, the event The complex visual processing and conceptualization pro- itself (i.e. ?r-2), or the object which is moving (i.e. ?o-3) cesses required for these experiments have been described at as in the utterance “the small paper moving-away from the length in (Spranger, Loetzsch, & Steels, 2012). The situa- wooden table”. tion models of speaker and hearer are not identical because their perception is different, but in general they are suffi- ciently shared that communication is possible, otherwise the language game fails. For the purposes of the experiments re- ported here, we have defined a ’situation model generator’ that generates possible situation models in the same format as obtained by visual processing. For the representation of the situation model we use well- known standard techniques from (typed second order) logic. The situation model is internally represented in terms of n- ary predicates, which have objects perceived in the world as arguments. The objects are labeled obj-1, obj-2, etc. The ob- jects can be bound to variables which are written down with a question mark in front, as in ?obj-1, ?obj-2, etc. Each pred- icate consists of an attribute, that acts also as a type, and a Figure 3: Semantic network representing the meaning of an ut- value. terance. Each node is a predication and the links represent co- Unary predicates are written down as: referential relations between the arguments. (attribute value object-to-which-predicate-applies) as in To simplify the experiments, we use utterances with only (color red obj-1) or (material plastic ?obj-6) content-carrying lexical items and only word order and phrase In the first example, the color red is predicated about obj-1, structure as the means for expressing syntactic structure, ig- i.e. obj-1 is red in the current situation model. In the second noring other syntactic devices like morphology, grammatical example, the material property plastic is predicated about a function words, agreement, etc. Thus, the utterance “a small variable object ?obj-6. It will be true if there is an object in paper moves away from a wooden table” would be rendered the world which has the material property plastic. as “small paper moves-away-from wooden table”. There is N-ary predicates are decomposed into a number of separate of course no necessity to use English-like words, that is only predicates. One for the predicate itself and then predicates done to make the utterances understandable for the reader, for each of the arguments. For example, suppose there is a and there is no reason why the grammar will be English-like, predicate of type moving (which is for all types of movement) except that English also makes heavy use of phrase structure. with a possible value away, then there are two predicates for its arguments, as illustrated in the following example. Grammar representation (moving away ?r-2) ; the main predicate We use Fluid Construction Grammar (FCG) for the represen- (away-arg-1 ?r-2 ?o-3) ; the object that is moving tation of the grammar (Steels, 2011). FCG views language (away-arg-2 ?r-2 ?o-1) ; the object being moved away from. processing in terms of operations over transient structures. A ?r-2 gets bound to the event of moving, ?o-3 to the object that transient structure captures all that is known about a particu- is moving and ?o-1 to the object ?o-3 moves away from. lar utterance being parsed or produced. In routine language Different predications can be combined into conjunctions processing, transient structures are expanded by the applica- and they are linked together by reusing the same variable- tion of constructions in a process of matching (to see whether names or object-names. For example, the utterance ”a the construction can apply) and merging (to add information small paper moves away from a wooden table” would be from the construction to the transient structure). represented as FCG represents transient structures in terms of feature (moving away ?r-2) ; the main predicate structures, similar to many other computational formalisms 260 in use today, such as HPSG. A feature structure consists of a a simplified example of a grammatical construction: set of units which correspond to words and phrases, and fea- ?np-unit tures associated with these units. The features can be at any constituents: level of linguistic description. For the present experiments, {?word-unit-1, ?word-unit-2} features of a unit include: meaning (the set of predications), sem-cat: material ← syn-cat: noun-phrase referent (the object referred to by the unit), form (the strings head: word-unit-2 and ordering relations), args (the arguments of these predica- referent: ?obj tions which can be used to combine this unit with the mean- ?word-unit-1 ?word-unit-2 ing of other units), sem-cat (the semantic categorizations), sem-cat: size sem-cat: material boundaries (the left and right boundaries), sem-subunits and referent: ?obj ≤ referent: ?obj syn-subunits (the constituents), potential-syn-cat (the poten- potential-syn-cat: potential-syn-cat: tial lexical categories (parts of speech)), syn-cat (the actual {adjective} {noun} lexical category or the phrasal category), head (the subunit acting as the head of the phrase), footprints (constructions that changed this unit). Meta-level processing A construction is an association between meaning and The consecutive application of constructions expands the function (the semantic pole) and form constraints (the syntac- transient structure to go from meaning to form or vice versa. tic pole). Lexical constructions associate one or more pred- But occasions will arise when no construction can be applied, icates and a semantic categorization of the predicates (equal particularly in language learning. The listener then moves to to the attribute (i.e. type) of the predicate) in the semantic the meta-level to repair the situation and then consolidate the pole with the occurrence of a word-string and lexical cate- outcome of the repair. gories (i.e. parts of speech) of that word in the syntactic pole. Grammatical constructions, in this case restricted to phrase Syntactic meta-operators structure constructions, associate a pattern of units with se- The listener can try to find a construction which is only par- mantic constraints in the semantic pole with syntactic cate- tially matching, and either coerce words to fit into that con- gories (lexical or phrasal categories) and word orders in the struction (syntactic coercion) or expand the applicability of syntactic pole. Each construction has a score (between 0.0 the construction (extension). More specifically, and 1.0) which reflects the success of that construction in past + Coercion: A construction is found that is semantically language interactions. compatible but one or more words do not have the appro- Constructions in FCG consist of two parts. A conditional priate lexical category (as in the example of “googled” where part (written on the right hand side) which specifies under a noun occurs in a context where a verb is expected). The which conditions a construction can become active and a con- meta-operator then adds the lexical category to the word-unit tributing part (written on the left hand side) which specifies in the transient structure and the construction then applies. what the construction contributes to the transient structure. + Extension: A construction is found for which all the com- Constructions must be usable both in comprehension and pro- ponents match but there is another word ordering. In this duction. So the conditional part is split into two ’locks’. A case, the ordering constraint can be relaxed and the construc- production lock (on top) which has to match with the transient tion applied. structure in production and a comprehension lock (below it) + Reduction: All components of an existing construction which has to match in comprehension. When a lock fits with could be matching with the transient structure but there is a a transient structure all the information of the construction superfluous unit and no matching construction without this (the other lock and the contributing part) gets merged in. The unit. This superfluous unit can be ignored and the construc- fact that constructions in FCG can be used both for compre- tion as well as possible applied. hension and production is crucial for insight learning because Semantic meta-operators once the learner has been able to deduce the meaning (pos- sibly indirectly), he can produce the same meaning himself When no partially matching constructions could be found, it with his own inventory and thus compare it to the utterance may be possible to use the situation model and combine units produced by the speaker. for words or phrases based on semantics, specifically: A lexical construction for the word “paper” is: + Build-or-extend-group: If two words or word groups ex- pressing unary predicates refer to the same object, they can ?word ?word be combined. For example, if there is a group for wooden sem-cat: material # meaning: potential-syn-cat: ← {(material paper ?obj)} table (based on an existing construction) and the utterance is small wooden table, the word-unit for small can be linked in {adj, noun} # form: referent: ?obj {(string ?word paper)} with the group-unit for wooden table. The group-unit retains the same referent. “Paper” has two potential lexical categories: adjective (as + Build-hierarchy When a relational word is encountered, in “a paper basket”) and noun (as in “a small paper”). Here is i.e. a word which introduces a predicate with more than 261 one argument, such as moves-away-from, and no construc- a stream of utterances each describing a particular topic (ob- tions are available to integrate it in the transient structure, the ject or event) in a scene. Some example utterances are “”Paul meta-operator looks in the world-model to detect which ob- sees (the) red carpet (that) Emilia wants”, or “Paul believes ject plays which role and then combines the units for these (that) Emilia wants (the) carpet on (the) big wooden table.”. objects into a new hierarchical unit. The meta-operator also The learner is initialized with the same lexicon (because we needs to decide which of the arguments is going to be the ref- focus here on the acquisition of grammar). He is endowed erent. In principle, it could be any argument (e.g. the event of with the various operators described above, but without any moving, the mover, or the object being moved away from). In grammatical constructions or syntactic categories (parts of practice, the referent is determined by the semantic network speech and phrases). that is expressed. For example, in the sentence ”the ball on In a concrete interaction (following the classical Naming the table”, the referent of the unit based on the relational word Game), tutor and learner share the same situation model (rep- ”on” is the ball, whereas in ”He wants the ball (to be) on the resented as a semantic network as in Figure 2). The tutor then table” the referent is the on-relation itself. chooses a topic (one object to be referred to) and produces a description of this topic using his lexicon and grammar. Then Consolidation the learner tries to parse the utterance and interpret the ex- When an utterance could be successfully parsed after one tracted meaning against the situation model. The interaction or more repairs, the learner activates consolidation operators is successful when the learner was able to identify the topic. that integrate the insights that were obtained into his construc- This may involve both routine and meta-level processing fol- tion inventory. In some cases this is straightforward, for ex- lowed by learning. Each experiment is carried out for 5 dif- ample, coercion can be consolidated by adding the relevant ferent tutor-learner pairs, using random choices from a set 20 lexical category to the potential lexical categories of a word. situations, so that results can be compared. In other cases, more complex manipulations are required. Ex- isting constructions are first copied and then additions and Experiment 1. Lexical categories given changes made. The first experiment tests out the learning operators assuming that the learner already knows the lexical categories, i.e. parts Alignment of speech, of the words in the lexicon. Moreover grammatical The meta-operators and learning-operators are hypotheses constructions are limited to those having only one semantic made by the learner about the language of the speaker and category and one syntactic category per subunit. The task mistakes are unavoidable. The learner therefore needs an ad- is to learn the grammatical constructions. Figure 4 shows ditional mechanism to progressively discard wrong hypothe- the results, measuring communicative success, inventory size ses based on further input. We have achieved this by updating (the number of constructions of the learner) and alignment the score of constructions using the well known lateral inhi- (how often the listener’s reproduction of the meaning of an bition learning rule. Knowing which constructions ci need utterance is equal to the utterance of the speaker). an increased score is easy: they are the constructions that ������������������� were used on the path towards the final transient structure. �� ��� We use the following update rule: σci ← σci (1 − γ) + γ, with ��� γ = 0.1 a constant. What competing constructions c j need to ���� ��� ����������������������� be decreased? First of all, all constructions that started off ���� ��� a wrong branch in the search space during comprehension, i.e. a branch which is not on the path to the final solution. ���� ��� Next, the listener produces himself the utterance based on the ��� ���� meaning deduced from the comprehension process and then �� finds all constructions that start off a wrong branch while pro- �� �� �� ���� ���� ���� ���� ���� ���� ���� ���� ducing. Their scores need to be decreased as well. We use the ������������ ��������� ����������������������� ��������������������� following update rule: σc j ← σc j (1 − γ). Results Figure 4: Grammar learning with lexical categories known. 800 language games are shown (x-axis). The learner reaches the same We now report on two (of many more) experiments exercis- number of grammatical constructions (namely 30 on right y-axis) ing the cognitive architecture in Figure 1 and the meta- and and a total alignment (left y-axis), demonstrating that he success- learning-operators from the previous section. We have imple- fully acquired the grammar. The shading around the lines represent the confidence interval of 5 runs. mented a tutor which is initialized with a lexicon of 40 lex- ical constructions and a grammar with 30 grammatical con- structions. The tutor grammar includes adverbs, adjectives, Experiment 2. Multiple semantic categories nouns, verbs, prepositions, pronouns and relative pronouns The second experiment assumes that the learner does not as well as noun phrases of different levels of complexity, verb know any a priori lexical categories. This obviously makes phrases, main clauses and relative clauses. The tutor produces the learning problem harder. Also, grammatical constructions 262 can have more than one semantic category per slot, which Acknowledgments means that constructions can be more general. They still The research reported here has been funded by an ICREA Re- have only one syntactic category for slots, whereas individ- search fellowship to LS and a Marie Curie Integration Grant ual words can have several lexical categories (syncretism). EVOLAN. The project has received further funding from the Because constructions can be more general, we end up with a European Union’s Seventh Framework Programme for re- smaller inventory. search, technological development and demonstration under Results for inventory size and alignment are shown in Fig- grant agreement no 308943, which is the FET-OPEN Insight ure 5. The graph also shows syntactic ambiguity (the num- project. We are indebted to Paul Van Eecke for comments. ber of superfluous branches when applying constructions plus the number of possible combinations in the semantic meta- References operators divided by the number of words in the utterance), Bannard, C., Lieven, E., & Tomasello, M. (2009). Modeling and semantic ambiguity (the number of situations considered children’s early grammatical knowledge. PNAS, 106(41), but cut off in erroneous branches divided by the number of 17284-9. variables introduced by all words). Thanks to grammar both Bod, R., & Scha, R. (2003). Data-oriented parsing. Stanford, types of ambiguity get drastically reduced, which proves that Palo Alto: Center for the Study of Language and Informa- an important function of grammar is to dampen combinatorial tion. search. Bybee, J. (2006). From usage to grammar: the mind’s re- sponse to repetition. Language, 82(4), 711–733. �� ������������������� ��� Chang, N. (2008). Constructing grammar: A computational model of the emergence of early constructions. Berkeley: ���� ��� Computer Science Division, University of Berkeley. ����������������������� ���� ��� Goldberg, A. E. (2006). Constructions at work, the nature of generalization in language. Great Clarendon Street, Ox- ���� ��� ford: Oxford University Press. ���� �� Koehler, W. (1956). The mentality of apes. New York: Rout- ledge and Kegan Paul. �� �� ���� ���� ���� ���� ���� ���� ���� �� ���� Langacker, R. W. (2000). A dynamic usage-based model. In ������������ ��������� ��������������������� ������������������� M. Barlow & S. Kemmer (Eds.), Usage-based models of ����������������������� ������������������ language (pp. 1–63). Ohlsson, S. (1984). Restructuring revisited: Ii. an informa- Figure 5: Grammar learning without lexical categories known. The tion processing theory of restructuring and insight. Scandi- learner reaches fewer grammatical constructions (25) and a total alignment, although this takes longer compared to Figure 4. navian Journal of Psychology, 25, 117-129. Patrick, J., & Ahmed, A. (2014). Facilitating representation change in insight problems through training. Journal of Ex- perimental Psychology. Learning Memory and Cognition, Conclusion 40(2), 532-543. The paper explored the hypothesis that usage-based gram- Spranger, M., Loetzsch, M., & Steels, L. (2012). A Percep- mar learning can be modeled as an insight problem solving tual System for Language Game Experiments. In L. Steels process. The computational simulations showed that this is & M. Hild (Eds.), Language Grounding in Robots (pp. 89– possible requiring (i) a strong semantic component to make 110). Springer. meaning-driven learning possible, (ii) a meta-level architec- Steels, L. (Ed.). (2011). Design patterns in Fluid Construc- ture similar to SOAR or ACT-R and (iii) appropriate learning tion Grammar. Amsterdam: John Benjamins. and alignment operators. Steels, L., & Hild, M. (Eds.). (2012). Language grounding in robots. New York: Springer. Of course, the model is much simpler compared to the Tomasello, M. (1992). First verbs: A case study of early full complexity of the resources that human language learners grammatical development. Cambridge: Cambridge Uni- bring to bear on language acquisition. However, the advan- versity Press. tage of a simpler system is that the minimal characteristics of a language learner become more starkly visible. More- over this method allows us to causally examine the impact of operators, and thus supports empirical research into which operators are available to human language learners. The sim- ulations have already been scaled up to sentences with much higher complexity, including full recursion, and will be tested against corpora of child-directed speech, such as CHILDES, in the future. 263