Body-Mind-Language: Multilingual Knowledge Extraction Based on Embodied Cognition Dagmar Gromann1 and Maria M. Hedblom2 1 Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain 2 Free University of Bozen-Bolzano, Bozen-Bolzano, Italy Abstract Cognitive linguistics has provided compelling evidence that semantic structure in natural language reflects conceptual structure that arises from our embodied experience in the world. To capture this conceptual structure, a set of spatio-temporal cognitive building blocks called image schemas was introduced by Lakoff and Johnson. Detecting image schemas in natural language can provide further insights into how embodied experiences are encoded in natural language and potentially contribute to research on conceptual understanding and symbol grounding in cognitive systems. Methods for (semi-)automatically extracting im- age schemas from natural language are an open challenge. We propose a spectral clustering approach paired with semantic role labeling to semi-automatically ex- tract image schemas from multilingual text, obtaining a precision of more than 80% on three languages. 1 Introduction Embodied agents and cognitive systems need to learn spatio-temporal knowledge in order to be able to interact with the world. Cognitive linguistics and psychology have introduced convincing evidence that spatio-temporal relations are highly frequent in natural language [17,37,39]. This is why language represents an important source for embodied agents to learn spatial relations [1], e.g. for human-robot interaction [16,27], making agents learn like a baby [10], and basically any mapping between symbols and objects in the physical world [19]. However, the symbol grounding problem of how signs are assigned with meaning, relate to real-world objects, and are cognitively represented remains open. While human cognition still is a mystery on many levels the latest paradigm shift in the view of cognition, embodied cognition, provides some new promising insights. Em- bodied cognition states that all cognition occurs as a consequence of the body’s sensori- motor experiences with the environment [34]. Image schemas are introduced within the context of embodied cognition and described as spatio-temporal relations that infants learn in the early years through repeated exposure to particular events. For example, the image schema C ONTAINMENT is learnt when the child is repeatedly exposed to objects moving in and out of containers [24]. Extracting image schemas from natural language can provide a principled way to investigate the connection of thought and language and gain new insights into the cog- nitive grounding of natural language. To the best of our knowledge, automatically de- tecting image schemas in natural language is an open challenge. Until recently, many linguistically-relevant studies on image schemas have focused on the lexical surface structure of expressions [3,8] or provided manually curated examples [6,12] to support their claims. In this paper, we address this challenge by proposing a semi-automated method to detect image schemas in three languages based on unsupervised spectral clustering paired with semantic role labeling, extending our previous work on methods in English [9]. The method builds on findings from research on spatial language (e.g [17,39,42]) in which prepositions were utilized as spatial indicators and verbs as indicators for motion and temporal change. In the proposed method verb-preposition pairs are clus- tered with co-occurring nouns as features and their relative frequencies as feature val- ues. For instance, for “continue-along” the feature vector is [‘route’: 13, ‘road’: 37, ‘path’: 94, ‘lines’: 53]. Verb-prepositions pairs are clustered based on their feature vectors. The above example is grouped in the cluster [’continue-along’, ’continuing- along’, ’progress-along’, ’set out-on’, ’set-on’]. We use existing automated tools for semantic role labeling to separate clusters into spatial or non-spatial based on the main prepositions senses, e.g. DIRECTION for the majority of prepositions in the above cluster which we consider spatial. All clusters are then manually annotated with image schemas, e.g. S OURCE _PATH _G OAL in the above example. The outcome is a reposi- tory of verb-preposition clusters with their feature nouns and their original sentences in English, German, and Swedish annotated with role labels and identified image schemas that carries the potential of improving spatial language understanding and its cognitive grounding. The remainder of this paper is structured as follows. First, we introduce image schemas and some related approaches on extracting spatial information from text. Sec- ond, we describe the utilized dataset and the proposed method. Third, the results of the method are presented as well as a summary of the identified image schemas across the three languages. Finally, we discuss the results and in the conclusion we include a few remarks on future work. 2 Foundation As cognitive research started to focus on the sensorimotor experiences as the foundation of cognition, rather than the classical cognitivist view (i.e. ‘cognition is computation’) that previously dominated cognitive science [34], image schemas were introduced by Lakoff [20] and Johnson [13] as a theory to explain certain cognitive phenomena, in particular the conceptualisation of concepts in terms of spatial language. While image schemas evolve from concrete sensorimotor experiences, their mental representation is considered abstract. Psychological support for image schemas comes from how they offer infants conceptual grounds to make predictions about their surroundings [24,7]. Indeed, work in linguistics (e.g. [6]) and psychology (e.g. [24]) reveal image-schematic involvement in reasoning and language development. In developmental psychology, the image schema demonstrates how key concepts are transferred through analogical reasoning and conceptual metaphors [21]. For example, if an infant has learned that ‘tables S UPPORT plates’, it can infer that ‘desks S UPPORT books’. It is proposed that a similar method is applied when language is developed, in particular when abstract concepts are concerned. Statements such as “to offer S UPPORT to a friend in need” or “to put in a good word” provide some good examples of how concrete sensorimotor experiences are transferred to abstract adult communication. Pauwels [30] went so far as to claim that any abstract use of the word “put” requires the understanding of C ONTAINMENT stressing the importance of verbs in image schema analyses. One semi-automated method designed to extract spatial relations between a trajector and a landmark was introduced by Kordjamshidi et al. [17]. Their method uses machine learning on word triples by connecting ‘the trajector’ and ‘the landmark’ through a preposition, or as they call it ‘a spatial indicator’. Prepositions have been demonstrated to be essential in terms of revealing spatial information [22], yet they do not always capture motion and temporal change. For this also verbs need to be taken into account as they play a central role in identifying the relation between trajectors and landmarks [16], which, however, is not the case in Kordjamshidi et al. [17]. Approaches that have taken verbs into account, such as [25], instead often rely on handcrafted rules or pre- defined spatial expressions to extract motion verbs across languages. Most work on ex- tracting image schemas from natural language has either been done by curating manual examples [6] or by conducting corpus studies based on specific lexico-syntactic patterns [3,8]. As image schemas tap into the core of conceptual metaphors [21,26], their identi- fication in language also opens up new possibilities in not only identifying the concrete, but also the abstract. But for this it is necessary to abstract away from the lexical sur- face structure of expressions, which is not possible with lexico-syntactic patterns. For instance, the expression “to have an empty life” characterizes life as having a feature that can either be full or empty, directly transferred from physical characteristics of C ONTAINERs, which, however, could never be found when querying for specific verbs or nouns. We methodologically benefit from this connection to metaphors by building on clustering approaches for multilingual metaphor detection [36]. 3 Dataset To comprehensively evaluate the occurrences of image schemas in natural language as well as to efficiently perform clustering, a large natural language corpus is needed. Additionally, in order to make the results comparable across different languages, the corpus needs to be parallel. Thus, we decided to use the Europarl corpus [15], which is aligned across several languages, commonly used in linguistic approaches, and with its 1,959,830 sentences in English, large enough for our purposes. The corpus contains sentences extracted from the proceedings of the European Parliament, meaning that the coverage of the corpus is somewhat limited to topics revolving around governance and political issues. 4 Methodology One established exploratory data analysis method is unsupervised clustering [2]. The chosen normalized spectral clustering algorithm proposed by Ng et al. [28] has been effectively applied to various lexical acquisition tasks (e.g. [36,41,38] ). We build on successful methods for conceptual metaphor extraction [36] in combination with find- ings from spatial language analysis (e.g. [39]) for our method. The combination of verbs and prepositions as indicators for spatial and potentially image-schematic structures, is backed by manual corpus-based analyses on image schema detection (e.g. [6]). This section describes the individual steps from text to image schema depicted in Fig. 1. Figure 1. Individual steps of proposed method 4.1 Dependency Parsing Dependency parsing identifies how individual sentential elements depend on each other by first part-of-speech (POS) tagging each element and then analyzing the structure of the whole sequence. We parse the dataset for each language and extract verb-preposition- noun combinations. From each sentence, such as (1), we first extract the preposition (“along”), its dependent noun (“road”) or noun phrases and search through all their de- pendency relations for a verb (“continue”) and potential phrasal particle as depicted in Step (1) of Fig. 1. (1) This is why Turkey must be encouraged to continue along this road. Thereby, we obtain verb-preposition pairs and all the co-occurring nouns with their relative frequencies, which represent the feature vector for our clustering algorithm. Only verb-preposition pairs that occurred at least ten times with the extracted noun where considered for the clustering to avoid a distortion of the clusters by rare words or dependency parsing errors. For English and German we used the Stanford Dependency Parser [5]. However, Swedish is not available in the Stanford tool set so we chose Stagger [29] for POS tagging and the data-driven parser-generator MaltParser [11] for dependency parsing. We use the Swedish MaltParser Model (swemalt-1.7.2.mco) that was trained on the Talbanken section of the Swedish Treebank. 4.2 Spectral Clustering Spectral clustering is particularly attractive since it is reasonably fast and transforms the data clustering into a graph partitioning problem, partitioning based on the values of the edges. It takes a similarity matrix and the number of clusters as input. Computing a similarity matrix depends on the choice of semantic distance measure that is best for the given data. We tested on Term Frequency-Inverse Document Frequency (TF-IDF) as one of the most common similarity measures, Positive Pointwise Mutual Informa- tion (PPMI), and the Jensen-Shannon divergence (JSD), a symmetric and a smoothed version of the Kullback-Leibler that has successfully been used in conceptual metaphor clustering [36]. We evaluated each of those similarity measures by generating simi- larity matrices and submitting them to the algorithm. The results of this process were analyzed by semantic role labeling and success was defined as the best separation of information into spatial and non-spatial clusters. We finally used PPMI for the image schema annotation (detailed in Section 5), which is why we only discuss this metric further. Pointwise mutual information quan- tifies the difference between the probability of two textual units occurring together and the presumed co-occurrence under the independence condition. With x representing the frequency of a verb-preposition pair and y representing a noun it co-occurs with, we use Positive Pointwise Mutual Information (PPMI) as defined in Equation 1 to create a similarity matrix, where P M I(x, y) is set to zero if its value is below zero. p(x, y) P M I(x, y) = log (1) p(x)p(y) ( P M I(x, y), if P M I(x, y) > 0 PPMI = 0, otherwise The similarity matrix captures the semantic distance between verb-preposition pairs based on their co-occurring nouns, which represent the features of the algorithm. We tested on unnormalized, normalized according to Shi and Malik [35] and normalized according to Ng et al. [28] spectral clustering. Algorithm 1 presents the most successful normalized algorithm by Ng et al. [28], where success is defined as balanced cluster sizes and correct separation of spatial and non-spatial information. The major problem with the Shi and Malik normalization [35] for our approach was the strong variation in cluster size, leading at times to clusters of more than 2,000 verb-preposition pairs. In Algorithm 1 the difference between the degree and the weighted adjacency ma- trix leads to the graph Laplacian L. The normalized matrix of eigenvectors of the nor- malized L is then used as input to the k-means algorithm. We followed von Luxburg [40] and tested the -neighborhood, k-nearest neighbors, and a fully connected graph building methods on our dataset. The algorithm outputs the number of clusters that was initially indicated. To optimize this variable, we experimented with different sizes of k detailed in Section 5. Our assumption for this method was that verb-preposition pairs co-occurring with similar nouns/noun phrases might exhibit a similar spatio-temporal behavior. Each cluster groups verb-preposition pairs based on that similarity. Algorithm 1 Normalized Spectral Clustering [28] 1: Input: Similarity matrix S ∈ Rnxn , number of k clusters n P 2: Construct a degree matrix D where dii = wij and dij = 0 if i 6= j j=1 3: Construct a similarity graph and its weighted adjacency matrix W 4: Construct a graph Laplacian L = D − W 5: Compute the normalized Laplacian Lsym := D−1/2 LD−1/2 6: Compute the first k eigenvectors v1 ,...vK of Lsym and write them as columns into the matrix U ∈ Rnxk 7: Compute the matrix T ∈ Rnxk from U by normalizing, i.e., set tij = uij /( k u2ik )1/2 P 8: Let yi be the vector corresponding to the ith row of T 9: Cluster the points (yi )i=1,...,n with the k-means algorithm into clusters C1 ,...,Ck 10: Output: Clusters C1 ,...,Ck with Ci = {j|yj ∈ Ci } 4.3 Semantic Role Labeling From experimenting with similarity matrices (TFIDF, PPMI, JSD), cluster sizes k (50, 100, 200, 300), and clustering algorithms we obtained 3,900 clusters for each lan- guage. This called for an automated method to compare the resulting clusters. We found that semantic role labeling can effectively be used to separate spatial from other verb- preposition pairs for each cluster, thus providing us with a first purity estimation of the individual clusters. Semantic role labeling abstracts away from syntactic variation and assigns labels to arguments of sentence predicates by means of predefined relations. In semantic role labeling, the determiner for spatial information is the preposition sense that is assigned based on the verb and noun the preposition relates to. Thus, we take the verb-preposition pairs of each cluster and query existing tools for the prepo- sition sense using the feature nouns of each pair, e.g. “continue-along-road” is a DI- RECTION. Second, we accumulate all labels obtained for a specific verb-preposition pair, e.g. one of them being the above DIRECTION with the noun “road” for “continue- along”. The most frequent label of those accumulated labels is assigned to the pair. We classified all preposition senses as either spatial or non-spatial. If most verb-preposition pairs in a cluster have spatial labels, we consider the whole cluster spatial such as the one in Fig. 1. As the example shows, we did not perform lemmatization (“continue” and “continuing” are in the cluster) and we considered phrasal verbs, such as “set out” as well as noun phrases. The evaluation of the purity of a cluster is estimated based on verb-preposition role labels and their frequency in a cluster. We differentiate between spatial (>80% spatial labels), mixed (>30% spatial labels), and other (<30% spatial labels) clusters. Semantic role labeling is language-specific, which means we had to find different solutions for each language. For English, we employed a semantic role labeling tool called Curator [31], which follows the notation of the PropBank project [14] and its fine-grained labeling of prepo- sition roles. Curator provided highly accurate as well as detailed semantic role labels for the whole English dataset, including annotations for verb, noun and preposition senses. The preposition sense annotation was a main motivator for choosing Curator, since other tools frequently only tag prepositions as "prepositions" without specifying their sense in detail. We classify all role labels into either spatial or non-spatial, where the former for our case are: Location, StartState, EndState, Source, Destination, Direction, PhysicalSupport, and Journey. For Swedish we relied on the preposition senses provided with the Swedish Tree- bank model of MaltParser, which differentiates between spatial (RA), temporal (TA), and several other types of adverbials, providing a less fine-grained and less accurate, but still viable, estimation of which cluster setting to analyze for image schemas. For German, semantic role labeling is a challenging endeavor since many parsers [4,32] focus on valency-bound complements. We could not find any parser that pro- vided equivalent preposition sense labeling results as for English and Swedish and thus decided to annotate the list of verb-preposition-noun triples extracted from text manu- ally differentiating spatial and non-spatial labels. This manual annotation was then used to evaluate the cluster settings. 4.4 Image Schema Annotation In a final step we analyzed the resulting clusters for their image-schematic content. Our definition of image schemas was based on definitions obtained from Johnson [13], Lakoff [20], and Kövecses [18] as exemplified in Table 1. Table 1. Example of image schemas and definitions Image Schema Description C ONTAINMENT Boundary, enclosed area or volume, or excluded area or volume [13] S OURCE _PATH _G OAL Source or starting point, goal or endpoint, a series of contiguous locations connecting those two, and movement [13,20] S UPPORT Contact between two objects in the vertical domain [23] Each cluster was manually analyzed based on those definitions exemplified in Ta- ble 1 to determine which image schema, if any, was the most dominant in the cluster. For English, two annotators separately assigned image schemas and upon disagreeing, a third annotator took the final decision. For Swedish and German there was only one annotator available for each language, resulting in less reliable results than for English. Consider, for instance, the triple “bring-into-disrepute” that describes a transforma- tion from the state of good, or no, reputation to a negative reputation, ‘disrepute’. De- spite being abstract, there is a clear boundary between the two states and certain events may cause this state to change, in this case an event ‘brings’ about this transformation. It can be argued to correspond to the movement ‘into’ a C ONTAINER. Annotators eval- uated each verb-preposition-noun triple in a cluster to decide whether it represents any image-schematic structure. 5 Results Due to the size of the corpus, we obtain a large collection of potentially image-schematic clusters. In Section 4, we explained how spatially relevant and image-schematic clus- ters are detected from this collection. This section presents quantified results of the best combination of settings and the number of obtained image-schematic structures. De- pendency parsing is taken as given even though some verb-preposition pairs might have been overlooked by the parser and we start with describing the clustering results. 5.1 Clustering Results We obtained 3,900 clusters from three similarity metrics (JSD, PPMI, TFIDF) to build the similarity matrices, three algorithms (unnormalized, two normalized), three graph building methods (knn, , fully-connected), and four cluster input sizes k (50, 100, 200, 300) for each language. A comparison in English is illustrated in Table 2 with normal- ized clustering [28] to exemplify the selection process of the best settings since space does not permit a detailed description for all languages. A total of 92 (31%) were tagged as purely spatial, 49 (16%) as mixed spatial and other labels, and 159 (53%) contained less than 30% spatial labels with knn, PPMI, and k of size 300. This represented the highest number of spatial clusters, which is why we analyzed this combination of set- tings for image schemas. While knn clearly returned the best results for all languages, in German JSD and TF-IDF with normalized clustering by Shi and Malik [35] returned higher numbers than PPMI, especially for size 50 clusters. However, upon manually inspecting the clusters, the clusters turned out to be very large and contain highly mixed information. Thus, we manually inspected ten clusters of each setting combination, which returned the best results for the same settings as for English. In Swedish, the role frequency-based ap- proach equally pointed to other combinations with inferior quality, and also for Swedish the best settings turned out to be the same as for English. Table 2. Comparison of setting combinations for English clustering Algorithm method: PPMI PPMI PPMI JSD JSD JSD Cluster size (100-300) knn fc  knn fc  100 29% 22% 24% 18% 20% 18% 200 21% 22% 22% 6% 23% 19% 300 31% 29% 24% 8% 23% 24% We clustered a total of 2,259 English, 3,234 Swedish, 2,739 German unique verb- preposition pairs based on their feature vectors with an overall frequency above ten oc- currences in the corpus. The average cluster size for English was 10.40 verb-preposition pairs, for Swedish 10.78, and for German 9.13. The first language we clustered was English, were we found that linking devices distorted our results. For instance, for the verb-preposition “make-of” the by far most frequent noun was “course”, an undesirable result based on the expression “of course”. Thus, we excluded linkers for the English clustering data. For Swedish this problem was less prominent, perhaps due to the use of a different tagger and parser, and in the German data this problem was not observed. 5.2 Semantic Role Labeling Results As exemplified in Table 2, semantic role labeling provided the basis for choosing the best parameter settings for the clustering algorithm. The results for the best setting combination knn, PPMI, and k size 300 with normalized clustering by Ng et al. [28] for all languages are presented in Table 3, which also shows the absolute frequency of image-schematic clusters. We analyzed all clusters, spatial and non-spatial, for their image-schematic content for the chosen 300 clusters. In English, the majority of de- tected image schema clusters were also labeled with spatial semantic roles, as is the case for German. In Swedish, however, 34% of all clusters were detected in non-spatial clusters. We attribute this to the lower accuracy of the semantic role labeler, especially since a substantially higher number of clusters (66%) were not assigned with a spatial label. Table 3. Total number of clusters per label and number of image schema (IS) clusters English Swedish German Label Clusters IS Clusters IS Clusters IS Spatial 92 74 64 59 88 80 Mixed 49 18 38 22 18 13 Other 159 18 198 42 194 10 Total 300 110 300 123 300 103 5.3 Image Schema Identification Results Our assumption was that the proposed method groups verb-preposition pairs based on their respective nouns into spatial and potentially image-schematic clusters. Thus, we are interested in how many of the spatial clusters actually are image-schematic. To cal- culate the accuracy of our method, we compare the number of obtained spatial clusters to the number of image-schematic clusters with a spatial label in relation to the total number of image-schematic clusters, the data of which are presented in Table 3 and 4. For English, this provides 74 image schema clusters in 92 spatial clusters with a total of 110 image schema clusters across the whole set of 300, which provides an accuracy of 80.43% (F-measure: 73.27%). For Swedish, the accuracy is 82.81% (F-measure: 56.38%) which also has the largest number of image schema clusters in the set of non- spatial clusters compared to the other languages, hence the low F-measure. For German, the accuracy is 90.91% (F-measure: 83.77%) because most image schema clusters have a spatial role label. This can be attributed to the manual annotation of the semantic roles performed only in German. In English the identification of image-schematic structures was conducted by two experts with an inter-annotator agreement of 78% on the 92 spatial clusters. For the 20 clusters that were not assigned the same image schema by the two experts, a third expert was consulted. For Swedish and German we only had one expert annotate each language for this first experiment. Table 4. Detected image schema clusters Image Schemas English Swedish German Absolute (A) & % A % A % A % C ONTAINMENT 61 55 81 66 57 55 S OURCE _PATH _G OAL 22 20 14 11 27 26 S UPPORT 16 14 21 17 6 6 S URFACE 4 4 0 - 7 7 V ERTICALITY 3 3 3 2 0 - Other 4 4 4 3 6 6 Total Schemas 110 123 103 The most commonly identified image schema in all languages is C ONTAINMENT as illustrated in Table 4. One of the reasons for this is the description provided by Johnson [13] with the ‘inside-border-outside’ relationship that establishes a rather general ref- erence which fits many scenarios. To account for the spatial relationships of the image schemas, as well as their conceptual correspondence, it would provide a more accurate account to divide this image schema into a family (and idea supported by e.g. [33,3,12]). While the members of a C ONTAINMENT family need to be properly established it is clear that spatial relations such as “being on the outside/inside”, “degrees of parthood”, “going in/out” and “going through” are fundamentally different despite all belonging to C ONTAINMENT. To account for these differences in our data, we would have to con- duct a more detailed analysis considering the context for each verb-preposition-noun combination, which could be interesting for further investigations. Swedish has a higher number of C ONTAINMENT since its overall count of verb- preposition pairs is higher than in the other two languages. The second most frequent image schema is S OURCE _PATH _G OAL, such as “fortsätta-på-väg” (continue-along- road), with a lower frequency in Swedish. In contrast, Swedish had more S UPPORT schemas especially in comparison to German, e.g. the German expression “lasten-auf- schultern” (rest-upon-shoulders). ‘Other’ in the image schema column in Table 1 refers to the collected occurrences of the image schema structures N EAR -FAR, S PLITTING, PART-W HOLE, S CALING and C ENTER -P ERIPHERY. Other than for S UPPORT the num- bers and types of image schemas are quite comparable across languages. We would like to point out that the numbers presented in Table 3 and 4 are image schema clusters containing several verb-preposition pairs with several feature nouns, e.g. for English we obtain 2.567 image-schematic verb-preposition-noun triples from the 110 clusters. Table 5 provides one C ONTAINMENT cluster for each language to exemplify the results. While they are not aligned automatically by our method, not identical and vary in size, they have several features in common, such as all of them referring to the C ONTAINER “hands” as in “play-into-hands” or “lie-in-hands”. The resulting repository can then be either represented as those verb-prepositions clusters with the annotated image schema as in Table 5 or in a more detailed way com- bining pairs with their feature nouns as depicted in Table 6. Table 5. Cluster example C ONTAINMENT English: ‘concentrated-in’,‘lie-in’, ‘lies-in’, ‘play-into’, ‘played-into’, ‘playing-into’, ‘plays-into’, ‘sit-on’, ‘suffer-at’, ‘suffered-at’ Swedish: f̀innas-i’, ‘föra-till’, ‘föras-till’, ‘förs-till’, ‘noteras-i’ German: ‘fallen-in’, ‘fällt-in’, ‘fällt-unter’, ‘gelangt-in’, ‘herausgenommen-aus’, ‘verbleiben-in’ Table 6. Repository example with annotations Verb-Prep-Noun Image Schema Role Label Sentence concentrated-in-areas C ONTAINMENT Location ... unemployment is quite concentrated in particular areas... plays-into-hands C ONTAINMENT Destination ... which plays straight into the hands of the radicals... continue-along-road S OURCE _PATH _G OAL Direction ... must be encouraged to continue along this road ... 6 Discussion In our corpus the manifestations of image schemas in natural language related to both the abstract and the concrete. For instance, in “Beziehungen mit neuem Leben erfüllen” (fill relationships with new life) “relationships” are abstract C ONTAINERs while in “sie füllen ihre Taschen mit Gold” (they are filling their pockets with gold) “pockets” (real or hypothetical) are concrete C ONTAINERs. This juxtaposition of abstract and concrete concepts makes the corpus a good dataset for investigations into the nature of image schemas since they offer to ground abstract phenomena in physical sensorimotor expe- riences. In addition, this analogous relationship between the concrete and the abstract assists the task of annotating the clusters for image schemas, for if “pockets” is a con- tainer then “relationships” has to be evaluated against the C ONTAINER criteria as well. The most challenging part of our method was the semantic role labeling of preposi- tion senses, which had to be approached with different methods for the three different languages in our study. In fact, we believe that the low F-measure in Swedish might be attributed to the role labeling, since both the English high-quality labels and the German manual annotation of semantic roles returned more satisfactory F-measures. Naturally it would be preferable if similar tools for semantic labeling were available for the in- vestigated languages. One possible solution would be to follow the method in [36] and from the beginning manually label the clusters. The consequences this might have had on our results are of less importance than for studies that aim at providing an in-depth crosslingual comparison of image schemas, which we intend to do as future work. One solution could be to crowdsource the spatial role labeling task. Another important aspect in need of improvement is the manual annotation of image schemas. It is error-prone and biased by the human evaluators. This problem was high- lighted through a preliminary crosslingual comparison. For instance, “put-on-market” was annotated as S UPPORT in English, C ONTAINMENT in German, and as none in Swedish. This disjoint annotation is partly due to individual annotators, but also due to the different connotations natural languages have and might therefore not be wrong in itself but provides little confidence in terms of automated cross-lingual comparisons. To improve the annotation process, we hope to rely on different supervised approaches based on good examples from the natural language image schema repository we created for this paper. Alternatively, we might consider crowdsourcing also for this step. The current analysis extracted image schemas from natural languages pertaining to the same language family. To really confirm that our method can be used effectively to extract image schemas from multilingual corpora, we need to test it on other language families as well, a process that has been started but not finished in time for this paper due to the time-consuming annotation process. Regarding the results, the spatial clusters return an overwhelming numbers of C ON - TAINMENT schemas. While C ONTAINMENT undeniably is one of the most essential image schemas, there is room for improvement here. As previously observed [33,12], image schemas do not always appear in isolation, but rather as families. During the pro- cess of annotating the image schemas, many kinds of C ONTAINMENT schemas could be detected. Following previous approaches [3,8], it would be interesting to analyze the elements involved in image schemas (e.g. border, inside, outside of C ONTAINMENT) and their interaction in natural language instead of identifying abstract image schemas only. Our results show that findings from investigations into spatial language can effec- tively be used to extract image schemas from natural languages. They also show that image schemas are prevalent in natural language, even highly abstract language. Know- ing that the abstract “relationships” from the above example have the same underlying image schema of C ONTAINMENT as the physical “pockets” can further our understand- ing of the influence of sensorimotor experiences on language use. It means that we conceptualize both in certain contexts as containers with a fill level. Our study found a rather similar distribution of image schemas across three languages. The resulting repository of multilingual expressions annotated with image schemas provides a good starting point for a crosslingual comparison, which we intend to do including other lan- guage families than considered here. Such investigation can contribute to research on the universality of image schemas. 7 Conclusion and future work We present a method to semi-automatically extract image schemas from natural lan- guage, which provides promising results that confirm our assumption that verb-preposition pairs with their context nouns as features are good indicators of spatial and also image- schematic language. We exemplify the approach in English, Swedish, and German. Parts of the method are manual and still in a preliminary stage. Future work there- fore includes to use the results from this study as examples for a supervised approach to work towards a method requiring less manual effort. We also intend to broaden the cur- rent experiment to different language families and test with more feature combinations than the three word classes used herein. References 1. Alomari, M., Duckworth, P., Hogg, D., Cohn, A.: Learning of object properties, spatial rela- tions, and actions for embodied agents from language and vision. In: To be confirmed. AAAI Press (2017) 2. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1). pp. 238–247 (2014) 3. Bennett, B., Cialone, C.: Corpus guided sense cluster analysis: a methodology for ontology development (with examples from the spatial domain). In: Garbacz, P., Kutz, O. (eds.) 8th International Conference on Formal Ontology in Information Systems (FOIS). Frontiers in Artificial Intelligence and Applications, vol. 267, pp. 213–226. IOS Press (2014) 4. Björkelund, A., Hafdell, L., Nugues, P.: Multilingual semantic role labeling. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task. pp. 43–48. Association for Computational Linguistics (2009) 5. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP. pp. 740–750 (2014) 6. Dodge, E., Lakoff, G.: Image schemas: From linguistic analysis to neural grounding. In: Hampe, B., Grady, J.E. (eds.) From perception to meaning: Image schemas in cognitive linguistics, pp. 57–91. Mouton de Gruyter, Berlin (2005) 7. Gibbs, R.W., Colston, H.L.: The cognitive psychological reality of image schemas and their transformation. Cognitive Linguistics 6, 347–378 (1995) 8. Gromann, D., Hedblom, M.M.: Breaking down finance: A method for concept simplification by identifying movement structures from the image schema path-following. In: Proc. of the Joint Ontology Workshops (JOWO) (2016) 9. Gromann, D., Hedblom, M.M.: Kinesthetic mind reader: A method to identify image schemas in natural language. In: Proceedings of Advancements in Cognitive Systems (2017) 10. Guerin, F.: Learning like a baby: A survey of AI approaches. The Knowledge Engineering Review 00(0), 1–22 (2008) 11. Hall, J., Nivre, J., Nilsson, J.: A hybrid constituency-dependency parser for swedish. In: Proceedings of NODALIDA. pp. 284–287 (2007) 12. Hedblom, M.M., Kutz, O., Neuhaus, F.: Choosing the Right Path: Image Schema Theory as a Foundation for Concept Invention. Journal of Artificial General Intelligence 6(1), 22–54 (2015) 13. Johnson, M.: The Body in the Mind. The Bodily Basis of Meaning, Imagination, and Rea- soning. The University of Chicago Press (1987) 14. Kingsbury, P., Palmer, M.: From treebank to propbank. In: LREC. pp. 1989–1993. Citeseer (2002) 15. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT summit. vol. 5, pp. 79–86 (2005) 16. Kollar, T., Tellex, S., Roy, D., Roy, N.: Grounding verbs of motion in natural language com- mands to robots. In: Experimental robotics. pp. 31–47. Springer (2014) 17. Kordjamshidi, P., Van Otterlo, M., Moens, M.F.: Spatial role labeling: Towards extraction of spatial relations from natural language. ACM Transactions on Speech and Language Pro- cessing (TSLP) 8(3), 4 (2011) 18. Kövecses, Z.: Metaphor: A Practical Introduction. Oxford University Press, USA (2010) 19. Krishnamurthy, J., Kollar, T.: Jointly learning to parse and perceive: Connecting natural lan- guage to the physical world. Transactions of the Association for Computational Linguistics 1, 193–206 (2013) 20. Lakoff, G.: Women, fire, and dangerous things. what categories reveal about the mind. The University of Chicago Press (1987) 21. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press (1980) 22. Litkowski, K.: Pattern Dictionary of English Prepositions. In: Proceedings of the 52nd An- nual Meeting of the Association for Computational Linguistics. vol. 1, pp. 1274–1283. Bal- timore, Maryland (2014) 23. Mandler, J.M.: How to build a baby: Ii. conceptual primitives. Psychological review 99(4), 587 (1992) 24. Mandler, J.M., Pagán Cánovas, C.: On defining image schemas. Language and Cognition pp. 1–23 (2014) 25. Mani, I., Pustejovsky, J.: Interpreting motion: Grounded representations for spatial language. No. 5 in Explorations in Language and Space, Oxford University Press (2012) 26. Mason, Z.J.: Cormet: a computational, corpus-based conventional metaphor extraction sys- tem. Computational linguistics 30(1), 23–44 (2004) 27. Misra, D.K., Sung, J., Lee, K., Saxena, A.: Tell me dave: Context-sensitive grounding of nat- ural language to manipulation instructions. The International Journal of Robotics Research 35(1-3), 281–300 (2016) 28. Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. In: NIPS. vol. 14, pp. 849–856 (2001) 29. Östling, R.: Stagger: An open-source part of speech tagger for swedish. Northern European Journal of Language Technology (NEJLT) 3, 1–18 (2013) 30. Pauwels, P.: Levels of metaphorization: The case of put. In: Goossens, L. (ed.) By Word of Mouth: Metaphor, metonymy and linguistic action in a cognitive perspective, pp. 125–158. John Benjamins Publishing Company, Amsterdam (1995) 31. Punyakanok, V., Roth, D., Yih, W.: The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics 34(2) (2008) 32. Roth, M., Lapata, M.: Neural semantic role labeling with dependency path embeddings. arXiv preprint arXiv:1605.07515 (2016) 33. Santibáñez, F.: The object image-schema and other dependent schemas. Atlantis 24(2), 183– 201 (2002) 34. Shapiro, L.: Embodied cognition. New problems of philosophy, Routledge, London and New York (2011) 35. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22(8), 888–905 (2000) 36. Shutova, E., Sun, L., Gutierrez, D., Lichtenstein, P., Narayanan, S.: Multilingual metaphor processing: Experiments with semi-supervised and unsupervised learning. Computational Linguistics (2016), forthcoming 37. Spranger, M.: The evolution of grounded spatial language (2016) 38. Sun, L., Korhonen, A.: Improving verb clustering with automatically acquired selectional preferences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Lan- guage Processing: Volume 2-Volume 2. pp. 638–647. Association for Computational Lin- guistics (2009) 39. Talmy, L.: The fundamental system of spatial schemas in language. In: Hampe, B., Grady, J.E. (eds.) From perception to meaning: Image schemas in cognitive linguistics, Cognitive Linguistics Research, vol. 29, pp. 199–234. Walter de Gruyter (2005) 40. Von Luxburg, U.: A tutorial on spectral clustering. Statistics and computing 17(4), 395–416 (2007) 41. Xu, Z., Ke, Y.: Effective and efficient spectral clustering on text and link data. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. pp. 357–366. ACM (2016) 42. Zlatev, J.: Spatial semantics. In: Geeraerts, D., Cuyckens, H. (eds.) The Oxford Handbook of Cognitive Linguistics, pp. 318–350. Oxford University Press (2010)