<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">CLEF 2017: Multimodal Spatial Role Labeling Task Working Notes</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Parisa</forename><surname>Kordjamshidi</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Tulane University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Taher</forename><surname>Rahgooy</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Tulane University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marie-Francine</forename><surname>Moens</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Katholieke Universiteit Leuven</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">James</forename><surname>Pustejovsky</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Brandies University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Umar</forename><surname>Manzoor</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Tulane University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kirk</forename><surname>Roberts</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">The University of Texas Health Science Center at Houston</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">CLEF 2017: Multimodal Spatial Role Labeling Task Working Notes</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">85F4662F65E992C67CD3CE2AFFE32E93</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The extraction of spatial semantics is important in many real-world applications such as geographical information systems, robotics and navigation, semantic search, etc. Moreover, spatial semantics are the most relevant semantics related to the visualization of language. The goal of multimodal spatial role labeling task is to extract spatial information from free text while exploiting accompanying images. This task is a multimodal extension of spatial role labeling task which has been previously introduced as a semantic evaluation task in the SemEval series. The multimodal aspect of the task makes it appropriate for the CLEF lab series. In this paper, we provide an overview of the task of multimodal spatial role labeling. We describe the task, sub-tasks, corpora, annotations, evaluation metrics, and the results of the baseline and the task participant.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The multimodal spatial role labeling task (mSpRL) is a multimodal extension of the spatial role labeling shared task in SemEval-2012 <ref type="bibr" target="#b4">[5]</ref>. Although there were proposed extensions of the data and the task in more extensive schemes in Kolomiyets et al. <ref type="bibr" target="#b3">[4]</ref> and Pustejovsky et al. <ref type="bibr" target="#b12">[13]</ref>, the SemEval-2012 data was more appropriate for the goal of incorporating the multimodality aspect. SemEval-2012 annotates CLEF IAPRTC-12 Image Benchmark <ref type="bibr" target="#b0">[1]</ref>, which includes touristic pictures along with a textual description of the pictures. The descriptions are originally provided in multiple languages though we use the English annotations for the purpose of our research.</p><p>The goal of mSpRL is to develop natural language processing (NLP) methods for extraction of spatial information from both images and text. Extraction of spatial semantics is helpful for various domains such as semantic search, question answering, geographical information systems, and even in robotic settings when giving robots navigational instructions or instructions for grabbing and manipulating objects. It is also essential for some specific tasks such as text to scene conversion (or vice-versa), scene understanding as well as general information retrieval tasks when using a huge amount of available multimodal data from various resources. Moreover, we have noticed an increasing interest in the extraction of spatial information from medical images that are accompanied by natural language descriptions. The textual descriptions of a subset of images are annotated with spatial roles according to spatial role labeling annotation scheme <ref type="bibr" target="#b6">[7]</ref>. We should note that considering the vision and language modalities and combining the two media has become a very popular research challenge nowadays. We distinguish our work and our data from the existing research related to vision and language (inter alia, <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b2">3]</ref>) in considering explicit formal spatial semantics representations and providing direct supervision for machine learning techniques by our annotated data. The formal meaning representation would help to exploit explicit spatial reasoning mechanisms in the future. In the rest of this overview paper, we introduce the task in Section 2; we describe the annotated corpus in Section 3; the baseline and the participant systems are described in Section 4; Section 5 reports the results and the evaluation metrics. Finally, we conclude in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Task Description</head><p>The task of text-based spatial role labeling (SpRL) <ref type="bibr" target="#b7">[8]</ref> aims at mapping natural language text to a formal spatial meaning representation. This formal representation includes specifying spatial entities based on cognitive linguistic concepts and the relationships between those entities, in addition to the type of relationships in terms of qualitative spatial calculi models. A concise ontology of the main target concepts is drawn in Figure <ref type="figure" target="#fig_1">1</ref> and the details are described later in this section. The applied ontology includes a subset of concepts proposed in the scheme described in <ref type="bibr" target="#b6">[7]</ref>. We divide this task to three sub-tasks. To clarify these sub-tasks, we use the example of Figure <ref type="figure" target="#fig_2">2</ref>. This figure shows a photograph and a few English sentences that describe it. Given the first sentence "About 20 kids in traditional clothing and hats waiting on stairs.", we need to do the following tasks:  Trajector is an entity whose location is described and landmark is a reference object for describing the location of a trajector. In the above-mentioned sentence, the location of about 20 kids that is the trajector has been described with respect to the the stairs that is the landmark using the preposition on that is the spatial indicator. These are examples the spatial roles that we aim to extract form the sentence. -Sub-task 2: The second sub-task is to identify the relations/links between the spatial roles. Each spatial relation is represented as a triplet of (spatial-indicator, trajector, landmark). Each sentence can contain multiple relations and individual phrases can even take part in multiple relations. Furthermore, occasionally roles can be implicit in the sentence (i.e., a null item in the triplet). In the above example, we have the triplet (kids,on,stairs) that form a spatial relation/link between the three above mentioned roles. Recognizing the spatial relations is very challenging because there could be several spatial roles in the sentence and the model should be able to recognize the right connections. For example (waiting, on, stairs) is a wrong relation here because "kids" is the trajector in this sentence not "waiting". -Sub-task 3: The third sub-task is to recognize the type of the spatial triplets. The types are expressed in terms of multiple formal qualitative spatial calculi models similar to Figure <ref type="figure" target="#fig_1">1</ref>. At the most course-grained level the relations are classified into three categories of topological (regional), directional, or distal. Topological relations are classified according to the well-known RCC (regional connection calculus) qualitative representation. A variation of RCC8 with five relations that is shown in Figure <ref type="figure" target="#fig_1">1</ref> includes Externally connected (EC), Disconnected (DC), Partially overlapping (PO), Proper part (PP), and Equality (EQ). The data is originally annotated by RCC8 which distinguishes between Proper part (PP), Tangential proper part (TPP) and Inverse tangential proper part inverse (TPPI). For this lab the original RCC8 annotations are used. In <ref type="bibr" target="#b8">[9]</ref>, these categories are merged due to the lack of examples in each of those in the corpus and also being semantically closely related. Directional relations include 6 relative directions: left, right, above, below, back, and front. In the above example, we can state the type of relation between the roles in the triplet (kids,on,stairs) is "above". In general, we can assign multiple types to each relation. This is due to the polysemy of spatial prepositions as well as the difference between the level of specificity of spatial relations expressed in the language compared to formal spatial representation models. However, multiple assignments are not frequently made in our dataset.</p><formula xml:id="formula_0">is-a is-a is-a is-a is-a is-a is-a is-a is-a is-a is-a is-a is-a composed-of composed-of composed-of is-a</formula><p>The task that we describe here is similar to the specifications that are provided in Kordjamshidi et al. <ref type="bibr" target="#b8">[9]</ref>, however, the main point of this CLEF lab was to provide an additional resource of information (the accompanying images) and investigate the ways that the images can be exploited to improve the accuracy of the text-based spatial extraction models. The way that the images can be used is left open to the participants. Previous research has shown that this task is very challenging <ref type="bibr" target="#b7">[8]</ref>, particularly given the small set of available training data and we aim to investigate if using the images that accompany textual data can improve the recognition of the spatial objects and their relations. Specifically, our hypothesis is that the images could improve the recognition of the type of relations given that the geometrical features of the boundaries of the objects </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Annotated Corpora</head><p>The annotated data is a subset of the IAPR TC-12 image Benchmark <ref type="bibr" target="#b0">[1]</ref>. It contains 613 text files with a total of 1,213 sentences. The original corpus was available without copyright restrictions. The corpus contains 20,000 images taken by tourists with textual descriptions in up to three languages (English, German, and Spanish). The texts describe objects and their absolute or relative positions in the image. This makes the corpus a rich resource for spatial information. However the descriptions are not always limited to spatial information which makes the task more challenging. The data has been annotated with the roles and relations that were described in Section 2, and the annotated data can be used to train machine leaning models to do this kind of extractions automatically. The text has been annotated in previous work (see <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b5">6]</ref>). The role annotations are provided on phrases rather than single words. The statistics about the data is given in Table <ref type="table">1</ref>. For this lab, we augmented the textual spatial annotations with a reference to the aligned images in the xml annotations and fixed some of the annotation mistakes to provide a cleaner version of the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">System Descriptions</head><p>We, as organizers of the lab, provided a baseline inspired by previous research for the sake of comparison. The shared task had one official participant who submitted two systems. In this section, we describe the submitted systems and the baseline.  <ref type="table">1</ref>. The statistics of the annotated CLEF-Image Benchmark, some of the spatial relations are annotated with multiple types, e.g., having both region and direction labels.</p><p>-Baseline: For sub-task 1 and classifying each role (Spatial Indicator, Trajector, and Landmark), we created a sparse perceptron binary classifier that uses a set of lexical, syntactical, and contextual features, such as lexical surface patterns, headwords phrases, part-of-speech tags, dependency relations, subcategorization, etc. For classifying the spatial relations, we first trained two binary classifiers on pairs of phrases. One classifier detects Trajector-SpatialIndicator pairs and another detects Landmark-SpatialIndicator pairs. We used the spatial indicator classifier from sub-task 1 to find the indicator candidates and considered all noun phrases as role candidates. Each combination of SpatialRole-SpatialIndicator candidates considered as a pair candidate and the pair classifiers are trained on. We used a number of relational features between the pairs of phrases such as distance, before, etc to classify them. In the final phase, we combined the predicted phrase pairs that have a common spatial indicator in order to create the final relation/triplet for subtask 2. for example if (kids,on) pair is classified as Trajector-SpatialIndicator and (stairs,on) is predicted as Landmark-SpatialIndicator then we generate the triplet, (on,kids,stairs) as a spatial triplet since both trajector and landmark relate to the same preposition 'on'. The features of this baseline model are inspired by the work in <ref type="bibr" target="#b8">[9]</ref>. For sub-task 3 and training general type and specific value classifiers, we used a very naive pipeline model as the baseline. In this pipeline, the predicted triplets from the last stage are used for training the relations types. For these type classifiers, simply, the phrase features of each argument of the triplets are concatenated and used as features. Obviously, we miss a large number of relations at the stage of spatial relation extraction in sub-task 2 since we depend on its recall.</p><p>-LIP6: The LIP6 group built a system for sub-task 3 that classifies relation types. For the sub-task 1 and 2, the proposed model in Roberts and Harabagiu <ref type="bibr" target="#b13">[14]</ref> was used.</p><p>Particularly, an implementation of that model in the Saul <ref type="bibr" target="#b9">[10]</ref> language/library was applied. These models are assigning roles to single words rather than phrases. However, since our evaluation counts the overlapping phrases, the correctly classified single words will be counted as correct predictions. For every relation, an embedding is built with available data: the textual relation triplet and visual features from the associated image. Pre-trained word embeddings are used <ref type="bibr" target="#b11">[12]</ref> to represent the trajector and landmark and a one-hot vector indicates which spatial indicator is used; the visual features and embeddings from the segmented regions of the trajectors and landmarks are extracted and projected into a low dimensional space. Given those generated embeddings, a linear SVM model is trained to classify the spatial relations and the embeddings remain fixed. Several experiments were made to try various classification modes and discuss the effect of the model parameters, and particularly to investigate the impact of the visual modality. As the best performing model ignores the visual modality, these results highlight that considering multimodal data for enhancing natural language processing is a difficult task and requires more efforts in terms of model design.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluation Metrics and Results</head><p>About 50% of the data was used as the test set for the evaluation of the systems. The evaluation metrics were precision, recall, and F1-measure, defined as:</p><formula xml:id="formula_1">recall = T P T P + F N , precision = T P T P + F P , F 1 = 2 * recall * precision (recall + precision)</formula><p>where, TP (true positives) is the number of predicted components that match the ground truth, FP (false positives) is the number of predicted components that do not match the ground truth, and FN (false negatives) is the number of ground truth components that do not match the predicted components. These metrics are used to evaluate the performance on recognizing each type of role, the relations and each type of relation separately. Since the annotations are provided based on phrases, the overlapping phrases are counted as correct predictions. The evaluation with exact matching between phrases would provide lower performance than the reported ones. The relation type evaluation for sub-task 3 includes course-and fine-grained metrics. The coarse-grained metric (overall-CG) averages over the labels of region, direction, and distance. The fine-grained metric (overall-FG) shows the performance over all lower-level nodes in the ontology including the RCC8 types (e.g., EC) and directional relative types (e.g., above, below). Table <ref type="table" target="#tab_1">2</ref> shows the results of our baseline system that was described in the previous section. Though the results of the roles and relation extraction are fairly comparable to the state of the art <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b8">9]</ref>, the results of the relations type classifiers are less matured because a simple pipeline, described in Section 4, was used. Table <ref type="table">3</ref> shows the results of the participant systems.</p><p>As mentioned before, LIP6 uses the model suggested in <ref type="bibr" target="#b13">[14]</ref> and its implementation in Saul <ref type="bibr" target="#b9">[10]</ref> for sub-task 1 and sub-task 2. It has a focus in designing a model for subtask 3. The experimental results using textual embeddings alone are shown under text only in the table, and a set of results are reported by exploiting the accompanying images and training the visual embeddings from the corpora. The LIP6's system significantly outperforms the provided baseline for relation type classifiers. Despite our expectations, the results that use the visual embeddings perform worse than the one that ignores images. In addition to the submitted systems, the LIP6 team improved their results slightly by using a larger feature size in their dimensionality reduction procedure with their text-only features. This model outperforms their submitted systems and is listed in Table <ref type="table">3</ref>   <ref type="table">3</ref>. LIP6 performance with various models for Sub-task 3; LIP6 uses Roberts and Harabagiu <ref type="bibr" target="#b13">[14]</ref> for Sub-tasks 1 and 2.</p><p>Discussion. Confirming the previous research results, the results of LIP6 team show this task is challenging, particularly, when using this small set of training data. LIP6 was able to outperform the provided baseline using the textual embeddings for relation types but the results of combining the images, in the contrary, dropped the performance. This result indicates that integrating the visual information needs more investigation otherwise it can only add noise to the learning system. One very basic question to be answered is whether the images of this specific dataset can potentially provide complementary information or help resolving ambiguities in the text at all; this investigation might need a human analysis. Although the visual embeddings did not help the best participant system with the current experiments, using other alternative embeddings trained from large corpora might help improving this task. Given the current interest of the vision and language communities in combining the two modalities and the benefits that this trend will have for the information retrieval, there are many new corpora becoming available(e.g. <ref type="bibr" target="#b10">[11]</ref>) which can be valuable sources of information for obtaining appropriate join features. There is a separate annotation on the same benchmark that includes the ground-truth of the co-references in the text and image <ref type="bibr" target="#b1">[2]</ref>. This annotation has been generated for co-reference resolution task but it seems to be very useful to be used on top of our spatial annotations for finding better alignment between spatial roles and image segments. In general, current related language and vision resources do not consider formal spatial meaning representation but can be used indirectly to train informative representations or be used as source for indirect supervision for extraction of formal spatial meaning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>The goal of the multimodal spatial role labeling lab was to provide a benchmark to investigate how adding grounded visual information can help understanding the spatial semantics of natural language text and mapping language to a formal spatial meaning representation. The prior hypothesis has been that the visual information should help the extraction of such semantics because spatial semantics are the most relevant semantics for visualization and the geometrical information conveyed in the vision media should be able to easily help in disambiguation of spatial meaning. Although, there are many recent research works on combining vision and language, none of them consider obtaining a formal spatial meaning representation as a target nor provide supervision for training such representations. However, the experimental results of our mSpRL lab participant show that even given ground truth segmented objects in the images and having the exact geometrical information about their relative positions, adding useful information for understanding the spatial meaning of the text is very challenging. The experimental results indicate that using the visual embeddings and using the similarity between the objects in the image and spatial entities in the text can turn to adding noise to the learning system reducing the performance. However, we believe our prior hypothesis is still valid, but finding an effective way to exploit vision for spatial language understanding, particularly obtaining a formal spatial representation appropriate for explicit reasoning, remains as an important research question.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig.1. Given spatial ontology<ref type="bibr" target="#b8">[9]</ref> </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. "About 20 kids in traditional clothing and hats waiting on stairs. A house and a green wall with gate in the background. A sign saying that plants can't be picked up on the right."</figDesc><graphic coords="4,187.82,117.02,239.71,179.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>as Best model. Baseline: classic classifiers and linguistically motivated features based on<ref type="bibr" target="#b8">[9]</ref> </figDesc><table><row><cell cols="3">Label</cell><cell>P</cell><cell>R</cell><cell>F1</cell></row><row><cell></cell><cell></cell><cell>SP</cell><cell cols="3">94.76 97.74 96.22</cell></row><row><cell></cell><cell></cell><cell>TR</cell><cell cols="3">56.72 69.56 62.49</cell></row><row><cell></cell><cell cols="2">LM</cell><cell cols="3">72.97 86.21 79.04</cell></row><row><cell cols="6">Overall 74.36 83.81 78.68</cell></row><row><cell cols="6">Triplets 75.18 45.47 56.67</cell></row><row><cell cols="6">Overall-CG 64.72 37.91 46.97</cell></row><row><cell cols="6">Overall-FG 47.768 23.490 26.995</cell></row><row><cell cols="3">Label</cell><cell></cell><cell>P</cell><cell>R</cell><cell>F1</cell></row><row><cell cols="3">SP</cell><cell></cell><cell cols="2">97.59 61.13 75.17</cell></row><row><cell cols="3">TR</cell><cell></cell><cell cols="2">79.29 53.43 63.84</cell></row><row><cell cols="3">LM</cell><cell></cell><cell cols="2">94.05 60.73 73.81</cell></row><row><cell cols="3">Overall</cell><cell></cell><cell cols="2">89.55 58.03 70.41</cell></row><row><cell cols="3">Triplets</cell><cell></cell><cell cols="2">68.33 48.03 56.41</cell></row><row><cell>Text only</cell><cell cols="3">Overall-CG Overall-FG</cell><cell>63.829 56.488</cell><cell>44.835 39.038</cell><cell>52.419 43.536</cell></row><row><cell cols="2">Text+Image</cell><cell cols="2">Overall-CG Overall-FG</cell><cell>66.366 58.744</cell><cell>46.539 40.716</cell><cell>54.635 45.644</cell></row><row><cell>Best model</cell><cell></cell><cell cols="2">Overall-CG Overall-FG</cell><cell>66.76 58.20</cell><cell>46.96 41.05</cell><cell>55.02 45.93</cell></row><row><cell>Table</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The IAPR benchmark: a new evaluation resource for visual information systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Grubinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Clough</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Deselaers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Language Resources and Evaluation (LREC)</title>
				<meeting>the International Conference on Language Resources and Evaluation (LREC)</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="13" to="23" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Referit game: Referring to objects in photographs of natural scenes</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kazemzadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ordonez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Berg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Unifying visual-semantic embeddings with multimodal neural language models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Zemel</surname></persName>
		</author>
		<idno>CoRR abs/1411.2539</idno>
		<ptr target="http://arxiv.org/abs/1411.2539" />
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Semeval-2013 task 3: Spatial role labeling</title>
		<author>
			<persName><forename type="first">O</forename><surname>Kolomiyets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Seventh International Workshop on Semantic Evaluation (Se-mEval 2013</title>
				<meeting>the Seventh International Workshop on Semantic Evaluation (Se-mEval 2013<address><addrLine>Atlanta, Georgia, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013-06">June 2013</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="255" to="262" />
		</imprint>
	</monogr>
	<note>Second Joint Conference on Lexical and Computational Semantics (*SEM)</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SemEval-2012 task 3: Spatial role labeling</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Joint Conference on Lexical and Computational Semantics: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval)</title>
				<meeting>the First Joint Conference on Lexical and Computational Semantics: the Sixth International Workshop on Semantic Evaluation (SemEval)</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="365" to="373" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Spatial role labeling annotation scheme</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Otterlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Handbook of Linguistic Annotation</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Pustejovsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><forename type="middle">N</forename></persName>
		</editor>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Spatial role labeling: task definition and annotation scheme</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Otterlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC&apos;10)</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Khalid</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Bente</surname></persName>
		</editor>
		<meeting>the Seventh Conference on International Language Resources and Evaluation (LREC&apos;10)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="413" to="420" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Spatial role labeling: towards extraction of spatial relations from natural language</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Otterlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM -Transactions on Speech and Language Processing</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="1" to="36" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Global machine learning for spatial ontology population</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Web Semant</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="3" to="21" />
			<date type="published" when="2015-01">Jan 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Saul: Towards declarative learning based programming</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the International Joint Conference on Artificial Intelligence (IJCAI)</title>
				<meeting>of the International Joint Conference on Artificial Intelligence (IJCAI)</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">7</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Visual genome: Connecting language and vision using crowdsourced dense image annotations</title>
		<author>
			<persName><forename type="first">R</forename><surname>Krishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kravitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kalantidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Shamma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Vision</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EMNLP</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1532" to="1543" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SemEval-2015 task 8: SpaceEval</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pustejovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kordjamshidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Levine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dworman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yocum</surname></persName>
		</author>
		<ptr target="https://lirias.kuleuven.be/handle/123456789/500427" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 9th international workshop on semantic evaluation (SemEval 2015)</title>
				<meeting>the 9th International Workshop on Semantic Evaluation (SemEval 2015), 9th international workshop on semantic evaluation (SemEval 2015)<address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<publisher>ACL</publisher>
			<date type="published" when="2015-06">June 2015. 2015</date>
			<biblScope unit="page" from="884" to="894" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">UTD-SpRL: A joint approach to spatial role labeling</title>
		<author>
			<persName><forename type="first">K</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Harabagiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval&apos;12)</title>
				<meeting>the Sixth International Workshop on Semantic Evaluation (SemEval&apos;12)</meeting>
		<imprint>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="419" to="424" />
		</imprint>
	</monogr>
	<note>The First Joint Conference on Lexical and Computational Semantics</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
