<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">MACID -Multimodal ACtion IDentification: A CALAMITA Challenge</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrea</forename><forename type="middle">Amelio</forename><surname>Ravelli</surname></persName>
							<email>andreaamelio.ravelli@unibo.it</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">ABSTRACTION Research Group</orgName>
								<orgName type="institution" key="instit2">University of Bologna</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rossella</forename><surname>Varvara</surname></persName>
							<email>rossella.varvara01@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Independent Researcher</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lorenzo</forename><surname>Gregori</surname></persName>
							<email>lorenzo.gregori@unifi.it</email>
							<affiliation key="aff2">
								<orgName type="institution">University of Florence</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">MACID -Multimodal ACtion IDentification: A CALAMITA Challenge</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">4669EC708D99AEE9A879C2B2B431AF1C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>human action recognition</term>
					<term>action types</term>
					<term>find the intruder</term>
					<term>LLM</term>
					<term>CALAMITA</term>
					<term>CLiC-it</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the Multimodal ACtion IDentification challenge (MACID), part of the first CALAMITA competition. The objective of this task is to evaluate the ability of Large Language Models (LLMs) to differentiate between closely related action concepts based on textual descriptions alone. The challenge is inspired by the "find the intruder" task, where models must identify an outlier among a set of 4 sentences that describe similar yet distinct actions. The dataset is composed of "pushing" events, and it highlights action-predicate mismatches, where the same verb may describe different actions or different verbs may refer to the same action. Although currently mono-modal (text-only), the task is designed for future multimodal integration, linking visual and textual representations to enhance action recognition. By probing a model's capacity to resolve subtle linguistic ambiguities, the challenge underscores the need for deeper cognitive understanding in action-language alignment, ultimately testing the boundaries of LLMs' ability to interpret action verbs and their associated concepts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction and Motivation</head><p>Human language and vision systems are deeply linked together, and the two may have a common evolutionary basis. According to the Mirror System Hypothesis <ref type="bibr" target="#b0">[1]</ref> the mechanism that supports language in the human brain may have evolved atop the mirror neuron system for grasping, taking advantage of its ability to recognize a set of actions, and adapting it to deal with linguistic acts (i.e. utterances) and to discriminate linguistic objects (i.e., audio patterns for words). Thus, according to this hypothesis, humans "invented" language by adapting the pattern recognition system, initially developed within the vision system to recognize actions, to identify and imitate audio patterns, and to link them to real-world entities (i.e. objects and events) and their mental representation. In other words, language is a form of action, and it probably starts from action capabilities that language emerged during human evolution. In this view, understanding and discriminating actions are of paramount importance for the broader scope of language understanding. Natural Language Processing is experiencing an unprecedented revolution due to the development of models capable of understanding and generating language; these models show human-like performances in solving many tasks (and above-human performance on some). Moreover, the recent development of multimodal LLMs allowed deep reasoning tasks involving the simultaneous processing of both textual and visual data.</p><p>With the MACID task at CALAMITA <ref type="bibr" target="#b1">[2]</ref>, we aim to challenge LLMs on their ability to finely discriminate between linguistic expressions referring to cognitively distinct but linguistically similar actions, due to the use of the same (or remarkably close) word labels to describe them. While the discrimination of very distant actions is a quite simple task (e.g. to distinguish between "opening a box" and "pressing a button"), grasping the nuances between actions that are much closer semantically is not so obvious (e.g. "pressing a button" and "pressing the wood"). These nuances are easy to highlight for a human, which can activate a simulated execution and thus find differences in motor execution, but a model without a physical dimension cannot. We aim to test to which degree an LLM can find the relevant information to recognize action concepts from their linguistic description. Moreover, visual information, in these scenarios, can facilitate the task for the computational model, providing more cues to disambiguate. For this reason, the proposed dataset has been conceived as a multimodal resource, with links between textual descriptions of actions and the short movie segments where these actions are performed.</p><p>Currently, the CALAMITA challenge does not deal with multi-modal LLMs, so for the first MACID competition, we are presenting the text-only version of the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Challenge Description</head><p>We propose a task modeled over the typical "find the intruder" game, similarly to Chang et al. <ref type="bibr" target="#b2">[3]</ref>, but extending it to sentences instead of words in isolation. Among a group of 4 video-caption pairs, the model is asked to select the one that does not refer to the same kind of action as the other three. For the task to be challenging, we focus on actions-predicate mismatches:</p><p>• different action concepts that may be defined by the same verb (e.g. "pressing a button" and "pressing the wood"); • the expression of the same action concept through different verbs (e.g. "pressing a button" and "pushing a button").</p><p>The challenge is mono-modal (i.e., text-only), but is ready to be turned in a multi-modal task (i.e., visual and linguistic information through video-caption pairs).</p><p>The task shares similarity with a word-sense discrimination task, since different senses of an action verb refer to different actions. However, the present task requires a deeper cognitive understanding of the sentences provided, given that the action can be described through different predicates and, the other way around, the same predicate can extend to a variety of actions. Indeed, the task forces the model to question a one-to-one relationship between meaning and form.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data description</head><p>We derived the data for this proposal from a small portion of the LSMDC dataset <ref type="bibr" target="#b3">[4]</ref>, which contains short video clips extracted from movies, along with English DVS (descriptive video services) transcription for visually impaired people. The LSMDC dataset is the result of the merging of two previous dataset, both built upon DVS from movies: the Max Plank Institute für Informatik Movie Description Dataset (MPII-MD) <ref type="bibr" target="#b4">[5]</ref>, and the Montreal Video Annotation Dataset (M-VAD) <ref type="bibr" target="#b5">[6]</ref>. The subset considered for this task is a collection of video-caption pairs restricted to the variation of the actions (and action verbs) linked to "pushing" events.</p><p>Data have been manually filtered and annotated [7] using the action conceptualization derived from the IMA-GACT Multilingual and Multimodal Ontology of Actions <ref type="bibr" target="#b7">[8]</ref>. IMAGACT is a multimodal and multilingual ontol-ogy of actions that provides a fine-grained categorization of action concepts, each represented by one or more visual prototypes in the form of recorded videos and 3D animations. IMAGACT currently contains 1,010 scenes that encompass the action concepts most commonly referred to in everyday language usage. Scenes belonging to the same action concept are grouped together and labeled with a unique identification number. The categorization of action concepts proposed in the theoretical framework behind IMAGACT has been validated in a series of experiments with a high inter-annotator agreement <ref type="bibr" target="#b8">[9]</ref>, confirming that the theoretical framework can be considered well-founded and reproducible.</p><p>We wrote an Italian caption for each of the selected videos from LSMDC, which originally had only an English textual description. The captioning took into account the necessity to produce a sounding Italian description, thus we chose the most appropriate verb (and construction) to describe the action depicted in the videos. Moreover, we choose to keep the anonymization as proposed in the LSMDC, but instead of using SOMEONE as the only replacement of nouns, we choose to use general expressions such as il ragazzo (the boy), la donna (the woman, and so on. In this way, we removed some ambiguities from the original dataset (e.g., SOMEONE pushes SOMEONE).</p><p>The MACID Task can also be framed as a multilingual task, given the already available parallel English captions, and the possibility to provide more translations in other languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data format</head><p>The MACID dataset is available on HuggingFace. 1  The dataset consists of groups of 4 captions (or videocaption pairs, in the case of the multimodal version), three of which belong to the same action concept, and one describing another action type.</p><p>Data are released in CSV format (columns: id, s1, v1, s2, v2, s3, v3, s4, v4, intruder), with the following meaning:</p><p>• id: the tuple id; • s1-4: the 4 sentences describing physical actions; • v1-4: the 4 videos depicting physical actions; • intruder: the number (1-4) of the sentence (and video) which is the intruder in the group.</p><p>An additional folder with the video files is included in the dataset for future extension to the multimodal task.</p><p>An example of the textual data follows. For each group, the model must select the caption referring to the intruder action. The action ID will be masked to the system and used for evaluating the model's performance, but the ID of the corresponding video will be added, in order to enable researchers to evaluate also multimodal models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Example of prompts used for zero shot</head><p>The task is evaluated with a zero-shot prompt only. The prompt used is reported in the example below.</p><p>Le seguenti 4 frasi sono descrizioni di azioni fisiche. Tre di queste azioni sono dello stesso tipo, mentre una è di un tipo diverso. Individua la frase che describe l'azione di tipo diverso rispondendo soltanto con il numero della frase (1, 2, 3 o 4). 1: I due ragazzi spingono il carrello verso la colonna 2: La donna spinge la signora anziana sulla sedia a rotelle 3: L'uomo spinge a terra l'aggressore 4: L'infermiere spinge la barella </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Frequency list of verbs used in the textual captions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Detailed data statistics</head><p>MACID dataset is made of 100 tuples, each one containing 4 textual descriptions of human actions in the form of short sentences in Italian, and 4 video segments depicting those actions. See Table <ref type="table">1</ref> for general details. The whole dataset is built using 307 hand-crafted captions, with each caption appearing at least once (either as positive sentence or as intruder), and for a maximum of 3 times (counting both the possible roles).</p><p>The dataset contains 18 action types, belonging to the semantic area of pushing events. Table <ref type="table">2</ref> reports the frequency list of verbs used to describe the actions.</p><p>In building the 4-sentence tuples, we maximized the balancing between close and distant action concepts, by choosing the intruder captions on the basis of the distance computed over the whole IMAGACT ontology data <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. Thus, we compiled the stimuli by paying attention to the distance between the action concepts of the three positive sentences and the intruder, trying to balance as much as possible between intruders with action concepts of high, medium or low similarity with respect to the action concept shared by the other three sentences in the stimulus. Furthermore, we also put our attention on creating stimuli which are varied in terms of action verbs, resulting in 5 possible patterns of verbs distribution across the 4 sentences of a stimulus:</p><p>1. four different verbs, i.e. one unique verb per sentence (1_1_1_1); 2. three different verbs, with a couple of sentences with the same verb (2_1_1);</p><p>3. two different verbs, with two sentences sharing the same verb (2_2); 4. two different verbs, with three sentences sharing the same verb and one with a different one (3_1); 5. one verb in all the four sentences (4). Table <ref type="table" target="#tab_1">3</ref> reports the distribution of the stimuli across the 5 schemes. Across all the stimuli and the distribution schemes, the intruder contains the same verb of at least one other sentence in 62 out of 100 cases. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Verb variation scheme</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Metrics</head><p>The evaluation metric proposed for the MACID Task is a simple accuracy: participating models will be evaluated on the basis of the percentage of correct times they select the intruder sentence in each 4-word tuple.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations</head><p>The main limitation of the MACID Task dataset is its size. We propose a set of 100 4-sentence tuples, as the MACID Task is intended as a zero-shot LLMs-only challenge, thus we did not designed it as a typical Machine Learning task with train(-dev)-test splitting. The possibility to have many more stimuli would open up to the possibility to tackle the task with other kind of models, but also to offer exemplars to be used to better inform LLMs about the required behavior.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example of the data from the MACID Task.</figDesc><graphic coords="2,89.29,84.19,416.69,234.39" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>TUPLE_1 1 https( 2 )( 1 )</head><label>121</label><figDesc>://huggingface.co/datasets/loregreg/MACID (1) I due ragazzi spingono il carrello verso la colonna (The two boys push the cart toward the column) [action id: 65431186] La donna spinge la signora anziana sulla sedia a rotelle (The woman pushes the elderly lady in the wheelchair) [action id: 65431186] (3) L'uomo spinge a terra l'aggressore (The man pushes the attacker to the ground) [action id: 18ad2fa9] (4) L'infermiere spinge la barella (The nurse pushes the gurney) [action id: 65431186] TUPLE_2 La donna si spinge fuori dalla piscina (The woman pushes herself out of the pool) [action id: 950a69d5] (2) L'uomo si solleva leggermente dalla donna sdraiata (The man lifts himself slightly off the lying woman) [action id: 950a69d5] (3) Il ragazzo a terra si alza in ginocchio con fatica (The boy on the ground gets up to his knees with difficulty) [action id: 950a69d5] (4) L'uomo preme il fazzoletto contro la sua narice (The man presses the tissue against his nostril) [action id: 8b2675f8]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Distribution of the verb variation scheme across the stimuli of the MACID dataset.</figDesc><table><row><cell></cell><cell>Count</cell></row><row><cell>1_1_1_1</cell><cell>7</cell></row><row><cell>2_1_1</cell><cell>16</cell></row><row><cell>2_2</cell><cell>9</cell></row><row><cell>3_1</cell><cell>44</cell></row><row><cell>4</cell><cell>24</cell></row><row><cell>Total</cell><cell>100</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partially supported by the Project ERC-2021-STG-101039777 (ABSTRACTION), funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Neural expectations: A possible evolutionary path from manual skills to language</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arbib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzolatti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communication and Cognition</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="393" to="424" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian</title>
		<author>
			<persName><forename type="first">G</forename><surname>Attanasio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Borazio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Francis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Musacchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Scalena</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024-12-06">December 4 -December 6, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Reading tea leaves: How humans interpret topic models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gerrish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Boyd-Graber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Movie description</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tandon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Vision</title>
		<imprint>
			<biblScope unit="volume">123</biblScope>
			<biblScope unit="page" from="94" to="120" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A dataset for movie description</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tandon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3202" to="3212" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Torabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1503.01070</idno>
		<title level="m">Using descriptive video services to create a large data source for video annotation research</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Annotation of linguistically derived action concepts in computer vision datasets</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ravelli</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
		<respStmt>
			<orgName>University of Florence</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The imagact visual ontology. an extendable multilingual infrastructure for the representation of lexical encoding of action</title>
		<author>
			<persName><forename type="first">M</forename><surname>Moneglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Frontini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gagliardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Monachini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panunzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation-LREC&apos;14, European Language Resources Association (ELRA)</title>
				<meeting>the Ninth International Conference on Language Resources and Evaluation-LREC&apos;14, European Language Resources Association (ELRA)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3425" to="3432" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Rappresentazione dei concetti azionali attraverso prototipi e accordo nella categorizzazione dei verbi generali. una validazione statistica</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gagliardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Italian Conference on Computational Linguistics-CLiC-it</title>
				<meeting>the First Italian Conference on Computational Linguistics-CLiC-it</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="180" to="185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Action type induction from multilingual lexical features</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gregori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Varvara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ravelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="85" to="92" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Comparing refvectors and word embeddings in a verb semantic similarity task</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ravelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gregori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Varvara</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Natural Language for Artificial Intelligence</title>
				<meeting>the 3rd Workshop on Natural Language for Artificial Intelligence</meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="0" to="0" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Towards a crosslinguistic identification of action concepts. automatic clustering of video scenes based on the imagact multilingual ontology</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gregori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moneglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panunzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Annotation, Recognition and Evaluation of Action, On line Areaworkshop</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
	<note>AREA II workshop</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
