=Paper=
{{Paper
|id=Vol-1749/paper51
|storemode=property
|title=Language Resources for Italian: towards the Development of a Corpus of Annotated Italian Multiword Expressions
|pdfUrl=https://ceur-ws.org/Vol-1749/paper51.pdf
|volume=Vol-1749
|authors=Shiva Taslimipoor,Anna Desantis,Manuela Cherchi,Ruslan Mitkov,Johanna Monti
|dblpUrl=https://dblp.org/rec/conf/clic-it/TaslimipoorDCMM16
}}
==Language Resources for Italian: towards the Development of a Corpus of Annotated Italian Multiword Expressions==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper51.pdf</pdf>
<pre>
    Language resources for Italian: towards the development of a corpus of
                  annotated Italian multiword expressions
                  Shiva Taslimipoor                  Anna Desantis, Manuela Cherchi
           University of Wolverhampton, UK              University of Sassari, Italy
                shiva.taslimi@wlv.ac.uk                  annadesantis_91@libero.it,
                                                        manuealacherchi82@gmail.com


                    Ruslan Mitkov                              Johanna Monti
           University of Wolverhampton, UK         "L’Orientale" University of Naples, Italy
                   r.mitkov@wlv.ac.uk                           jmonti@unior.it


                     Abstract                       ever, despite being desiderata for linguistic anal-
                                                    ysis and language learning, as well as for train-
     English. This paper describes the first        ing and evaluation of NLP tasks such as term ex-
     resource annotated for multiword expres-       traction (and Machine Translation in multilingual
     sions (MWEs) in Italian. Two versions of       scenarios), resources annotated with MWEs are a
     this dataset have been prepared: the first     scarce commodity (Schneider et al., 2014b). The
     with a fast markup list of out-of-context      need for such types of resources is even greater for
     MWEs, and the second with an in-context        Italian which does not benefit from the variety and
     annotation, where the MWEs are entered         volume of resources as does English.
     with their contexts. The paper also dis-
     cusses annotation issues and reports the          This paper outlines the development of a new
     inter-annotator agreement for both types       language resource for Italian, namely a corpus an-
     of annotations. Finally, the results of        notated with Italian MWEs of a particular class:
     the first exploitation of the new resource,    verb-noun expressions such as fare riferimento,
     namely the automatic extraction of Italian     dare luogo and prendere atto. Such colloca-
     MWEs, are presented.                           tions are reported to be the most frequent class of
                                                    MWEs and of high practical importance both for
     Italiano.     Questo contributo descrive       automatic translation and language learning. To
     la prima risorsa italiana annotatata con       the best of our knowledge, this is the first resource
     polirematiche. Sono state preparate due        of this kind in Italian.
     versioni del dataset: la prima con una             The development of this corpus is part of a mul-
     lista di polirematiche senza contesto, e       tilingual project addressing the challenge of com-
     la seconda con annotazione in contesto.        putational treatment of MWEs. It covers English,
     Il contributo discute le problematiche         Spanish, Italian and French and its goal is to de-
     emerse durante l’annotazione e riporta         velop a knowledge-poor methodology for auto-
     il grado di accordo tra annotatori per         matically identifying MWEs and retrieving their
     entrambi i tipi di annotazione. Infine         translations (Taslimipoor et al., 2016) for any pair
     vengono presentati i risultati del primo       of languages. The developed methodology will
     impiego della nuova risorsa, ovvero            be used for Machine Translation and multilin-
     l’estrazione automatica di polirematiche       gual dictionary compilation, and also in computer-
     per l’italiano.                                aided tools to support the work of language learn-
                                                    ers and translators.
1    Rationale
                                                      Two versions of the above resource have been
Multiword expressions (MWEs) are a pervasive        produced. The first version consists of lists
phenomenon in language with their computational     of MWEs annotated out-of-context with a view
treatment being crucial for users and NLP appli-    to performing fast evaluation of the developed
cations alike (Baldwin and Kim, 2010; Granger       methodology (out-of-context mark-up). The sec-
and Meunier, 2008; Monti et al., 2013; Monti and    ond version consists of annotated MWEs along
Todirascu, 2015; Seretan and Wehrli, 2013). How-    with their concordances (in-context annotation).
The latter type of annotation is time-consuming,                 MWEs in several languages. The shared task,
but provides the contexts for the MWEs annotated.                while having interesting discussions on the area,
                                                                 has embarked upon the labour-intensive annota-
2    Annotation of MWEs: out-of-context                          tion of verbal MWEs.
     mark-up and in-context annotation                              Since there is no list of verb-noun MWEs in
After more than two decades of computational                     Italian, we first automatically compile a list of
studies on MWEs, the lack of a proper gold stan-                 such expressions, to be annotated by human ex-
dard is still an issue. Lexical resources like dic-              perts. This is based on previous attempts at ex-
tionaries have limited coverage of these expres-                 tracting a lexicon of MWEs (as in (Villavicencio,
sions (Losnegaard et al., 2016) and there is also                2005)). Annotators are not provided with any con-
no proper tagged corpus of MWEs in any language                  text and hence the task is more feasible in terms
(Schneider et al., 2014b).                                       of time. Human annotators are asked to label the
   Most previous studies on the computational                    expressions as MWEs only if they have sufficient
treatment of MWEs have focused on extracting                     degrees of idiomaticity. In other words, a Verb +
types (rather than tokens)1 of MWEs from corpora                 Noun MWEs does not convey literal meaning in
(Ramisch et al., 2010; Villavicencio et al., 2007;               that the verb is delexicalised.
Rondon et al., 2015; Salehi and Cook, 2013). The                    However, we believe that idiomaticity is not a
widely-used toolboxes of MWEToolkit (Ramisch                     binary property; rather it is known to fall on a con-
et al., 2010) or Xtract (Smadja, 1993) extract ex-               tinuum from completely semantically transparent,
pressions if their statistical occurrences represent             or literal, to entirely opaque, or idiomatic (Fa-
the likelihood of them being MWEs. The evalu-                    zly et al., 2009). This makes the task of out-of-
ation for the type-based extraction of MWEs has                  context marking-up of the expression more chal-
been mostly performed against a dictionary (de                   lenging for annotators, since they have to pick a
Caseli et al., 2010), lexicon (Pichotta and DeN-                 value according to all the possible contexts of a
ero, 2013) or list of human-annotated expressions                target expression. This ambiguity and the fact that
(Villavicencio et al., 2007). However, there are                 there are many expressions that in some contexts
some examples like the expression have a baby,                   are MWEs and in some contexts not, prompted us
which in exactly the same form and structure,                    to initiate a subsequent annotation where MWEs
might be an MWE (meaning to give birth ) in some                 are tagged in their contexts. The idea is to ex-
contexts and a literal expression in others.                     tract the concordances around all the occurrences
   As for the automatic identification of the tokens             of a Verb + Noun expression and provide annota-
of MWEs, Fazly et al. (2009) make use of both                    tors with these concordances in order to be able
linguistic properties and the local context, in de-              to decide the degree of idiomaticity of the specific
termining the class of an MWE token. They re-                    verb-noun expression. We compare the reliability
port an unsupervised approach to identifying id-                 of the in-context and out-of-context annotations by
iomatic and literal usages of an expression in con-              way of the agreement between annotators.
text. Their method is evaluated on a very small
sample of expressions in a small portion of the                  2.1   Experimental expressions
British National Corpus (BNC), which were anno-                  Highly polysemous verbs, such as give and take
tated by humans. Schneider et al. (2014a) devel-                 in English and fare and dare in Italian widely par-
oped a supervised model whose purpose is to iden-                ticipate in Verb+Noun MWEs, in which they con-
tify MWEs in context. Their methodology results                  tribute a broad range of figurative meanings that
in a corpus of automatically annotated MWEs. It                  must be recognised (Fazly et al., 2007). We fo-
is not clear, however, if the methodology is able                cus on four mostly frequent Italian verbs: fare,
to tag one specific expression as an MWE in one                  dare, prendere and trovare. We extract all the oc-
context and non-MWE in another. The PARSEME                      currences of these verbs when followed by any
shared task2 is also devoted to annotating verbal                noun, from the itWaC corpus (Baroni and Kilgar-
   1
     Type refers to the canonical form of an expression, while   riff, 2006), using SketchEngine (Kilgarriff et al.,
token refers to each instance (usage) of the expression in any   2004). For the first experiment all the Verb+Noun
morphological form in text.                                      types are extracted when the verb is lemmatised;
   2
     http://typo.uni-konstanz.de/parseme/index.php/2-general/
142-parseme-shared-task-on-automatic-detection-of-verbal-mwes    and for the second experiment all the concor-
dances of these verbs when followed by a noun            Table 1: Annotation details (A: Annotator)
are generated.
                                                         Annotation                 tag 0        tag 1 tag 2
2.2   Out-of-context mark-up of Verb+Noun(s)               task        A                       (MWE)
The extraction of Verb+Noun candidates of the                         1st          2,491          792    92
                                                       Out-of-context nd
four verbs in focus and the removal of the expres-                    2            2,112         1,127  136
sions with frequencies lower than 20, results in a                    1st          10,478       19,616    -
                                                        In-context
dataset of 3, 375 expressions. Two native speak-                      2nd          9,058        21,036    -
ers annotated every candidate expression with 1
for an MWE if the expression was idiomatic and                Table 2: Inter-annotator agreement
with 0 for a non-MWE if the expression was lit-
                                                               Annotation    Kappa Observed
eral. We have also defined the tag 2 for the ex-
                                                                   task            Agreement
pressions that in some contexts behave as MWEs
                                                              Out-of-context 0.40    0.73
and in others do not, e.g. dare frutti, which has
                                                               In-context     0.65   0.85
a literal usage that means to produce fruits but in
some contexts means to produce results and is an
MWE in these contexts. While this out-of-context      cerned with abstract nouns. The annotation of ex-
‘fast track’ annotation procedure saves time and      pressions composed of a verb followed by a noun
yields a long list of marked-up expressions, an-      with an abstract meaning is a more complicated
notators often feel uncomfortable due to the lack     process as the candidate expression may carry a
of context. The information about the agreements      figurative meaning. Each annotator uses their in-
between annotators in terms of Kappa is shown         tuition to annotate them and it leads to random
in Table 2 and is compared with the in-context an-    tags for these expression (e.g. fare notizia, dare
notation of MWEs as explained in Section 2.3.         identità, prendere possesso) when they are out-of-
                                                      context. However, in the case of in-context anno-
2.3   Annotating Verb+Noun(s) in context
                                                      tation, concordances composed of abstract nouns
We design an annotation task, in which we provide     have been annotated in the majority of cases with
a sample of all usages of any type of Verb+Noun       1 by both annotators.
expression to be annotated. For this purpose, we         In-context annotation is also very helpful for
employ the SketchEngine to list all the concor-       annotating expressions with both idiomatic and
dances of each verb when it is followed by a noun.    literal meanings. An interesting observation, re-
Concordances include the verb in focus with al-       ported in Table 3, is related to the number of ex-
most ten words before and ten words after that.       pressions that are detected with the two different
The SketchEngine reports only 100, 000 concor-        usages of idiomatic and non-idiomatic, in context.
dances for each query. Among them, we filter out
the concordances that include Verb+Noun expres-        Table 3: Statistics on the in-context annotation
sions with frequencies lower than 50 and we ran-
domly select 10% of the concordances for each                             0 tagged 1 tagged         context
verb. As a result, there are 30, 094 concordances                                                  depending
to be annotated. The two annotators annotate all       1st annotator         924         195          530
usages of Verb+Noun expressions in these concor-       2nd annotator         696         424          529
dances, considering the context that the expression
occurred in, marking up MWEs with 1 and expres-          As can be seen in Table 3,3 among the 1, 649
sions which are not MWEs, with 0. Table 1 re-         types of expressions in concordances, 530 (32%)
ports on the details of annotation tasks and Table    of them could be MWEs in some context and non-
2 shows the agreement details for them.               MWEs in others (context-depending), according
                                                      to the first annotator. This annotator has annotated
2.4   Discussion
                                                      only 3% of the expressions with tag ‘2’ without
As seen in Table 2, the inter-annotator agreement     context.
is significantly higher when annotating the expres-      3
                                                           Note that the numbers in Table 3 cannot be interpreted
sions in context. One of the main causes of dis-      to validate agreement between annotators, i.e. no conclusion
agreements in out-of-context annotation is con-       about agreement can be derived from 3.
3     First use of the MWE resource:                       Table 4: 11-p IAP          Table 5: Accuracy of
      comparative evaluation of the                        for ranking MWEs           AMs in classifying us-
      automatic extraction of Italian MWEs                 using different AMs        ages of Verb+Noun(s).

In our multilingual project (see Section 1) we re-         AMs           11-p IAP     AMs           Accuracy
gard the automatic translation of MWEs as a two-           Freq            0.49       Freq            0.72
stage process. The first stage is the extraction of        MI3             0.51       MI3             0.68
MWEs in each of the languages; the second stage            log-likelihood 0.49        log-likelihood 0.72
is a matching procedure for the extracted MWEs in          Salience        0.49       Salience        0.69
each language which proposes translation equiv-            log-dice        0.48       log-dice        0.67
alents. In this study the extraction of MWEs is            T-Score         0.49       T-Score         0.69
based on statistical association measures (AMs).
   These measures have been proposed to deter-
mine the degree of compositionality, and fixedness         as the highest precision found for any recall level
of expressions. The more compositional or fixed            r0 ≥ r. The average of these 11 points is reported
expressions are, the more likely it is that they are       as 11-p IAP in Table 4.
MWEs (Evert, 2008; Bannard, 2007). According                  As can be seen in Table 4, the selected asso-
to Evert (2008), there is no ideal association mea-        ciation measures generally perform with similar
sure for all purposes. We aim to evaluate AMs              performance in ranking this type of MWEs, with
as a baseline approach against the annotated data          M I3 performing slightly better than others.
which we prepared. We focus on a selection of
                                                           3.2   Experiments on token-based
five AMs which have been more widely discussed
                                                                 identification of MWEs
to be the best measures to identify MWEs. These
are: MI3 (Oakes, 1998), log-likelihood (Dun-               In the second experiment, we seek to establish
ning, 1993), T-score (Krenn and Evert, 2001), log-         the effect of these measures on identifying the us-
Dice (Rychlý, 2008) and Salience (Kilgarriff et al.,       ages of MWEs in our dataset of in-context an-
2004) all as defined in SketchEngine. We compare           notations. We set a threshold for each score
the performance of these AMs and also frequency            that we have computed for Verb+Noun expres-
of occurrence (Freq) as the sixth measure to rank          sion types. By setting thresholds we compute the
the candidate MWEs. We evaluate the effect of              classification accuracy of the measures to iden-
these measures in ranking MWEs on both kinds of            tify MWEs among the usages of Verb+Noun ex-
datasets.                                                  pressions in a corpus. Specifically, each candidate
                                                           of a Verb+Noun in the concordances is automat-
3.1    Experiments on type-based extraction of             ically tagged as an MWE if its lemmatised form
       MWEs                                                has a score higher than the threshold, and as a non-
In the first experiment, the list of all extracted Verb    MWE, otherwise. For each measure, we compute
+ Noun combinations (as explained in Section 2.1)          the arithmetic mean (average) of all the values of
are ranked according to the above measures that            that measure for all expressions, and set the re-
are computed from itWaC as a reference corpus.             sulted average value as a threshold.
To perform the evaluation against the list of an-             The accuracies of classifying the candidate
notated expressions, we process all 2,415 expres-          Verb+Noun expressions are computed based on
sions for which the annotators agreed on tags 0            the human annotations of the concordances and
or 1. After ranking the expressions by the mea-            are shown in Table 5. The classification accura-
sures, we examine the retrieval performance of             cies of AMs are also very close to each other (see
each measure by computing the 11-point Interpo-            Table 5); however, this time Log-likelihood and
lated Average Precision (11-p IAP). This reflects          F req fare slightly better than others in classifying
the goodness of a measure in ranking the relevant          tokens of Verb+Noun expressions.
items (here, MWEs) before the irrelevant ones. To
this end, the interpolated precision at the 11 re-         3.3   Usage-related features
call values of 0, 10%, ..., 100% is calculated. As         Our new resource of concordances contains use-
detailed in Manning et al. (2008), the interpo-            ful linguistic information related to usages of ex-
lated precision at a certain recall level, r, is defined   pressions and as such important features can be
extracted from the resource to help identifying         References
MWEs. One of these features can be obtained             Timothy Baldwin and Su Nam Kim. 2010. Multi-
from the statistics of different possible inflections     word expressions. In Handbook of Natural Lan-
of the verb component of an expression. Based on          guage Processing, second edition., pages 267–292.
the premise of the fixedness of MWEs, we expect           CRC Press.
that the verb component of a verb-noun MWE oc-          Colin Bannard. 2007. A measure of syntactic flexibil-
curs only in a limited number of inflections. We          ity for automatically identifying multiword expres-
implement this feature by dividing the frequency          sions in corpora. In Proceedings of the Workshop
                                                          on a Broader Perspective on Multiword Expressions,
of occurrences of each expression by the number           pages 1–8. Association for Computational Linguis-
of inflections that the verb component occurs in.         tics.
Note that to count the number of different inflec-
                                                        Marco Baroni and Adam Kilgarriff. 2006. Large
tions of the verb component, we rely on the sub-         linguistically-processed web corpora for multiple
corpus of concordances that we gathered.                 languages. In Proceedings of the Eleventh Confer-
   We evaluate this approach only on 1,077 ex-           ence of the European Chapter of the Association for
                                                         Computational Linguistics: Demonstrations, EACL
pressions that occur in concordances. We rank
                                                         ’06, pages 87–90, Stroudsburg, PA, USA. Associa-
the expressions according to this newly computed         tion for Computational Linguistics.
score and we call this score, which depends on the
inflection varieties, INF-VAR. For all verbs, the       Helena Medeiros de Caseli, Carlos Ramisch, Maria das
                                                          Graças Volpe Nunes, and Aline Villavicencio. 2010.
INF-VAR performs comparably to Frequency in               Alignment-based extraction of multiword expres-
ranking MWEs higher than non-MWEs, but for                sions. Language resources and evaluation, 44(1-
the verb trovare, we obtain better 11-p IAP using         2):59–77.
this score than by using Frequency (see Table 6).       Ted Dunning. 1993. Accurate methods for the
                                                          statistics of surprise and coincidence. COMPUTA-
Table 6: Performance of new scores in ranking             TIONAL LINGUISTICS, 19(1):61–74.
MWEs in terms of 11-p IAP.                              Stefan Evert. 2008. Corpora and collocations. In Cor-
                                                           pus Linguistics. An International Handbook, vol-
                         total   trovare                   ume 2, pages 1212–1248.
           Frequency     0.57      0.44
                                                        Afsaneh Fazly, Suzanne Stevenson, and Ryan North.
           INF-VAR       0.58      0.48                   2007. Automatically learning semantic knowledge
                                                          about multiword predicates. Language Resources
                                                          and Evaluation, 41(1):61–89.
                                                        Afsaneh Fazly, Paul Cook, and Suzanne Stevenson.
4   Conclusions and future work                           2009. Unsupervised type and token identification of
                                                          idiomatic expressions. Computational Linguistics,
In this paper, we outline our work towards a gold-        35(1):61–103.
standard dataset which is tagged with Italian verb-
                                                        Sylviane Granger and Fanny Meunier. 2008. Phrase-
noun MWEs along with their contexts. We show              ology: an interdisciplinary perspective. John Ben-
the reliability of this dataset by its considerable       jamins Publishing Company.
inter-annotator agreement compared to the moder-
                                                        Adam Kilgarriff, Pavel Rychlý, Pavel Smrz, and David
ate inter-annotator agreement on annotated verb-          Tugwell. 2004. The sketch engine. In EURALEX
noun expressions presented without context. We            2004, pages 105–116, Lorient, France.
also report the results of automatic extraction of
                                                        Brigitte Krenn and Stefan Evert. 2001. Can we do
MWEs using this dataset as a gold-standard. One           better than frequency? a case study on extracting
of the advantages of this dataset is that it includes     pp-verb collocations. Proceedings of the ACL Work-
both 0-tagged and 1-tagged tokens of expressions          shop on Collocations, pages 39–46.
and it can be used for classification and other sta-    Gyri Smørdal Losnegaard, Federico Sangati,
tistical NLP approaches. In future work, we are           Carla Parra Escartín, Agata Savary, Sascha
interested in extracting context features from con-       Bargmann, and Johanna Monti. 2016. Parseme
cordances in this resource to automatically recog-        survey on mwe resources.      In Proceedings of
                                                          the Tenth International Conference on Language
nise and classify the expressions that are MWEs in        Resources and Evaluation (LREC 2016), Paris,
some contexts but not MWEs in others.                     France. European Language Resources Association
                                                          (ELRA).
Christopher D Manning, Prabhakar Raghavan, and           Violeta Seretan and Eric Wehrli. 2013. Syntactic
  Hinrich Schütze. 2008. Introduction to information       concordancing and multi-word expression detection.
  retrieval. Cambridge University Press.                   International Journal of Data Mining, Modelling
                                                           and Management, 5(2):158–181.
Johanna Monti and Amalia Todirascu. 2015. Multi-
  word units translation evaluation in machine trans-    Frank Smadja. 1993. Retrieving collocations from
  lation: another pain in the neck? In Proceedings         text: Xtract. Computational Linguistics, 19:143–
  of MUMTTT workshop, Corpas Pastor G, Monti J,            177.
  Mitkov R, Seretan V (eds) (2015), Multi-word Units
  in Machine Translation and Translation Technology.     Shiva Taslimipoor, Ruslan Mitkov, Gloria Corpas Pas-
                                                           tor, and Afsaneh Fazly. 2016. Bilingual contexts
                                                           from comparable corpora to mine for translations of
Johanna Monti, Ruslan Mitkov, Gloria Corpas Pastor,
                                                           collocations. In Proceedings of the 17th Interna-
  and Violeta Seretan. 2013. Multi-word units in ma-
                                                           tional Conference on Intelligent Text Processing and
  chine translation and translation technologies.
                                                           Computational Linguistics, CICLing’16. Springer.
Michael P. Oakes. 1998. Statistics for Corpus Linguis-   Aline Villavicencio, Valia Kordoni, Yi Zhang, Marco
  tics. Edinburgh: Edinburgh University Press.             Idiart, and Carlos Ramisch. 2007. Validation and
                                                           evaluation of automatically acquired multiword ex-
Karl Pichotta and John DeNero. 2013. Identify-             pressions for grammar engineering. In EMNLP-
  ing phrasal verbs using many bilingual corpora. In       CoNLL, pages 1034–1043.
  Proceedings of the 2013 Conference on Empirical
  Methods in Natural Language Processing (EMNLP          Aline Villavicencio. 2005. The availability of verb–
  2013), Seattle, WA, October.                             particle constructions in lexical resources: How
                                                           much is enough? Computer Speech & Language,
Carlos Ramisch, Aline Villavicencio, and Christian         19(4):415–432.
  Boitet. 2010. mwetoolkit: a Framework for Mul-
  tiword Expression Identification. In Proceedings of
  the Seventh International Conference on Language
  Resources and Evaluation (LREC 2010), Valetta,
  Malta, May. European Language Resources Asso-
  ciation.

Alexandre Rondon, Helena Caseli, and Carlos
  Ramisch. 2015. Never-ending multiword expres-
  sions learning. In Proceedings of the 11th Work-
  shop on Multiword Expressions, pages 45–53, Den-
  ver, Colorado, June. Association for Computational
  Linguistics.

Pavel Rychlý. 2008. A lexicographer-friendly asso-
  ciation score. In RASLAN 2008, pages 6–9, Brno.
  Masarykova Univerzita.

Bahar Salehi and Paul Cook. 2013. Predicting
  the compositionality of multiword expressions us-
  ing translations in multiple languages. Second Joint
  Conference on Lexical and Computational Seman-
  tics (* SEM), 1:266–275.

Nathan Schneider, Emily Danchik, Chris Dyer, and
  Noah A. Smith. 2014a. Discriminative lexical se-
  mantic segmentation with gaps: Running the MWE
  gamut. TACL, 2:193–206.

Nathan Schneider, Spencer Onuffer, Nora Kazour,
  Emily Danchik, Michael T. Mordowanec, Henrietta
  Conrad, and Noah A. Smith. 2014b. Comprehen-
  sive annotation of multiword expressions in a so-
  cial web corpus. In Proceedings of the Ninth In-
  ternational Conference on Language Resources and
  Evaluation (LREC’14), pages 455–461, Reykjavik,
  Iceland. European Language Resources Association
  (ELRA).

</pre>