Tagging Semantic Types for Verb Argument Positions Francesca Della Moretta Anna Feltracco University of Pavia / Pavia, Italy Fondazione Bruno Kessler / Trento, Italy francesca.dellamoretta01 University of Pavia / Pavia, Italy @universitadipavia.it University of Bergamo / Bergamo, Italy feltracco@fbk.eu Elisabetta Jezek Bernardo Magnini University of Pavia / Pavia, Italy Fondazione Bruno Kessler / Trento, Italy jezek@unipv.it magnini@fbk.eu Abstract sets (Hanks and Jezek, 2008) (Jezek and Hanks, 2010). However, despite the large theoretical in- English. Verb argument positions can be terest, there is still a limited amount of empiri- described by the semantic types that char- cal evidences (e.g. annotated corpora) that can be acterise the words filling that position. We used to support linguistic theories. Particularly, for investigate a number of linguistic issues the Italian language, there has been no systematic underlying the tagging of an Italian corpus attempt to annotate a corpus with semantic tagging with the semantic types provided by the of verb argument positions T-PAS (Typed Predicate Argument Struc- ture) resource. We report both quantita- In this paper we assume a corpus-based per- tive data about the tagging and a qualita- spective, and we focus on manually tagging verb tive analysis of cases of disagreement be- argument positions in a corpus with their corre- tween two annotators. sponding semantic classes, selected from those used in the T-PAS resource (Jezek et al., 2014). Italiano. Le posizioni argomentali di un We make use of an explicit set of semantic cate- verbo possono essere descritte dai tipi se- gories (i.e., an ontology of Semantic Types), hi- mantici che caratterizzano le parole che erarchically organised (e.g. inanimate subsumes riempiono quella posizione. Nel contrib- food): we are interested in a qualitative analy- uto affrontiamo alcune problematiche lin- sis, a rather different perspective with respect to guistiche sottostanti l’annotazione di un recent works that exploit distributional properties corpus italiano con i tipi semantici usati of words filling argument positions (Ponti et al., nella risorsa T-PAS (Typed Predicate Ar- 2016; Ponti et al., 2017). We run a pilot annotation gument Structure). Riportiamo sia dati on a corpus of sentences. We aim at investigat- quantitativi relativi all’annotazione, sia ing how human annotators assign semantic types una analisi qualitativa dei casi di disac- to argument fillers, and to what extent they agree cordo tra due annotatori. or disagree. A mid term goal of this work is the extension of 1 Introduction the T-PAS resource with a corpus of annotated sen- Words that fill a certain verb argument position tences aligned with the T-PASs of the verbs (see are characterised for their semantic properties. section 2). This would have a twofold impact: For instance, the fillers of the object position of it would allow a corpus based linguistic investi- the verb “eat” are typically required to share the gation, and it would provide a unique dataset for fact that they are edible objects, like “meat” and training semantic parsers for Italian. “bread”. There has been a vast literature in lexi- The paper is structured as follows. Section 2 cal semantics addressing, under different perspec- introduces T-PAS and the ontology of semantic tives, this issue, including the notion of selec- types used in the resource. Section 3 describes tional preferences (Resnik, 1997) (McCarthy and the annotation task and the guidelines for annota- Carroll, 2003), the notion of prototypical cate- tors. Section 4 presents the annotated corpus and gories (Rosch, 1973), and the notion of lexical the data of the inter-annotator agreement. Finally, Section 5 discusses the most interesting phenom- che vendeva anche .prodotti 1 . . . . . . . . tipici” ..... ena that emerged during the annotation exercise. We annotate the content word(s) that is the head-noun both in case of the noun-phrases (NP) 2 Overview of the T-PAS resource (e.g. give a cake) . . . . . and in case of prepositional- The T-PAS resource is an inventory of 4241 phrases (PP) (e.g. give a cake . . . . In . . . . . to his little son). Typed Predicate Argument Structures (T-PASs) - the case the head-noun is a quantifier, the quanti- for example [[Human]] partecipa a ‘takes part fier is not tagged but the quantified element is (e.g. in’ [[Event]] - for 1000 average polysemy Ital- to give a piece of cake). ..... ian verbs, acquired from the ItWaC corpus (Baroni Notice that more than one token can be anno- and Kilgarriff, 2006) by manual clustering of dis- tated, e.g. in the case of multiword expressions tributional information about Italian verbs (Jezek such as prodotti . . . . . in Example (1), and more . . . . . . . . .tipici et al., 2014), following the Corpus Patterns Anal- than one item can be tagged for the same argument ysis (CPA) procedure (Hanks, 2004) (Hanks and position, e.g. in case of coordination, such in [..] che vendeva anche prodotti 2 Pustejovsky, 2005) which consists in recognising . . . . . . . . .tipici ......... . . . . . . e cartoline” the relevant structures of a verb and identifying In the case an argument is not present in the sen- the Semantic Types (STs) for their argument slots tence (for instance, when the subject of the verb is by generalizing over the lexical sets observed in unexpressed), we do not signal this lack. a sample of 250 concordances. The current list of On the other hand, the annotation accounts for about 230 semantic types used in the resource (e.g. the following cases. human, event, location, artifact - henceforth, STs) Semantic mismatches. Lexical items are an- is corpus derived, that is, STs are the result of man- notated according to the T-PAS; however, the an- ual generalization over the lexical sets found in the notator can use a different ST, if she/he thinks the argument positions in the concordances, for exam- one specified in the T-PAS does not apply. For ple in the [[Event]] argument position of parte- instance, Example (2) reports another instance of cipare we find gara, riunione, selezione, and so T-PAS#1 of vendere in which lavoro has been an- forth. Besides the T-PASs and the hierarchically notated as [[Activity]], a ST not selected by the organized list of STs, the resource contains a cor- T-PAS#1 of vendere in object position (see the T- pus of sentences that instantiate the different T- PAS in Example (1)). PASs for each verb. Each sentence is therefore (2) “il lavoro . . . . . . . come qualsiasi altra cosa può es- currently tagged with the number of the T-PAS it sere acquistato e venduto.”3 instantiates; the tag is located on the verb. No fur- ther information is present in the instance except Syntactic mismatches. We account for cases in for the T-PAS number. which the syntactic role of the lexical items does not match with the one proposed in the T-PAS, e.g. 3 Annotating Semantic Types in cases of passive forms of verbs, where the sub- ject and prepositional phrase introduced by da cor- The main goal of the annotation effort reported respond respectively to the object and the subject in this paper is to enrich the annotation already of the active construction. In Example (2), lavoro present in the examples associated with each T- is the syntactic subject of the passive clause, and PAS. Specifically, given a T-PAS of a verb and an it is generalized by [[Activity]]) in the object posi- example from the corpus, we annotate the lexical tion of the T-PAS. In such cases we annotate both items (in the example) generalised by the STs (in the ST of the lexical item and its grammatical re- the T-PAS). lation using the one in the T-PAS. For instance, Example (1) shows the T-PAS#1 Pronouns. In case the argument of the verb is of the verb vendere (Eng. ‘to sell’), and a sentence realised as a pronoun, we tag the pronoun with- associated to it. The task consists in annotating out assigning a ST. The pronoun is then linked to prodotti tipici (Eng. ‘traditional products’) as a the noun(s) it refers to, and this noun is actually lexical item for [[Inanimate]]-obj. 1 Eng. ‘[..] the name of that Brazilian association that was (1) [[Human | Business Enterprise]] vendere selling traditional . . . . . . . . . . .products’ ......... 2 Eng. ‘[..] that was selling traditional . . . . . . . . . . . .products . . . . . . . . and . . . . . . . . . . | Animal]] [[Inanimate postcards’ . . . .3. . . . . . “[..] il nome di un’associazione brasiliana Eng. ‘jobs can be sold and bought just like anything.’ tagged with the ST label. In case the pronoun is 4.1 Inter-annotator Agreement agglutinated to the verb (i.e. it is found in the same In order to assess the reliability of the annotated token of the verb, e.g. venderla, Eng. ‘to sell it’), data, we run an Inter-Annotator Agreement (IAA) the part of the token corresponding to the pronoun test.7 We asked a second annotator to annotate is specified and, as just specified, the noun is an- a sample of 11 T-PASs associated to 3 differ- notated with the ST. ent verbs (i.e., pulire, vendere and sbottonare). Impersonal constructions. In case of imper- These verbs were chosen because they correspond sonal constructions with an indefinite pronoun, the to about 10% of the annotated sentences. More- pronoun is annotated and the ST it refers to is spec- over, we selected them because they present a low ified: e.g. In Germania [..] si vende a 10 euro al or middle degree of polysemy with respect of the chilo 4 , si is annotated with [[Human]]. group of 25 verbs initially annotated. The second We annotated the examples in T-PAS using CAT annotator was provided with the task guidelines (Content Annotation Tool)5 , a general-purpose and a training session was done to solve potential text annotation tool (Bartalesi Lenzi et al., 2012). uncertainties in annotation. The second annotator was trained on a selection of corpus instances de- 4 Results of the Pilot Annotation rived from verb lemmas, which are not included in the evaluation we report here. The pilot annotation consisted in a selection of Table 2 shows the results of the IAA for each 3554 sentences extracted from the current version T-PAS. We measured both the agreement on argu- of T-PAS6 associated to 25 Italian verbs, selected ment annotation, calculated with the Dice’s coeffi- with different levels of polysemy (from a mini- cient (Rijsbergen, 1979), and the agreement on ST mum of 2 to a maximum of 10 T-PASs), and ar- annotation, calculated as the accuracy (Manning et gument structure. The average polysemy of the 25 al., 2008) among the two annotators. As reported verbs (i.e. number of senses divided by the num- in the last row of Table 2, the average agreement ber of verbs) is 4.08, and for each T-PAS (sense) is 0.87 for argument annotation, and 0.83 for ST we have an average of 34.84 annotated sentences. annotation. The annotation was carried out by a master stu- dent in linguistics, who was trained on the T-PAS Argument ST T-PAS Dice’s value Accuracy resource, but had no previous experience in anno- Pulire, T-PAS#1 0.83 0.74 tation. The annotator was able to tag the 3554 sen- Pulire, T-PAS#2 1 1 tences in one month. Sbottonare, T-PAS#1 0.94 0.89 Table 1 shows the main data of the pilot anno- Sbottonare, T-PAS#2 0.95 0.98 tation. Overall, we annotated 5342 argument po- Sbottonare, T-PAS#3 1 1 sitions expressed in the 3554 sentences, with an Sbottonare, T-PAS#4 0.88 0.90 average of 1.5 argument per sentence. Out of the Vendere, T-PAS#1 0.87 0.81 Vendere, T-PAS#2 0.33 0.5 230 Semantic Types available in the T-PAS ontol- Vendere, T-PAS#3 0.8 1 ogy, 99 have been selected during the annotation, Vendere, T-PAS#4 1 1 which means that we used about 40% of the STs Vendere, T-PAS#5 1 1 contained in the hierarchy. Overall average 0.87 0.83 Data Total Table 2: Inter Annotator Agreement. # Verbs 25 # T-PASs 102 # Examples 3554 A special case is vendere T-PAS#2, which shows # Examples per T-PAS 34.84 the lowest score for both argument and STs anno- # Semantic Types used 99 tation. The annotation task allowed annotators to Table 1: Pilot annotation results. discard sentences which according to their opin- ion did not fit the sense of the T-PAS taken into consideration. Vendere T-PAS#2 has only a few 4 Eng. ‘In Germany, they sell it at 10 euro per kilo’. corpus instances, which were mostly discarded or 5 https://dh.fbk.eu/resources/ cat-content-annotation-tool 7 Cinková et al. (2012) held an IAA on pattern- 6 http://tpas.fbk.eu identification using the CPA procedure in 30 English verbs. tagged differently by the two annotators, causing ST Expected ST used T-PAS according to the T-PAS A+B low agreement in the results for this T-PAS. Pulire, T-PAS#1 4 23 Pulire, T-PAS#2 3 4 5 Discussion Sbottonare, T-PAS#1 2 6 Sbottonare, T-PAS#2 2 4 This Section discusses the most interesting phe- Sbottonare, T-PAS#3 1 1 Sbottonare, T-PAS#4 1 4 nomena that emerged during the annotation ex- Vendere, T-PAS#1 4 23 ercise, particularly in light of the Inter-annotator Vendere, T-PAS#2 2 3 Agreement. Vendere, T-PAS#3 3 3 Vendere, T-PAS#4 1 1 Vendere, T-PAS#5 1 1 5.1 Discussion: Argument Tagging In this paragraph, we focus on the disagreements Table 3: Expected and used STs in the IAA test. we found in argument tagging. The annotation task was difficult because the annotators had to identify the semantic structure of the verbs, using specifically this correlation is shown by pulire syntactic criteria to distinguish whether a lexical T-PAS#1, sbottonare T-PAS#1,#4, vendere T- element was an argument or not. PAS#1. There are a number of reasons that jus- Annotating pronouns was also a very demand- tify this STs usage. In some cases one annotator ing process since it implies the identification of tends to tag the entity denoted by single lexical co-reference chains. Differences in argument an- items instead of the generalisations made by the T- notation between the two annotators, that impact PASs. This causes a sentence specific annotation the arguments Dice score, lie mainly in the an- that employs STs that are end nodes in the hier- notation of pronouns and in the identification of archy, which do not correspond to the ones in the co-referents. One annotator usually tends to an- reference T-PAS. As future work, we plan to de- notate all the pronouns contained in an utterance velop a methodology to normalize the STs to the whereas the other tags only the pronoun which appropriate level of abstraction. is an argument of the verb taken into considera- There are also linguistic reasons that intervene tion. In addition, one usually does not identify in the assignment of different STs to the same lex- co-referents which are lexically realised at great ical element. Annotators captured repeatedly the distance of words from the tagged verb, whereas phenomenon known as inherent polysemy by tag- the other sometimes annotates co-referents even if ging the same lexical elements in two totally dif- the argument has already been identified. There ferent ways. An inherent polysemous noun de- are also differences concerning the extension of notes, depending on the context, a single aspect annotation e.g. one interpreted prodotti tipici as of an entity which is inherently complex, i.e. that multiword expression and the other did not. Over- can be described simultaneously by more than all, we obtained good agreement results, although one ST (see (Jezek, 2016) and references therein). some disagreements still remain even if we tried to An example is provided by the nouns that de- reduce potential differences in annotation treating note countries that in our annotation exercise have as many cases as possible in the guidelines. been tagged as [[Business Enterprise]], [[Institu- tion]] or [[Area]], pointing out their complex na- 5.2 Discussion: Semantic Type Tagging ture of territorial, politic and economic entity. In The main goal of this section is to analyse the re- some cases annotators have privileged different sults of IAA on ST selection. Annotators used semantic components in the ST annotation pro- approximately 40 STs even though their expected cess. This is due to the context in which the words number (according to the T-PAS resource) was 11. are embedded, that determines certain interpreta- Table 3 represents the ST usage in the IAA exper- tions instead of others. However, sometimes the iment for each T-PAS. compositionality principle does not strictly define Annotators used approximately the expected the meaning of an utterance. Hence some lexical number of semantic types with some T-PASs, items remain underspecified so that they can re- while with others they used many more. To ceive more than one ST at once. a higher number of STs employed corresponds For instance in example (3) one annotator a lower ST accuracy score (see Table 1), more tagged lente as [[Artifact]] highlighting its nature of manufactured object, whereas the other has an- Patrick Hanks and James Pustejovsky. 2005. A pattern notated the lexical item as [[Physical Object Part]] dictionary for natural language processing. Revue française de linguistique appliquée, 10(2):63–82. focusing on its nature of constituent element of a bigger object. Patrick Hanks. 2004. Corpus pattern analysis. In Pro- ceedings of the Eleventh EURALEX International (3) “Giles pulisce una lente . . . . . dei suoi oc- Congress. chiali.”8 Elisabetta Jezek and Patrick Hanks. 2010. What lex- ical sets tell us about conceptual categories. Lexis, Moreover, there are differences is ST assignment 4(7):22. caused by regular polysemy (Apresjan, 1974), systematic alternation of meaning that apply to Elisabetta Jezek, Bernardo Magnini, Anna Feltracco, Alessia Bianchini, and Octavian Popescu. 2014. T- classes of words (Jezek, 2016). IAA results reveal PAS: a resource of corpus-derived types predicate- regular polysemy patterns for nouns. argument structures for linguistic analysis and se- mantic processing. In Proceedings of the Ninth In- 6 Conclusions ternational Conference on Language Resources and Evaluation (LREC’14). We performed a pilot experiment to tag the ar- Elisabetta Jezek. 2016. The lexicon: an introduction. guments of verbs, as recorded in the T-PAS re- Oxford University Press. source, with their associated semantic type. We obtained good result in the annotation. By analyz- Christopher D. Manning, Prabhakar Raghavan, and ing the cases of inter annotator disagreement, we Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, were able to identify phenomena which lie at the NY, USA. core of such disagreements, such as the presence of inherent polysemous words. Ongoing work in- Diana McCarthy and John Carroll. 2003. Disam- biguating nouns, verbs, and adjectives using auto- cludes spelling out the rules for polysemous words matically acquired selectional preferences. Compu- tagging more clearly in the guidelines. tational Linguistics, 29(4):639–654. Edoardo Maria Ponti, Elisabetta Jezek, and Bernardo References Magnini. 2016. Grounding the lexical sets of causative-inchoative verbs with word embedding. In Iurii Derenikovich Apresjan. 1974. Regular polysemy. Proceedings of the Second Italian Conference on Linguistics, 32. Computational Linguistic (CLiC-it 2016). Marco Baroni and Adam Kilgarriff. 2006. Large Edoardo Maria Ponti, Elisabetta Jezek, and Bernardo linguistically-processed web corpora for multiple Magnini. 2017. Distributed representations of lex- languages. In Proceedings of the Eleventh Confer- ical sets and prototypes in causal alternation verbs. ence of the European Chapter of the Association for Italian Journal of Computational Linguistics, to ap- Computational Linguistics: Posters & Demonstra- pear. tions, pages 87–90. Association for Computational Philip Resnik. 1997. Selectional preference and sense Linguistics. disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Valentina Bartalesi Lenzi, Giovanni Moretti, and Why, What, and How, pages 52–57. Rachele Sprugnoli. 2012. Cat: the celct annota- tion tool. In Proceedings of the Eight International CJ van Rijsbergen. 1979. Information retrieval. 1979. Conference on Language Resources and Evaluation (LREC ‘12), pages 333–338. Eleanor H Rosch. 1973. Natural categories. Cognitive psychology, 4(3):328–350. Silvie Cinková, Martin Holub, Adam Rambousek, and Lenka Smejkalová. 2012. A database of seman- tic clusters of verb usages. In Proceedings of the Eighth International Conference on Language Re- sources and Evaluation (LREC ‘12), pages 3176– 3183. Patrick Hanks and Elisabetta Jezek. 2008. Shimmer- ing lexical sets. In Proceedings of the XIII EU- RALEX International Congress, pages 391–402. 8 Eng.‘Giles cleans a lens of his glasses’