Introduction

Two corpus based experiments with the Portuguese and English Wordnets

Alexandre Rademaker

Fabricio Chalub

Claudia Freitas

IBM Research alexrad@br.ibm.com

IBM Research fchalub@br.ibm.com

PUC-Rio claudiafreitas@puc-rio.br

This paper presents two experiments with real world applications of word sense disambiguation, wordnets and dependency parsing. The rst is an e ort towards a portuguese wordnet annotated corpus. We manually annotated 30 sentences using OpenWordNet-PT as a lexicon and then compared the results with an automatic annotation. In addition to the system's evaluation, the results provided valuable insights about how to deal with such an ambitious task. The second experiment deals with using Princeton Wordnet as part of an NLP pipeline for information extraction from technical texts in the mining domain and the issues found while integrating word sense disambiguation with a syntactic analysis of the sentences.

Introduction

In the current context of the computational processing of languages, in which systems are no longer prototypes, resources capable of handling meaning processing are in the spotlight. Such resources may take the form of semantically annotated corpora or computational lexicons, or lexical databases. For the English language, the Princeton WordNet (PWN) [ 8 ] is the canonical example of a general and robust lexical base, widely used by natural language processing (NLP) systems. For the Portuguese language, with respect to resources similar to PWN [19], we highlight OpenWordNet-PT [20] (OWN-PT)4, aligned with Princeton's WordNet, and that has 47; 700 synsets (33; 604 nouns, 6; 805 verbs, 6; 233 adjectives, and 1; 058 adverbs).

OWN-PT was chosen by Freeling Library [ 7 ] and Open Multilingual Wordnet [ 4 ]. While it does not yet have a corpus aligned to it, rst steps to such a resource were reported in [ 10 ].

The purpose of this article is to discuss the contributions of an alignment with a wordnet as one of the stages of natural language understanding. We report two studies. The rst, in Portuguese and based on OWN-PT, was carried out on journalistic texts. This experiment was rst reported in [ 10 ], published in Portuguese, and we repeat here some of its results. Building on some of the 4 Available for download at http://github.com/own-en/openWordnet-PT/ and for online navigation at http://wnpt.brlcloud.com/wn/. results of this rst exercise, we developed the second study. This time, we used English as target language and Princeton Wordnet [ 8 ]. Instead of journalistic texts, we relied on texts from a speci c domain, technical reports from Canadian mining companies.

In the second experiment, problems arising from the limitations of OWN-PT itself were eliminated and we tried to minimize the impact of polysemy, since the phenomenon is less expected in speci c domains. If the rst study is taken as an exploratory investigation of the di culty to produce a corpus annotated with OWN-PT senses, the second one is a follow-up, an investigation of how the di culties found in the rst experiment can be mitigated with the use of the Princeton Wordnet (a more mature wordnet) and a domain speci c corpus. 2

Wordnets evaluation

Many works in NLP uses a form of evaluation that measures both soundness and completeness, for which the existence of a golden data set is essential. However, for the evaluation of lexical databases, such measures are not easily applicable. In particular, what would mean the notion of completeness? The ratio of knowledge correctly acquired in relation to all the knowledge that must be acquired? The problem is how to de ne \all the knowledge that must be acquired", since the same set of facts can lead to di erent interpretations and, consequently, to di erent types of \knowledge".

Although there are attempts to evaluate wordnets or similar resources in Portuguese [19], such evaluations are always comparisons, and they do not tell us much about the intrinsic quality of each resource. In addition, we agree with [ 5 ] when they indicate that a possible evaluation criteria for ontologies is to map them to the data (a data-driven evaluation). Therefore, an alignment between existing synsets and a corpus is a good way to verify their completeness | even though we know that a corpus will always be a limited portion of the language. 3

First experiment

The Freeling library [ 7 ] has a Word Sense Disambiguation (WSD) module, which performs an alignment between the words in the text and any semantic lexical database, it is an implementation of the graph-based method for WSD proposed in [ 2 ]. In order to verify the accuracy of the automatic disambiguation system, we created an experiment in which di erent annotators should select the appropriate synset (or synsets) for a word in context. Then, we compare the results obtained from the annotators and the Freeling WSD module.

We selected 30 phrases from the Brazilian portion of the Bosque corpus, the revised part of the Foresta sinta(c)tica [ 1 ]. The choice for the Brazilian variant aims to guarantee a good attribution of the senses, since the annotators were Brazilian. In addition, we consider only the nouns, and we select sentences with at least ve nouns. The restriction on nouns is due to the well-known verbal polysemy5, which would make the task more di cult for the annotators. The total number of nouns evaluated was 226, with 204 di erent words. Each annotator received a form with the 30 sentences, and below each sentence we listed the target nouns, which in turn directed the annotators to the OWN-PT page with all the synsets in which the word parsed participated. The annotators should then select the appropriate synset, indicating in the form eld the synset code. More than one synset could be chosen, as long as both t the context equally, according to the annotators. The annotators were instructed to leave the eld blank if they did not consider any suitable synset, regardless of the nature of the inadequacy.

It should be noted that the annotators did not receive any special training that would ensure familiarity with OWN-PT. Nine undergraduate students from linguistics (translation course) and one professional translator, considered \inexperienced" annotators, participated in the study. In addition, two linguists with annotation experience also participated, the \experienced" annotators. 3.1

Results and error analysis

Using the Kappa coe cient [ 6 ], which measures the degree of agreement among annotators, we performed two types of concordance assessment: only human concordance, and human concordance versus Freeling's disambiguation module.

In the inter-annotators agreement, considering only the inexperienced annotators and only one synset per annotator6, the concordance index was 0.67. When, in the same group of annotators, we consider all the synsets chosen for the same word, the concordance index falls to 0.55. The low agreement rate is noteworthy, but it is equally astonishing that the agreement among only the experienced annotators was also 0.67.

Speci cally for experienced annotators, when we compare Freeling's WSD module and annotator 1, the concordance is 0.45; the agreement between the WSD module and the annotator 2 is 0.52; and the agreement between both scorers and the WSD module is 0.56. Because agreement was low even among experienced scorers, the evaluation with the Freeling WSD module is poorly informative with respect to system quality. That is, if among humans it is di cult to agree on the appropriateness of synset, what performance should we expect from the system?

In about 20% of the cases it was pointed out the absence of an adequate synset. This absence, in turn, does not necessarily mean a gap in OWN-PT, since the alignment of words with synsets is preceded by the steps of tokenization and lemmatization. When some of these steps fail, the meaning assignment also fails. 5 In PWN, for example, the average number of senses for verbs is 2:17 (with 36 verbs with over 20 senses), and for nouns is 1:22 (with only ve nouns with over 20 senses). 6 Throughout the evaluation, we noticed that there were more discerning annotators, who systematically chose to list all the synsets considered appropriate, as opposed to more economical annotators, who listed only the rst appropriate synset they encountered. The option of evaluating one synset per annotator sought to avoid that the divergence in the quantity of the chosen synsets in uenced the discordance.

The following are the situations in which this occurred: 1. Errors in the attribution of the part-of-speech class: six cases of mistaken nouns by adjectives or vice-versa; 2. Lemmatization errors regarding the `number' feature: there are words with slightly di erent meanings when they are in the singular or plural: \recursos" (resources) can be the plural of \recurso" (resource) but, with the sense of goods and nancial resources, it will always be used in the plural. The word \vesperas" (plural of eve) also has a less precise meaning than \vespera" (eve); 3. Tokenization errors and multiword units: it is di cult to nd the appropriate synset when it contains a multiword unit, but tokenization is done on a word per word basis [21, 22], and this happened in about 20% of non-aligned words.

We know that some of these \failures" are not exactly errors, but rather non-consensual points in NLP and that are re ected in wordnets.

Another point is the need for a more systematic treatment of pre xes and other compounds with hyphen. In our exercise it was not possible to disambiguate \super-acordo" (super-agreement), which is absent from OWN-PT and it does not seem to us that it should be di erent. On the other hand, we would like \social-democrata" (social-democrat) to be in some synset. The existence of synsets related to US politics also poses challenges in annotating texts from another culture, and it may be necessary to create synsets relevant to the Portuguese-speaking countries. Finally, we do not know how to deal with stylistic e ects, such as the use of the expression like \iron and re", in the Example 1 which refer to the expression of \iron and re", but also to the iron and re of the grill.

(1) \Iti Fuji conquista clientela a ferro e fogo. O restaurante tem seu ponto forte no balc~ao de grelhados, que se sobrep~oe aos prosaicos sushis e sashimis." (Iti Fuji wins over customers with iron and re. The restaurant has the grill as its strong point, which upstages the prosaic sushis and sashimis.)

The possibility of assigning more than one synset to a word also contributed to the low agreement. Although we are aware of the perhaps excessive granularity of PWN, and of the well-known di culty of clearly separating word senses [ 12 ], cases where more than one synset was suitable in the context of a sentence were far more common than we expected. Based on one of the experienced annotators, in at least 8% of annotated words more than one synset would be acceptable.

This last point was one of the main motivators for carrying out a second study. We know that polysemy/vagueness tends to be less frequent in terminology, and words typical of speci c domains tend to be monosemic. When dealing with a more robust wordnet (PWN), and in a speci c domain (mining), would the alignment be facilitated?

Second experiment

The second experiment reported here is part of a project of information extraction (references of entities and relation between those entities) from reports in the mining domain. Specialists in the domain have identi ed 16 PDF les to be used as seeds. These PDF les are scanned documents, and we used Apache Tika7 for text extraction followed by manual xing of the most obvious errors related to typos and formatting. The nal corpus produced from these 16 PDF les contains 2; 629 sentences and 60; 455 tokens.

The corpus was processed by a NLP pipeline composed by tokenization, sentence segmentation, lemmatization, part-of-speech tagging, word sense disambiguation against the PWN and parsing. The pipeline produces a set of parsed trees enhanced with sense annotation using the PWN synset identi ers. The resulted parsed trees were used in two di erent stages: in the rst stage, for queries for retrieving patterns of interest; in the second stage, the identi ed patterns obtained in the rst stage were used to create rules for extracting facts to populate a knowledge base (e.g. mentions of entities or relations between entities). Once the rules and the NLP pipeline are re ned, it can be reused for processing new documents to create a knowledge base. The nal goal is to create a knowledge base to allow geologists to explore the data in an e ective manner and obtain useful insights.

For example, the query <amod|<compound L=deposit (sentences that contain a token governed through a amod or compound dependency by a token with lemma `deposit')8 give us many candidates of references to styles of mineralization of rocks9: gold deposit, glacial sedimentary deposits, glacio uvial deposits, mineral deposits, Zn-Cu deposit, stratiform gold deposits, etc.

However, the word `deposition' may also be used as synonym of `deposit' both included in the synset 13462191-n. It would be desired to expand the query language to express a reference to a synset instead of only the lemmas or surface forms, i.e. <amod|<compound S=13462191-n . Another practical use of PWN synsets annotation is the canonical reference to chemical elements which are referred both by a name or a symbol: gold/Au 14638799-n, zinc/Zn 14661977-n, copper/Cu 14635722-n etc. Of course, we can also explore the hyponym/hypernym relations to query for references to any chemical element 14622893-n.

We used Freeling for the tokenization phase, POS tagging, lemmatization, sentence splitting, and word sense disambiguation. For the syntax analysis, since 7 https://tika.apache.org 8 The query language is loosely inspired by TGrep2 and TRegex, but is designed for querying general dependency graphs, see [ 13 ] and the online reference page http: //bionlp.utu.fi/searchexpressions-new.html. 9 In geology, mineralization is the deposition of economically important metals in the formation of rocks. Mineralization may also refer to the product resulting from the process of mineralization. For example, mineralization may introduce metals (such as iron) into a rock. That rock may then be referred to as possessing iron mineralization. we wanted to use the universal dependencies [18], we opted for the SyntaxNet10 parser for the ease of use and for the fact that we could customize its tokenization step. We used SyntaxNet with the pre-trained model with UD 1.3 release English corpus11. Figures 1a and 1b present examples of sentence annotated with UD 1.3 relations and POS tags. (a) The dependency tree of the sentence \This highway allows access to other major highways such as Highway 144 allowing access to other major mining centers such as Timmins and Sudbury." (b) The dependency tree of the sentence \In 2007, Dianor Resources Inc. uncovered several potential diamond-bearing conglomerates in DumasT ownship." Combining both systems posed a challenge: in order to use SyntaxNet we need to provide it a sequence of tokens that is compatible with the way the model that we are using was trained. That means that several modules had to be disabled in Freeling, including ones that a ect directly the word sense disambiguation, such as the multiword recognition and numbers detection. For example, in the sentence of Figure 1a, it is very likely that `such as' would be tokenized by Freeling as a single token such as, which di er in how UD would annotate this expression (using the mwe relation). On the other hand, we chose to keep the Freeling's Named Entity Recognition (NER) module active, and this module identi es names such as `Dianor Resources Inc.' and produce a single token (see Figure 1b). 4.1

Results and error analysis

Since both SyntaxNet and Freeling produce POS tags for each token, one of the rst obvious idea was to measure the agreement between the systems. The disagreements happens mostly between: (1) tokens that Freeling tag as NOUN but SyntaxNet tag as ADJ, ADP, ADV or AUX; and (2) Freeling tagged as VERB 10 https://github.com/tensorflow/models/tree/master/syntaxnet. 11 https://github.com/tensorflow/models/blob/master/syntaxnet/g3doc/ universal.md tokens that SyntaxNet tagged as ADJ or NOUN. Many of these errors are expected since both POS tagging models were trained with corpora from a di erent domain, Freeling was trained with the Penn TreeBank [ 16 ] and SyntaxNet was trainned with the UD 1.3 release (English corpus [23]). The errors in POS tag naturally propagate to errors in the WSD. The word `till' is one such case, in 75% of its occurrences, it was tagged by Freeling as preposition, leading to a lack of sense in PWN. Nevertheless, almost all occurrences in the corpus should be tagged as noun with the sense 15074772-n (`till' has only three noun senses in PWN). The word `fault' is another such case, with 84 occurrences, 62% were tagged as 14464203-n and only 2% received the expected sense 09278537-n (a crack in the earth's crust resulting from the displacement of one side with respect to the other), this word has 8 senses in PWN but only three were used in our corpus. Table 1 presents the words with more than 100 occurrences in the corpus and the respective used senses. High frequency words tends to be more polysemic, but Table 1 shows that, at least for the words with more than 100 occurrences in this corpus, in most cases, one sense predominate over the others. In Table 1, we also noted that the most frequent senses chosen by the Freeling's WSD module is the right one that we expected to be chosen in all occurrences, which is directly related with our expectation discussed in the Section 1. As we discussed in the end of Section 3.1, a clear distinction between word senses can be di cult to sustain { for example, the two senses for the word `rock' are both acceptable.

Freeling's POS tagger is highly dependent of its dictionary, since for every token, rst Freeling assign for each token all possible pairs of lemma/POS and only after this step the system choose the best choice in the context (based on a trained statistical model). If a word is not in the dictionary, Freeling tries to guess the possible POS tag and lemma for the token. Table 2 shows the most frequent words not found in the Freeling's dictionary. For instance, `volcanic rock' are often shortened to `volcanics' in scienti c contexts. In all occurrences of `volcanics' the POS tag was guessed correctly but the lemma was kept `volcanics', not found in the PWN, but `volcanic rock' is in the 14933314-n. This case shows that PWN is missing the word `volcanics' in this same synset.

A serious challenge for NLP understanding is how to combine word sense disambiguation and parsing. The universal dependency model, chosen as our morphossintactic framework, emphasizes single words as the logical unit of analysis. Multiword expressions are to be related to other words via speci c dependencies, such as name, compound or mwe (in UD 1.3). The motivation behind this decision is the goal to deal with discontinuous expressions that would not be possible to be joined in a single token, the main limitation of the Freeling's multiword module.

How to specify a sense that is comprised multiple words, such as 14933314-n (`volcanic rock') that in the parse trees are represented as two tokens connected by the amod relation, volcanic <amod rock?

Another case is the phrasal verb particles, annotated with the dependency relation compound:prt. This relation holds between the verb and its particle but in many cases, only the verb was tagged with a sense and we end-up losing semantics. One example is the expression `carry out' with two senses in PWN versus the verb `carry'. In our corpus we found 100 occurrences of `carry' being two of them part of the expression `carry out', in 88% of them, the word `carry' was tagged with 01449974-v (move while supporting) and 8% of them it was tagged as 02079933-v (transmit or serve as the medium for transmission), senses very similar12. When we search in PWN for senses of all pairs of tokens connected by compound:prt in our corpus, we could nd: `carry out' (two senses), `put down' (eight senses), `drop o ' ( ve senses), `open up' (seven senses), `make up' (nine senses), `follow up' (two senses), `pick up' (16 senses).

Unfortunately, the parser did not produce a consistent annotation of the expressions already presented in PWN. If we try the inverse of the previous analysis, that is, if we search how the MWE found in PWN were annotated in our corpus, we nd many di erent dependency relations being used: `carry out' (compound:prt and advmod), `drill hole' (compound), `as well' (mwe), `base metal' (compound), `up to' (mwe), `at least' (case), `east side' (amod), `be due' (cop), `representative sample' (amod).

Also with regards the way PWN deals with multiword expressions, we do nd a number of inconsistencies when attempting to verify the completeness of a particular domain. For example, we nd synsets such as 14996395-n (`porphyritic rock'), but not one for `aplitic rock'. There are adjectives for `porphyritic' and `aplitic', which suggests that we could opt out of having a noun for `porphyritic rock' and use a more compositional model, combining adjectives and nouns to form new types of rocks (sort of a special case of semantic decomposable MWEs 12 The synset 02079933-v in PWN seems to contain an error in the verb frames associated to it, the gloss suggest that `Something |s something' is missing. from [22]). While this is in line with the universal dependency model, it carries the disadvantage losing some semantic information, as there is no hierarchy in PWN for adjectives. For example, while we know that 14697485-n (`arenaceous rock') is a hyponym of 14698000-n (`sedimentary rock'), there isn't a connection between 00142040-a (`arenaceous') and 02952109-a (`sedimentary'), nor should there be, since those adjectives can be applied to other nouns, not necessarily only types of rocks. On the other hand, it is also well known that the compositional model does not work for certain types of MWEs, like `round' and `round robin'. 5

Conclusion

One important conclusion of this article can be taken from the discussion about Table 1, where we noted that the most frequent senses chosen by the Freeling's WSD module were the right ones, supporting our expectation from the introduction. It is widely recognized that the ne grained senses of PWN makes word sense disambiguation harder. We also have shown that some senses can be equally acceptable in some contexts. These observations suggest that methods taking domain information into consideration (word domain disambiguation), such as the ones explored in [ 15, 14, 11 ] as well as the use of the `Domain of synset' semantic relations from PWN or the `subject area tests' from the English Slot Grammar are all valuable alternatives for the direction sense tagging of words with PWN synsets.

Bringing together word sense disambiguation is a delicate balance that needs to consider the POS tagging and dependency training corpus, WSD algorithm and tokenization. We have shown that it is possible to combine multiple NLP pipelines to achieve this goal, at the expense of losing valuable information, such as MWE senses. But how to tag word senses together with a dependency model that favors single words as the basic lexical unit? One possible idea is that sense should be assigned to the head of the MWE and none of the other words that belong to that MWE, provided that the dependencies are correctly annotated. This di cult is an unfortunate side e ect of having each NLP step being independent of one another, with separate training models and such.

We believe that a more integrated approach of parsing, WSD and morphological analysis seems to be worth to investigate, such as the one taken by grammar based formalism like constraint grammar [ 3 ], English Slot Grammar [ 17 ] and the HPSG based English Resource Grammar [ 9 ].

We also aim to investigate possible approaches to guess the general area of a unknown word. For example, the word `ma c' does not exist in PWN, but in our corpora it is connected to a number of words in the geology domain, such as `volcanics', `rock', `gabbro', `breccia', which all have senses. 18. Nivre, J., de Marne e, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: A multilingual treebank collection. In: Chair), N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).

European Language Resources Association (ELRA), Paris, France (may 2016) 19. Oliveira, H.G., de Paiva, V., Freitas, C., Rademaker, A., Real, L., Simo~es, A.: As Wordnets do Portugu^es, vol. 7, pp. 397{424. OSLa, Oslo, Noruega (Marco 2015), https://www.journals.uio.no/index.php/osla/issue/view/100/showToc 20. de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: An Open Brazilian WordNet for Reasoning. In: Proceedings of 24th International Conference on Computational Linguistics. COLING (Demo Paper) (2012) 21. Ramisch, C.: A generic framework for multiword expressions treatment: from acquisition to applications. In: Proceedings of ACL 2012 Student Research Workshop. pp. 61{66. Association for Computational Linguistics (2012) 22. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: a pain in the neck for NLP. In: Conference on Intelligent Text Processing and Computational Linguistics. pp. 1{15. Springer Berlin, Heidelberg (2002) 23. Silveira, N., Dozat, T., de Marne e, M.C., Bowman, S., Connor, M., Bauer, J., Manning, C.D.: A gold standard dependency corpus for English. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (2014)

1. Afonso , S. , Bick , E. , Haber , R. , Santos , D. : Floresta sinta(c)tica: um treebank para o portugu^es . In: Goncalves, A. , Correia , C.N.

(eds.) Actas do XVII Encontro Nacional da Associaca~o Portuguesa de Lingu stica (

APL 2001 ). pp. 533 { 545 . APL, Lisboa, Portugal (2-4 de Outubro de 2001 2002)

2. Agirre , E. , Soroa , A. : Personalizing pagerank for word sense disambiguation . In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics . pp. 33 { 41 . Association for Computational Linguistics ( 2009 )

3. Bick , E. , Didriksen , T. : Cg-3|beyond classical constraint grammar . In: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13 , 2015 , Vilnius, Lithuania. pp. 31 { 39 . Linkoping University Electronic Press ( 2015 )

4. Bond , F. , Foster , R.: Linking and extending an open multilingual wordnet . In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics (ACL) . vol. 1 , p. 1352 { 1362 ( 2013 )

5. Brewster , C. , Alani , H. , Dasmahapatra , A. : Data driven ontology evaluation . In: In Int. Conf. on Language Resources and Evaluation ( 2004 )

6. Carletta , J. : Assessing agreement on classi cation tasks: the kappa statistic . Computational linguistics 22(2) , 249 { 254 ( 1996 )

7. Carreras , X. , Chao , I. , Padro , L. , Padro , M. : Freeling: An open-source suite of language analyzers . In: Proceedings of the 4th LREC ( 2004 )

8. Fellbaum , C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press ( 1998 )

9. Flickinger , D. : Accuracy vs. robustness in grammar engineering. Language from a cognitive perspective: Grammar, usage , and processing pp. 31 { 50 ( 2011 )

10. Freitas , C. , Real , L. , Rademaker , A. : Anotaca~o de corpus com a openwordnet-pt: um exerc cio de desambiguaca~o . In: Freitas, C. , Rademaker , A . (eds.) Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology . pp. 51 { 55 . Natal , Brazil (Nov 2015 ), http://www.aclweb.org/anthology/ W15-5607

11. Gella , S. , Strapparava , C. , Nastase , V. : Mapping wordnet domains, wordnet topics and wikipedia categories to generate multilingual domain speci c resources . In: LREC . pp. 1117 { 1121 ( 2014 )

12. Kilgarri , A.: I don't believe in word senses pp. 1 { 33 (Dec 2014 )

13. Luotolahti , J. , Kanerva , J. , Pyysalo , S. , Ginter , F. : Sets: Scalable and e cient tree search in dependency graphs . In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations . pp. 51 { 55 . Association for Computational Linguistics ( 2015 ), https://aclweb.org/anthology/N/N15/N15-3011.pdf

14. Magnini , B. , Cavaglia , G. : Integrating subject eld codes into wordnet . In: LREC . pp. 1413 { 1418 ( 2000 )

15. Magnini , B. , Strapparava , C. , Pezzulo , G. , Gliozzo , A. : Using domain information for word sense disambiguation . In: The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems . pp. 111 { 114 . Association for Computational Linguistics ( 2001 )

16. Marcus , M. , Kim , G. , Marcinkiewicz , M.A. , MacIntyre , R., Bies , A. , Ferguson , M. , Katz , K. , Schasberger , B. : The penn treebank: Annotating predicate argument structure . In: Proceedings of the Workshop on Human Language Technology . pp. 114 { 119 . HLT ' 94 , Association for Computational Linguistics, Stroudsburg, PA, USA ( 1994 ), http://dx.doi.org/10.3115/1075812.1075835

17. McCord , M.C. : The slot grammar lexical formalism . Tech. Rep. RC23977 (W0607- 020) , IBM Research (Jul 2006 )