=Paper=
{{Paper
|id=Vol-1885/193
|storemode=property
|title=Coreference Resolution System Not
Only for Czech
|pdfUrl=https://ceur-ws.org/Vol-1885/193.pdf
|volume=Vol-1885
|authors=Michal Novák
|dblpUrl=https://dblp.org/rec/conf/itat/Novak17
}}
==Coreference Resolution System Not
Only for Czech==
J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 193–200 CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 M. Novák Coreference Resolution System Not Only for Czech Michal Novák Charles University, Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Malostranské náměstí 25, CZ-11800 Prague 1 mnovak@ufal.mff.cuni.cz Abstract: The paper introduces Treex CR, a coreference resolution in Czech texts. Therefore, Treex CR naturally resolution (CR) system not only for Czech. As its name supports coreference resolution of zero mentions. suggests, it has been implemented as an integral part of The platform that ensures this and that our system op- the Treex NLP framework. The main feature that distin- erates on is called tectogrammatical layer, a deep syntax guishes it from other CR systems is that it operates on the representation of the text. It has been proposed in the the- tectogrammatical layer, a representation of deep syntax. ory of Prague tectogrammatics [32]. The tectogrammati- This feature allows for natural handling of elided expres- cal layer represents a sentence as a dependency tree, whose sions, e.g. unexpressed subjects in Czech as well as gener- nodes are formed by content words only. All the function ally ignored English anaphoric expression – relative pro- and auxiliary words are hidden in a corresponding content nouns and zeros. The system implements a sequence of node. On the other hand, the tectogrammatical tree can mention ranking models specialized at particular types of represent a content word that is unexpressed on the sur- coreferential expressions (relative, reflexive, personal pro- face as a full-fledged node. nouns etc.). It takes advantage of rich feature set extracted T-layer is also the place where coreference is repre- from the data linguistically preprocessed with Treex. We sented. A generally used style of representing coreference evaluated Treex CR on Czech and English datasets and is by co-indexing continuous chunks of surface text. Tec- compared it with other systems as well as with modules togrammatics adopts a different style. A coreference link used in Treex so far. always connects two tectogrammatical nodes that repre- sent mentions’ heads. Unlike the surface style, tectogram- matics does not specify a span of the mention, though. 1 Introduction Such representation should be easier for a resolver to han- dle as the errors introduced by wrong identification of Coreference Resolution (CR) is a task of discovering mention boundaries are eliminated. On the other hand, for coreference relations in a text. Coreference connects men- some mentions it may be unclear what its head is.2 tions of the same real-world entity. Knowing coreference At this point, let us introduce the linguistic terminology relations may help in understanding the text better, and that we use in the rest of the paper. Multiple coreferen- thus it can be used in various natural language process- tial mentions form a chain. Splitting the chain into pairs ing applications including question answering, text sum- of mentions, we can adopt the terminology used for a re- marization, and machine translation. lated phenomena – anaphoric relations. The anaphoric re- Most of the works on CR have focused on English. In lation connects the mention which depends upon another English, a mention almost always corresponds to a chunk mention used in the earlier context.3 The later mention is of actual text, i.e. it is expressed on the surface. But Czech, denoted as the anaphor while the earlier mention is called for instance, is a different story. Czech is a typical exam- the antecedent. ple of pro-drop languages. In other words, a pronoun in the This work is motivated by cross-lingual studies of coref- subject position is usually dropped as it is in the following erential relations. We thus concentrate mostly on pro- example: “Honza miluje Márii. Takymiluje nouns and zeros, which behave differently in distant lan- pivo.” (“John loves Mary. He also loves beer.”) If we ig- guages, such as Czech and English.4 Coreference of nom- nored Czech subject zeros, we would not be able to extract inal groups is not in the scope of this work because it is a lot of information encoded in the text. less interesting from this perspective. But subject zeros are not the only coreferential expres- However, Treex CR is still supposed to be a standard sion that may be dropped from the surface. Indeed, such coreference resolver. We thus compare its performance zero mentions may appear even in the language where with three coreference resolvers from the Stanford Core one would not expect them. For instance, the following English sentence does not express the relative pronoun: as the module Treex::Scen::Coref in the Treex framework. “John wants the beer Mary drinks.” 2 As we demonstrate in Section 5. This paper presents the Treex Coreference Resolver 3 As opposed to cataphoric relations, where the dependence is ori- (Treex CR).1 It has been primarily designed with focus on ented to the future context. 4 A thorough analysis of correspondences between Czech and En- 1 It is freely available at https://github.com/ufal/treex glish coreferential expressions has been conducted in [26]. 194 M. Novák NLP toolkit, which are the current and former state-of-the- coreferential relations on the t-layer. For that reason, it re- art systems for English. Since we evaluate all the systems quires the input texts to be automatically pre-processed up on two datasets using the measure that may focus on spe- to this level of linguistic annotation. The system is based cific anaphor types, this work also offers a non-traditional on machine learning, thus making all the components fully comparison of established systems for English. trainable if the appropriate training data is available. Up to now, the system has been build for Czech, English, Rus- sian and German.5 In this paper, we focus only on its im- 2 Related Work plementation for Czech and English. Coreference resolution has experienced evolution typical 3.1 Preprocessing to a Tectogrammatical for most of the problems in natural language processing. Representation Starting with rule-based approaches (summarized in [20]), the period of supervised (summarized in [23]) and unsu- Before coreference resolution is carried out, the input text pervised learning methods (e.g. [6] and [15]) followed. must undergo a thorough analysis producing a tectogram- This period has been particularly colorful, having defined matical representation of its sentences. Treex CR cannot three standard models for CR and introduced multiple ad- process a text that has not been analyzed this way. Input justments of system design. For instance, our Treex CR data must comply with at least basics of this annotation system implements some of them: mention-ranking model style. The text should be tokenized and labeled with part- [10], joint anaphoricity detection and antecedent selection, of-speech tags in order for the resolver to focus on nouns and specialized models [11]. A recent tsunami of deep and pronouns as mention candidates. However, the real neural network appears to be a small wave in the field of power of the system lies in exploiting rich linguistic anno- research on coreference. Neural Stanford system [8] set tation that can be represented by tectogramatics. a new state of the art, yet the change of direction has not been as rapid and massive as for the other, more popular, Czech and English analysis. We make use of rich topics, e.g. machine translation. pipelines for Czech and English available in the Treex The evolution of CR for Czech proceeded in a simi- framework, previously applied for building the Czech- lar way. It started during the annotation work on Prague English parallel treebank CzEng 1.6 [4]. Dependency Treebank 2.0 [16, PDT 2.0] and a set of de- Sentences are first split into tokens, which is ensured by terministic filters for personal pronouns proposed by [17], rule-based modules. Subsequently, the tokens are enriched followed by a rule-based system for all coreferential re- with morphological information including part-of-speech lations annotated in PDT 2.0 [24]. Release of the first tag, morphological features as well as lemmas. Whereas coreference-annotated treebank opened the door for super- in English, the Morče tool [33] is used to collect part- vised methods. A supervised resolver for personal pro- of-speech tags, followed by a rule-based lemmatizer, the nouns and subject zeros [25] is the biggest inspiration for Czech pipeline utilizes the MorphoDiTa tool [34] to ob- the present work. We use a similar architecture imple- tain all. menting multiple mention-ranking models [10] special- A dependency tree is build on top of this annotation, us- ized on individual anaphor types [11]. Unlike [25], we use ing MST parser [19] and its adapted version [28] for En- a richer feature set and extend the resolver also to other glish and Czech, respectively. Named entity recognition is anaphor types. carried out by the NameTag [35] tool in both languages. Moreover, we rectify a fundamental shortcoming of all The NADA tool [3] is applied to help distinguish refer- these coreference resolvers for Czech – the experiments ential and non-referential occurrences of the English pro- with them were conducted on the manual annotation of noun “it”. Every occurrence is assigned a probability esti- tectogrammatical layer. In this way, the systems could mate based on n-gram features. take advantage of gold syntax or disambiguated genders Transition from a surface dependency tree to the tec- and numbers. While the rule-based system [24] reports togrammatical one consists of the following steps. As around 99% F-score on relative pronouns, fair evaluation tectogrammatical nodes correspond to content words only, of a similar method but run on automatic tectogrammati- function words such as prepositions, auxiliary verbs, par- cal annotation reports only 57% F-score (see Table 2). If ticles, punctuation must be hidden. Morpho-syntactic in- the system uses linguistically pre-processed data, the pre- formation is transferred to tectogrammatical layer by two processing must always be performed automatically. channels: (i) morpho-syntactic tags called formemes [13] and (ii) features of deep grammar called grammatemes. All nodes are then subject to semantic role labeling as- 3 System Architecture signing them roles such as Actor and Patient, and linking of verbs to items in Czech valency dictionary [12]. Treex Coreference Resolver has been developed as an in- 5 Russian and German version has been trained on automatic English tegral part of the Treex framework for natural language coreference labeling projected to these languages through a parallel cor- processing [29]. Treex CR is a unified solution for finding pus. See [27] for more details. Coreference Resolution System Not Only for Czech 195 Reconstructing zeros. To mimic the style of tectogram- A cascade of specialized models. Properties of coref- matical annotation in automatic analysis, some nodes that erential relations are so diverse that it is worth modeling are not present on the surface must be reconstructed. We individual anaphor types rather separately than jointly as focus on cases that directly relate to coreference. Such shown in [11]. For instance, while personal pronouns may nodes are added by heuristics based on syntactic struc- refer to one of the previous sentences, the antecedent of tures. relative and reflexive pronouns always lies in the same sen- Subject zeros are the most prominent anaphoric zeros in tence. By representing coreference of these expressions Czech. A subject is generated as a child of a finite verb if separately in multiple specialized models, the abovemen- it has no children in subject position or in nominative case. tioned hyperparameters can be adjusted to suit the par- Grammatical person, number and gender are inferred from ticular anaphor type. Processing of these anaphor types a form of the verb. may be sorted in a cascade so that the output of one model Perhaps surprisingly, English uses zeros as well. The might be taken into account in the following models. Cur- coreferential ones can be found in relative clauses (see rently, we do not take advantage of this feature, though. the example in Section 1) and non-finite verbal construc- Models are thus independent on each other and can be run tions, e.g. in participles and infinitives. We seek for such in any ordering. constructions and add a zero child with a semantic role corresponding to the type of the construction. This work 3.3 Feature extraction extends the original Treex module for English zeros’ gen- eration, which addressed only infinitives. The preprocessing stage (see Section 3.1) enriches a raw text with substantial amount of linguistic material. Feature 3.2 Model design extraction stage then uses this material to yield features consumable by the learning method. In addition, Vow- Treex CR models coreference in a way to be easily opti- pal Wabbit, the learning tool we use, supports grouping mized by supervised learning. Particularly, we use logistic features into namespaces. The tool may introduce com- regression with stochastic gradient descend optimization binations of features as a Cartesian product of selected implemented in the Vowpal Wabbit toolkit.6 Design of the namespaces and thus massively extend the space of fea- model employs multiple concepts that have proved to be tures. This can be controlled by hyperparameters to Vow- useful and simple at the same time. pal Wabbit. Features used in Treex CR can be categorized by their Mention-ranking model. Given an anaphor and a set of form. The categories differ in the number of input argu- antecedent candidates, mention-ranking models [10] are ments they require. Unary features describe only a sin- trained to score all the candidates at once. Competition gle node, either anaphor or antecedent candidate. Such between the candidates is captured in the model. Every an- features start with prefixes anaph and cand, respec- tecedent candidate describes solely the actual mention. It tively. Binary features require both the anaphor and the does not represent a possible cluster of coreferential men- antecedent candidate for their construction. Specifically, tions built up to the moment. they can be formed by agreement or concatenation of re- Antecedent candidates for an anaphor are selected from spective unary features, but they can generally describe the context window of a predefined size. This is done only any relation between the two arguments. Finally, ranking for the nodes satisfying simple morphological criteria (e.g. features need all the antecedent candidates along with the nouns and pronouns). Both the window size and the filter- anaphor candidate to be yielded. Their purpose is to rank ing criteria can be altered as hyperparameters. antecedent candidates with respect to a particular relation to an anaphor candidate. Joint anaphoricity detection and antecedent selection. Our features also differ by their content. They can be What we denote as an anaphor in the model is, in fact, divided into three categories: (i) location and distance fea- an anaphor candidate. There is no preprocessing that tures, (ii) (deep) morpho-syntactic features, and (iii) lex- would filter out non-referential anaphor candidates. In- ical features. The core of the feature set was formed by stead, both decisions, i.e. (i) to determine if the anaphor adapting features introduced in [25]. candidate is referential and (ii) to find the antecedent of the anaphor, are performed in a single step. This is ensured by Location and distance features Positions of anaphor and adding a fake “antecedent” candidate representing solely an antecedent in a sentence were inspired by [6]. Position the anaphor candidate itself. By selecting this candidate, of the antecedent is measured backward from the anaphor the model labels the anaphor candidate as non-referential. if they lie in the same sentence, otherwise it is measured forward from the start of the sentence. As for distance fea- tures, we use various granularity to measure distance be- tween an anaphor and an antecedent candidate: number of 6 https://github.com/JohnLangford/vowpal_ sentences, clauses and words. In addition, an ordinal num- wabbit/wiki ber of the current candidate antecedent among the others is 196 M. Novák included. All location and distance features are bucketed Czech English Train Eval test Train Eval test CoNLL 2012 into predefined bins. sents 38k 5k 39k 5k 9.5k words 652k 92k 912k 130k 170k (Deep) morpho-syntactic features. They utilize the an- t-nodes 528k 75k 652k 91k 116k notation provided by part-of-speech taggers, parsers and anaph 92k 14k 103k 15k 15k tectogrammatical annotation. Their unary variants capture Relative 7.2k 1k 6.4k 0.8k – the mention head’s part-of-speech tag and morphological Reflexive 3.4k 0.6k 0.4k 0.05k 0.1k PP3 – – 19k 2.4k 4.5k features, e.g. gender, number, person, case. As gender SzPP3 12k 2k – – – and number are considered important for resolution of pro- Zero – – 23k 3.2k – nouns, we do not rely on their disambiguation and work Other 70k 10k 54k 8.0k 10.4k with all possible hypotheses. We do the same for some Czech words that are in nominative case but disambigua- Table 1: Basic statistics of used datasets. The class SzPP3 tion labeled them with the accusative case. Such case is stands for 3rd person subject zeros, personal and posses- a typical source of errors in generating a subject zero as sive pronouns, while the class PP3 excludes subject zeros. it fills a missing nominative slot in the governing verb’s valency frame. To discover potentially spurious subject Prague Czech-English Dependency Treebank 2.0 Coref zeros, we also inspect if the verb has multiple arguments [22, PCEDT] for Czech and English, respectively. Al- in accusative and if the argument in nominative is refused though PCEDT is a Czech-English parallel treebank, we by the valency, as it is in the phrase “Zdá se mi, že. . . ” used only its English side. Both treebanks are collections (“It seems to me that. . . ”). Furthermore, the unary fea- of newspaper and journal articles. In addition, they both tures contain (deep) syntax features including its depen- follow the annotation principles of the theory of Prague dency relation, semantic role, and formeme. We exploit tectogrammatics [32]. They also comprise a full-fledged the structure of the syntactic tree as well, extracting some manual annotation of coreferential relations.7 features from the mention head’s parent. Training and evaluation test dataset for Czech are Many of these features are combined to binary vari- formed by PDT sections train-* and etest, respec- ants by agreement and concatenation. Heuristics used tively. As for English, these two datasets are collected in original Treex modules for some anaphor types gave from PCEDT sections 00-18 and 22-24, respectively.8 birth to another pack of binary features. For instance, In addition, we used the official testset for CoNLL 2012 the feature indicating if a candidate is the subject of the Shared Task to evaluate English systems [31]. This dataset anaphor’s clause should target coreference of reflexive has been sampled from the OntoNotes 5.0 [30] corpus. pronouns. Similarly, signaling whether a candidate gov- OntoNotes, and thus CoNLL 2012 testset as well, dif- erns the anaphor’s clause should help with resolution of fer from the two treebanks in the following main aspects: relative pronouns. (i) coreference is annotated on the surface, where mentions Lexical features Lemmas of the mentions’ heads and of the same entity are co-indexed spans of consecutive their parents are directly used as features. Such features words, (ii) it contains no zeros and relative pronouns are may have an effect only if built from frequent words, not annotated for coreference.9 These differences must be though. By using them with an external lexical resource, reflected when evaluating on this dataset (see Section 5). this data sparsity problem can be reduced. A basic statistics collected on these datasets is shown Firstly, we used a long list of noun-verb collocations in Table 1. The anaphor types treated by Treex CR cover collected by [25] on Czech National Corpus [9]. Having around 50% and 25-30% of all anaphors in English and this statistics, we can estimate how probable is that the Czech tectogrammatical treebanks, respectively. The main anaphor’s governing verb collocates with an antecedent reason of the disproportion is that we did not include candidate. Czech non-subject zeros to the collection (class Zero). Another approach to fight data sparsity is to employ an Czech subject zeros are merged to a common class with ontology. Apart from an actual word, we can include all its personal and possessive pronouns in 3rd person (class hypernymous concepts from the hierarchy as features. We SzPP3), as they are trained in a joint model (see Sec- exploit WordNet [14] and EuroWordNet [38] for English tion 5). Due the same reason, English personal and posses- and Czech, respectively. sive pronouns in 3rd person form a common class PP3. As To target proper nouns, we also extract features from the CoNLL 2012 testset has no annotation of relative pro- tags assigned by named entity recognizers run during the nouns and zeros, Treex CR covers 30% of all the anaphors. preprocessing stage. 7 See [21] for more information on coreference annotation. 8 During development of our system, we employed the rest of the 4 Datasets treebanks’ data as development test dataset for intermediate testing. 9 Reasons for ignoring relative pronouns in OntoNotes are unclear. We exploited two treebanks for training and testing pur- They might be seen as so tied up with rules of grammar and syntax that poses: Prague Dependency Treebank 3.0 [2, PDT] and annotation of such cases is too unattractive to deal with. Coreference Resolution System Not Only for Czech 197 5 Experiments and Evaluation Relative Reflexive SzPP3 All Our system uses two specialized models for relative and Count 1,075 579 1,950 3,604 reflexive pronouns in both languages. The Czech system Treex in addition contains a joint model for subject zeros, per- CzEng 1.0 57.14 67.57 50.52 55.20 sonal and possessive pronouns in 3rd person (denoted as Treex CR 78.40 76.19 61.31 68.46 SzPP3). The English system contains two more models: one for personal and possessive pronouns in 3rd person Table 2: F-scores of Czech coreference resolvers mea- (denoted as PP3) and another one for zeros. sured on all anaphor types both separately and altogether. The type SzPP3 denotes 3rd person subject zeros, personal Systems to compare. To show performance of Treex CR and possessive pronouns. in a context, we evaluated multiple other systems on the same data. Since currently there is no other publicly avail- able system for Czech to our knowledge, we compare it • pred(ai ) if the CR system claims ai is anaphoric, with the original Treex set of modules for coreference. The set consists of rule-based modules for relative and reflex- • both(ai ) if both the system and gold annotation claim ive pronouns, and a supervised model for SzPP3 mentions. ai is anaphoric and the antecedent found by the sys- It has been previously used for building a Czech-English tem belongs to the transitive closure of all mentions parallel treebank CzEng 1.0 [5]. coreferential with ai in the gold annotation. We also report performance of the English predecessor After aggregating these counts over all anaphor candi- of Treex CR used to build CzEng 1.0. It comprises a rule- dates, we compute the final Precision, Recall and F-score based module for relative pronouns and zeros, and a joint as follows: supervised model for reflexives and PP3 mentions. In ad- dition, we include the Stanford Core NLP toolkit to the both(ai ) both(ai ) 2PR P=∑ R=∑ F= evaluation. It contains three approaches to full-fledged CR ai pred(ai ) ai true(ai ) P+R that all claimed to improve over the state of the art at the time of their release: deterministic [18], statistical [7], and To evaluate only a particular anaphor type, the aggregation neural [8]. In fact, the neural system has not been outper- runs over all anaphor candidates of the given type. formed, yet. The presented evaluation schema, however, needs to be Stanford Core NLP predicts surface mentions, which is adjusted for the CoNLL 2012 dataset. As mentioned in not compatible with the evaluation schema designed for Section 4, in this dataset relative pronouns are not consid- tectogrammatical trees. The surface mentions thus must ered coreferential and zeros are missing at all. As a result, be transformed to the tectogrammatical style of corefer- a system that marks such expressions as antecedents would ence annotation, i.e. the mention heads must be connected be penalized. We thus apply the following patch specifi- with links. We may use the information on mention heads cally for the CoNLL 2012 dataset to rectify this issue. If provided by the Stanford system itself. However, by using the predicted antecedent is a zero or a relative pronoun, this approach results we observed completely contradic- instead of using it directly we follow the predicted coref- tory results on different datasets. Manual investigation on erential chain until the expression outside of these two cat- a sample of the data revealed that often the Stanford sys- egories is met. The found expression is then used to calcu- tem in fact identified a correct antecedent mention, but se- late the counts, as described above. If no such expression lected a head different to the one in the data. Most of these is found, the direct antecedent is used, even if it is a zero cases, e.g. company names like “McDonald’s Corp.” or or a relative pronoun. “Walt Disney Co.”, have no clear head, though. There- All the scores presented in the rest of the paper are F- fore, we decided to use the gold tectogrammatical tree to scores. identify the head of the mention labeled by the Stanford system. Even though employing gold information for sys- Results and their analysis. Table 2 shows results of eval- tem’s decision is a bad practice, here it should not affect uation on the Czech data. The Czech version of Treex CR the result so much and we use it only for the third-party succeeded in its ambition to replace the modules used in systems, not for our Treex CR. Treex until now. It significantly10 outperformed the base- line for each of the anaphor type, with the overall score Evaluation measure. Standard evaluation metrics (e.g. by 13 percentage points higher. The jump for relative pro- MUC [37], B3 [1]) are not suitable for our purposes as nouns was particularly high. they do not allow for scoring only a subset of mentions. The analysis of improved examples for this category Instead, we use a measure similar to scores proposed by shows that apart from the syntactic principles used in the [36]. For an anaphor candidate ai , we increment the three rule-based module, it also exploits other symptoms of following counts: 10 Significance has been calculated by bootstrap resampling with a • true(ai ) if ai is anaphoric in the gold annotation, confidence level 95%. 198 M. Novák PCEDT Eval CoNLL 2012 test set Relative Reflexive PP3 Zeros All Reflexive PP3 All Count 842 49 2,494 3,260 6,645 111 4,583 4,710 Stanford deterministic 1.16 55.67 63.65 0.00 34.96 71.11 60.55 60.79 statistical 0.00 63.74 72.71 0.00 39.09 80.56 71.07 71.29 neural 0.00 70.97 76.36 0.00 41.56 80.73 70.45 70.70 Treex CzEng 1.0 70.64 65.93 73.52 28.48 55.34 76.02 67.93 68.13 Treex CR 75.99 81.63 74.11 45.37 60.87 79.65 66.64 66.96 Table 3: F-scores of English coreference resolvers measured on all anaphor types both separately and altogether. The type PP3 denotes personal and possessive pronouns in 3rd person. coreference. The most prominent are agreement of the less, in all the cases the performance gaps are not so big anaphor and the antecedent in gender and number as well and thus it is reasonable using Treex CR for further exper- as the distance between the two. It also succeeds in iden- iments in the future. tifying non-anaphoric examples, for instance interrogative To best of our knowledge, no analysis of how Stanford pronouns, which use the same forms. systems perform for individual anaphor types has been Results of evaluation on the English data are highlighted published, yet. Interestingly, our result show that even in Table 3. Similarly to the Czech system, the English though the overall performance of the neural system on the version of Treex CR outperforms its predecessor in Treex CoNLL 2012 testset is reported to be higher [8], for per- by a large margin of 15 percentage points on the PCEDT sonal and possessive pronouns in third person it is slightly Eval testset. Most of it stems from a large improvement outperformed by the statistical system. However, as the on the biggest class of anaphors, zeros. Unlike for Czech evaluation on the PCEDT Eval testset shows completely relative pronouns, the supervised CR is not the only rea- the opposite, we cannot arrive at any conclusion on their son for this leap. It largely results from the extension that mutual performance comparison on this anaphor type. we made to the method for adding zero arguments of non- finite clauses (see Section 3.1). Consequently, the cover- age of these nodes compared to their gold annotation rose from 34% to 80%. Comparing these two versions of the 6 Conclusion Treex system on the CoNLL 2012 testset, we see a differ- ent picture. The systems’ performances are more similar, the baseline system for PP3 even slightly outperforms the new Treex CR. We described Treex CR, a coreference resolver not only As for the comparison with the Stanford systems, we for Czech. The main feature of the system is that it op- should not look at the scores aggregated over all the erates on the tectogrammatical layer, which allows it to anaphor types under scrutiny, because Stanford systems address also coreference of zeros. The system uses a su- apparently do not address zeros and relative pronouns.11 pervised model, supported by a very rich set of linguis- In fact, the Stanford systems try to reconstruct coreference tic features. We presented modules for processing Czech as it is annotated in OntoNotes 5.0. and English and evaluated them on several datasets. For The classes of reflexive and PP3 pronouns are the only comparison, we conducted the evaluation with the prede- ones within the scope of all the resolvers. The Stanford de- cessors of Treex CR and three versions of the Stanford terministic system seems to be consistently outperformed system, one of which was a state-of-the-art neural resolver by all the other approaches. Performance rankings on re- for English. Our system seems to have outperformed the flexive pronouns differ for the two datasets, which is prob- baseline system on Czech. On English, although it could ably the consequence of low frequency of reflexives in the not outperform the best approaches in the Stanford sys- datasets. Regarding the PP3 pronouns, Treex CR does not tem, its performance is high enough to be used in future achieve the performance of the state-of-the-art Stanford experiments. Furthermore, it may be used for resolution of neural system. On the CoNLL 2012 testset it is outper- anaphor types that are ignored by most of the coreference formed even by the Stanford statistical system. Neverthe- resolvers for English, i.e. relative pronouns and zeros. 11 On the other hand, they address coreference of nominal groups and In the future work, we would like to use Treex CR in pronouns in first and second person. Treex CR does not provide Czech cross-lingual coreference resolution, where the system is or English models for these classes, so far. Nevertheless, experimental applied on parallel corpus and thus it may take advantage projection-based models already exist for German and Russian [27]. of both languages. Coreference Resolution System Not Only for Czech 199 Acknowledgments Germany, 2016. Association for Computational Linguis- tics. This project has been funded by the GAUK grant 338915 [9] CNC. Czech national corpus – SYN2005, 2005. and the Czech Science Foundation grant GA-16-05394S. [10] Pascal Denis and Jason Baldridge. A Ranking Approach This work has been also supported and has been using lan- to Pronoun Resolution. In Proceedings of the 20th Inter- guage resources developed and/or stored and/or distributed national Joint Conference on Artifical Intelligence, pages by the LINDAT/CLARIN project No. LM2015071 of the 1588–1593, San Francisco, CA, USA, 2007. Morgan Kauf- Ministry of Education, Youth and Sports of the Czech Re- mann Publishers Inc. public. [11] Pascal Denis and Jason Baldridge. Specialized Models and Ranking for Coreference Resolution. In Proceedings of the Conference on Empirical Methods in Natural Language References Processing, pages 660–669, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. [1] Amit Bagga and Breck Baldwin. Algorithms for Scoring [12] Ondřej Dušek, Jan Hajič, and Zdeňka Urešová. Verbal Va- Coreference Chains. In In The First International Confer- lency Frame Detection and Selection in Czech and English. ence on Language Resources and Evaluation Workshop on In The 2nd Workshop on EVENTS: Definition, Detection, Linguistics Coreference, pages 563–566, 1998. Coreference, and Representation, pages 6–11, Stroudsburg, [2] Eduard Bejček, Eva Hajičová, Jan Hajič, Pavlína Jínová, PA, USA, 2014. Association for Computational Linguis- Václava Kettnerová, Veronika Kolářová, Marie Mikulová, tics. Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lu- [13] Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin cie Poláková, Magda Ševčíková, Jan Štěpánek, and Šárka Majliš, Michal Novák, and David Mareček. Formemes Zikánová. Prague Dependency Treebank 3.0, 2013. in English-Czech Deep Syntactic MT. In Proceedings of [3] Shane Bergsma and David Yarowsky. NADA: A Robust the Seventh Workshop on Statistical Machine Translation, System for Non-referential Pronoun Detection. In Proceed- pages 267–274, Montréal, Canada, 2012. Association for ings of the 8th International Conference on Anaphora Pro- Computational Linguistics. cessing and Applications, pages 12–23, Berlin, Heidelberg, [14] Christiane Fellbaum. WordNet: An Electronic Lexical 2011. Springer-Verlag. Database (Language, Speech, and Communication). The [4] Ondřej Bojar, Ondřej Dušek, Tom Kocmi, Jindřich Li- MIT Press, 1998. bovický, Michal Novák, Martin Popel, Roman Sudarikov, [15] Aria Haghighi and Dan Klein. Coreference Resolution in and Dušan Variš. CzEng 1.6: Enlarged Czech-English a Modular, Entity-centered Model. In Human Language Parallel Corpus with Processing Tools Dockered. In Text, Technologies: The 2010 Annual Conference of the North Speech, and Dialogue: 19th International Conference, American Chapter of the Association for Computational TSD 2016, number 9924 in Lecture Notes in Artificial In- Linguistics, pages 385–393, Stroudsburg, PA, USA, 2010. telligence, pages 231–238, Heidelberg, Germany, 2016. Association for Computational Linguistics. Springer International Publishing. [16] Jan Hajič et al. Prague Dependency Treebank 2.0. CD- [5] Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra ROM, Linguistic Data Consortium, LDC Catalog No.: Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, LDC2006T01, Philadelphia, 2006. Michal Novák, Martin Popel, and Aleš Tamchyna. The Joy [17] Lucie Kučová and Zdeněk Žabokrtský. Anaphora in Czech: of Parallelism with CzEng 1.0. In Proceedings of the 8th In- Large Data and Experiments with Automatic Anaphora. In ternational Conference on Language Resources and Eval- Lecture Notes in Artificial Intelligence, Proceedings of the uation (LREC 2012), pages 3921–3928, Istanbul, Turkey, 8th International Conference, TSD 2005, volume 3658 of 2012. European Language Resources Association. Lecture Notes in Computer Science, pages 93–98, Berlin / [6] Eugene Charniak and Micha Elsner. EM Works for Pro- Heidelberg, 2005. Springer. noun Anaphora Resolution. In Proceedings of the 12th [18] Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Conference of the European Chapter of the Association for Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s Computational Linguistics, pages 148–156, Stroudsburg, Multi-Pass Sieve Coreference Resolution System at the PA, USA, 2009. Association for Computational Linguis- CoNLL-2011 Shared Task. In Proceedings of the Fifteenth tics. Conference on Computational Natural Language Learn- [7] Kevin Clark and Christopher D. Manning. Entity-Centric ing: Shared Task, pages 28–34, Portland, Oregon, USA, Coreference Resolution with Model Stacking. In Proceed- 2011. Association for Computational Linguistics. ings of the 53rd Annual Meeting of the Association for [19] Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Computational Linguistics and the 7th International Joint Hajič. Non-projective Dependency Parsing Using Span- Conference on Natural Language Processing (Volume 1: ning Tree Algorithms. In Proceedings of the Conference Long Papers), pages 1405–1415, Beijing, China, 2015. As- on Human Language Technology and Empirical Methods sociation for Computational Linguistics. in Natural Language Processing, pages 523–530, Strouds- [8] Kevin Clark and Christopher D. Manning. Improving burg, PA, USA, 2005. Association for Computational Lin- Coreference Resolution by Learning Entity-Level Dis- guistics. tributed Representations. In Proceedings of the 54th An- [20] Ruslan Mitkov. Anaphora Resolution. Longman, London, nual Meeting of the Association for Computational Lin- 2002. guistics (Volume 1: Long Papers), pages 643–653, Berlin, 200 M. Novák [21] Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie [33] Drahomíra Spoustová, Jan Hajič, Jan Votrubec, Pavel Kr- Mikulová, and Jiří Mírovský. Coreference in Prague bec, and Pavel Květoň. The Best of Two Worlds: Coop- Czech-English Dependency Treebank. In Proceedings eration of Statistical and Rule-based Taggers for Czech. of the 10th International Conference on Language Re- In Proceedings of the Workshop on Balto-Slavonic Natu- sources and Evaluation (LREC 2016), pages 169–176, ral Language Processing: Information Extraction and En- Paris, France, 2016. European Language Resources Asso- abling Technologies, pages 67–74, Stroudsburg, PA, USA, ciation. 2007. Association for Computational Linguistics. [22] Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie [34] Jana Straková, Milan Straka, and Jan Hajič. Open-Source Mikulová, and Jiří Mírovský. Prague czech-english depen- Tools for Morphology, Lemmatization, POS Tagging and dency treebank 2.0 coref, 2016. Named Entity Recognition. In Proceedings of 52nd An- [23] Vincent Ng. Supervised Noun Phrase Coreference Re- nual Meeting of the Association for Computational Lin- search: The First Fifteen Years. In Proceedings of the 48th guistics: System Demonstrations, pages 13–18, Baltimore, Annual Meeting of the Association for Computational Lin- Maryland, 2014. Association for Computational Linguis- guistics, pages 1396–1411, Stroudsburg, PA, USA, 2010. tics. Association for Computational Linguistics. [35] Jana Straková, Milan Straka, and Jan Hajič. Open-Source [24] Giang Linh Nguy. Návrh souboru pravidel pro analýzu Tools for Morphology, Lemmatization, POS Tagging and anafor v českém jazyce. Master’s thesis, MFF UK, Prague, Named Entity Recognition. In Proceedings of 52nd An- Czech Republic, 2006. In Czech. nual Meeting of the Association for Computational Lin- [25] Giang Linh Nguy, Václav Novák, and Zdeněk Žabokrtský. guistics: System Demonstrations, pages 13–18, Baltimore, Comparison of Classification and Ranking Approaches to Maryland, 2014. Association for Computational Linguis- Pronominal Anaphora Resolution in Czech. In Proceedings tics. of the SIGDIAL 2009 Conference, pages 276–285, London, [36] Don Tuggener. Coreference Resolution Evaluation for UK, 2009. The Association for Computational Linguistics. Higher Level Applications. In Gosse Bouma and Yannick [26] Michal Novák and Anna Nedoluzhko. Correspondences Parmentier, editors, Proceedings of the 14th Conference between Czech and English Coreferential Expressions. of the European Chapter of the Association for Computa- Discours: Revue de linguistique, psycholinguistique et in- tional Linguistics, EACL 2014, April 26-30, 2014, Gothen- formatique., 16:1–41, 2015. burg, Sweden, pages 231–235. The Association for Com- puter Linguistics, 2014. [27] Michal Novák, Anna Nedoluzhko, and Zdeněk Žabokrtský. Projection-based Coreference Resolution Using Deep Syn- [37] Marc Vilain, John Burger, John Aberdeen, Dennis Con- tax. In Proceedings of the 2nd Workshop on Coreference nolly, and Lynette Hirschman. A Model-theoretic Coref- Resolution Beyond OntoNotes (CORBON 2017), pages 56– erence Scoring Scheme. In Proceedings of the 6th Con- 64, Valencia, Spain, 2017. Association for Computational ference on Message Understanding, pages 45–52, Strouds- Linguistics. burg, PA, USA, 1995. Association for Computational Lin- guistics. [28] Václav Novák and Zdeněk Žabokrtský. Feature engineer- ing in maximum spanning tree dependency parser. volume [38] Piek Vossen. Introduction to EuroWordNet. Computers 4629, pages 92–98, Berlin / Heidelberg, 2007. Springer. and the Humanities, Special Issue on EuroWordNet, 32(2– 3), 1998. [29] Martin Popel and Zdeněk Žabokrtský. TectoMT: Modular NLP Framework. In Proceedings of the 7th International Conference on Advances in Natural Language Processing, pages 293–304, Berlin, Heidelberg, 2010. Springer-Verlag. [30] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. Towards Robust Linguistic Anal- ysis using OntoNotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learn- ing, pages 143–152, Sofia, Bulgaria, 2013. Association for Computational Linguistics. [31] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Joint Conference on Empirical Meth- ods in Natural Language Processing and Computational Natural Language Learning - Proceedings of the Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes, EMNLP-CoNLL 2012, pages 1–40, Jeju, Ko- rea, 2012. Association for Computational Linguistics. [32] Petr Sgall, Eva Hajičová, Jarmila Panevová, and Jacob Mey. The meaning of the sentence in its semantic and prag- matic aspects. Springer, 1986.