=Paper=
{{Paper
|id=Vol-1885/193
|storemode=property
|title=Coreference Resolution System Not
        Only for Czech
|pdfUrl=https://ceur-ws.org/Vol-1885/193.pdf
|volume=Vol-1885
|authors=Michal Novák
|dblpUrl=https://dblp.org/rec/conf/itat/Novak17
}}
==Coreference Resolution System Not
        Only for Czech==
<pdf width="1500px">https://ceur-ws.org/Vol-1885/193.pdf</pdf>
<pre>
J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 193–200
CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 M. Novák


                            Coreference Resolution System Not Only for Czech

                                                              Michal Novák

                                           Charles University, Faculty of Mathematics and Physics
                                                 Institute of Formal and Applied Linguistics
                                               Malostranské náměstí 25, CZ-11800 Prague 1
                                                      mnovak@ufal.mff.cuni.cz

     Abstract: The paper introduces Treex CR, a coreference             resolution in Czech texts. Therefore, Treex CR naturally
     resolution (CR) system not only for Czech. As its name             supports coreference resolution of zero mentions.
     suggests, it has been implemented as an integral part of              The platform that ensures this and that our system op-
     the Treex NLP framework. The main feature that distin-             erates on is called tectogrammatical layer, a deep syntax
     guishes it from other CR systems is that it operates on the        representation of the text. It has been proposed in the the-
     tectogrammatical layer, a representation of deep syntax.           ory of Prague tectogrammatics [32]. The tectogrammati-
     This feature allows for natural handling of elided expres-         cal layer represents a sentence as a dependency tree, whose
     sions, e.g. unexpressed subjects in Czech as well as gener-        nodes are formed by content words only. All the function
     ally ignored English anaphoric expression – relative pro-          and auxiliary words are hidden in a corresponding content
     nouns and zeros. The system implements a sequence of               node. On the other hand, the tectogrammatical tree can
     mention ranking models specialized at particular types of          represent a content word that is unexpressed on the sur-
     coreferential expressions (relative, reflexive, personal pro-      face as a full-fledged node.
     nouns etc.). It takes advantage of rich feature set extracted         T-layer is also the place where coreference is repre-
     from the data linguistically preprocessed with Treex. We           sented. A generally used style of representing coreference
     evaluated Treex CR on Czech and English datasets and               is by co-indexing continuous chunks of surface text. Tec-
     compared it with other systems as well as with modules             togrammatics adopts a different style. A coreference link
     used in Treex so far.                                              always connects two tectogrammatical nodes that repre-
                                                                        sent mentions’ heads. Unlike the surface style, tectogram-
                                                                        matics does not specify a span of the mention, though.
     1   Introduction                                                   Such representation should be easier for a resolver to han-
                                                                        dle as the errors introduced by wrong identification of
     Coreference Resolution (CR) is a task of discovering               mention boundaries are eliminated. On the other hand, for
     coreference relations in a text. Coreference connects men-         some mentions it may be unclear what its head is.2
     tions of the same real-world entity. Knowing coreference              At this point, let us introduce the linguistic terminology
     relations may help in understanding the text better, and           that we use in the rest of the paper. Multiple coreferen-
     thus it can be used in various natural language process-           tial mentions form a chain. Splitting the chain into pairs
     ing applications including question answering, text sum-           of mentions, we can adopt the terminology used for a re-
     marization, and machine translation.                               lated phenomena – anaphoric relations. The anaphoric re-
        Most of the works on CR have focused on English. In             lation connects the mention which depends upon another
     English, a mention almost always corresponds to a chunk            mention used in the earlier context.3 The later mention is
     of actual text, i.e. it is expressed on the surface. But Czech,    denoted as the anaphor while the earlier mention is called
     for instance, is a different story. Czech is a typical exam-       the antecedent.
     ple of pro-drop languages. In other words, a pronoun in the           This work is motivated by cross-lingual studies of coref-
     subject position is usually dropped as it is in the following      erential relations. We thus concentrate mostly on pro-
     example: “Honza miluje Márii. Taky <ZERO-on> miluje                nouns and zeros, which behave differently in distant lan-
     pivo.” (“John loves Mary. He also loves beer.”) If we ig-          guages, such as Czech and English.4 Coreference of nom-
     nored Czech subject zeros, we would not be able to extract         inal groups is not in the scope of this work because it is
     a lot of information encoded in the text.                          less interesting from this perspective.
        But subject zeros are not the only coreferential expres-           However, Treex CR is still supposed to be a standard
     sion that may be dropped from the surface. Indeed, such            coreference resolver. We thus compare its performance
     zero mentions may appear even in the language where                with three coreference resolvers from the Stanford Core
     one would not expect them. For instance, the following
     English sentence does not express the relative pronoun:
                                                                        as the module Treex::Scen::Coref in the Treex framework.
     “John wants the beer <ZERO-that> Mary drinks.”                          2 As we demonstrate in Section 5.
        This paper presents the Treex Coreference Resolver                   3 As opposed to cataphoric relations, where the dependence is ori-

     (Treex CR).1 It has been primarily designed with focus on          ented to the future context.
                                                                             4 A thorough analysis of correspondences between Czech and En-
         1 It is freely available at https://github.com/ufal/treex      glish coreferential expressions has been conducted in [26].
194                                                                                                                                        M. Novák

      NLP toolkit, which are the current and former state-of-the-    coreferential relations on the t-layer. For that reason, it re-
      art systems for English. Since we evaluate all the systems     quires the input texts to be automatically pre-processed up
      on two datasets using the measure that may focus on spe-       to this level of linguistic annotation. The system is based
      cific anaphor types, this work also offers a non-traditional   on machine learning, thus making all the components fully
      comparison of established systems for English.                 trainable if the appropriate training data is available. Up to
                                                                     now, the system has been build for Czech, English, Rus-
                                                                     sian and German.5 In this paper, we focus only on its im-
      2   Related Work                                               plementation for Czech and English.

      Coreference resolution has experienced evolution typical
                                                                     3.1 Preprocessing to a Tectogrammatical
      for most of the problems in natural language processing.
                                                                         Representation
      Starting with rule-based approaches (summarized in [20]),
      the period of supervised (summarized in [23]) and unsu-        Before coreference resolution is carried out, the input text
      pervised learning methods (e.g. [6] and [15]) followed.        must undergo a thorough analysis producing a tectogram-
      This period has been particularly colorful, having defined     matical representation of its sentences. Treex CR cannot
      three standard models for CR and introduced multiple ad-       process a text that has not been analyzed this way. Input
      justments of system design. For instance, our Treex CR         data must comply with at least basics of this annotation
      system implements some of them: mention-ranking model          style. The text should be tokenized and labeled with part-
      [10], joint anaphoricity detection and antecedent selection,   of-speech tags in order for the resolver to focus on nouns
      and specialized models [11]. A recent tsunami of deep          and pronouns as mention candidates. However, the real
      neural network appears to be a small wave in the field of      power of the system lies in exploiting rich linguistic anno-
      research on coreference. Neural Stanford system [8] set        tation that can be represented by tectogramatics.
      a new state of the art, yet the change of direction has not
      been as rapid and massive as for the other, more popular,      Czech and English analysis. We make use of rich
      topics, e.g. machine translation.                              pipelines for Czech and English available in the Treex
         The evolution of CR for Czech proceeded in a simi-          framework, previously applied for building the Czech-
      lar way. It started during the annotation work on Prague       English parallel treebank CzEng 1.6 [4].
      Dependency Treebank 2.0 [16, PDT 2.0] and a set of de-            Sentences are first split into tokens, which is ensured by
      terministic filters for personal pronouns proposed by [17],    rule-based modules. Subsequently, the tokens are enriched
      followed by a rule-based system for all coreferential re-      with morphological information including part-of-speech
      lations annotated in PDT 2.0 [24]. Release of the first        tag, morphological features as well as lemmas. Whereas
      coreference-annotated treebank opened the door for super-      in English, the Morče tool [33] is used to collect part-
      vised methods. A supervised resolver for personal pro-         of-speech tags, followed by a rule-based lemmatizer, the
      nouns and subject zeros [25] is the biggest inspiration for    Czech pipeline utilizes the MorphoDiTa tool [34] to ob-
      the present work. We use a similar architecture imple-         tain all.
      menting multiple mention-ranking models [10] special-             A dependency tree is build on top of this annotation, us-
      ized on individual anaphor types [11]. Unlike [25], we use     ing MST parser [19] and its adapted version [28] for En-
      a richer feature set and extend the resolver also to other     glish and Czech, respectively. Named entity recognition is
      anaphor types.                                                 carried out by the NameTag [35] tool in both languages.
         Moreover, we rectify a fundamental shortcoming of all          The NADA tool [3] is applied to help distinguish refer-
      these coreference resolvers for Czech – the experiments        ential and non-referential occurrences of the English pro-
      with them were conducted on the manual annotation of           noun “it”. Every occurrence is assigned a probability esti-
      tectogrammatical layer. In this way, the systems could         mate based on n-gram features.
      take advantage of gold syntax or disambiguated genders            Transition from a surface dependency tree to the tec-
      and numbers. While the rule-based system [24] reports          togrammatical one consists of the following steps. As
      around 99% F-score on relative pronouns, fair evaluation       tectogrammatical nodes correspond to content words only,
      of a similar method but run on automatic tectogrammati-        function words such as prepositions, auxiliary verbs, par-
      cal annotation reports only 57% F-score (see Table 2). If      ticles, punctuation must be hidden. Morpho-syntactic in-
      the system uses linguistically pre-processed data, the pre-    formation is transferred to tectogrammatical layer by two
      processing must always be performed automatically.             channels: (i) morpho-syntactic tags called formemes [13]
                                                                     and (ii) features of deep grammar called grammatemes.
                                                                     All nodes are then subject to semantic role labeling as-
      3   System Architecture                                        signing them roles such as Actor and Patient, and linking
                                                                     of verbs to items in Czech valency dictionary [12].
      Treex Coreference Resolver has been developed as an in-             5 Russian and German version has been trained on automatic English
      tegral part of the Treex framework for natural language        coreference labeling projected to these languages through a parallel cor-
      processing [29]. Treex CR is a unified solution for finding    pus. See [27] for more details.
Coreference Resolution System Not Only for Czech                                                                                       195

     Reconstructing zeros. To mimic the style of tectogram-            A cascade of specialized models. Properties of coref-
     matical annotation in automatic analysis, some nodes that         erential relations are so diverse that it is worth modeling
     are not present on the surface must be reconstructed. We          individual anaphor types rather separately than jointly as
     focus on cases that directly relate to coreference. Such          shown in [11]. For instance, while personal pronouns may
     nodes are added by heuristics based on syntactic struc-           refer to one of the previous sentences, the antecedent of
     tures.                                                            relative and reflexive pronouns always lies in the same sen-
        Subject zeros are the most prominent anaphoric zeros in        tence. By representing coreference of these expressions
     Czech. A subject is generated as a child of a finite verb if      separately in multiple specialized models, the abovemen-
     it has no children in subject position or in nominative case.     tioned hyperparameters can be adjusted to suit the par-
     Grammatical person, number and gender are inferred from           ticular anaphor type. Processing of these anaphor types
     a form of the verb.                                               may be sorted in a cascade so that the output of one model
        Perhaps surprisingly, English uses zeros as well. The          might be taken into account in the following models. Cur-
     coreferential ones can be found in relative clauses (see          rently, we do not take advantage of this feature, though.
     the example in Section 1) and non-finite verbal construc-         Models are thus independent on each other and can be run
     tions, e.g. in participles and infinitives. We seek for such      in any ordering.
     constructions and add a zero child with a semantic role
     corresponding to the type of the construction. This work
                                                                       3.3 Feature extraction
     extends the original Treex module for English zeros’ gen-
     eration, which addressed only infinitives.                        The preprocessing stage (see Section 3.1) enriches a raw
                                                                       text with substantial amount of linguistic material. Feature
     3.2 Model design                                                  extraction stage then uses this material to yield features
                                                                       consumable by the learning method. In addition, Vow-
     Treex CR models coreference in a way to be easily opti-           pal Wabbit, the learning tool we use, supports grouping
     mized by supervised learning. Particularly, we use logistic       features into namespaces. The tool may introduce com-
     regression with stochastic gradient descend optimization          binations of features as a Cartesian product of selected
     implemented in the Vowpal Wabbit toolkit.6 Design of the          namespaces and thus massively extend the space of fea-
     model employs multiple concepts that have proved to be            tures. This can be controlled by hyperparameters to Vow-
     useful and simple at the same time.                               pal Wabbit.
                                                                          Features used in Treex CR can be categorized by their
     Mention-ranking model. Given an anaphor and a set of              form. The categories differ in the number of input argu-
     antecedent candidates, mention-ranking models [10] are            ments they require. Unary features describe only a sin-
     trained to score all the candidates at once. Competition          gle node, either anaphor or antecedent candidate. Such
     between the candidates is captured in the model. Every an-        features start with prefixes anaph and cand, respec-
     tecedent candidate describes solely the actual mention. It        tively. Binary features require both the anaphor and the
     does not represent a possible cluster of coreferential men-       antecedent candidate for their construction. Specifically,
     tions built up to the moment.                                     they can be formed by agreement or concatenation of re-
        Antecedent candidates for an anaphor are selected from         spective unary features, but they can generally describe
     the context window of a predefined size. This is done only        any relation between the two arguments. Finally, ranking
     for the nodes satisfying simple morphological criteria (e.g.      features need all the antecedent candidates along with the
     nouns and pronouns). Both the window size and the filter-         anaphor candidate to be yielded. Their purpose is to rank
     ing criteria can be altered as hyperparameters.                   antecedent candidates with respect to a particular relation
                                                                       to an anaphor candidate.
     Joint anaphoricity detection and antecedent selection.               Our features also differ by their content. They can be
     What we denote as an anaphor in the model is, in fact,            divided into three categories: (i) location and distance fea-
     an anaphor candidate. There is no preprocessing that              tures, (ii) (deep) morpho-syntactic features, and (iii) lex-
     would filter out non-referential anaphor candidates. In-          ical features. The core of the feature set was formed by
     stead, both decisions, i.e. (i) to determine if the anaphor       adapting features introduced in [25].
     candidate is referential and (ii) to find the antecedent of the
     anaphor, are performed in a single step. This is ensured by       Location and distance features Positions of anaphor and
     adding a fake “antecedent” candidate representing solely          an antecedent in a sentence were inspired by [6]. Position
     the anaphor candidate itself. By selecting this candidate,        of the antecedent is measured backward from the anaphor
     the model labels the anaphor candidate as non-referential.        if they lie in the same sentence, otherwise it is measured
                                                                       forward from the start of the sentence. As for distance fea-
                                                                       tures, we use various granularity to measure distance be-
                                                                       tween an anaphor and an antecedent candidate: number of
        6 https://github.com/JohnLangford/vowpal_                      sentences, clauses and words. In addition, an ordinal num-
     wabbit/wiki                                                       ber of the current candidate antecedent among the others is
196                                                                                                                                      M. Novák

      included. All location and distance features are bucketed                             Czech                        English
                                                                                    Train     Eval test   Train    Eval test CoNLL 2012
      into predefined bins.
                                                                     sents           38k            5k     39k           5k            9.5k
                                                                     words          652k          92k     912k        130k            170k
      (Deep) morpho-syntactic features. They utilize the an-         t-nodes        528k          75k     652k          91k           116k
      notation provided by part-of-speech taggers, parsers and         anaph         92k          14k     103k          15k            15k
      tectogrammatical annotation. Their unary variants capture         Relative     7.2k           1k     6.4k        0.8k               –
      the mention head’s part-of-speech tag and morphological           Reflexive    3.4k         0.6k     0.4k       0.05k            0.1k
                                                                        PP3             –            –      19k        2.4k            4.5k
      features, e.g. gender, number, person, case. As gender            SzPP3         12k           2k        –           –               –
      and number are considered important for resolution of pro-        Zero            –            –      23k        3.2k               –
      nouns, we do not rely on their disambiguation and work            Other         70k         10k      54k         8.0k           10.4k
      with all possible hypotheses. We do the same for some
      Czech words that are in nominative case but disambigua-        Table 1: Basic statistics of used datasets. The class SzPP3
      tion labeled them with the accusative case. Such case is       stands for 3rd person subject zeros, personal and posses-
      a typical source of errors in generating a subject zero as     sive pronouns, while the class PP3 excludes subject zeros.
      it fills a missing nominative slot in the governing verb’s
      valency frame. To discover potentially spurious subject
                                                                     Prague Czech-English Dependency Treebank 2.0 Coref
      zeros, we also inspect if the verb has multiple arguments
                                                                     [22, PCEDT] for Czech and English, respectively. Al-
      in accusative and if the argument in nominative is refused
                                                                     though PCEDT is a Czech-English parallel treebank, we
      by the valency, as it is in the phrase “Zdá se mi, že. . . ”
                                                                     used only its English side. Both treebanks are collections
      (“It seems to me that. . . ”). Furthermore, the unary fea-
                                                                     of newspaper and journal articles. In addition, they both
      tures contain (deep) syntax features including its depen-
                                                                     follow the annotation principles of the theory of Prague
      dency relation, semantic role, and formeme. We exploit
                                                                     tectogrammatics [32]. They also comprise a full-fledged
      the structure of the syntactic tree as well, extracting some
                                                                     manual annotation of coreferential relations.7
      features from the mention head’s parent.
                                                                        Training and evaluation test dataset for Czech are
         Many of these features are combined to binary vari-
                                                                     formed by PDT sections train-* and etest, respec-
      ants by agreement and concatenation. Heuristics used
                                                                     tively. As for English, these two datasets are collected
      in original Treex modules for some anaphor types gave
                                                                     from PCEDT sections 00-18 and 22-24, respectively.8
      birth to another pack of binary features. For instance,
                                                                        In addition, we used the official testset for CoNLL 2012
      the feature indicating if a candidate is the subject of the
                                                                     Shared Task to evaluate English systems [31]. This dataset
      anaphor’s clause should target coreference of reflexive
                                                                     has been sampled from the OntoNotes 5.0 [30] corpus.
      pronouns. Similarly, signaling whether a candidate gov-
                                                                     OntoNotes, and thus CoNLL 2012 testset as well, dif-
      erns the anaphor’s clause should help with resolution of
                                                                     fer from the two treebanks in the following main aspects:
      relative pronouns.
                                                                     (i) coreference is annotated on the surface, where mentions
      Lexical features Lemmas of the mentions’ heads and             of the same entity are co-indexed spans of consecutive
      their parents are directly used as features. Such features     words, (ii) it contains no zeros and relative pronouns are
      may have an effect only if built from frequent words,          not annotated for coreference.9 These differences must be
      though. By using them with an external lexical resource,       reflected when evaluating on this dataset (see Section 5).
      this data sparsity problem can be reduced.                        A basic statistics collected on these datasets is shown
         Firstly, we used a long list of noun-verb collocations      in Table 1. The anaphor types treated by Treex CR cover
      collected by [25] on Czech National Corpus [9]. Having         around 50% and 25-30% of all anaphors in English and
      this statistics, we can estimate how probable is that the      Czech tectogrammatical treebanks, respectively. The main
      anaphor’s governing verb collocates with an antecedent         reason of the disproportion is that we did not include
      candidate.                                                     Czech non-subject zeros to the collection (class Zero).
         Another approach to fight data sparsity is to employ an     Czech subject zeros are merged to a common class with
      ontology. Apart from an actual word, we can include all its    personal and possessive pronouns in 3rd person (class
      hypernymous concepts from the hierarchy as features. We        SzPP3), as they are trained in a joint model (see Sec-
      exploit WordNet [14] and EuroWordNet [38] for English          tion 5). Due the same reason, English personal and posses-
      and Czech, respectively.                                       sive pronouns in 3rd person form a common class PP3. As
         To target proper nouns, we also extract features from       the CoNLL 2012 testset has no annotation of relative pro-
      tags assigned by named entity recognizers run during the       nouns and zeros, Treex CR covers 30% of all the anaphors.
      preprocessing stage.
                                                                         7 See [21] for more information on coreference annotation.
                                                                          8 During development of our system, we employed the rest of the
      4 Datasets                                                     treebanks’ data as development test dataset for intermediate testing.
                                                                          9 Reasons for ignoring relative pronouns in OntoNotes are unclear.
      We exploited two treebanks for training and testing pur-       They might be seen as so tied up with rules of grammar and syntax that
      poses: Prague Dependency Treebank 3.0 [2, PDT] and             annotation of such cases is too unattractive to deal with.
Coreference Resolution System Not Only for Czech                                                                                               197

     5     Experiments and Evaluation                                                      Relative     Reflexive       SzPP3        All

     Our system uses two specialized models for relative and             Count                1,075            579      1,950      3,604
     reflexive pronouns in both languages. The Czech system              Treex
     in addition contains a joint model for subject zeros, per-           CzEng 1.0           57.14          67.57      50.52      55.20
     sonal and possessive pronouns in 3rd person (denoted as              Treex CR            78.40          76.19      61.31      68.46
     SzPP3). The English system contains two more models:
     one for personal and possessive pronouns in 3rd person            Table 2: F-scores of Czech coreference resolvers mea-
     (denoted as PP3) and another one for zeros.                       sured on all anaphor types both separately and altogether.
                                                                       The type SzPP3 denotes 3rd person subject zeros, personal
     Systems to compare. To show performance of Treex CR               and possessive pronouns.
     in a context, we evaluated multiple other systems on the
     same data. Since currently there is no other publicly avail-
     able system for Czech to our knowledge, we compare it               • pred(ai ) if the CR system claims ai is anaphoric,
     with the original Treex set of modules for coreference. The
     set consists of rule-based modules for relative and reflex-         • both(ai ) if both the system and gold annotation claim
     ive pronouns, and a supervised model for SzPP3 mentions.              ai is anaphoric and the antecedent found by the sys-
     It has been previously used for building a Czech-English              tem belongs to the transitive closure of all mentions
     parallel treebank CzEng 1.0 [5].                                      coreferential with ai in the gold annotation.
        We also report performance of the English predecessor             After aggregating these counts over all anaphor candi-
     of Treex CR used to build CzEng 1.0. It comprises a rule-         dates, we compute the final Precision, Recall and F-score
     based module for relative pronouns and zeros, and a joint         as follows:
     supervised model for reflexives and PP3 mentions. In ad-
     dition, we include the Stanford Core NLP toolkit to the                         both(ai )              both(ai )           2PR
                                                                            P=∑                    R=∑                   F=
     evaluation. It contains three approaches to full-fledged CR                  ai pred(ai )           ai true(ai )           P+R
     that all claimed to improve over the state of the art at the
     time of their release: deterministic [18], statistical [7], and   To evaluate only a particular anaphor type, the aggregation
     neural [8]. In fact, the neural system has not been outper-       runs over all anaphor candidates of the given type.
     formed, yet.                                                         The presented evaluation schema, however, needs to be
        Stanford Core NLP predicts surface mentions, which is          adjusted for the CoNLL 2012 dataset. As mentioned in
     not compatible with the evaluation schema designed for            Section 4, in this dataset relative pronouns are not consid-
     tectogrammatical trees. The surface mentions thus must            ered coreferential and zeros are missing at all. As a result,
     be transformed to the tectogrammatical style of corefer-          a system that marks such expressions as antecedents would
     ence annotation, i.e. the mention heads must be connected         be penalized. We thus apply the following patch specifi-
     with links. We may use the information on mention heads           cally for the CoNLL 2012 dataset to rectify this issue. If
     provided by the Stanford system itself. However, by using         the predicted antecedent is a zero or a relative pronoun,
     this approach results we observed completely contradic-           instead of using it directly we follow the predicted coref-
     tory results on different datasets. Manual investigation on       erential chain until the expression outside of these two cat-
     a sample of the data revealed that often the Stanford sys-        egories is met. The found expression is then used to calcu-
     tem in fact identified a correct antecedent mention, but se-      late the counts, as described above. If no such expression
     lected a head different to the one in the data. Most of these     is found, the direct antecedent is used, even if it is a zero
     cases, e.g. company names like “McDonald’s Corp.” or              or a relative pronoun.
     “Walt Disney Co.”, have no clear head, though. There-                All the scores presented in the rest of the paper are F-
     fore, we decided to use the gold tectogrammatical tree to         scores.
     identify the head of the mention labeled by the Stanford
     system. Even though employing gold information for sys-           Results and their analysis. Table 2 shows results of eval-
     tem’s decision is a bad practice, here it should not affect       uation on the Czech data. The Czech version of Treex CR
     the result so much and we use it only for the third-party         succeeded in its ambition to replace the modules used in
     systems, not for our Treex CR.                                    Treex until now. It significantly10 outperformed the base-
                                                                       line for each of the anaphor type, with the overall score
     Evaluation measure. Standard evaluation metrics (e.g.             by 13 percentage points higher. The jump for relative pro-
     MUC [37], B3 [1]) are not suitable for our purposes as            nouns was particularly high.
     they do not allow for scoring only a subset of mentions.             The analysis of improved examples for this category
     Instead, we use a measure similar to scores proposed by           shows that apart from the syntactic principles used in the
     [36]. For an anaphor candidate ai , we increment the three        rule-based module, it also exploits other symptoms of
     following counts:
                                                                          10 Significance has been calculated by bootstrap resampling with a

         • true(ai ) if ai is anaphoric in the gold annotation,        confidence level 95%.
198                                                                                                                                       M. Novák


                                                            PCEDT Eval                                 CoNLL 2012 test set
                                        Relative      Reflexive   PP3           Zeros       All      Reflexive  PP3        All
                  Count                       842             49      2,494     3,260     6,645            111    4,583     4,710
                  Stanford
                    deterministic            1.16          55.67     63.65       0.00    34.96           71.11    60.55     60.79
                    statistical              0.00         63.74      72.71       0.00    39.09           80.56    71.07     71.29
                    neural                   0.00         70.97      76.36       0.00    41.56           80.73    70.45     70.70
                  Treex
                   CzEng 1.0               70.64           65.93      73.52     28.48     55.34         76.02     67.93    68.13
                   Treex CR                75.99           81.63      74.11     45.37     60.87         79.65     66.64    66.96

      Table 3: F-scores of English coreference resolvers measured on all anaphor types both separately and altogether. The type
      PP3 denotes personal and possessive pronouns in 3rd person.


      coreference. The most prominent are agreement of the                       less, in all the cases the performance gaps are not so big
      anaphor and the antecedent in gender and number as well                    and thus it is reasonable using Treex CR for further exper-
      as the distance between the two. It also succeeds in iden-                 iments in the future.
      tifying non-anaphoric examples, for instance interrogative                   To best of our knowledge, no analysis of how Stanford
      pronouns, which use the same forms.                                        systems perform for individual anaphor types has been
         Results of evaluation on the English data are highlighted               published, yet. Interestingly, our result show that even
      in Table 3. Similarly to the Czech system, the English                     though the overall performance of the neural system on the
      version of Treex CR outperforms its predecessor in Treex                   CoNLL 2012 testset is reported to be higher [8], for per-
      by a large margin of 15 percentage points on the PCEDT                     sonal and possessive pronouns in third person it is slightly
      Eval testset. Most of it stems from a large improvement                    outperformed by the statistical system. However, as the
      on the biggest class of anaphors, zeros. Unlike for Czech                  evaluation on the PCEDT Eval testset shows completely
      relative pronouns, the supervised CR is not the only rea-                  the opposite, we cannot arrive at any conclusion on their
      son for this leap. It largely results from the extension that              mutual performance comparison on this anaphor type.
      we made to the method for adding zero arguments of non-
      finite clauses (see Section 3.1). Consequently, the cover-
      age of these nodes compared to their gold annotation rose
      from 34% to 80%. Comparing these two versions of the                       6      Conclusion
      Treex system on the CoNLL 2012 testset, we see a differ-
      ent picture. The systems’ performances are more similar,
      the baseline system for PP3 even slightly outperforms the
      new Treex CR.                                                              We described Treex CR, a coreference resolver not only
         As for the comparison with the Stanford systems, we                     for Czech. The main feature of the system is that it op-
      should not look at the scores aggregated over all the                      erates on the tectogrammatical layer, which allows it to
      anaphor types under scrutiny, because Stanford systems                     address also coreference of zeros. The system uses a su-
      apparently do not address zeros and relative pronouns.11                   pervised model, supported by a very rich set of linguis-
      In fact, the Stanford systems try to reconstruct coreference               tic features. We presented modules for processing Czech
      as it is annotated in OntoNotes 5.0.                                       and English and evaluated them on several datasets. For
         The classes of reflexive and PP3 pronouns are the only                  comparison, we conducted the evaluation with the prede-
      ones within the scope of all the resolvers. The Stanford de-               cessors of Treex CR and three versions of the Stanford
      terministic system seems to be consistently outperformed                   system, one of which was a state-of-the-art neural resolver
      by all the other approaches. Performance rankings on re-                   for English. Our system seems to have outperformed the
      flexive pronouns differ for the two datasets, which is prob-               baseline system on Czech. On English, although it could
      ably the consequence of low frequency of reflexives in the                 not outperform the best approaches in the Stanford sys-
      datasets. Regarding the PP3 pronouns, Treex CR does not                    tem, its performance is high enough to be used in future
      achieve the performance of the state-of-the-art Stanford                   experiments. Furthermore, it may be used for resolution of
      neural system. On the CoNLL 2012 testset it is outper-                     anaphor types that are ignored by most of the coreference
      formed even by the Stanford statistical system. Neverthe-                  resolvers for English, i.e. relative pronouns and zeros.
         11 On the other hand, they address coreference of nominal groups and
                                                                                    In the future work, we would like to use Treex CR in
      pronouns in first and second person. Treex CR does not provide Czech
                                                                                 cross-lingual coreference resolution, where the system is
      or English models for these classes, so far. Nevertheless, experimental    applied on parallel corpus and thus it may take advantage
      projection-based models already exist for German and Russian [27].         of both languages.
Coreference Resolution System Not Only for Czech                                                                                              199

     Acknowledgments                                                           Germany, 2016. Association for Computational Linguis-
                                                                               tics.
     This project has been funded by the GAUK grant 338915                 [9] CNC. Czech national corpus – SYN2005, 2005.
     and the Czech Science Foundation grant GA-16-05394S.                 [10] Pascal Denis and Jason Baldridge. A Ranking Approach
     This work has been also supported and has been using lan-                 to Pronoun Resolution. In Proceedings of the 20th Inter-
     guage resources developed and/or stored and/or distributed                national Joint Conference on Artifical Intelligence, pages
     by the LINDAT/CLARIN project No. LM2015071 of the                         1588–1593, San Francisco, CA, USA, 2007. Morgan Kauf-
     Ministry of Education, Youth and Sports of the Czech Re-                  mann Publishers Inc.
     public.                                                              [11] Pascal Denis and Jason Baldridge. Specialized Models and
                                                                               Ranking for Coreference Resolution. In Proceedings of
                                                                               the Conference on Empirical Methods in Natural Language
     References                                                                Processing, pages 660–669, Stroudsburg, PA, USA, 2008.
                                                                               Association for Computational Linguistics.
       [1] Amit Bagga and Breck Baldwin. Algorithms for Scoring           [12] Ondřej Dušek, Jan Hajič, and Zdeňka Urešová. Verbal Va-
           Coreference Chains. In In The First International Confer-           lency Frame Detection and Selection in Czech and English.
           ence on Language Resources and Evaluation Workshop on               In The 2nd Workshop on EVENTS: Definition, Detection,
           Linguistics Coreference, pages 563–566, 1998.                       Coreference, and Representation, pages 6–11, Stroudsburg,
       [2] Eduard Bejček, Eva Hajičová, Jan Hajič, Pavlína Jínová,          PA, USA, 2014. Association for Computational Linguis-
           Václava Kettnerová, Veronika Kolářová, Marie Mikulová,             tics.
           Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lu-         [13] Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin
           cie Poláková, Magda Ševčíková, Jan Štěpánek, and Šárka            Majliš, Michal Novák, and David Mareček. Formemes
           Zikánová. Prague Dependency Treebank 3.0, 2013.                     in English-Czech Deep Syntactic MT. In Proceedings of
       [3] Shane Bergsma and David Yarowsky. NADA: A Robust                    the Seventh Workshop on Statistical Machine Translation,
           System for Non-referential Pronoun Detection. In Proceed-           pages 267–274, Montréal, Canada, 2012. Association for
           ings of the 8th International Conference on Anaphora Pro-           Computational Linguistics.
           cessing and Applications, pages 12–23, Berlin, Heidelberg,     [14] Christiane Fellbaum. WordNet: An Electronic Lexical
           2011. Springer-Verlag.                                              Database (Language, Speech, and Communication). The
       [4] Ondřej Bojar, Ondřej Dušek, Tom Kocmi, Jindřich Li-              MIT Press, 1998.
           bovický, Michal Novák, Martin Popel, Roman Sudarikov,          [15] Aria Haghighi and Dan Klein. Coreference Resolution in
           and Dušan Variš. CzEng 1.6: Enlarged Czech-English                  a Modular, Entity-centered Model. In Human Language
           Parallel Corpus with Processing Tools Dockered. In Text,            Technologies: The 2010 Annual Conference of the North
           Speech, and Dialogue: 19th International Conference,                American Chapter of the Association for Computational
           TSD 2016, number 9924 in Lecture Notes in Artificial In-            Linguistics, pages 385–393, Stroudsburg, PA, USA, 2010.
           telligence, pages 231–238, Heidelberg, Germany, 2016.               Association for Computational Linguistics.
           Springer International Publishing.                             [16] Jan Hajič et al. Prague Dependency Treebank 2.0. CD-
       [5] Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra             ROM, Linguistic Data Consortium, LDC Catalog No.:
           Galuščáková, Martin Majliš, David Mareček, Jiří Maršík,          LDC2006T01, Philadelphia, 2006.
           Michal Novák, Martin Popel, and Aleš Tamchyna. The Joy         [17] Lucie Kučová and Zdeněk Žabokrtský. Anaphora in Czech:
           of Parallelism with CzEng 1.0. In Proceedings of the 8th In-        Large Data and Experiments with Automatic Anaphora. In
           ternational Conference on Language Resources and Eval-              Lecture Notes in Artificial Intelligence, Proceedings of the
           uation (LREC 2012), pages 3921–3928, Istanbul, Turkey,              8th International Conference, TSD 2005, volume 3658 of
           2012. European Language Resources Association.                      Lecture Notes in Computer Science, pages 93–98, Berlin /
       [6] Eugene Charniak and Micha Elsner. EM Works for Pro-                 Heidelberg, 2005. Springer.
           noun Anaphora Resolution. In Proceedings of the 12th           [18] Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael
           Conference of the European Chapter of the Association for           Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s
           Computational Linguistics, pages 148–156, Stroudsburg,              Multi-Pass Sieve Coreference Resolution System at the
           PA, USA, 2009. Association for Computational Linguis-               CoNLL-2011 Shared Task. In Proceedings of the Fifteenth
           tics.                                                               Conference on Computational Natural Language Learn-
       [7] Kevin Clark and Christopher D. Manning. Entity-Centric              ing: Shared Task, pages 28–34, Portland, Oregon, USA,
           Coreference Resolution with Model Stacking. In Proceed-             2011. Association for Computational Linguistics.
           ings of the 53rd Annual Meeting of the Association for         [19] Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan
           Computational Linguistics and the 7th International Joint           Hajič. Non-projective Dependency Parsing Using Span-
           Conference on Natural Language Processing (Volume 1:                ning Tree Algorithms. In Proceedings of the Conference
           Long Papers), pages 1405–1415, Beijing, China, 2015. As-            on Human Language Technology and Empirical Methods
           sociation for Computational Linguistics.                            in Natural Language Processing, pages 523–530, Strouds-
       [8] Kevin Clark and Christopher D. Manning. Improving                   burg, PA, USA, 2005. Association for Computational Lin-
           Coreference Resolution by Learning Entity-Level Dis-                guistics.
           tributed Representations. In Proceedings of the 54th An-       [20] Ruslan Mitkov. Anaphora Resolution. Longman, London,
           nual Meeting of the Association for Computational Lin-              2002.
           guistics (Volume 1: Long Papers), pages 643–653, Berlin,
200                                                                                                                                 M. Novák

      [21] Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie          [33] Drahomíra Spoustová, Jan Hajič, Jan Votrubec, Pavel Kr-
           Mikulová, and Jiří Mírovský. Coreference in Prague                bec, and Pavel Květoň. The Best of Two Worlds: Coop-
           Czech-English Dependency Treebank. In Proceedings                  eration of Statistical and Rule-based Taggers for Czech.
           of the 10th International Conference on Language Re-               In Proceedings of the Workshop on Balto-Slavonic Natu-
           sources and Evaluation (LREC 2016), pages 169–176,                 ral Language Processing: Information Extraction and En-
           Paris, France, 2016. European Language Resources Asso-             abling Technologies, pages 67–74, Stroudsburg, PA, USA,
           ciation.                                                           2007. Association for Computational Linguistics.
      [22] Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie          [34] Jana Straková, Milan Straka, and Jan Hajič. Open-Source
           Mikulová, and Jiří Mírovský. Prague czech-english depen-          Tools for Morphology, Lemmatization, POS Tagging and
           dency treebank 2.0 coref, 2016.                                    Named Entity Recognition. In Proceedings of 52nd An-
      [23] Vincent Ng. Supervised Noun Phrase Coreference Re-                 nual Meeting of the Association for Computational Lin-
           search: The First Fifteen Years. In Proceedings of the 48th        guistics: System Demonstrations, pages 13–18, Baltimore,
           Annual Meeting of the Association for Computational Lin-           Maryland, 2014. Association for Computational Linguis-
           guistics, pages 1396–1411, Stroudsburg, PA, USA, 2010.             tics.
           Association for Computational Linguistics.                    [35] Jana Straková, Milan Straka, and Jan Hajič. Open-Source
      [24] Giang Linh Nguy. Návrh souboru pravidel pro analýzu                Tools for Morphology, Lemmatization, POS Tagging and
           anafor v českém jazyce. Master’s thesis, MFF UK, Prague,          Named Entity Recognition. In Proceedings of 52nd An-
           Czech Republic, 2006. In Czech.                                    nual Meeting of the Association for Computational Lin-
      [25] Giang Linh Nguy, Václav Novák, and Zdeněk Žabokrtský.             guistics: System Demonstrations, pages 13–18, Baltimore,
           Comparison of Classification and Ranking Approaches to             Maryland, 2014. Association for Computational Linguis-
           Pronominal Anaphora Resolution in Czech. In Proceedings            tics.
           of the SIGDIAL 2009 Conference, pages 276–285, London,        [36] Don Tuggener. Coreference Resolution Evaluation for
           UK, 2009. The Association for Computational Linguistics.           Higher Level Applications. In Gosse Bouma and Yannick
      [26] Michal Novák and Anna Nedoluzhko. Correspondences                  Parmentier, editors, Proceedings of the 14th Conference
           between Czech and English Coreferential Expressions.               of the European Chapter of the Association for Computa-
           Discours: Revue de linguistique, psycholinguistique et in-         tional Linguistics, EACL 2014, April 26-30, 2014, Gothen-
           formatique., 16:1–41, 2015.                                        burg, Sweden, pages 231–235. The Association for Com-
                                                                              puter Linguistics, 2014.
      [27] Michal Novák, Anna Nedoluzhko, and Zdeněk Žabokrtský.
           Projection-based Coreference Resolution Using Deep Syn-       [37] Marc Vilain, John Burger, John Aberdeen, Dennis Con-
           tax. In Proceedings of the 2nd Workshop on Coreference             nolly, and Lynette Hirschman. A Model-theoretic Coref-
           Resolution Beyond OntoNotes (CORBON 2017), pages 56–               erence Scoring Scheme. In Proceedings of the 6th Con-
           64, Valencia, Spain, 2017. Association for Computational           ference on Message Understanding, pages 45–52, Strouds-
           Linguistics.                                                       burg, PA, USA, 1995. Association for Computational Lin-
                                                                              guistics.
      [28] Václav Novák and Zdeněk Žabokrtský. Feature engineer-
           ing in maximum spanning tree dependency parser. volume        [38] Piek Vossen. Introduction to EuroWordNet. Computers
           4629, pages 92–98, Berlin / Heidelberg, 2007. Springer.            and the Humanities, Special Issue on EuroWordNet, 32(2–
                                                                              3), 1998.
      [29] Martin Popel and Zdeněk Žabokrtský. TectoMT: Modular
           NLP Framework. In Proceedings of the 7th International
           Conference on Advances in Natural Language Processing,
           pages 293–304, Berlin, Heidelberg, 2010. Springer-Verlag.
      [30] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue,
           Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen
           Zhang, and Zhi Zhong. Towards Robust Linguistic Anal-
           ysis using OntoNotes. In Proceedings of the Seventeenth
           Conference on Computational Natural Language Learn-
           ing, pages 143–152, Sofia, Bulgaria, 2013. Association for
           Computational Linguistics.
      [31] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue,
           Olga Uryupina, and Yuchen Zhang. CoNLL-2012 Shared
           Task: Modeling Multilingual Unrestricted Coreference in
           OntoNotes. In Joint Conference on Empirical Meth-
           ods in Natural Language Processing and Computational
           Natural Language Learning - Proceedings of the Shared
           Task: Modeling Multilingual Unrestricted Coreference in
           OntoNotes, EMNLP-CoNLL 2012, pages 1–40, Jeju, Ko-
           rea, 2012. Association for Computational Linguistics.
      [32] Petr Sgall, Eva Hajičová, Jarmila Panevová, and Jacob
           Mey. The meaning of the sentence in its semantic and prag-
           matic aspects. Springer, 1986.

</pre>