=Paper= {{Paper |id=Vol-2203/130 |storemode=property |title=A Study on Bilingually Informed Coreference Resolution |pdfUrl=https://ceur-ws.org/Vol-2203/130.pdf |volume=Vol-2203 |authors=Michal Novak |dblpUrl=https://dblp.org/rec/conf/itat/Novak18 }} ==A Study on Bilingually Informed Coreference Resolution== https://ceur-ws.org/Vol-2203/130.pdf
S. Krajči (ed.): ITAT 2018 Proceedings, pp. 130–137
CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Michal Novák



                           A Study on Bilingually Informed Coreference Resolution

                                                                          Michal Novák

                                                   Charles University, Faculty of Mathematics and Physics
                                                         Institute of Formal and Applied Linguistics
                                                       Malostranské náměstí 25, CZ-11800 Prague 1
                                                              mnovak@ufal.mff.cuni.cz

      Abstract: Coreference is a basic means to retain coher-                     big parallel data might be leveraged in a weakly supervised
      ence of a text that likely exists in every language. How-                   manner to boost the models trained in a monolingual way.
      ever, languages may differ in how a coreference relation                        The present work is concerned with corpus-based bilin-
      is manifested on the surface. A possible way how to mea-                    gually informed CR on Czech-English texts. Specifically,
      sure the extent and nature of such differences is to build                  it focuses on resolution of pronouns and zeros, as these are
      a coreference resolution system that operates on a parallel                 the coreferential expressions whose grammatical and func-
      corpus and extracts information from both language sides                    tional properties differ considerably across the languages.
      of the corpus. In this work, we build such a bilingually                    For instance, whereas in English most of non-living ob-
      informed coreference resolution system and apply it on                      jects are referred to with pronouns in neuter gender (e.g.
      Czech-English data. We compare its performance with                         “it”, “its”), genders are distributed more evenly in Czech.
      the system that learns only from a single language. Our                     Information on Czech genders thus may be useful to fil-
      results show that the cross-lingual approach outperforms                    ter out English candidates that are highly improbable to be
      the monolingual one. They also suggest that a system for                    coreferential with the pronoun. By comparison of its per-
      Czech can exploit the additional English information more                   formance with a monolingual approach and by thorough
      effectively than the other way round. The work concludes                    analysis of the results, our work aims at discovering the
      with a detailed analysis that tries to reveal the reasons be-               extent and nature of such differences.
      hind these results.                                                             The paper is structured as follows. After mentioning
                                                                                  related work (Section 2), we introduce a coreference re-
                                                                                  solver (Section 3), both its monolingual and cross-lingual
      1    Introduction                                                           variants. Section 4 describes the dataset used in experi-
                                                                                  ments in Section 5. Before we conclude, the results of
      Cross-lingual techniques are becoming still more and                        experiments are thoroughly analyzed (Section 6).
      more popular. Even though they do not circumvent
      the task of Coreference Resolution (CR), the research is
      mostly limited to cross-lingual projection. Other cross-                    2 Related Work
      lingual techniques remain a largely unexplored area for
      this task.                                                                  Building a bilingually informed CR system requires a par-
         One of the yet neglected cross-lingual techniques is                     allel corpus with at least the target-language side annotated
      called bilingually informed resolution. It is an approach,                  with coreference. Even these days very few such corpora
      in which decisions in a particular task are made based on                   exist, e.g. Prague Czech-English Dependency Treebank
      the information from bilingual parallel data. Parallel texts                2.0 Coref [14], ParCor 1.0 [9] and parts of OntoNotes 5.0
      must be available when a method is trained, but also at                     [19].
      test time, that is when a trained model is applied to new                      It is thus surprising that the peak of popularity for such
      data. In real-world scenarios, the availability of parallel                 approach was reached around ten years before these cor-
      data at test time requires the technique to apply a machine                 pora had been published. Harabagiu and Maiorano [10]
      translation service to acquire them (MT-based bilingually                   present an heuristics-based approach to CR. The set of
      informed resolution).                                                       heuristics is expanded by exploiting the transitivity prop-
         Nevertheless, for limited purposes it may pay off to use                 erty of coreferential chains in a bootstrapping fashion.
      human-translated parallel data instead (corpus-based bilin-                 Moreover, they expand the heuristics even more, follow-
      gually informed resolution). If it outperforms the mono-                    ing mention counterparts in translations of source English
      lingual approach, it may be used in building automatically                  texts to Romanian with coreference annotation. Mitkov
      annotated parallel corpora. Such corpora with more reli-                    and Barbu [13] adjust a rule-based pronoun coreference
      able annotation could be useful for corpus-driven theoret-                  resolution system to work on a parallel corpus. After pro-
      ical research.1 Furthermore, it can be also used for au-                    viding a linguistic comparison of English and French pro-
      tomatic processing. For instance, improved resolution on                    nouns and their behavior in discourse, the authors distill
                                                                                  their findings into a set of cross-lingual rules to be inte-
          1 In case a cross-lingual origin of the annotation does not matter.
                                                                                  grated into the CR system. In evaluation, they observe im-
A study on bilingually informed coreference resolution                                                                                                           131

      provements in resolution accuracy of up to 5 percentage                        The tectogrammatical layer is also the place, where
      points compared to the monolingual approach.                                coreference relations should be annotated. It is technically
         As for more recent works, the authors of [5] address                     represented as a link between two coreferential nodes:3 the
      the task of overt pronoun resolution in Chinese. Among                      anaphor (the referring expression) and the antecedent (the
      the others they propose an MT-based bilingually informed                    referred expression).
      approach. A model is built on Chinese coreference, ex-                         Each input text must be first automatically pre-
      ploiting Chinese features. These are augmented with En-                     processed up to this level of linguistic annotation. The CR
      glish features, extracted from the Chinese texts machine-                   system based on supervised machine learning then takes
      translated to English. It allows for taking advantage of                    advantage of the information available in the annotation.
      English nouns’ gender and number lists, which according
      to authors correspond to the distribution of genders and                    Pre-processing. The input text must undergo an analysis
      numbers over Chinese nouns.                                                 producing a tectogrammatical representation of its sen-
         Experiments of Novák and Žabokrtský [17], the first                      tences before coreference resolution is carried out. We
      ones using bilingually informed CR on Czech-English                         use pipelines for analysis of Czech and English available
      data, are most relevant to the present work. With the focus                 in the Treex framework [18]. The analysis starts with a
      on English personal pronouns only, their best cross-lingual                 rule-based tokenization, morphological analysis and part-
      configuration managed to outperform the monolingual CR                      of-speech tagging (e.g. [21] for Czech), dependency pars-
      by one F-score point. Taking advantage of a more devel-                     ing to surface trees (e.g. MST parser [12] for English)
      oped version of their CR system, we extend their work in                    and named entity recognition [22]. In addition, the NADA
      several directions. First, we explore the potential of such                 tool [3] is applied to help distinguish referential and non-
      approach for a wider range of English coreferential expres-                 referential occurrences of the English pronoun “it”.
      sions. Next, we perform experiments in the opposite direc-                     Tectogrammatical trees are created by a transformation
      tion, i.e. Czech CR informed by English. And finally, we                    from the surface trees. All function words are made hid-
      provide a very detailed analysis of the results unveiling the               den, morpho-syntactic information is transferred and se-
      nature of the cross-lingual aid.                                            mantic roles are assigned to tectogrammatical nodes [4].
                                                                                  On the tectogrammatical layer, certain types of ellipsis can
                                                                                  be restored. The automatic pre-processing focuses only on
      3    Coreference Resolution System                                          restoring nodes that might be anaphoric. Such nodes are
                                                                                  added by heuristics based on syntactic structures. The re-
      For coreference resolution we adopt a more developed ver-                   stored nodes include Czech zero subjects and both Czech
      sion of the resolver utilized in [17]. This new version                     and English zeros in non-finite clauses, e.g. zero relative
      builds on the monolingual Treex CR system [15], and aug-                    pronouns, unexpressed arguments in infinitives, past and
      ments it with the cross-lingual extension presented in [17].                present participles.
      The difference between the current system and the sys-
      tem in [17] lies mostly in that it can target a wider range                 Model design. Treex CR models coreference in a way to
      of expressions, it exploits a richer feature set and the pre-               be easily optimized by supervised learning. Particularly,
      processing stage analyzing the text to the tectogrammati-                   we use logistic regression with stochastic gradient descend
      cal representation is of higher quality. Instead of listing all             optimization implemented in the Vowpal Wabbit toolkit.4
      the changes, we briefly introduce the monolingual (Sec-                     Design of the model employs multiple concepts that have
      tion 3.1) and the cross-lingual component (Section 3.2) of                  proved to be useful and simple at the same time.
      Treex CR from the scratch.2                                                    Given an anaphor and a set of antecedent candidates,
                                                                                  mention-ranking models [6] are trained to score all the
                                                                                  candidates at once. On the one hand a mention-ranking
      3.1 Monolingual Resolution                                                  model is able to capture competition between the candi-
      Treex CR operates on the tectogrammatical layer. It is                      dates, but on the other hand features describe solely the
      a layer of deep syntax based on the theory of Functional                    actual mentions, not the whole clusters built up to the mo-
      Generative Description [20]. The tectogrammatical repre-                    ment. Antecedent candidates for an anaphor (both positive
      sentation of a sentence is a dependency tree with rich lin-                 and negative) are selected from the context window of a
      guistic features consisting of the content words only. Fur-                 predefined size.
      thermore, some surface ellipses are restored at this layer.                    No anaphor detection stage precedes the coreference
      It includes anaphoric zeros (e.g. zero subjects in Czech,                   resolution. Unless another measure was taken, it would
      unexpressed arguments of non-finite clauses in both En-                     lead to all occurrences of the pronoun “it” labeled as ref-
      glish and Czech) that are introduced in the tectogrammat-                   erential, for instance. Nevertheless, the model determines
      ical layer with a newly established node.                                       3 A mention is determined only by its head in tectogrammatics. No

                                                                                  mention boundaries are specified. Therefore, it is sufficient for a corefer-
                                                                                  ence link to determine only two nodes, the mentions’ head nodes.
           2 Please refer to [15] for more details on the monolingual component       4 https://github.com/JohnLangford/vowpal_

      of the system.                                                              wabbit/wiki
132                                                                                                                          Michal Novák

      whether the anaphor is referential jointly with selecting its    Alignment. It is central for our cross-lingual approach to
      antecedent. This is ensured by adding a dummy candi-             have the English and Czech texts aligned on the level of
      date representing solely the anaphor itself. By selecting        tectogrammatical nodes. The alignment is based on un-
      this candidate, the model claims that the anaphor is in fact     supervised word alignment performed by MGIZA++ [8]
      non-referential.                                                 trained on the data from CzEng 1.0 [4], and projected to
         Diverse properties of various types of coreferential rela-    the tectogrammatical layer. Furthermore, it is augmented
      tions (e.g. different referential scopes of personal and rela-   with a supervised method [17] addressing selected corefer-
      tive pronouns) encouraged us to model individual anaphor         ential expressions, including potentially anaphoric zeros.
      types separately. A specialized model is build for (1) per-
      sonal and possessive pronouns in 3rd person (and zero sub-       Features. Cross-lingual features describe the nodes
      jects in Czech), (2) reflexive pronouns, (3) relative pro-       aligned to the coreferential candidates in the target lan-
      nouns, and (4) zeros in non-finite clauses. Treex CR runs        guage – the anaphor candidate and the antecedent candi-
      them in a sequence.                                              date. To collect such nodes, we follow the alignment links
                                                                       connected to these two candidates. For each of the nodes,
      Features. The pre-processing stage enriches a raw text           we take at most one of its aligned counterparts. In this
      with a substantial amount of linguistic information. Fea-        way, we obtain at most two nodes aligned to the pair of
      ture extraction stage then uses this material to yield fea-      potentially coreferential nodes, for which we can extract
      tures consumable by the learning method. Features are al-        cross-lingual features. If no aligned counterpart is found,
      ways related to at most two nodes – an anaphor candidate         no cross-lingual features are added.
      and an antecedent candidate.                                        We extract two sets of cross-lingual features:
         The features can be divided into three categories.
      Firstly, location and distance features indicate positions of        • aligned_all: it consists of all the features contained
      the anaphor and the antecedent candidate in a sentence and             in a monolingual set for a given aligned language;
      their mutual distance in terms of words, clauses and sen-            • aligned_coref : it consists of a single indicator fea-
      tences. Secondly, a big group of features reflects (deep)              ture, assigning the true value only if the two aligned
      morpho-syntactic aspects of the candidates. It includes the            nodes belong to the same coreferential entity. This
      mention head’s part-of-speech tag and morphological fea-               feature can be activated only if there exists a mono-
      tures (e.g. gender, number, person, case), (deep) syntax               lingual coreference resolver for the aligned language.
      features (e.g. dependency relation, semantic role) as well             We employ Treex CR and its monolingual models for
      as some features exploiting the structure of the syntactic             English and Czech, but any CR system, even a rule-
      tree. Many of the features are combined by concatena-                  based one, could be used.
      tion or by agreement, i.e. indicating whether the anaphor’s
      value agrees with antecedent’s one. Finally, lexical fea-           We do not manually construct features combining both
      tures focus on lemmas of the mentions’ heads and their           language sides. Nevertheless, such features are formed au-
      parents. These are used directly or through the frequen-         tomatically by the machine-learning tool Vowpal Wabbit.
      cies collected in a large data of Czech National Corpus [1]
      indexed in a list of noun-verb collocations. Furthermore,        4     Datasets
      all hypernymous concepts of a mention are extracted as
      features from ontologies (e.g. WordNet [7]) and named            We employ Prague Czech-English Dependency Treebank
      entity labels are also employed.                                 2.0 Coref [14, PCEDT 2.0 Coref] to train and test our CR
                                                                       systems. It is a Czech-English parallel corpus, consisting
      3.2 Cross-lingual Extension                                      of almost 50k sentence pairs (more on its basic statistics is
                                                                       shown in the upper part of the Table 1). The English part
      The extension enables bilingually informed CR. Like the          of the treebank is based on texts from the Wall Street Jour-
      monolingual CR, it addresses coreference in one target           nal collected for the Penn Treebank [11]. The Czech part
      language at a time. However, instead of data in single lan-      was manually translated from English. All texts have been
      guage, it must be fed with parallel data in two languages.       annotated at multiple layers of linguistic representation up
      Both language sides (Czech and English in this case) of          to the tectogrammatical layer.
      the data must be first pre-processed with the pipelines an-         Although PCEDT 2.0 Coref has been extensively anno-
      alyzing the texts up to the diagrammatically layer. Fur-         tated by humans, we strip almost all manual annotations
      thermore, to facilitate the access to important information      and replace it by the output of the pre-processing pipeline
      in the other language, the pre-processing stage also seeks       (see Sections 3.1 and 3.2). The only manually annotated
      for alignment between tectogrammatical nodes. The bilin-         information that we retain are the coreferential links.
      gually informed approach then augments the monolingual              We do not split the data into train and test sections.
      features with those accessing the other side of the paral-       All the experiments are conducted using 10-fold cross-
      lel data. Design of the model remains the same as for the        validation.
      monolingual approach.
A study on bilingually informed coreference resolution                                                                                                   133


             Mention type               Czech         English                                   Czech                          English
                                                                          Mention type
             Sentences                   49,208         49,208                           monoling     biling             monoling     biling
             Tokens                   1,151,150      1,173,766            Personal       63.84
                                                                                         61.24   62.51   67.82
                                                                                                         64.38   66.06   76.34
                                                                                                                         71.37   73.77   78.57
                                                                                                                                         72.64   75.49
             Tecto. nodes               931,846        838,212            Possessive     71.93
                                                                                         71.51   71.72   75.73
                                                                                                         74.85   75.29   80.07
                                                                                                                         79.54   79.81   81.46
                                                                                                                                         81.00   81.23
             Mentions (total)            183,277         188,685          Refl. poss.    85.61
                                                                                         85.42   85.52   87.70
                                                                                                         87.04   87.36             —               —
              Personal pron.               3,038          14,887          Reflexive      66.91
                                                                                         56.60   61.33   67.24
                                                                                                         55.66   60.90   77.31
                                                                                                                         72.67   74.92   75.88
                                                                                                                                         71.01   73.37
              Possessive pron.             3,777           9,186          Zero subj.     73.18
                                                                                         55.46   63.10   78.88
                                                                                                         57.64   66.61              —               —
              Refl. poss. pron.            4,389              —           Zero nonfin.   78.98
                                                                                         41.51   54.42   81.52
                                                                                                         42.63   55.98   71.48
                                                                                                                         54.62   61.92   73.31
                                                                                                                                         54.75   62.68
              Reflexive pron.              1,272             484          Relative       81.51
                                                                                         79.94   80.72   83.48
                                                                                                         81.62   82.54   83.47
                                                                                                                         76.23   79.69   85.76
                                                                                                                                         77.13   81.21
              Zero subject                16,875              —           Total          76.83
                                                                                                 70.52   80.27
                                                                                                                 73.09   75.93
                                                                                                                                 70.19   77.85
                                                                                                                                                 71.41
                                                                                         65.17           67.09           65.26           65.95
              Zero in nonfin. cl.          6,151          29,759
              Relative pron.              15,198           8,170      Table 2: Anaphora scores of monolingual and bilingually
              Other                      132,577         126,199      informed coreference resolution.

       Table 1: Basic and coref. statistics of PCEDT 2.0 Coref.


                                                                      by the system as anaphoric, recall averages over all true
                                                                      anaphoric mentions. A decision on an anaphor candidate
         As mentioned in Section 3.1, our CR system consists          is correct if the system correctly labels it as non-anaphoric
      of four models targeting different types of mentions as         or the antecedent found by the system really belongs to the
      anaphors. In evaluation, we split the anaphor candidates to     same entity as the anaphor. In the following tables, we use
      even finer categories, namely: (1) personal pronouns, (2)       R F to format the three components of the anaphora score.
                                                                      P

      possessive pronouns, (3) reflexive possessive pronouns,
      (4) reflexive pronouns, all four types of pronouns in the       Bilingually informed vs. Monolingual CR. Table 2 lists
      3rd or ambiguous person, (5) zero subjects, (6) zeros in        the anaphora scores measured on the output of 10-fold
      non-finite clauses, and (7) relative pronouns (the statistics   cross-validation. In overall, cross-lingual models succeed
      of coreferential mentions is collected in the bottom part of    in exploiting additional knowledge from parallel data and
      Table 1). Driven by the findings in an analysis of Czech-       perform better than the monolingual approach. The F-
      English correspondences [16], these expressions are very        score improvement benefits mainly from a rise in preci-
      interesting from a cross-lingual point of view, as they often   sion, but recall also gets improved. In both languages,
      transform to a different type or carry different grammati-      personal and possessive pronouns are the types that ex-
      cal properties, when translated. We assume this aspect is       hibit the greatest improvement. In Czech, the top-scoring
      not so significant in case of nominal groups, for instance,     mention types include zero subjects, too. Nevertheless,
      which represent the majority of remaining mentions. The         English as an aligned language seems to have a stronger
      other types grouped under the category Other are demon-         impact on resolution in Czech (the difference between the
      strative pronouns, pronouns in 1st and 2nd person etc. This     systems is 2.5 F-score points) than Czech has on resolution
      category of anaphors is not targeted by our CR method.          in English (the difference of 1.2 F-score points).

      5    Experiments                                                6      Analysis of the Results
      The following experiments compare the performance of            The results of experiments undoubtedly show the superi-
      the monolingual and bilingually informed system. Both           ority of the cross-lingual CR over the monolingual one.
      systems are trained on the PCEDT dataset. All the design        Here, we delve more into the comparison of these two
      choices (except for the feature sets) and hyperparameter        approaches. Firstly, we conduct a quantitative analysis
      values are shared by both systems.                              of resolvers’ decisions. It should show how many deci-
                                                                      sion changes the switch to the cross-lingual approach in-
      Evaluation measure. We expect different mention types to        troduces for individual mention types and what is the role
      behave differently in the cross-lingual approach. Standard      of anaphoricity in these changes. Secondly, we inspect
      evaluation metrics (e.g. MUC [23], B3 [2]), however, do         randomly sampled examples in a qualitative analysis. We
      not allow for scoring only a subset of mentions. Instead,       attempt to disclose what are the typical examples when the
      we use the anaphora score, an anaphor-decomposable              system benefits from the other language and, on the other
      measure proposed by [15]. The score consists of three           hand, if there is a systematic case when the cross-lingual
      components: precision, recall, and F-score as a harmonic        approach hurts.
      mean of the previous two. While precision expresses the
      success rate of a system averaged over all mentions labeled
134                                                                                                                                  Michal Novák


                     Mention type                            Anaph                                    Non-anaph
                                             Both X      Both × M > C           M C         M C),                                                     the resolution deteriorates with cross-lingual features. The
         • C’s decision was correct while M’s decision was in-                  systems’ decisions differ the least for Czech reflexive pos-
           correct (M < C).                                                     sessive (7%) and English relative pronouns (6%). Here,
                                                                                we also observe a various effect on anaphora score. While
      A decision is either assignment of the anaphor candidate to               the resolution of Czech reflexive possessives is hardly im-
      a coreferential entity5 or labeling it as non-anaphoric. The              proved by English features, the small amount of changed
      tables also distinguish if the candidate is in fact anaphoric             decisions on English relative pronouns suffices to achieve
      or non-anaphoric. Numbers in the tables represent pro-                    one of the biggest improvements among English corefer-
      portions (in %) of these categories aggregated over all in-               ential expressions.
      stances. Every row thus sums to 100%.                                         Anaphora scores in Table 2 have already shown that ba-
         Conditioning on anaphoricity allows us to directly relate              sic reflexive pronouns are the only mention type, where the
      this analysis to the anaphora scores shown in Table 2. Note               cross-lingual approach falls behind the monolingual one.
      that while resolution on anaphoric mentions may have an                   The quantitative analysis of changed decisions confirms
      effect on both the precision and the recall component of                  it, especially for anaphoric occurrences.
      the anaphora score, resolution on non-anaphoric mentions                      The gains of the Czech cross-lingual system on non-
      affects only the precision.                                               anaphoric mentions can be attributed mostly to zeros. Also
         Changed decisions account for around 10% in both                       thanks to the resolution on non-anaphoric mentions, the
      Czech and English. More importantly, whereas we see                       highest margin between the proportion of improved and
      over 7% of decisions changed positively in Czech, it cor-                 worsened instances (5%) is observed on Czech zero sub-
      responds to 5.5% of decisions in English. This accords                    jects. It leads to one of the biggest improvement in terms
      with the extents of improvement observed on anaphora                      of the anaphora F-score (see Table 2).
      score. In Czech, a difference between improved and wors-
      ened decisions is only a bit higher for anaphoric mentions.
                                                                                6.2 Qualitative Analysis
      It means that the positive effect of English on resolution
                                                                                In the following, we scrutinize more closely what are the
            5 Some  of the anaphors that were assigned to the same entity       typical cases, where the cross-lingual system makes a dif-
      (columns Both X and Both ×) may have been in fact paired with dif-
                                                                                ferent decision.
      ferent antecedents by each of the CR algorithms. As our anaphora score
      is agnostic to such changes, we do not distinguish such cases.
A study on bilingually informed coreference resolution                                                                                                             135


                    Mention type                                 Anaph                                           Non-anaph
                                               Both X        Both × M > C             M C              M