The annotation coreference task at
      IberEval’2017: the experience of CLUL/UE

            Amália Mendes1 , Sandra Antunes1 , and Paulo Quaresma2
                1
               Center for Linguistics of the University of Lisbon, Portugal
            2
              Computer Science Department, University of Évora, Portugal
    amaliamendes@letras.ulisboa.pt, sandra.antunes@gmail.com, pq@uevora.pt


        Abstract. In this paper the process of coreference annotation in Por-
        tuguese texts in the context of a task of IberEval 2017 is described and
        the main observed problems are discussed. The work was done by a team
        of researchers from the Centre for Linguistics of the University of Lisbon
        (CLUL) and from the Computer Science Department of the University
        of Évora (UE). Due to time constraints and the complexity of the task,
        only researchers from CLUL were able to finish successfully the annota-
        tion process. The main problems are presented and discussed and some
        possible solutions are proposed. Nevertheless, the obtained results are
        similar with the overall results of the task.


1     Introduction
We report here our annotation experience in the scope of the coreference anno-
tated corpus task at IberEval 2017. The first task was for each team to select
a set of texts to be made available for annotation. The texts selected by the
CLUL/UE team are taken from the LE-PAROLE Corpus, a 3 million words
corpus of European Portuguese from different genres that was compiled for the
LE-PAROLE project as the Portuguese counterpart of a set of comparable cor-
pora of 20 European languages to be made available free of copyrights. For each
language, a subset of 250.000 words was also annotated for POS and manually
revised [5]. We selected texts from this corpus to ensure that the texts would be
cleared for copyright issues and that the results of the coreference annotation
task could be freely distributed to the community. Another reason was that this
corpus allowed us to dispose of texts from a set of different genres. For the coref-
erence annotation task, texts were meant to have a maximum of 1200 words. We
selected the shortest texts of newspapers from the LE-PAROLE and, when the
texts were longer, we adjusted the length of the document.
    The annotation was performed using the editor CorrefVisual [6], developed
by the Group of Natural Language Processing PLN-PUCRS at Pontifı́cia Uni-
versidade Católica do Rio Grande do Sul. We first annotated a text sent by the
organizing committee for training and to report any problem that we might en-
counter with the editor. The result of the annotation of this text was then sent
back to the organization of the task. This first stage of the task was very impor-
tant to get used to the editor and to the guidelines, and for a first impression of
the task.
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


               The goal was then for each element of our team to annotate 10 texts, that
           were selected out of the set of texts sent by all the teams. Annotator1 (SA)
           annotated the full 10 texts, annotator2 (AM) annotated only 6 texts due to
           time constraints. All other annotators failed to finish the task: two of them due
           to difficulties with the computational process (they were not able to install and
           to run the annotator) and two of them due to time constraints and the linguistic
           complexity associated with the task (these annotators background was computer
           science).
               Our experience annotating with the CorrefVisual editor is reported in section
           2, some issues in identifying coreference relations are discussed in section 3, as
           well as our interannotator agreement in section 4, and some final remarks are
           presented in section 5.


           2      Working with the CorrefVisual editor
           The tool CorrefVisual [6] runs in Java and allows the edition of texts previously
           annotated with CORP [3, 2], a nominal coreference resolution tool for the Por-
           tuguese language. Two different operating systems (windows and ios) were used
           during the annotation and the tool worked well on both systems. It is important,
           however, to point out that two annotators were not able to install CorrefVisual
           in their computers due to problems with the operating system and Java versions.
           As they were not able to obtain technical help in time, they were not able to
           successfully finish the annotation.
               We received a short description of the tool and its functionalities that was
           extremely useful to get acquainted with the editor and how to proceed with the
           annotation3 . We also received guidelines with a description of the task, an expla-
           nation of the concept of coreference and examples of coreference chains and also
           negative examples. The guidelines were clear and well structured but, of course,
           considering the complexity of the task, we encountered several cases that were
           not considered in the guidelines and that we will discuss in section 3. Moreover,
           two annotators, having a good knowledge in computer science but a weak lin-
           guistic background, failed to understand all the concepts and implications of the
           task and they were not able to annotate the texts. This point clearly shows the
           high difficulty and the requirement of very specific skills for this task.
               The two successful annotators in our team worked independently on their
           annotations and didn’t discuss the work among them. They relied solely on the
           guidelines that were made available by the organizing team. This was done on
           purpose to properly evaluate interannotator agreement.
               The main problem that was encountered with the editor was the delimitation
           of phrases: when changing the length of a phrase, the editor returned an error
           message saying that more than one phrase was selected, even when only one
           phrase was highlighted. After reporting on this issue, there was information to
           use the ESC key but making sure that at least one free phrase was selected,
            3
                http://www.inf.pucrs.br/linatural/wordpress/index.php/recursos-e-
                ferramentas/correfvisual/


                                                                                                                        84
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           otherwise the unselection wouldn’t work. This solved the problem and it was
           then easy to unselect all current selections (even invisible ones) and proceed
           with the selection and manipulation of a single phrase.
               The NPs are not editable in the editor, and they have to be selected out
           of the list of NPs that have been automatically identified in the preprocessing
           stage. Some of the NPs were automatically attributed to a reference chain and
           the remaining ones were listed in a separate window. The annotator task was
           to verify the contents of each suggested reference chain and to modify it when
           necessary, using the list of NPs in the separate window (free nominal phrases).
               The NPs in the reference chains were frequently inadequate and in this case it
           was necessary to remove the NPs to the list of free nominal phrases (by dragging
           it to the window) or to manipulate the NP to obtain the right length. When the
           NP that was aimed for was not available in the list of free nominal phrases,
           another NP close in the context had to be selected and its length manipulated
           until it fitted. In selecting a different NP, it was crucial to make sure that it
           would not be required for another reference chain.
               For instance, there are two different reference chains in Após a sua con-
           stituição: the NP refers to one reference chain and the possessive pronoun to
           another one. In order to identify the possessive separately, one had to choose
           a free nominal phrase close by in the context and remove and add tokens until
           the possessive was correctly identified. In other cases, it was not possible to find
           a sequence, among the free nominal phrases available, that could correctly cap-
           ture the NP that we were aiming for. For instance, according to the guidelines,
           the apposition should be identified as a separate referential expression, as in o
           feiticeiro ( o psiquiatra colectivo ou o Moreno de então ), but there was no free
           nominal phrase that would single out the part between curved brackets.
               To identify the correct NP in the list of free nominal phrases, the annotator
           could use the function Search nominal phrases, that would highlight the NPs
           containing the sequence that was queried. This was very helpful for the task.
           Several NPs could be highlighted as the result of the search (although not visible
           in the screen) and the annotators had to remind to check the results of the search
           before dragging an NP to the reference chain box, otherwise all the highlighted
           NPs would be dragged together. The identification of the correct NP in the list
           of free nominal phrases was anyway time consuming, especially when several
           NPs had very similar content. This involved checking them one by one in the
           context before selecting the right one.
               Some of the problems that we experienced in the identification of the nominal
           phrases are due to the preprocessing of the texts, namely tokenization and the
           grouping of tokens as named entities. For instance, titles were grouped together
           with the first token of the following sentence as one named entity, as in Hospital
           de Castelo Branco O Hospital Distrital and also Linha de o Corgo Está. Some
           named entities that were automatically identified contained more lexical material
           than required. For instance, in the sequence Extinção do Gabinete de Planeamen-
           to e de Coordenação do Combate à Droga, we couldn’t eliminate the first two
           tokens Extinção do and had to select the whole sequence. The tokenization pro-


                                                                                                                        85
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           cess wrongly treated the accusative pronoun nos as the contraction of em and os,
           in example (1). Consequently, we included both tokens as part of the reference
           chain.
           (1)      Para em os explicarmos temos de ir buscar a ponta a o princı́pio de o
                    século (file dn81701)
               The post-verbal accusative or dative clitic is usually treated as an indepen-
           dent token that can be identified separately as part of a reference chain (as in
           (2-a)), but in some cases, the tokenization didn’t separate verb and clitic, and
           both tokens had to be selected as part of the chain, as illustrated in (2-b).
           (2)      a.    o homem não devia obedecer a a natureza , mas sim vencê –la
                          (dn88218).
                    b.    desafiar e vencer a natureza contrariando-a (dn88218)
               In several cases, the manipulation of the length of the NP would create an
           incorrect tokenization by including parts of the previous or following token. For
           instance, when modifying a sequence to obtain a single token que, it included
           the first letter of the following token hoje. The result of the annotation is que h
           and it couldn’t be corrected.


           3     Linguistic issues in annotating coreference
           Near Identity
           In many contexts, it is difficult to establish with absolute certainty that two
           NPs are coreferent [4]. For instance, in the initial training phase, we decided
           to treat as coreferent the NP os primeiros cães domésticos and the NP cães
           domésticos that occurs in the larger NP fósseis de cães domésticos. It can be
           debated whether the two are coreferent: although the second NP refers to fossils
           which are consequently old, it might not refer exactly to the fossils of the first
           domestic dogs. The two NPs were treated as coreferent to avoid dividing the
           data into many reference chains. This brings about the question of Near-Identity,
           which will be treated according to the level of granularity and the general goals of
           the annotation. Another example from the training phase is the NP diversidade
           genética and the noun diferenças that were treated as coreferent because the
           differences were interpreted contextually as genetic differences. This reference
           chain was already automatically pre-identified in the CorrefVisual editor.
               In another case, we annotated two near corereferent NPs as part of different
           reference chains. In example (3), ser humano and a humanidade are treated as
           non coreferent due to the explicit mention of their different scope in the context,
           although they appear to be used in the rest of the text as synonyms.
           (3)      o ser humano e, por extensão, a humanidade (dn81201)
           Nominal phrases may be lexically distinct but very similar in terms of their
           reference, as in examples (4-a) and (4-b), where estações de recolha and estações


                                                                                                                        86
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           meteorológicas refer to the same entity and à escala mundial and o planeta refer
           to the same scope of the network. We considered the two NPs as part of the
           same reference chain.
           (4)      a.    a rede de estações de recolha a a escala mundial (pu92214)
                    b.    a rede de estações meteorológicas de o planeta (pu92214)
               The following case raises even more questions about what can be considered
           as coreferent. In example (5-a), the nouns modelos matemáticos and computa-
           dores are modified by an adjectival phrase with very specific lexical material.
           In example (5-b), the same nouns are modified by the less informative adjective
           melhores. Could we consider that the better models and computers that the sec-
           ond example mentions are the ones capable of modelizing the weather and the
           meteorological conditions? We consider that it is indeed the case, based on the
           context, and annotated as coreferent.
           (5)      a.    Há alguns anos , faltavam estações de observação e não havia modelos
                          matemáticos nem computadores capazes de modelizar o clima e as
                          condições meteorológicas (pu92214)
                    b.    Considera que os principais problemas consistem em a falta de dados
                          de base , de melhores modelos matemáticos , de melhores computa-
                          dores (pu92214)

           Quantification
           The quantification of the NPs raises many questions regarding the annotation
           of coreference. The NPs descargas elétricas atmosféricas and um raio, in (6-a)
           denote the same type of entity in the text and differ in terms of their register
           (more vs. less specialized). Although the first is in the plural form and the
           second in the singular, they both have a generic reading that points to a case of
           coreference (or near identity). Compare, however, (6-a) with (6-b): in the second
           sentence, the NP also has a generic reading but it is quantified. The question is
           whether it should be included in the same reference chain. However, quantified
           NPs are not considered coreferent: even if they denote the same type of entity,
           they refer to a specific subset of those entities. These issues should be made
           explicit in the annotation guidelines.
           (6)      a.    Lago de Maracaibo , em a Venezuela , apresenta a concentração
                          mais elevada de descargas elétricas atmosféricas de o mundo . Em
                          algum lugar de o mundo está caindo um raio em este momento .
                          (Texto11.txt)
                    b.    44 descargas elétricas atmosféricas a cada segundo (Texto11.txt)

           Modality
           The presence of epistemic modal markers (i.e, lexical markers that express values
           such as uncertainty, possibility) raises issues in terms of the annotation of coref-
           erence [1]. The same entity is referred to through the view of a different source


                                                                                                                        87
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           in examples (7-a) and (7-b) (the NPs are in italic, while the modal marker is
           underlined). This would mean that the coreference exists only for this specific
           source: for instance, aquela organização do trabalho and um dos primeiros passos
           num caminho que tende a levar longe have the same reference for the source,
           namely o sacrossanto poder do patrão na empresa, but it is clear that the author
           of the text disagrees with this view. Coreference would then be dependent on
           the view of each source.

           (7)      a.    os depoimentos de quem viveu a cadeia , desde o velho Navel a o
                          recente Haraszti , deixam -nos a impressão de um trabalho destru-
                          idor...” (Texto11.txt)
                    b.    Aquela organização de o trabalho , a o conferir poderes a os trabal-
                          hadores sobre as condições de o trabalho , é vista por o sacrossanto
                          poder de o patrão em a empresa como um de os primeiros passos em
                          um caminho que tende a levar longe . de aı́ resistências (Texto11.txt)


           In example (8), modality does not refer to the viewpoint of another source.
           Modality is expressed by a conditional clause and the equivalence between the
           two NPs is only valid if the condition applies. In this specific case, the condition
           is to be pragmatically understood as a goal, so that modality doesn’t seem to
           affect the existence of coreference.

           (8)      É o principal objectivo de a OMM e é a coisa mais sensata a fazer se
                    queremos compreender o fenómeno de o aquecimento global e de as suas
                    implicações.


           Embedded and coordinated NPs


           One of the first issues faced during the annotation was whether an embedded
           NP could be part of another reference chain. For instance, the NP illustrated in
           (9) would refer to the two reference chains indicated in the example. We treated
           embedded NPs as part of other reference chains, when applicable.

           (9)      o estudo das sociedades primitivas (dn81201)
                    reference chain 1: o estudo das sociedades primitivas
                    reference chain 2: as sociedades primitivas

           The same issue arose in what concerns coordinated NPs. For instance, could the
           NP os primatas in (10) be included in the reference chain os antropóides? We
           treated coordinated and embedded NPs similarly.

           (10)      Os animais inferiores , os primatas e o homem primitivo constituem uma
                     linha evolutiva que se revela sempre mais complexa , até desembocar em
                     o salto qualitativo que é a civilização . (dn81201)


                                                                                                                        88
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           Modified NPs
           Another issue is whether a modified nominal head can be considered indepen-
           dently of its modifiers. For instance, in (5-a) the complex nominal head modelos
           matemáticos and the head computadores are modified by an adjectival phrase.
           We considered that the modifier had to be included since it restricts the reference
           of the nominal.

           Implicit content
           The antecedent of an anaphoric element can be implicit in the context. For ex-
           ample, the meetings refered in (11-b) are the meetings of the Comission refered
           in (11-a), so this entity is implicit in the NP: essas reuniões [da Comissão Inter-
           governamental de Negociação]. The question is whether the NP in (11-b) should
           be considered as part of the reference chain of the entity Comissão Intergover-
           namental de Negociação.
           (11)      a.    Uma Comissão Intergovernamental de Negociação (pu92214)
                     b.    Essas reuniões
           An NP can refer to information that is scattered in the previous context, as the
           NP todas essas estações de observação que é preciso montar ou reactivar. The
           demonstrative relates to information showed in italic, but there is no clear NP
           that could be considered coreferent with the underlined NP and such cases were
           not annotated:
           (12)      A falta de estações de recolha de dados é a que nos parece de maior im-
                     portância superar , para permitir melhorar a fiabilidade de as previsões
                     . Quanto a os modelos que se usam actualmente , são muito rigorosos
                     e não nos parecem responsáveis por a imprecisão de as previsões : é a
                     qualidade de os dados que se fornecem a o computador que as condiciona
                     . em uma previsão de 24 horas para um paı́s de a Europa ocidental ,
                     por exemplo , é preciso ter dados de toda a Europa , de uma grande parte
                     de África , etc. Se quisermos uma previsão a quatro dias , já se torna
                     necessário cobrir cerca de metade de o globo . E , para uma previsão
                     além dos os quatro ou cinco dias , são precisos dados de todo o globo .
                     Mesmo que se tenha uma densidade óptima de estações de observação
                     em Portugal , não podemos esperar uma boa previsão , nem sequer para
                     Portugal , se o resto de o mundo não estiver bem coberto . Existem mais
                     de dez mil estações terrestres de observação em o mundo e mais de 1500
                     estações atmosféricas . (...)
                     Como espera financiar todas essas estações de observação que é necessá-
                     rio montar ou reactivar?

           Relatives
           Restrictive relatives were included in the NP because they contribute to establish
           the reference of the NP. There could be two possible annotations of such cases,


                                                                                                                        89
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           illustrated in (13-a) and (13-b). Option 1 (a single NP) was preferred due to
           the fact that the restrictive relative is included in the NP and is crucial for the
           reference, but their was some hesitation in the annotation.

           (13)      a gorila a quem M. Patterson conseguiu...
                     a. [a gorila a quem M. Patterson conseguiu. . . ]
                     b. [a gorila] a [quem]


           Length of the NPs

           Other constructions beside relative clauses raise questions about what to include
           in the NP of a reference chain. For instance, in the case of the context illustrated
           in (4-a), the issue was whether à escala mundial should be included in the NP or
           not. The fact that a similar NP, illustrated in (4-b), occurred in the text lead to
           the selection of the whole sequence. In contexts such as (14), the parenthetical
           segment wasn’t included because it is not essential to the reference of the NP
           (just as a non restrictive relative would also be left out).

           (14)      a definição e concretização de uma estrutura associativa empresarial
                     sólida e eficaz, essencialmente de base regional (dn81625)


           4      Results and interannotator agreement

           Based on two files annotated by both annotators (pu92214 and dn81201), the
           interannotator agreement reached a moderate kappa value of 0.40. We inspected
           our results and compared the annotation of these files. In both files, Annotator2
           treated more reference chains than Annotator1: 65 vs. 40 chains for pu92214
           and 75 vs. 43 chains for dn81201. There is an average of 37 common reference
           chains in the two annotations. The number of NPs in these common chains is
           very similar among the two annotators. The differences lie mostly in the anno-
           tation of embedded NPs, and we believe that this could be easily improved with
           explicit mention in the guidelines. There is also a difference in the number of
           pronominal elements in the chains (demonstrative, personal, possessive and rel-
           ative pronouns) and in the length of some NPs. For example, in the file dn81201,
           22 reference chains that were identified by Annotator2 (and not by Annotator1)
           involve relative pronouns, as can be observed in (15-a) and (15-b).

           (15)      a.    homem revelou aquela alma espiritual que hoje parece ser suficiente

                     b.    tal como um carro que começa a diminuir a velocidade por falta de
                           combustı́vel

           The annotation had to deal with clear cases of Identity but also with Near-
           Identity relations, predicative relations and bridging. Results are positive con-
           sidering the complexity of the task, the lack of training of the annotators and the
           level of granularity of the guidelines that were distributed. Also, we believe that


                                                                                                                        90
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)


           the manual edition of the NPs would have made the task easier and provided
           conditions for a higher level of agreement among the annotators.


           5     Conclusion
           The main conclusion of our participation in this task is the high level of difficulty
           of the annotation process: it requires some computer science skills and a high
           linguistic knowledge. As a consequence of these requirements all the annotators
           without strong linguistic background failed to finish the task.
               The use of pre-identified NPs is also a potential problem: if they are correct
           it helps the annotation process; but if they are incorrect this brings an overhead
           to the process having, as a consequence, the need to manually undo the initial
           annotation. The same problem is associated with the use of the visual editor: it
           helps when the NPs are correct but it showed to lack stronger editing options,
           allowing to easily change pre-identified segments.
               In spite of all the described problems we believe this task allowed us to
           better understand the complexity and the details of coreference annotation and
           to contribute to the creation of a reference annotated corpus for the Portuguese
           language.


           6     Acknowledgments
           This work was partially supported by national funds through FCT – Fundação
           para a Ciência e Tecnologia, under projects PEst-OE/LIN/UI0214/2013 and
           PEst/CEC/UI04668/2013.


           References
           1. Bouma, G., Daelemans, W., Hendrickx, I., Hoste, V., Mineur, A.: The corea-project,
              manual for the annotation of coreference in dutch texts. Tech. rep. (2007)
           2. Fonseca, E.B., Vieira, R., Vanin, A.: Corp: Coreference resolution for portuguese.
              In: 12th International Conference on the Computational Processing of Portuguese,
              Demo Session (PROPOR) (2016)
           3. Fonseca, E.B., Sesti, V., Antonitsch, A., Vanin, A.A., Vieira, R.: Corp - uma abor-
              dagem baseada em regras e conhecimento semântico para a resolução de corre-
              ferências. Linguamatica 9(1), 3–18 (2017)
           4. Mendes, A.: Organização textual e articulação de orações. pp. 1691–1755. Gramática
              do Português, vol. II. Lisboa: Fundação Calouste Gulbenkian (2013)
           5. do Nascimento, M.F.B., Mendes, A., Pereira, L.: Providing on-line access to por-
              tuguese language resources: corpora and lexicons. pp. 1825–1828. Proceedings of the
              IV International Conference on Language Resources and Evaluation - LREC2004,
              Lisbon, Centro de Cultural de Belém (2004)
           6. Tubino, M.d.O., Silva, M.M.S.: Visualização, manipulação e refinamento de corre-
              ferência em lı́ngua portuguesa. Trabalho de conclusão de curso, Pontifı́cia Universi-
              dade Católica do Rio Grande do Sul (2015)


                                                                                                                        91