Text Structure and Its Ambiguities: Corpus Annotation as a
                                Helpful Guide
                                Šárka Zikánová
                                Charles University, Faculty of Mathematics and Physics, Malostranské nám. 25, 118 00 Prague 1, Czech Republic


                                                                       Abstract
                                                                       It is typical for natural languages that their texts can be understood differently by individual recipients. A number of scientific
                                                                       disciplines, from cognitive psychology to linguistics, are devoted to this phenomenon. In this study, we focus mainly on
                                                                       linguistic factors, which may lead to different interpretations of coherence relations in the text (simply speaking, what is
                                                                       related to what and how). This work presents a pilot typological survey of disagreements in Czech corpus annotations
                                                                       of coherence relations (discourse relations, coreference, information structure) and their common features. Polysemy
                                                                       (polyfunctionality) and semantic underspecification of coherent expressions (e.g. discourse connectives), generic / abstract
                                                                       meaning of autosemantic words, presence of attribution constructions, word order as a potential marker of information
                                                                       structure and text size appear to be essential factors for disagreement in interpretation. In addition, subjective reception of
                                                                       the relative importance of different text parts plays an important role, too. Based on the observation of the material, we raise
                                                                       questions and propose possible steps for the ongoing research of variability in the perception of text coherence.

                                                                       Keywords
                                                                       inter-annoator agreement, human label variation, discourse relations, coreference, information structure


                                1. Introduction                                                                                        unfamiliarity with the annotation scenario. That is the
                                                                                                                                       reason why these data are often re-annotated later. To
                                The availability of digital language resources enables an prevent these kinds of inconsistent analysis of the data,
                                important step forward in linguistic research, both for annotators usually attend frequent trainings; simultane-
                                its theoretical as well as applicational orientation. The ously, their feedback at the beginning of the annotation
                                originally collected data serving mostly for the study of may improve annotation scenario and point out some
                                the lexical studies and those of the study of syntax proper problematic points in the underlying theory. Before re-
                                gave an impulse to enrich them by various more sophis- leasing data, annotators’ mistakes are searched for and
                                ticated annotation systems dealing with most different corrected, e.g. a simple overseeing of phenomena that
                                phenomena, going beyond the sentence boundary and should be marked; nevertheless, some of the mistakes can
                                incl. e.g. text coherence and phenomena related to infer- remain even in the final data. Last, but not least source of
                                encing, and elaborating more levels of granularity in the the disagreement in the annotation is language vagueness,
                                annotation. The annotated data serve for different tasks polysemy and homonymy: in some cases, a language itself
                                in the computational processing of natural languages – as allows for several understandings of a sentence.
                                training and testing data for the development of language                                                 Computational linguistics offers several methodologi-
                                models.                                                                                                cal approaches to this variability of the data annotation.
                                   Human data annotation is a process based on interpre- One of the solutions is unification: a gold standard is set,
                                tation of observed phenomena and thus may lead to differ- e.g. by majority voting or by a third judge.
                                ent outcomes. This variation is caused by various factors.                                                Another, more demanding way of data unification is a
                                Some of them are connected with the shortcomings of the joint annotation, when annotators mark the data together,
                                annotation scenario (e.g., not providing instructions for discussing each single case and marking the result of their
                                the solution of some cases) or with the leaks of the under- discussion only.
                                lying theory (e.g., non-intuitive solutions or discerning                                                 In order to accept and capture the uncertainty annota-
                                too fine categories, very close to each other). Other cases tors can face while marking language phenomena, some
                                of inter-annotator disagreement are connected with the annotation scenarios with hierarchical classifications al-
                                learning process of annotators: especially the first anno- low the use of more general levels of the classifications,
                                tated batches of data may be influenced by the annotators’ not discerning the finest classification differences in du-
                                                                                                                                       bious cases. Another way how to mark the annotators’
                                Conference ITAT (Information Technologies—Applications and Theory),
                                2024: Drienica, Čergovské vrchy, Slovakia                                                              certainty is a separate marking of their confidence as a
                                $ zikanova@ufal.mff.cuni.cz ( Zikánová)                                                                specific feature (e.g., (a) a discourse relation is marked as
                                 https://ufal.mff.cuni.cz/sarka-zikanova ( Zikánová)                                                  a conjunction and (b) the annotator was absolutely sure
                                 0000-0002-7805-9649 ( Zikánová)                                                                      about his solution). It is necessary to say that annotator’s
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).                                                   high certainty does not necessarily mean that his solution
                                 CEUR
                                 Workshop
                                 Proceedings
                                               http://ceur-ws.org
                                               ISSN 1613-0073
                                                                    CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
is the only possible one; in some cases, another annotator      3.1. Discourse relations
can be equally convinced about a different reading.
                                                                Discourse relations connect so called discourse argu-
   Unification is not the only way how to handle the data.
                                                                ments (clauses, sentences or larger text segments) and
Some researchers argue that unification may result in
                                                                express certain semantic relation between the arguments.
biased data missing important information about variabil-
                                                                They are prototypically expressed by discourse connec-
ity of language understanding [1]. Consequently, biased
                                                                tives (conjunctions, subjunctions, discourse adverbs etc.),
language models are developed based on this data. There-
                                                                but they may be formally unexpressed, either. The for-
fore, annotators are allowed to mark multiple description
                                                                mer type of relations is called explicit discourse relations,
of the same phenomenon in some approaches, (e.g., in the
                                                                the latter relations are implicit.
Penn Discourse Treebank 3.0 [2], a single discourse rela-
                                                                   <Arg1: She enjoyed working in the office> <Arg2: be-
tion can be marked as an instantiation and cause at the
                                                                cause REASON she had pretty flowers there.>
same time, if the annotator understands it in this way).
                                                                   In our data, we work with the data from the following
Other annotation projects publish their data with partial
                                                                discourse corpora:
or complete multiple annotations carried out by different
annotators; in such data, personal solutions of similar    (a) Prague Dependency Treebank 2.0 [12] and 3.0
language phenomena can be observed systematically (cf.     [13]. The annotation scenario of the Prague Dependency
Czech RST Discourse Treebank, [3]).                        Treebank was motivated by the approach of the Penn
                                                           Discourse Treebank ([14], following the Lexical Tree-
                                                           Adjoining Grammar [15]) and is based on the Functional
2. Aim of the study                                        Generative Description [16] as applied in the family of
                                                           Prague Dependency Treebanks. It discerns 23 semantic
In our research, we deal with the annotation variation
                                                           types of discourse relations, such as conjunction, disjunc-
from a different perspective, from the linguistic and psy-
                                                           tion, concession, generalization etc.; the discourse con-
cholinguistic point of view, with focusing on human lan-
                                                           nectives are marked explicitly. The annotation is carried
guage understanding. We use data with variations as a
                                                           out on so called tectogrammatic (syntactico-semantic)
source of phenomena that are regularly understood in
                                                           dependency trees which allows the discourse annotation
different ways and we search for possible common fea-
                                                           to be related to syntactico-semantic level of a language.
tures of different readings. We pay special attention to
                                                           The data in the corpus are in Czech.
the cues that are inherent to a language, rather than to
the diversity among humans receiving the texts.            (b) Enriched Discourse Annotation of Prague Dis-
   Questions of human language understanding have course Treebank Subset 1.0 (PDiT-EDA 1.0, [17] The
been addressed on a theoretical level, e.g. in psycholin- annotation scenario follows the approach of the Prague
guistics or lexical and syntactic semantics. In our study, Dependency Treebank; the annotation is enriched with
we want to take use of our practical long term experi- marking of implicit discourse relations.
ence with large amounts of language data and possibly
to offer some new insights into the variation of language (c) Data comparing underspecification of discourse
interpretation or to contribute to theoretical discussions connectives in five languages (English, French,
with practical findings.                                   Czech, Hungarian, Lithuanian) as published in [7].
                                                           The annotation scenario is based on the Crible’s classifi-
                                                           cation of discourse relations [7] discerning 15 discourse
3. Data: Text Coherence                                    relations (e.g., opening, addition, topic-shift). Unlike the
                                                           Praguian discourse approach, Crible’s classification takes
     Annotation                                            into account broader pragmatic aspects of discourse (so
Multiple reading may result at many language levels and called domains), explicitly discerning ideational, rhetor-
perspectives, such as lexical semantics (cf. polysemy of ical, sequential, and interpersonal domains where the
the word bank as an institution and as a river bank), mor- discourse relations are used.
phology (homonymous singular and plural form, like              (d) Czech RST Discourse Treebank 1.0 [3]. The anno-
sheep or fish), syntax (having an old friend for dinner) etc.   tation scenario is based on the Rhetorical Text Structure
Our research is restricted to the area of text coherence        Theory as applied in the Potsdam Commentary Corpus
in general. Specifically, our data cover multiple annota-       [18]. This theory assumes that text as a whole is built
tions of the following phenomena: discourse relations,          from a smaller segments which are all interconnected by
coreference, and information structure (3.1–3.3).               discourse relations, without any part being left aside. It
                                                                discerns 37 discourse relations (e.g., concession, conces-
                                                                sion as nucleus, textual preparation). A specific feature
                                                                of RST is that it puts emphasis on different levels of com-
 Phenomenon       Source                               Language        Amount of multiple annotations           Reference
 Discourse        Prague Dependency Treebank 2.0       Czech           44 documents, 2084 sentences; 2          [4], [5]
 relations                                                             annotators
                  Enriched Discourse Annotation        Czech           12 documents, 233 sentences; 2           [6]
                  of Prague Discourse Treebank                         annotators
                  Subset 1.0
                  Unpublished parallel multilin-       English,        3 documents, 234 sentences, 4720         [7]
                  gual annotation of discourse con-    Czech,          words in the original English; 1-2
                  nectives in TED talks in five lan-   French,         annotators for each language
                  guages                               Hungarian,
                                                       Lithuanian
                  Czech RST Discourse Treebank         Czech           5 documents, 63 sentences, 2 an-         [8]
                  1.0                                                  notators
 Coreference      Prague Dependency Treebank 2.0       Czech           2 annotators, the number of of           [9]
                                                                       texts and sentences is not pre-
                                                                       sented
                  Prague Dependency Treebank 3.0       Czech           5 documents, 180 sentences, 2-3          [10]
                                                                       annotators
 Information      Prague Dependency Treebank 2.0       Czech           879 sentences annotated by 6 an-         [11]
 structure        Control data annotated indepen-                      notators, 9825 sentences anno-
                  dently from the PDT annotation                       tated by 3 annotators
                  scenario

Table 1
Multiple annotations of text coherence (data overview)


municative importance of discourse arguments, mark-              3.3. Information Structure
ing more important and less important parts (nucleus
                                                            Information structure of a sentence expresses a commu-
and satellite, respectively) in every discourse relation.
                                                            nicative importance of single parts of a sentence in a
Relations with balanced importance of both parts are
                                                            given context. In general, it captures a topic (what the
described as multinuclear.
                                                            sentence is about) and a focus of a sentence (what new
                                                            information is said about the topic), cf. (context: There is
3.2. Coreference                                            a cat under the tree.) It TOPIC is ready for a jump FOCUS .
Coreferential relations connect expressions with the            Our data about information structure come from an
same reference, such as The girl looked into her map, she   experiment      carried out on the data of the Prague De-
looked like she was enjoying the adventure. Madelein had    pendency     Treebank    2.0 [12] where information structure
a great sense of orientation. The arguments of coreferen- is marked on dependency trees       1
                                                                                                    on the tectogrammatic
tial relations are prototypically noun phrases (nouns, pro- (syntactico-semantic)      level.
nouns) including dropped phrases (While [she] walking
through the landscape, she admired the nature’s beauty.).
A coreferential relation may also hold between a larger
text segment, such as a whole thought or paragraph and
a summarizing pronoun it / this etc.                        1
                                                              According to the Functional Generative Approach [16], a tectogram-
   We use coreference data including disagreement in the matic tree consists of nodes which prototypically correspond to
annotation coming from the Prague Dependency Tree- autosemantic words; the nodes are connected by edges expressing
bank 2.0 [12] and 3.0 [13] where coreference is a part of syntactico-semantic relations (e.g., Actor, Patient, Addressee). As
multi-level annotation including discourse and syntactic for the information structure, each node is ascribed a value of con-
                                                              textual boundness (contextually bound, contextually non-bound,
semantics (see above).                                        contrastively contextually bound). The nodes are ordered from
                                                                  the left to the right according to their so called communicative dy-
                                                                  namism, i.e. measure to which they contribute to the development
                                                                  of information flow in the sentence. The values of topic and focus
                                                                  can be derivated from these two features (contextual boundness
                                                                  and communicative dynamism.)
4. Methodology                                                           in the linguistic reasons why annotators ascribe different
                                                                         meanings to one coherence relation.3
In the present study, we search for general language fea-
tures of sentences (words, contexts) allowing for variable
readings of text structure. For this purpose, we collect                 5. Analysis
occurrences of inter-annotators’ disagreement in the lan-
guage corpora (see Table 1) and classify them manually,                  In our data, which includes the annotation of discourse re-
putting aside occurrences of disagreement resulting ob-                  lations, coreference, and information structure, we have
viously from other types of reasons (annotator’s mistake,                identified seven areas (factors) that repeatedly influence
technical solutions of the applied theory). We concen-                   different readings of textual coherence by annotators.
trate on the semantic and grammatical features of the
examined sentences and expressions.2                                     5.1. Synsemantic signals of coherence
   The results are compared and supplemented by a meta-                       relations: polysemy
analysis of reports on annotations of single corpora; un-
fortunately, due to space limitations, the annotation re-                Some words function primarily in the text as explicit
ports often describe reasons of inter-annotators’ disagree-              markers of coherence relations (discourse connectives
ment very shortly.                                                       for discourse relations, some pronouns for anaphoric re-
                                                                         lations). However, these words are often polysemous
                                                                         (polyfunctional) as lexical units: they can also be used in
4.1. Measuring inter-annotator                                           other, coherence-unrelated roles in the text. For example,
     disagreement on a text structure                                    conjunctions can have a connecting function in discourse
On the most general level, measuring inter-annotator                     relations, but they can also become particles and func-
agreement of textual phenomena concerns with two cri-                    tion as communication expressions without connecting
teria:                                                                   function (cf. Czech Já peníze nemám, ale CONJUNCTION můj
                                                                         bratr je má. I have no money, but CONJUNCTION my brother
(a) How often all the annotators found a certain phe-                    has. vs Ale PARTICLE prosím vás! Co to říkáte? But PARTICLE
nomenon (e.g., a discourse relation). E.g. one annota-                   please! What are you saying?).
tor may ignore a case which should be marked whereas                        Similarly, in coreferential relations, e.g. the word it
the other one does not. This would be a case of a                        can perform a pronominal function and be part of a coref-
disagreement on the existence of the phenomenon.                         erential chain (She played great. I really liked it.), but
(Dis)agreement on the existence is usually measured with                 it can also function as a grammatical word without any
the F1 measure (a harmonic average of precision and re-                  reference (The weather is fine. It is not raining anymore.).
call).                                                                   The presence of such synsemantic expressions in the
(b) Within the cases where all the annotators agree on                   text does not signal the presence of a coherence relation
the existence of a certain phenomenon, it is measured                    clearly; thus, recipients may disagree about the existence
how often annotators agree on the classification of the                  of a relation depending on their readings of the function
found phenomenon. If one annotator assigns a discourse                   of the polysemous word, as in the discourse annotation
relation the semantic type conjunction, whereas the other                example 1:
one sees it as gradation, it is a case of a disagreement                     (1)   Annotation 1: explicit discourse relation expressed
on the type of the phenomenon. (Dis)agreement on the                               by a discourse connective přece (because)
type is prototypically evaluated as a simple percentage                            <Arg1: Neptejte se mě, proč jsem přijel do Prahy.>
match or with the Cohen’s kappa measure.                                           <Arg2: Je to přece EXPLICATION normální sem přijet.>
  Both types of disagreement are relevant to our re-
search: we are looking for linguistic features that can                  3
                                                                             General information on measuring inter-annotator agreement can
cause one annotator not to recognize a certain type of                       be found in [19].
contiguity while another does. We are equally interested                     Many annotation projects adapt their measurement methods to
                                                                             more precisely suit the phenomena under investigation. E.g. in
                                                                             the case of discourse relations, the agreement on existence can be
2
    This method has its restrictions: it may be questionable how far         considered strictly as the case where both annotators agree on the
    we interpret the real reasons of inter-annotators’ disagreement          exact scope of both discourse arguments and assign it to a certain
    correctly: what we see as a variation based on a language feature,       discourse connective as an agreement on existence. For a looser
    could have be seen by an annotator just as his clear oversight. We       approach, which respects that the exact localization of arguments
    do not have annotators’ explanations for their solutions. These          can be difficult in some cases, the mere matching of a discourse
    questions are being solved by the present-day research by Anna           connective can be considered an agreement on existence. In this
    Nedoluzhko; for the time being, we find this method appropriate          case, it does not matter which words exactly the annotators mark
    for the present analysis as a pilot study.                               as parts of single discourse arguments [9].
        Don’t ask me why I came. Because EXPLICATION it’s                      za tím jen okouzlující charakter, neobyčejný kon-
        normal to come here.                                                   verzační um či ostře nabroušené tužky.
                                                                               (Dataset of the research reported in [7])
       Annotation 2: no explicit discourse relation, the
       word přece (after all) expresses the stance of the                      The interchangeability of these words in the given con-
       speaker                                                                 texts raises certain theoretical questions: for example,
       Neptejte se mě, proč jsem přijel do Prahy. Je to přece                  what level of text coherence is necessary for the recipi-
       normální sem přijet.                                                    ent? In the examples given, it seems sufficient to signal
       Don’t ask me why I came. After all, it’s normal to                      that the two arguments are connected by a discourse
       come here.                                                              relation. Which meaning type is specifically involved
       (according to [6, p. 63]; multiple annotation of the PDiT-EDA 1.0 [17]) seems to be irrelevant.
                                                                                  Both examples, (2) and (3) lead at the same time to an-
5.2. Synsemantic signals of coherence                                          other question, namely the nature of the semantic types
      relations: underspecification                                            of discourse relations. In the annotations, we differentiate
                                                                               the individual types very precisely; but in fact, contrastiv-
Other cases of disagreement are based on the semantic un-
                                                                               ity, like causality, can be scalar, gradual, can be located on
derspecification of words signaling coherence relations:
                                                                               the same axis with conjunction, and different recipients
in these cases, the annotators agree on the existence of a
                                                                               can only perceive different degrees of contrastivity or
certain relation, but they disagree on the assessment of
                                                                               causality. This property of discourse semantic types can
its meaning (disagreement on type). This disagreement
                                                                               be verified using psycholinguistic experiments.
is typical for discourse relations, signaled by discourse
connectors with a vague meaning, cf. (2):
                                                                       5.3. Autosemantic words in coherence
(2)    <Arg1: Za nabídku by se nemusel stydět ani Don
       Carleone – nebylo možné jí odolat.>
                                                                            relations: genericity and abstractness
       <Arg2: A tak CONJUNCTION / RESULT do roka a do dne              Based on the analysis of the data, we make the as-
       dostalo práci 440 shanonských občanů a do pěti let              sumption that autosemantic words with a concrete, non-
       jich bylo už desetkrát tolik.>                                  abstract meaning (cf. concrete to bake versus abstract to
                                                                       do) and expressions with a specific, not generic reference
       <Arg1: Not even Don Carleone would have to be
                                                                       (the boy vs. the youth as such) are generally more accessi-
       ashamed of that offer – it was impossible to resist.>
                                                                       ble and representable for the recipients. In this context,
       <Arg2: And so CONJUNCTION / RESULT 440 people of
                                                                       we observe that words with an abstract meaning or with
       Shannon got a job within a year and a day, and
                                                                       a generic reference can complicate the understanding
       within five years, they were already ten times as
                                                                       of the text coherence structure: in sentences with these
       many.>
                                                                       expressions, inter-annotator disagreement occurs more
        ([4]; multiple annotation of the PDT 2.0, [12])
                                                                       often.
Different understandings of underspecified discourse con-                 Regarding coreferential relations, Nedoluzhko [10, p.
junctions are also evident in the dataset reported in [7],             221] states that "The more nouns with abstract meaning
which contains the original English subtitles of TED talks             and expressions with generic reference in the text, the
and their equivalents in four languages. In the following              smaller the agreement." It is often difficult to estimate, for
document, the original English conjunction but (under-                 example, whether concepts of two abstract expressions
specified discourse connective with contrastive meaning)               fully overlap (and are therefore fully coreferential), or
is translated using the Czech a (and, underspecified dis-              one is a part of the other, or they are independent, cf. (4).
course connective with a simple conjunctive meaning).
                                                                        (4)     (context: interview with child psychiatrists who
(3)     English original:                                                       published the Czech book Children, Family and
        Today I want to talk to you about the mathematics                       Stress)
        of love. Now, I think that we can all agree that math-                 - Materiálům, které dnes máte k dispozici, předcházel
        ematicians are famously excellent at finding love.                      dlouholetý výzkum.
        But it’s not just because of our dashing personalities,                - Zdeněk Dytrych: Od roku 1969, kdy jsme založili v
        superior conversational skills and excellent pencil                     bývalém Výzkumném ústavu psychiatrickém Oddě-
        cases.                                                                  lení pro výzkum rodiny, se hlavně zabýváme touto
                                                                                problematikou.
        Czech translation:
                                                                                Měli jsme samozřejmě řadu spolupracovníků a za
        Dnes vám chci povědět něco o matematice lásky.
                                                                                pětadvacet let jsme v týmu udělali téměř nekonečnou
        Myslím, že se shodneme na tom, že matematici jsou
                                                                                řadu prací.
        v oblasti lásky proslulí svými schopnostmi. A nestojí
                                                                                Tak například rozsáhlý výzkum rozvodovosti.
          - The materials you have at your disposal today were                <Arg1: When observing the roofs of the Stern-
           preceded by a long-term research.                                  berg Palace it is possible to note a small, but dis-
          - Zdeněk Dytrych: Since 1969, when we founded                       tinctive difference between the approaches of
           the Department for Family Research in the former                   preservationists of late 80’s and now: COLON >
           Research Institute of Psychiatry, we have mainly                   SPECIFICATION <Arg2: while chimneys of the old Par-
           been dealing with this issue.                                      liament were demolished as functionless and only
           Of course, we had a number of collaborators, and in                a clear roof was retained, the KDM workers are or-
           twenty-five years we have done an almost endless                   dered not only to maintain chimneys of all the four
           amount of work as a team. [lit.: endless amount                    objects, but even to decorate them slightly, so that
           of works (plural) which can mean publications as                   the traditional local atmosphere of Lesser Town roofs
          well, ŠZ]                                                           does not eventually disappear.>
           For example, extensive research on the divorce                     ([5, p. 2004]; multiple annotation of the PDT 2.0[12])
           rate.
                                                                        In fact, this is a disagreement on which level the given
          ([10, p. 223–226]; multiple annotation of the PDT 3.0[13])
                                                                        phenomenon should be captured (in this case, coreference
In example (4), the question is how the last sentence is                or discourse). It is rather an academic question how to
related to the previous text – what is the research on the              annotate these cases consistently. As for the recipients
divorce rate supposed to serve as an example of? One                    themselves, the difference in the annotation does not
annotator sees the phrase research on the divorce rate as               mean a difference in the understanding of the text, as the
an example of a series (amount) of works in the previous                language levels and perspectives are inter-related and
sentence, while the other one sees it as an example of                  the annotators can ascribe single phenomena to different
the long-term research in the first sentence. Is a series               levels without understanding the text coherence in a
(amount) of works (publications?) the same as research? Or              different way.
are the works (publications) only the result of research, i.e.
one part of it? Similar contradictions are quite common 5.4. Attribution: verbs of thinking and
in the understanding of the coreference of generic and
abstract terms.
                                                                      saying
                                                                Attribution is the relation between the (named) author
    Also in the annotation of discourse relations, words of a section of text and his speech. A typical component
with an abstract, non-specific meaning result in the inter- in the attribution construction is the author’s name, the
annotators’ disagreement [5]. This is the case of sen- verb of thinking or speaking or another form expressing
tences including verbs with an abstract, general meaning. speech (colon, phrases such as according to) and the direct
As the authors say, “The disagreement occurs when it / indirect speech itself (dictum). A language has means
is not clear whether the potential discourse connective how to distinguish the author’s speech from the reported
refers to the whole sentence as an independent abstract speech. Nevertheless, with attributive constructions it
object (discourse argument), or just to its complement, is often difficult to distinguish how far discourse rela-
typically a nominal phrase.” [5, p. 2003]. Thus, in ex- tions extend and what is the scope of their arguments,
ample (5), the disagreement between annotators shows especially when it comes to verbs of thinking and say-
that it is questionable whether the second part of the ing. In these cases, annotators often disagree in their
sentence (while chimneys. . . ) is related to the whole pre- interpretations, cf. examples (6) and (7).
vious clause including the verbs with abstract meaning
(it is possible to note a small, but distinctive difference be- (6) Annotation 1: the discourse connective ale (but)
tween. . . ), or just to the nominal phrase (a small, but             relates the second sentence to the whole previous
distinctive difference between. . . ).4                               sentence including the verb of thinking phrase
                                                                      vím, že (I know that).
 (5) <Arg1: Při prohlídce střech Šternberského paláce                 “<Arg1: Vím, že se nás Rusů bojíte, že nás nemáte
        si lze všimnout drobného, avšak charakteri-                   rádi, že námi trochu pohrdáte.> <Arg2: Ale
        stického rozdílu mezi přístupem památkářů                     OPPOSITION Rusko není jenom Žirinovskij, Rusko není
        koncem 80. let a nyní: COLON > SPECIFICATION <Arg2:           jenom vraždění v Čečensku.>”
        zatímco komíny staré sněmovny byly zbourány jako              “<Arg1: I know that you are afraid of us Russians,
        zbytečné a zůstala jen holá střecha, dělníci KDM              that you dislike us, that you despise us a little.>
        mají přikázáno komíny všech čtyř objektů nejen                <Arg2: But OPPOSITION Russia is not only Zhiri-
        ponechat, ale dokonce mírně přizdobit, aby tradiční           novsky, Russia is not only murdering in Chechnya.>”
        kolorit malostranských střech časem nezmizel.>
4
    According to the approach of the Prague Dependency Treebank 2.0,
                                                                              Annotation 2: the discourse connective ale (but)
    a colon is understood as an explicit discourse connective ([20]).
      relates the second sentence to the content of the        an important role in ensuring the coherence of the text
      thought only, without the governing verb of think-       and can also become subject to different interpretations.
      ing.                                                        In Czech, similarly as in other Slavic languages, the
      “Vím, že <Arg1: se nás Rusů bojíte, že nás nemáte        word order is relatively free, with few grammatical re-
      rádi, že námi trochu pohrdáte.> <Arg2: Ale               strictions. It is used to express information structure of a
      OPPOSITION Rusko není jenom Žirinovskij, Rusko není      sentence: the information belonging to the topic is pro-
      jenom vraždění v Čečensku.>”                             totypically placed in the sentence to the left, the focus is
      “I know that <Arg1: you are afraid of us Russians,       usually located to the right. However, it is also possible
      that you dislike us, that you despise us a little.>      to use a marked word order, when the topic and focus oc-
      <Arg2: But OPPOSITION Russia is not only Zhiri-          cupy various places in the sentence and are distinguished
      novsky, Russia is not only murdering in Chechnya.>”      by intonation, the use of focalizers, or deduced from the
        ([9, p. 777]; multiple annotation of the PDT 2.0 [12]) context. This freedom in the formal expression of infor-
                                                               mation structure results in some cases in inter-annotator
 (7) Annotation 1: the discourse connective tudíž
                                                               disagreement. Often, annotators interpret differently in-
        (therefore) relates the second sentence to the whole
                                                               formation structure of the left part of a sentence: some
        previous sentence including the governing verb
                                                               tend to consider it less important, disregarding the used
        of saying phrase trvají památkáři (preservationists
                                                               expressions, because it is prototypically a topic position;
        insist); the relation of reason is broader.
                                                               others are more driven by context and other indicators
       <Arg1: Na tom, aby ve Šternberku ani v
                                                               of possible focus.
        paláci Smiřických nevznikaly žádné příčky,
                                                                  This variability applies especially to adverbials lo-
        trvají památkáři.> <Arg2: Poslancům tudíž
                                                               cated before the verb in the surface word order, focalized
        REASON nebude dopřáno žádné velké soukromí.>
                                                               phrases and predicate verbs in the left part of the sentence
       <Arg1: Preservationists insist that no partition
                                                               [11]. The example (8) presents an ambiguous interpre-
        walls will be built up neither in the Sternberg Palace
                                                               tation of the conditional phrase at the beginning of the
        nor in the Smiřický Palace.> <Arg2: Therefore,
                                                               sentence; one of the annotators considers it to be a part
        REASON MP’s will not enjoy great privacy.>
                                                               of the very message of the sentence, the other as a mere
       Annotation 2: the discourse connective tudíž unimportant circumstance. Thus, both perceive the given
        (therefore) relates the second sentence to the con- sentence as a response to a different (unspoken) context,
        tent of the saying only (dictum), the Arg1 is as shown by the contextual questions at the end of each
        smaller; the meaning of the whole causal relation interpretation. (The expressions in topic are underlined;
        is different.                                          the focus is marked with bold characters.)
       <Arg1: Na tom, aby ve Šternberku ani v
                                                                (8) (Context: Po ekonomech, kteří nyní už opouštějí
        paláci Smiřických nevznikaly žádné příčky,>
                                                                       školu se znalostí pravidel hry v tržním prostředí, je
        trvají památkáři. <Arg2: Poslancům tudíž REASON
                                                                       hlad. Co hodláte udělat, aby jich bylo dost?
        nebude dopřáno žádné velké soukromí.>
                                                                       The economists are now requested who leave the
        ([5, p. 2005]; multiple annotation of the PDT 2.0[12])
                                                                       school with a knowledge of the life in the market en-
In general, attribution is one of the ways of text arrange-            vironment. How do you intend to provide a sufficient
ment, in addition to e.g. parentheses, meta-comments                   number of them?)
on the communication etc. All of these ways represent a
                                                                       Annotation 1:
digression from the baseline of a simple main narrative
                                                                       [Při využití všech výukových prostor od rána
with a single narrator. As such, they can be a source
                                                                       až do večera] 0-subject jsme schopni ročně při-
of different interpretations of the text: people can differ
                                                                       jmout ke studiu okolo 2500 studentů.
in what they regard as author’s speech and what as re-
                                                                       Lit.: [When using all classrooms from morn-
ported speech, what as part of the main line and what as
                                                                       ing till evening] we_are able a_year to_accept
a parenthesis, etc. (see subsection 5.6 below).
                                                                       to_studies about 2500 students.
                                                                       [When using all our classrooms during the whole
5.5. Word order                                                        day], we are able to accept about 2500 new students
                                                                       a year.
So far, we have observed cases of disagreement between
                                                                       (How is your present-day situation?)
annotators, which result from the lexical properties of
expressions ensuring coherence (underspecification vs.                 Annotation 2:
specificity, abstractness vs. concreteness) and from the               [Při využití všech výukových prostor od rána až do
syntactic structure (governing verb of saying/thinking                 večera] jsme schopni ročně přijmout ke studiu
vs. dictum itself). Word order is another area that plays              okolo 2500 studentů.
       (How will your situation be if you take full advan-            Nejvíc [kritizují a rozčilují se] neschopní.
       tage of your present-day capacities?)                          Lit.: Most [criticize and get_angry] incompe-
       ([11]; control multiple annotation of the PDT 2.0, [12])       tent.
                                                                      Incompetent employees criticize and get angry most
In example (9), there is a collision between two indicators
                                                                      of all.
of importance (belonging to the topic / focus): the ob-
                                                                      (What happens?)
served phrase is located at the beginning of the sentence,
a place typical for the topic; but at the same time it is             Annotation 2:
emphasized by the focalizer. Annotators perceive its role             Nejvíc [kritizují a rozčilují se] neschopní.
in the information structure of the sentence differently.             (Who criticizes and gets angry most of all?)
                                                                      ([11]; control multiple annotation of the PDT 2.0 [12])
(9)    (Context: Oskar... Firmě Ilja Běhal a spol., zajišťující
       umělecko-kovářské a restaurátorské práce hlavně 5.6. Core of the message: subjective
       na střední Moravě.                                               perception of the relative importance
       The Oscar prize. . . for the firm Ilja Běhal & Co.
       which deals with smith craft and conservatory works At this point, we allow ourselves a small digression in-
       mainly in central Moravia.)                              spired by the information structure. In many kinds of
                                                                coherence annotations, we see that annotators differ in
       Annotation 1:                                            what they consider to be important, central, at a given
       [Zejména FOCALIZER v Olomouci] firma svými place in the text.
       výrobky přispívá ke zvýraznění koloritu his-                As the previous subsection showed, the variety of un-
       torického jádra města.                                   derstanding of coherence relations often comes from cer-
       Lit.: [Especially FOCALIZER in Olomouc] firm tain linguistic forms (specific word order pattern, etc.).
       with_its products helps accentuation However, the language itself often does not provide a
       of_colouring of_historical centre of_city.               clue: we cannot tell which phrase or syntactic construc-
       [Especially in Olomouc], the firm helps to accentuate tion was vague enough to allow for multiple readings.
       the colouring of the historical centre of the city with The diversity here comes from the different experience
       its products.                                            of the recipients, from their expectations and knowledge
       (What does the firm do? What can we say about of the world. This type of inter-annotator disagreement
       the firm?)                                               is difficult for linguistics to grasp. Nevertheless, since we
       Annotation 2:                                            can document it well in our data, we take the liberty of
       [Zejména v Olomouci] firma svými výrobky přis- presenting a few of these phenomena here, which can
       pívá ke zvýraznění koloritu historického jádra serve as inspiration for e.g. psycholinguistic research.
       města.                                                      At the local level, subjectivity can be seen in the per-
       (What does the firm do especially in Olomouc?)           ception of importance in the information structure (cf.
       ([11]; control multiple annotation of the PDT 2.0 [12])  [21]), i.e. what people see as a topic / focus of a sen-
                                                                tence. Furthermore, this variation is found in discourse
In example (10), a striking feature of verbs can be seen: relations in Rhetorical Structure Theory, which differen-
expressions dependent on the verbs often tend to be com- tiates between a more substantial and a less substantial
municatively more important than the verbs themselves. arguments of a discourse relations (nucleus and satel-
This can make the role of predicate verbs in the informa- lite, respectively; cf. [8]). See the following example (11)
tion structure unclear: annotators do not agree whether where adjacent sentences have the same syntactic struc-
to classify them as focus or as topic. We have already ture connected by the phrase not only – but also. One of
observed the unclear importance of verbs with respect the annotators considers both parts of these sentences to
to dependent parts in examples (5, unclear role of a verb have the same level of importance and marks a multinu-
with general meaning in a discourse structure) and (6-7, clear relation of contrast between them. The other one
unclear role of a verb of thinking/saying in a discourse understands the second parts (starting with but also) as
structure, compared to the clear role of dictum).               emphasized, more important, marking thus the relation
(10) (Context:                                                  as antithesis with the nucleus in the second part.
       - Nářky lidí známe ze svého nejbližšího okolí. Jejich (11) <Arg1: Jan Kotík nemaluje jen očima a rukou,>
       frekvence spíš vzrůstá, než aby se tenčila. Proč?
                                                                         CONTRAST / ANTITHESIS <Arg2: ale také mozkem.>
       - We know these complaints from our nearest vicinity.            <Arg1: Jeho obrazy tedy vyžadují nejen citlivost
       Their frequency is getting rather higher than lower.              a vnímavost,> CONTRAST / ANTITHESIS <Arg2: ale také
       Why?)                                                             přemýšlení.>
       Annotation 1:                                                  <Arg1: Jan Kotík paints not only with his eyes and
       hands,> CONTRAST / ANTITHESIS <Arg2: but also with        cerned with the features given by the language itself; we
       his brain.>                                               only marginally stopped at cases of disagreement that
       <Arg1: Therefore his paintings require not only sen-      result from the difference of speakers. We have also for-
       sitivity and receptivity,> CONTRAST / ANTITHESIS <Arg2:   mulated some questions that can be the subject of further
       but also thinking.>                                       research.
       (Czech RST Discourse Treebank 1.0 [3])                       Coherence relations can be divided into formally ex-
                                                                 pressed (e.g. in the discourse structure relations ex-
At the global level, in the annotations according to Rhetor-
                                                                 pressed by an explicit discourse connective or an informa-
ical Structure Theory, the perceptual importance of indi-
                                                                 tion structure expressed by word order) and unexpressed
vidual parts of news reports differs, too. Typically, while
                                                                 relations that are understood from the context (e.g. coref-
one annotator understands the introductory part as a cen-
                                                                 erence relation between the words text and chapter in a
tral message to which details are added in the following
                                                                 specific text).
text, the other perceives the same part as a preparation
                                                                    In formally unexpressed relations, disagreement oc-
to which the own message is associated afterwards. ([8]).
                                                                 curs naturally: it depends on the recipients what they
                                                                 infer from the context. Formally expressed relations can
5.7. Text dimensions                                             be also interpreted differently. There may be disagree-
                                                                 ment on the very existence of a coherence relation; this
Inter-annotator agreement can also be affected by text
                                                                 disagreement is usually based on the polysemy (poly-
dimensions. As coreference research shows, the larger
                                                                 functionality) of the linguistic form (expression), which
the network of possible antecedents for a given word in
                                                                 in some contexts functions as a signal of coherence, but
a text, the greater the disagreement between annotators
                                                                 not in others. In addition, coherence signals can also
([10, p. 221]; cf. the opportunities for disagreement in
                                                                 lead to a different perception of the semantic type of a
example 4). The author further states that divergent
                                                                 discourse relation (in cases where speakers agree on its
interpretations of coreference can also be chained: if
                                                                 existence): this is caused by the semantic underspecifica-
annotators differ in the interpretation of expressions at
                                                                 tion of language forms that express coherence (discourse
the beginnings of the coreference chain, their different
                                                                 connectives). The general question arises whether, as
interpretations can be reflected in other expressions with
                                                                 recipients, we need to understand textual coherence in
a similar meaning in the text.
                                                                 detail in all contexts, i.e. distinguish not only the simple
   It is a question of how the size of the text affects the
                                                                 existence of coherence relations, but also their semantic
variability of understanding in other coherence relations,
                                                                 coloring. What level actually represents a functional and
such as discourse relations and information structure.
                                                                 sufficient understanding of the text?
We have not yet conducted research in this direction.
                                                                    Lexical specificity plays an important role in the under-
For discourse relations, there can theoretically be more
                                                                 standing of autosemantic words, too; these expressions
potential arguments in a large text that are connected
                                                                 do not function primarily as signals of coherence. Coref-
by a discourse connective. If the text is longer, it will
                                                                 erence research shows that for abstract and generic nom-
probably also be more layered in terms of author’s and
                                                                 inal phrases in a text, recipients determine with difficulty
reported speech, metacommunication, insertions, etc.,
                                                                 whether the words have the same content; in contrast,
which again offers more possibilities for different under-
                                                                 for words with a concrete, specific meaning, coreference
standings of discourse and other relation. On the other
                                                                 is easier to determine. The same applies to the semantic
hand, a longer text can more accurately describe the con-
                                                                 concreteness of verbs: for verbs with more vague, gen-
text in which the discourse relations are interpreted, and
                                                                 eral meanings, it is difficult for annotators to determine
thus contribute to the clarity of understanding. In this
                                                                 whether or not they are part of discourse arguments.
regard, another question arises: whether there is a differ-
                                                                 Their meaning seems to be too insignificant, whereas the
ence in the variability in the understanding of coherence
                                                                 content of their dependent words is more important.
relations at the beginning of the text (where the text is
                                                                    This observation also applies to the verbs of thinking
still short, there are few potential members of different
                                                                 and saying in the relation of attribution, where the con-
relations available, but also little context) and in its later
                                                                 tent of reported speech seems to be communicatively
parts.
                                                                 more essential than the act of communication itself. In
                                                                 the case of attribution, there is another reason for the
6. Conclusion                                                    diverse interpretation of the text: it represents one of
                                                                 the forms of text arrangement (alongside parentheses,
In this study, we observed what common features the oc-          meta-comments on the communication, etc.), i.e. a com-
currences of inter-annotator disagreement have in coher-         plication in the simple basic line of the narrative. It thus
ence relations, specifically in discourse relations, coref-      provides the possibility for different recipients to inter-
erence and information structure. We were mainly con-            pret the overall structure of the text differently.
   In addition to individual words, such as various co-        References
herence operators or autosemantic expressions, word
order can also cause a disagreement in text understand-        [1] B. Plank, The “problem” of human label variation:
ing. Specifically, in Czech and other Slavic languages,            On ground truth in data, modeling and evaluation,
word order affects the understanding of the information            in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Pro-
structure. If expressions with higher communicative dy-            ceedings of the 2022 Conference on Empirical Meth-
namism (informativeness) appear in the left, topical part          ods in Natural Language Processing, Association
of the sentence, which has a prototypically low com-               for Computational Linguistics, Abu Dhabi, United
municative dynamism, typical contradictions in their               Arab Emirates, 2022, pp. 10671–10682. URL: https:
evaluation occur.                                                  //aclanthology.org/2022.emnlp-main.731. doi:10.
   In many types of annotation, it turns out that anno-            18653/v1/2022.emnlp-main.731.
tators perceive the importance of individual parts of          [2] R. Prasad, B. Webber, A. Lee, A. Joshi, Penn Dis-
the text and their (hierarchical) connections differently.         course Treebank Version 3.0, 2019. URL: https:
These disagreements are often not so much caused by the            //hdl.handle.net/11272.1/AB2/SUU9CB. doi:11272.
special properties of the text as by differences between           1/AB2/SUU9CB.
the annotators (specifically, it may be knowledge of the       [3] L. Poláková, Š. Zikánová, J. Mírovský, E. Hajičová,
language, knowledge of the world, expectations, expe-              Czech RST Discourse Treebank 1.0, 2023.
rience with different text genres, etc.). This area seems      [4] P. Jínová, J. Mírovský, L. Poláková, Analyzing the
particularly suitable for future psycholinguistic research         most common errors in the discourse annotation
focusing on specific domains of coherence. Here, for               of the Prague Dependency Treebank, in: I. Hen-
example, it is possible to examine the influence of respon-        drickx, S. Kübler, K. Simov (Eds.), Proceedings of the
dents’ literacy on the understanding of coreference in             11th International Workshop on Treebanks and Lin-
abstract words or the process how children learn the text          guistic Theories, Universidade de Lisboa, Edicoes
arrangement.                                                       Colibri, Lisboa, Lisboa, Portugal, 2012, pp. 127–132.
   The last factor we dealt with is text dimensions. Its       [5] Š. Zikánová, L. Mladová, J. Mírovský, P. Jínová, Typ-
effect on different readings was described in coreference          ical cases of annotators’ disagreement in discourse
(the longer the text, the greater the disagreement in in-          annotations in Prague Dependency Treebank, in:
terpretation). For other coherence relations, this factor is       Proceedings of the 7th International Conference on
still unexplored. We hypothesized that for discourse re-           Language Resources and Evaluation (LREC 2010),
lations and information structure, text dimensions could           European Language Resources Association, Val-
influence the degree of disagreement in both directions;           letta, Malta, 2010, pp. 2002–2006.
the degree of disagreement may also vary by place in the       [6] Š. Zikánová, Implicitní diskurzní vztahy v češtině
text and amount of preceding context (early vs. later in           [Implicit Discourse Relations in Czech], Charles
the text). These ideas suggest possible directions for fur-        University, Faculty of Mathematics and Physics,
ther research on different text comprehension coherence.           Prague, Czech Republic, 2021.
                                                               [7] L. Crible, Á. Abuczki, N. Burkšaitieṅe, P. Furkó,
                                                                   A. Nedoluzhko, G. Oleskeviciene, S. Rackevičieṅe,
Acknowledgments                                                    Š. Zikánová, Functions and translations of under-
                                                                   specified discourse markers in TED talks: a parallel
The research reported in this paper was supported by               corpus study on five languages, Journal of Prag-
the Czech Science Foundation (project no. 24-11132S,               matics (2019) 139–155.
Disagreement in Corpus Annotation and Variation in             [8] L. Poláková, J. Mírovský, Š. Zikánová, E. Hajičová,
Human Understanding of Text); a part of the used data              Developing a Rhetorical Structure Theory Treebank
comes from the project no. LM2018101 by the Czech Min-             for Czech, in: N. Calzolari, M.-Y. Kan, V. Hoste,
istry of Education, Youth and Sports (Digital Research             A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the
Infrastructure for Language Technologies, Arts and Hu-             2024 Joint International Conference on Computa-
manities).                                                         tional Linguistics, Language Resources and Evalu-
   The author would like to express her gratitude to Prof.         ation (LREC-COLING 2024), European Language
E. Hajičová for careful proofreading of the manuscript,            Resources Association, Torino, Italy, 2024, pp. 4802–
dr. J. Mírovský for help with the technical processing             4810.
of the text and F. Zikánová for the language examples.         [9] J. Mírovský, L. Mladová, Š. Zikánová, Connective-
Thank you all for the pleasant cooperation.                        based measuring of the inter-annotator agreement
                                                                   in the annotation of discourse in PDT, in: C.-R.
                                                                   Huang, D. Jurafsky (Eds.), Proceedings of the 23rd
                                                                   International Conference on Computational Lin-
     guistics (Coling 2010), volume 1, Chinese Informa-
     tion Processing Society of China, Tsinghua Univer-
     sity Press, Beijing, China, 2010, pp. 775–781.
[10] A. Nedoluzhko, Rozšířená textová koreference a
     asociační anafora (Koncepce anotace českých dat
     v Pražském závislostním korpusu) [Extended nom-
     inal coreference and bridging anaphora (An ap-
     proach to annotation of Czech data in the Prague
     Dependency Treebank)], Studies in Computational
     and Theoretical Linguistics, Ústav formální a ap-
     likované lingvistiky, Praha, Česká republika, 2011.
[11] Š. Zikánová, M. Týnovský, Identification of Topic
     and Focus in Czech: Comparative Evaluation on
     Prague Dependency Treebank, in: G. Zybatow,
     U. Junghanns, D. Lenertová, P. Biskup (Eds.), Stud-
     ies in Formal Slavic Phonology, Morphology, Syn-
     tax, Semantics and Information Structure. Formal
     Description of Slavic Languages 7, Universität
     Leipzig, Peter Lang, Frankfurt am Main, Germany,
     2009, pp. 343–353.
[12] J. Hajič, J. Panevová, E. Hajičová, P. Sgall, P. Pajas,
     J. Štěpánek, J. Havelka, M. Mikulová, Z. Žabokrt-
     ský, M. Ševčíková-Razímová, Z. Urešová, Prague
     Dependency Treebank 2.0, 2006.
[13] E. Bejček, E. Hajičová, J. Hajič, P. Jínová, V. Ket-
     tnerová, V. Kolářová, M. Mikulová, J. Mírovský,
     A. Nedoluzhko, J. Panevová, L. Poláková,
     M. Ševčíková, J. Štěpánek, Š. Zikánová, Prague
     Dependency Treebank 3.0, 2013.
[14] R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki,
     L. Robaldo, A. Joshi, B. Webber, The Penn Discourse
     TreeBank 2.0, in: Proceedings, 6th International
     Conference on Language Resources and Evaluation,
     Marrakech, Morocco, 2008, pp. 2961–2968.
[15] B. L. Webber, A. K. Joshi, Anchoring a Lexical-
     ized Tree-Adjoining Grammar for discourse, in:
     Discourse Relations and Discourse Markers, 1998.
     URL: https://aclanthology.org/W98-0315.
[16] P. Sgall, E. Hajicová, J. Panevová, The Meaning of
     the Sentence in its Semantic and Pragmatic Aspects,
     Springer Science & Business Media, 1986.
[17] Š. Zikánová, P. Synková, J. Mírovský, Enriched dis-
     course annotation of PDiT subset 1.0 (PDiT-EDA
     1.0), 2018.
[18] M. Stede, M. Taboada, D. Das, Annotation Guide-
     lines for Rhetorical Structure. (Manuscript)., 2017.
[19] R. Artstein, Inter-annotator agreement, Handbook
     of linguistic annotation (2017) 297–313.
[20] L. Poláková, Discourse Relations in Czech, Ph.D.
     thesis, Faculty of Mathematics and Physics, Charles
     University in Prague, Prague, Czech Republic, 2015.
[21] Š. Zikánová, M. Týnovský, J. Havelka, Identifica-
     tion of Topic and Focus in Czech: Evaluation of
     Manual Parallel Annotations, The Prague Bulletin
     of Mathematical Linguistics (2007) 61–70.