<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Structure and Its Ambiguities: Corpus Annotation as a Helpful Guide</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Šárka Zikánová</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University, Faculty of Mathematics and Physics</institution>
          ,
          <addr-line>Malostranské nám. 25, 118 00 Prague 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>It is typical for natural languages that their texts can be understood diferently by individual recipients. A number of scientific disciplines, from cognitive psychology to linguistics, are devoted to this phenomenon. In this study, we focus mainly on linguistic factors, which may lead to diferent interpretations of coherence relations in the text (simply speaking, what is related to what and how). This work presents a pilot typological survey of disagreements in Czech corpus annotations of coherence relations (discourse relations, coreference, information structure) and their common features. Polysemy (polyfunctionality) and semantic underspecification of coherent expressions (e.g. discourse connectives), generic / abstract meaning of autosemantic words, presence of attribution constructions, word order as a potential marker of information structure and text size appear to be essential factors for disagreement in interpretation. In addition, subjective reception of the relative importance of diferent text parts plays an important role, too. Based on the observation of the material, we raise questions and propose possible steps for the ongoing research of variability in the perception of text coherence.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;inter-annoator agreement</kwd>
        <kwd>human label variation</kwd>
        <kwd>discourse relations</kwd>
        <kwd>coreference</kwd>
        <kwd>information structure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>unfamiliarity with the annotation scenario. That is the
reason why these data are often re-annotated later. To
The availability of digital language resources enables an prevent these kinds of inconsistent analysis of the data,
important step forward in linguistic research, both for annotators usually attend frequent trainings;
simultaneits theoretical as well as applicational orientation. The ously, their feedback at the beginning of the annotation
originally collected data serving mostly for the study of may improve annotation scenario and point out some
the lexical studies and those of the study of syntax proper problematic points in the underlying theory. Before
regave an impulse to enrich them by various more sophis- leasing data, annotators’ mistakes are searched for and
ticated annotation systems dealing with most diferent corrected, e.g. a simple overseeing of phenomena that
phenomena, going beyond the sentence boundary and should be marked; nevertheless, some of the mistakes can
incl. e.g. text coherence and phenomena related to infer- remain even in the final data. Last, but not least source of
encing, and elaborating more levels of granularity in the the disagreement in the annotation is language vagueness,
annotation. The annotated data serve for diferent tasks polysemy and homonymy: in some cases, a language itself
in the computational processing of natural languages – as allows for several understandings of a sentence.
training and testing data for the development of language Computational linguistics ofers several
methodologimodels. cal approaches to this variability of the data annotation.</p>
      <p>Human data annotation is a process based on interpre- One of the solutions is unification: a gold standard is set,
tation of observed phenomena and thus may lead to difer- e.g. by majority voting or by a third judge.
ent outcomes. This variation is caused by various factors. Another, more demanding way of data unification is a
Some of them are connected with the shortcomings of the joint annotation, when annotators mark the data together,
annotation scenario (e.g., not providing instructions for discussing each single case and marking the result of their
the solution of some cases) or with the leaks of the under- discussion only.
lying theory (e.g., non-intuitive solutions or discerning In order to accept and capture the uncertainty
annotatoo fine categories, very close to each other). Other cases tors can face while marking language phenomena, some
of inter-annotator disagreement are connected with the annotation scenarios with hierarchical classifications
allearning process of annotators: especially the first anno- low the use of more general levels of the classifications,
tated batches of data may be influenced by the annotators’ not discerning the finest classification diferences in
dubious cases. Another way how to mark the annotators’
certainty is a separate marking of their confidence as a
specific feature (e.g., (a) a discourse relation is marked as
a conjunction and (b) the annotator was absolutely sure
Conference ITAT (Information Technologies—Applications and Theory),
2024: Drienica, Čergovské vrchy, Slovakia
$ zikanova@ufal.mf.cuni.cz ( Zikánová)
 https://ufal.mf.cuni.cz/sarka-zikanova ( Zikánová)
0000-0002-7805-9649 ( Zikánová)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License about his solution). It is necessary to say that annotator’s
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) high certainty does not necessarily mean that his solution
In our research, we deal with the annotation variation
from a diferent perspective, from the linguistic and
psycholinguistic point of view, with focusing on human
language understanding. We use data with variations as a
source of phenomena that are regularly understood in
diferent ways and we search for possible common
features of diferent readings. We pay special attention to
the cues that are inherent to a language, rather than to
the diversity among humans receiving the texts.</p>
      <p>Questions of human language understanding have
been addressed on a theoretical level, e.g. in
psycholinguistics or lexical and syntactic semantics. In our study,
we want to take use of our practical long term
experience with large amounts of language data and possibly
to ofer some new insights into the variation of language
interpretation or to contribute to theoretical discussions
with practical findings.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Data: Text Coherence</title>
    </sec>
    <sec id="sec-3">
      <title>Annotation</title>
      <p>is the only possible one; in some cases, another annotator 3.1. Discourse relations
can be equally convinced about a diferent reading.</p>
      <p>
        Unification is not the only way how to handle the data. Discourse relations connect so called discourse
arguSome researchers argue that unification may result in ments (clauses, sentences or larger text segments) and
biased data missing important information about variabil- express certain semantic relation between the arguments.
ity of language understanding [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Consequently, biased They are prototypically expressed by discourse
conneclanguage models are developed based on this data. There- tives (conjunctions, subjunctions, discourse adverbs etc.),
fore, annotators are allowed to mark multiple description but they may be formally unexpressed, either. The
forof the same phenomenon in some approaches, (e.g., in the mer type of relations is called explicit discourse relations,
Penn Discourse Treebank 3.0 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a single discourse rela- the latter relations are implicit.
tion can be marked as an instantiation and cause at the &lt;Arg1: She enjoyed working in the ofice &gt; &lt;Arg2:
besame time, if the annotator understands it in this way). causeREASON she had pretty flowers there. &gt;
Other annotation projects publish their data with partial In our data, we work with the data from the following
or complete multiple annotations carried out by diferent discourse corpora:
annotators; in such data, personal solutions of similar (a) Prague Dependency Treebank 2.0 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and 3.0
language phenomena can be observed systematically (cf. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The annotation scenario of the Prague Dependency
Czech RST Discourse Treebank, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Treebank was motivated by the approach of the Penn
Discourse Treebank ([
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], following the Lexical
TreeAdjoining Grammar [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]) and is based on the Functional
2. Aim of the study Generative Description [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] as applied in the family of
Prague Dependency Treebanks. It discerns 23 semantic
types of discourse relations, such as conjunction,
disjunction, concession, generalization etc.; the discourse
connectives are marked explicitly. The annotation is carried
out on so called tectogrammatic (syntactico-semantic)
dependency trees which allows the discourse annotation
to be related to syntactico-semantic level of a language.
      </p>
      <p>The data in the corpus are in Czech.</p>
      <sec id="sec-3-1">
        <title>Multiple reading may result at many language levels and</title>
        <p>
          perspectives, such as lexical semantics (cf. polysemy of
the word bank as an institution and as a river bank),
morphology (homonymous singular and plural form, like (d) Czech RST Discourse Treebank 1.0 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The
annosheep or fish ), syntax (having an old friend for dinner) etc. tation scenario is based on the Rhetorical Text Structure
Our research is restricted to the area of text coherence Theory as applied in the Potsdam Commentary Corpus
in general. Specifically, our data cover multiple annota- [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. This theory assumes that text as a whole is built
tions of the following phenomena: discourse relations, from a smaller segments which are all interconnected by
coreference, and information structure (3.1–3.3). discourse relations, without any part being left aside. It
discerns 37 discourse relations (e.g., concession,
concession as nucleus, textual preparation). A specific feature
of RST is that it puts emphasis on diferent levels of
com(b) Enriched Discourse Annotation of Prague
Discourse Treebank Subset 1.0 (PDiT-EDA 1.0, [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] The
annotation scenario follows the approach of the Prague
Dependency Treebank; the annotation is enriched with
marking of implicit discourse relations.
(c) Data comparing underspecification of discourse
connectives in five languages (English, French,
Czech, Hungarian, Lithuanian) as published in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          The annotation scenario is based on the Crible’s
classification of discourse relations [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] discerning 15 discourse
relations (e.g., opening, addition, topic-shift). Unlike the
Praguian discourse approach, Crible’s classification takes
into account broader pragmatic aspects of discourse (so
called domains), explicitly discerning ideational,
rhetorical, sequential, and interpersonal domains where the
discourse relations are used.
        </p>
        <sec id="sec-3-1-1">
          <title>Language</title>
          <p>Amount of multiple annotations
44 documents, 2084 sentences; 2
annotators
12 documents, 233 sentences; 2
annotators
3 documents, 234 sentences, 4720
words in the original English; 1-2
annotators for each language
5 documents, 63 sentences, 2
annotators
2 annotators, the number of of
texts and sentences is not
presented</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>5 documents, 180 sentences, 2-3</title>
          <p>annotators
879 sentences annotated by 6
annotators, 9825 sentences
annotated by 3 annotators</p>
          <p>
            Reference
municative importance of discourse arguments, mark- 3.3. Information Structure
ing more important and less important parts (nucleus
and satellite, respectively) in every discourse relation. Information structure of a sentence expresses a
commuRelations with balanced importance of both parts are nicative importance of single parts of a sentence in a
described as multinuclear. given context. In general, it captures a topic (what the
sentence is about) and a focus of a sentence (what new
information is said about the topic), cf. (context: There is
3.2. Coreference a cat under the tree.) It TOPIC is ready for a jump FOCUS.
Coreferential relations connect expressions with the Our data about information structure come from an
same reference, such as The girl looked into her map, she experiment carried out on the data of the Prague
Delooked like she was enjoying the adventure. Madelein had pendency Treebank 2.0 [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] where information structure
a great sense of orientation. The arguments of coreferen- is marked on dependency trees on the tectogrammatic
tial relations are prototypically noun phrases (nouns, pro- (syntactico-semantic) level.1
nouns) including dropped phrases (While [she] walking
through the landscape, she admired the nature’s beauty.).
          </p>
          <p>A coreferential relation may also hold between a larger
text segment, such as a whole thought or paragraph and
a summarizing pronoun it / this etc.</p>
          <p>
            We use coreference data including disagreement in the
annotation coming from the Prague Dependency
Treebank 2.0 [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] and 3.0 [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] where coreference is a part of
multi-level annotation including discourse and syntactic
semantics (see above).
1According to the Functional Generative Approach [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], a
tectogrammatic tree consists of nodes which prototypically correspond to
autosemantic words; the nodes are connected by edges expressing
syntactico-semantic relations (e.g., Actor, Patient, Addressee). As
for the information structure, each node is ascribed a value of
contextual boundness (contextually bound, contextually non-bound,
contrastively contextually bound). The nodes are ordered from
the left to the right according to their so called communicative
dynamism, i.e. measure to which they contribute to the development
of information flow in the sentence. The values of topic and focus
can be derivated from these two features (contextual boundness
and communicative dynamism.)
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>in the linguistic reasons why annotators ascribe diferent
meanings to one coherence relation.3
In the present study, we search for general language
features of sentences (words, contexts) allowing for variable
readings of text structure. For this purpose, we collect 5. Analysis
occurrences of inter-annotators’ disagreement in the
language corpora (see Table 1) and classify them manually, In our data, which includes the annotation of discourse
reputting aside occurrences of disagreement resulting ob- lations, coreference, and information structure, we have
viously from other types of reasons (annotator’s mistake, identified seven areas (factors) that repeatedly influence
technical solutions of the applied theory). We concen- diferent readings of textual coherence by annotators.
trate on the semantic and grammatical features of the
examined sentences and expressions.2 5.1. Synsemantic signals of coherence</p>
      <p>The results are compared and supplemented by a meta- relations: polysemy
analysis of reports on annotations of single corpora;
unfortunately, due to space limitations, the annotation re- Some words function primarily in the text as explicit
ports often describe reasons of inter-annotators’ disagree- markers of coherence relations (discourse connectives
ment very shortly. for discourse relations, some pronouns for anaphoric
relations). However, these words are often polysemous
(polyfunctional) as lexical units: they can also be used in
4.1. Measuring inter-annotator other, coherence-unrelated roles in the text. For example,
disagreement on a text structure conjunctions can have a connecting function in discourse
On the most general level, measuring inter-annotator relations, but they can also become particles and
funcagreement of textual phenomena concerns with two cri- tion as communication expressions without connecting
teria: function (cf. Czech Já peníze nemám, aleCONJUNCTION můj
bratr je má. I have no money, butCONJUNCTION my brother
(a) How often all the annotators found a certain phe- has. vs AlePARTICLE prosím vás! Co to říkáte? ButPARTICLE
nomenon (e.g., a discourse relation). E.g. one annota- please! What are you saying?).
tor may ignore a case which should be marked whereas Similarly, in coreferential relations, e.g. the word it
the other one does not. This would be a case of a can perform a pronominal function and be part of a
corefdisagreement on the existence of the phenomenon. erential chain (She played great. I really liked it.), but
(Dis)agreement on the existence is usually measured with it can also function as a grammatical word without any
the F1 measure (a harmonic average of precision and re- reference (The weather is fine. It is not raining anymore.).
call). The presence of such synsemantic expressions in the
text does not signal the presence of a coherence relation
clearly; thus, recipients may disagree about the existence
of a relation depending on their readings of the function
of the polysemous word, as in the discourse annotation
example 1:
(b) Within the cases where all the annotators agree on
the existence of a certain phenomenon, it is measured
how often annotators agree on the classification of the
found phenomenon. If one annotator assigns a discourse
relation the semantic type conjunction, whereas the other
one sees it as gradation, it is a case of a disagreement
on the type of the phenomenon. (Dis)agreement on the
type is prototypically evaluated as a simple percentage
match or with the Cohen’s kappa measure.</p>
      <sec id="sec-4-1">
        <title>Annotation 1: explicit discourse relation expressed</title>
        <p>by a discourse connective přece (because)
&lt;Arg1: Neptejte se mě, proč jsem přijel do Prahy.&gt;
&lt;Arg2: Je to přece EXPLICATION normální sem přijet.&gt;
(1)</p>
        <p>Both types of disagreement are relevant to our
research: we are looking for linguistic features that can
cause one annotator not to recognize a certain type of
contiguity while another does. We are equally interested</p>
      </sec>
      <sec id="sec-4-2">
        <title>2This method has its restrictions: it may be questionable how far</title>
        <p>we interpret the real reasons of inter-annotators’ disagreement
correctly: what we see as a variation based on a language feature,
could have be seen by an annotator just as his clear oversight. We
do not have annotators’ explanations for their solutions. These
questions are being solved by the present-day research by Anna
Nedoluzhko; for the time being, we find this method appropriate
for the present analysis as a pilot study.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3General information on measuring inter-annotator agreement can</title>
        <p>
          be found in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          Many annotation projects adapt their measurement methods to
more precisely suit the phenomena under investigation. E.g. in
the case of discourse relations, the agreement on existence can be
considered strictly as the case where both annotators agree on the
exact scope of both discourse arguments and assign it to a certain
discourse connective as an agreement on existence. For a looser
approach, which respects that the exact localization of arguments
can be dificult in some cases, the mere matching of a discourse
connective can be considered an agreement on existence. In this
case, it does not matter which words exactly the annotators mark
as parts of single discourse arguments [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Don’t ask me why I came. Because EXPLICATION it’s
normal to come here.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Annotation 2: no explicit discourse relation, the</title>
        <p>word přece (after all) expresses the stance of the
speaker
Neptejte se mě, proč jsem přijel do Prahy. Je to přece
normální sem přijet.</p>
        <p>Don’t ask me why I came. After all , it’s normal to
come here.</p>
        <p>
          (according to [6, p. 63]; multiple annotation of the PDiT-EDA 1.0 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ])
5.2. Synsemantic signals of coherence
        </p>
        <p>relations: underspecification
Other cases of disagreement are based on the semantic
underspecification of words signaling coherence relations:
in these cases, the annotators agree on the existence of a
certain relation, but they disagree on the assessment of
its meaning (disagreement on type). This disagreement
is typical for discourse relations, signaled by discourse
connectors with a vague meaning, cf. (2):
za tím jen okouzlující charakter, neobyčejný
konverzační um či ostře nabroušené tužky.</p>
        <p>
          (Dataset of the research reported in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ])
The interchangeability of these words in the given
contexts raises certain theoretical questions: for example,
what level of text coherence is necessary for the
recipient? In the examples given, it seems suficient to signal
that the two arguments are connected by a discourse
relation. Which meaning type is specifically involved
seems to be irrelevant.
        </p>
        <p>Both examples, (2) and (3) lead at the same time to
another question, namely the nature of the semantic types
of discourse relations. In the annotations, we diferentiate
the individual types very precisely; but in fact,
contrastivity, like causality, can be scalar, gradual, can be located on
the same axis with conjunction, and diferent recipients
can only perceive diferent degrees of contrastivity or
causality. This property of discourse semantic types can
be verified using psycholinguistic experiments.
(2)
5.3. Autosemantic words in coherence
relations: genericity and abstractness
&lt;Arg1: Za nabídku by se nemusel stydět ani Don
Carleone – nebylo možné jí odolat.&gt;
&lt;Arg2: A tak CONJUNCTION / RESULT do roka a do dne
dostalo práci 440 shanonských občanů a do pěti let
jich bylo už desetkrát tolik.&gt;</p>
        <p>
          Based on the analysis of the data, we make the
assumption that autosemantic words with a concrete,
nonabstract meaning (cf. concrete to bake versus abstract to
&lt;Arg1: Not even Don Carleone would have to be do) and expressions with a specific, not generic reference
ashamed of that ofer – it was impossible to resist. &gt; (the boy vs. the youth as such) are generally more
accessible and representable for the recipients. In this context,
&lt;SAharngn2:onAgnodt asojoCbONwJUitNhCiTnIOaN /yReEaSUrLTan4d40a pdeaoyp,leanodf we observe that words with an abstract meaning or with
within five years, they were already ten times as a generic reference can complicate the understanding
many.&gt; of the text coherence structure: in sentences with these
expressions, inter-annotator disagreement occurs more
([
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]; multiple annotation of the PDT 2.0, [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]) often.
        </p>
        <p>
          Diferent understandings of underspecified discourse con- Regarding coreferential relations, Nedoluzhko [10, p.
junctions are also evident in the dataset reported in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], 221] states that "The more nouns with abstract meaning
which contains the original English subtitles of TED talks and expressions with generic reference in the text, the
and their equivalents in four languages. In the following smaller the agreement." It is often dificult to estimate, for
document, the original English conjunction but (under- example, whether concepts of two abstract expressions
specified discourse connective with contrastive meaning) fully overlap (and are therefore fully coreferential), or
is translated using the Czech a (and, underspecified dis- one is a part of the other, or they are independent, cf. (4).
course connective with a simple conjunctive meaning).
(4)
(3)
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>English original:</title>
        <p>Today I want to talk to you about the mathematics
of love. Now, I think that we can all agree that
mathematicians are famously excellent at finding love.
But it’s not just because of our dashing personalities,
superior conversational skills and excellent pencil
cases.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Czech translation:</title>
        <p>Dnes vám chci povědět něco o matematice lásky.
Myslím, že se shodneme na tom, že matematici jsou
v oblasti lásky proslulí svými schopnostmi. A nestojí
(context: interview with child psychiatrists who
published the Czech book Children, Family and
Stress)
- Materiálům, které dnes máte k dispozici, předcházel
dlouholetý výzkum.
- Zdeněk Dytrych: Od roku 1969, kdy jsme založili v
bývalém Výzkumném ústavu psychiatrickém
Oddělení pro výzkum rodiny, se hlavně zabýváme touto
problematikou.</p>
        <p>Měli jsme samozřejmě řadu spolupracovníků a za
pětadvacet let jsme v týmu udělali téměř nekonečnou
řadu prací.</p>
        <p>Tak například rozsáhlý výzkum rozvodovosti.
- The materials you have at your disposal today were
preceded by a long-term research.
- Zdeněk Dytrych: Since 1969, when we founded
the Department for Family Research in the former
Research Institute of Psychiatry, we have mainly
been dealing with this issue.</p>
        <p>Of course, we had a number of collaborators, and in
twenty-five years we have done an almost endless
amount of work as a team. [lit.: endless amount
of works (plural) which can mean publications as
well, ŠZ]
For example, extensive research on the divorce
rate.</p>
        <p>
          ([10, p. 223–226]; multiple annotation of the PDT 3.0[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ])
In example (4), the question is how the last sentence is
related to the previous text – what is the research on the
divorce rate supposed to serve as an example of? One
annotator sees the phrase research on the divorce rate as
an example of a series (amount) of works in the previous
sentence, while the other one sees it as an example of
the long-term research in the first sentence. Is a series
(amount) of works (publications?) the same as research? Or
are the works (publications) only the result of research, i.e.
one part of it? Similar contradictions are quite common
in the understanding of the coreference of generic and
abstract terms.
&lt;Arg1: When observing the roofs of the
Sternberg Palace it is possible to note a small, but
distinctive diference between the approaches of
preservationists of late 80’s and now: COLON&gt;
SPECIFICATION &lt;Arg2: while chimneys of the old
Parliament were demolished as functionless and only
a clear roof was retained, the KDM workers are
ordered not only to maintain chimneys of all the four
objects, but even to decorate them slightly, so that
the traditional local atmosphere of Lesser Town roofs
does not eventually disappear.&gt;
([5, p. 2004]; multiple annotation of the PDT 2.0[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ])
In fact, this is a disagreement on which level the given
phenomenon should be captured (in this case, coreference
or discourse). It is rather an academic question how to
annotate these cases consistently. As for the recipients
themselves, the diference in the annotation does not
mean a diference in the understanding of the text, as the
language levels and perspectives are inter-related and
the annotators can ascribe single phenomena to diferent
levels without understanding the text coherence in a
diferent way.
5.4. Attribution: verbs of thinking and
saying
        </p>
      </sec>
      <sec id="sec-4-7">
        <title>Attribution is the relation between the (named) author</title>
        <p>
          Also in the annotation of discourse relations, words of a section of text and his speech. A typical component
with an abstract, non-specific meaning result in the inter- in the attribution construction is the author’s name, the
annotators’ disagreement [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This is the case of sen- verb of thinking or speaking or another form expressing
tences including verbs with an abstract, general meaning. speech (colon, phrases such as according to) and the direct
As the authors say, “The disagreement occurs when it / indirect speech itself (dictum). A language has means
is not clear whether the potential discourse connective how to distinguish the author’s speech from the reported
refers to the whole sentence as an independent abstract speech. Nevertheless, with attributive constructions it
object (discourse argument), or just to its complement, is often dificult to distinguish how far discourse
relatypically a nominal phrase.” [5, p. 2003]. Thus, in ex- tions extend and what is the scope of their arguments,
ample (5), the disagreement between annotators shows especially when it comes to verbs of thinking and
saythat it is questionable whether the second part of the ing. In these cases, annotators often disagree in their
sentence (while chimneys. . . ) is related to the whole pre- interpretations, cf. examples (6) and (7).
vious clause including the verbs with abstract meaning
(it is possible to note a small, but distinctive diference be- (6)
tween. . . ), or just to the nominal phrase (a small, but
distinctive diference between. . . ).4
(5)
&lt;Arg1: Při prohlídce střech Šternberského paláce
si lze všimnout drobného, avšak
charakteristického rozdílu mezi přístupem památkářů
koncem 80. let a nyní: COLON&gt; SPECIFICATION &lt;Arg2:
zatímco komíny staré sněmovny byly zbourány jako
zbytečné a zůstala jen holá střecha, dělníci KDM
mají přikázáno komíny všech čtyř objektů nejen
ponechat, ale dokonce mírně přizdobit, aby tradiční
kolorit malostranských střech časem nezmizel.&gt;
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>4According to the approach of the Prague Dependency Treebank 2.0,</title>
        <p>
          a colon is understood as an explicit discourse connective ([
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]).
        </p>
      </sec>
      <sec id="sec-4-9">
        <title>Annotation 1: the discourse connective ale (but)</title>
        <p>
          relates the second sentence to the whole previous
sentence including the verb of thinking phrase
vím, že (I know that).
“&lt;Arg1: Vím, že se nás Rusů bojíte, že nás nemáte
rádi, že námi trochu pohrdáte.&gt; &lt;Arg2: Ale
OPPOSITION Rusko není jenom Žirinovskij, Rusko není
jenom vraždění v Čečensku.&gt;”
“&lt;Arg1: I know that you are afraid of us Russians,
that you dislike us, that you despise us a little.&gt;
&lt;Arg2: But OPPOSITION Russia is not only
Zhirinovsky, Russia is not only murdering in Chechnya.&gt;”
Annotation 2: the discourse connective ale (but)
(7)
relates the second sentence to the content of the an important role in ensuring the coherence of the text
thought only, without the governing verb of think- and can also become subject to diferent interpretations.
ing. In Czech, similarly as in other Slavic languages, the
“Vím, že &lt;Arg1: se nás Rusů bojíte, že nás nemáte word order is relatively free, with few grammatical
rerádi, že námi trochu pohrdáte.&gt; &lt;Arg2: Ale strictions. It is used to express information structure of a
OPPOSITION Rusko není jenom Žirinovskij, Rusko není sentence: the information belonging to the topic is
projenom vraždění v Čečensku.&gt;” totypically placed in the sentence to the left, the focus is
“I know that &lt;Arg1: you are afraid of us Russians, usually located to the right. However, it is also possible
that you dislike us, that you despise us a little.&gt; to use a marked word order, when the topic and focus
oc&lt;Arg2: But OPPOSITION Russia is not only Zhiri- cupy various places in the sentence and are distinguished
novsky, Russia is not only murdering in Chechnya.&gt;” by intonation, the use of focalizers, or deduced from the
([9, p. 777]; multiple annotation of the PDT 2.0 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]) context. This freedom in the formal expression of
information structure results in some cases in inter-annotator
Annotation 1: the discourse connective tudíž disagreement. Often, annotators interpret diferently
in(therefore) relates the second sentence to the whole formation structure of the left part of a sentence: some
previous sentence including the governing verb tend to consider it less important, disregarding the used
of saying phrase trvají památkáři (preservationists expressions, because it is prototypically a topic position;
insist); the relation of reason is broader. others are more driven by context and other indicators
&lt;Arg1: Na tom, aby ve Šternberku ani v of possible focus.
tpravlaájcíipaSmmáitřkiáckřiý.&gt;ch &lt;Anervgz2n: ikalPyoslažnácdůnmé ptřuíčdkíyž, catTehdibsevfoarreiatbhielivtyerbapinpltihese seusrpfeaccieawllyortdooarddevre,
rfboicaallsizleodREASON nebude dopřáno žádné velké soukromí.&gt; phrases and predicate verbs in the left part of the sentence
&lt;Arg1: Preservationists insist that no partition [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The example (8) presents an ambiguous
interprewalls will be built up neither in the Sternberg Palace tation of the conditional phrase at the beginning of the
nor in the Smiřický Palace.&gt; &lt;Arg2: Therefore, sentence; one of the annotators considers it to be a part
REASON MP’s will not enjoy great privacy.&gt; of the very message of the sentence, the other as a mere
unimportant circumstance. Thus, both perceive the given
sentence as a response to a diferent (unspoken) context,
as shown by the contextual questions at the end of each
interpretation. (The expressions in topic are underlined;
the focus is marked with bold characters.)
Annotation 2: the discourse connective tudíž
(therefore) relates the second sentence to the
content of the saying only (dictum), the Arg1 is
smaller; the meaning of the whole causal relation
is diferent.
&lt;Arg1: Na tom, aby ve Šternberku ani v
paláci Smiřických nevznikaly žádné příčky,&gt;
trvají památkáři. &lt;Arg2: Poslancům tudíž REASON
nebude dopřáno žádné velké soukromí.&gt;
([5, p. 2005]; multiple annotation of the PDT 2.0[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ])
(8)
In general, attribution is one of the ways of text
arrangement, in addition to e.g. parentheses, meta-comments
on the communication etc. All of these ways represent a
digression from the baseline of a simple main narrative
with a single narrator. As such, they can be a source
of diferent interpretations of the text: people can difer
in what they regard as author’s speech and what as
reported speech, what as part of the main line and what as
a parenthesis, etc. (see subsection 5.6 below).
5.5. Word order
So far, we have observed cases of disagreement between
annotators, which result from the lexical properties of
expressions ensuring coherence (underspecification vs.
specificity, abstractness vs. concreteness) and from the
syntactic structure (governing verb of saying/thinking
vs. dictum itself). Word order is another area that plays
(Context: Po ekonomech, kteří nyní už opouštějí
školu se znalostí pravidel hry v tržním prostředí, je
hlad. Co hodláte udělat, aby jich bylo dost?
The economists are now requested who leave the
school with a knowledge of the life in the market
environment. How do you intend to provide a suficient
number of them?)
        </p>
      </sec>
      <sec id="sec-4-10">
        <title>Annotation 1:</title>
        <p>[Při využití všech výukových prostor od rána
až do večera] 0-subject jsme schopni ročně
přijmout ke studiu okolo 2500 studentů.</p>
        <p>
          Lit.: [When using all classrooms from
morning till evening] we_are able a_year to_accept
to_studies about 2500 students.
[When using all our classrooms during the whole
day], we are able to accept about 2500 new students
a year.
(How is your present-day situation?)
Annotation 2:
[Při využití všech výukových prostor od rána až do
večera] jsme schopni ročně přijmout ke studiu
okolo 2500 studentů.
(How will your situation be if you take full
advantage of your present-day capacities?)
([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; control multiple annotation of the PDT 2.0, [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ])
In example (9), there is a collision between two indicators
of importance (belonging to the topic / focus): the
observed phrase is located at the beginning of the sentence,
a place typical for the topic; but at the same time it is
emphasized by the focalizer. Annotators perceive its role
in the information structure of the sentence diferently.
(9)
(Context: Oskar... Firmě Ilja Běhal a spol., zajišťující
umělecko-kovářské a restaurátorské práce hlavně
na střední Moravě.
        </p>
        <p>The Oscar prize. . . for the firm Ilja Běhal &amp; Co.
which deals with smith craft and conservatory works
mainly in central Moravia.)
Nejvíc [kritizují a rozčilují se] neschopní.
Lit.: Most [criticize and get_angry]
incompetent.</p>
        <p>Incompetent employees criticize and get angry most
of all.
(What happens?)</p>
      </sec>
      <sec id="sec-4-11">
        <title>Annotation 2:</title>
        <p>
          Nejvíc [kritizují a rozčilují se] neschopní.
(Who criticizes and gets angry most of all?)
([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; control multiple annotation of the PDT 2.0 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ])
5.6. Core of the message: subjective
perception of the relative importance
        </p>
      </sec>
      <sec id="sec-4-12">
        <title>At this point, we allow ourselves a small digression in</title>
        <p>
          spired by the information structure. In many kinds of
coherence annotations, we see that annotators difer in
Annotation 1: what they consider to be important, central, at a given
[Zejména FOCALIZER v Olomouci] firma svými place in the text.
výrobky přispívá ke zvýraznění koloritu his- As the previous subsection showed, the variety of
untorického jádra města. derstanding of coherence relations often comes from
cerLit.: [Especially FOCALIZER in Olomouc] firm tain linguistic forms (specific word order pattern, etc.).
with_its products helps accentuation However, the language itself often does not provide a
of_colouring of_historical centre of_city. clue: we cannot tell which phrase or syntactic
construc[Especially in Olomouc], the firm helps to accentuate tion was vague enough to allow for multiple readings.
the colouring of the historical centre of the city with The diversity here comes from the diferent experience
its products. of the recipients, from their expectations and knowledge
(What does the firm do? What can we say about of the world. This type of inter-annotator disagreement
the firm?) is dificult for linguistics to grasp. Nevertheless, since we
Annotation 2: can document it well in our data, we take the liberty of
[Zejména v Olomouci] firma svými výrobky přis- presenting a few of these phenomena here, which can
pívá ke zvýraznění koloritu historického jádra serve as inspiration for e.g. psycholinguistic research.
města. At the local level, subjectivity can be seen in the
per(What does the firm do especially in Olomouc?) ception of importance in the information structure (cf.
([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; control multiple annotation of the PDT 2.0 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]), i.e. what people see as a topic / focus of a
sentence. Furthermore, this variation is found in discourse
In example (10), a striking feature of verbs can be seen: relations in Rhetorical Structure Theory, which
diferenexpressions dependent on the verbs often tend to be com- tiates between a more substantial and a less substantial
municatively more important than the verbs themselves. arguments of a discourse relations (nucleus and
satelThis can make the role of predicate verbs in the informa- lite, respectively; cf. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). See the following example (11)
tion structure unclear: annotators do not agree whether where adjacent sentences have the same syntactic
structo classify them as focus or as topic. We have already ture connected by the phrase not only – but also. One of
observed the unclear importance of verbs with respect the annotators considers both parts of these sentences to
to dependent parts in examples (5, unclear role of a verb have the same level of importance and marks a
multinuwith general meaning in a discourse structure) and (6-7, clear relation of contrast between them. The other one
unclear role of a verb of thinking/saying in a discourse understands the second parts (starting with but also) as
structure, compared to the clear role of dictum). emphasized, more important, marking thus the relation
(10) as antithesis with the nucleus in the second part.
(Context:
- Nářky lidí známe ze svého nejbližšího okolí. Jejich
frekvence spíš vzrůstá, než aby se tenčila. Proč?
- We know these complaints from our nearest vicinity.
        </p>
        <p>Their frequency is getting rather higher than lower.</p>
        <p>Why?)</p>
      </sec>
      <sec id="sec-4-13">
        <title>Annotation 1: (11)</title>
        <p>
          &lt;Arg1: Jan Kotík nemaluje jen očima a rukou,&gt;
CONTRAST / ANTITHESIS &lt;Arg2: ale také mozkem.&gt;
&lt;Arg1: Jeho obrazy tedy vyžadují nejen citlivost
a vnímavost,&gt; CONTRAST / ANTITHESIS &lt;Arg2: ale také
přemýšlení.&gt;
&lt;Arg1: Jan Kotík paints not only with his eyes and
hands,&gt; CONTRAST / ANTITHESIS &lt;Arg2: but also with cerned with the features given by the language itself; we
his brain.&gt; only marginally stopped at cases of disagreement that
&lt;Arg1: Therefore his paintings require not only sen- result from the diference of speakers. We have also
forsitivity and receptivity,&gt; CONTRAST / ANTITHESIS &lt;Arg2: mulated some questions that can be the subject of further
but also thinking.&gt; research.
(Czech RST Discourse Treebank 1.0 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]) Coherence relations can be divided into formally
expressed (e.g. in the discourse structure relations
exAt the global level, in the annotations according to Rhetor- pressed by an explicit discourse connective or an
informaical Structure Theory, the perceptual importance of indi- tion structure expressed by word order) and unexpressed
vidual parts of news reports difers, too. Typically, while relations that are understood from the context (e.g.
corefone annotator understands the introductory part as a cen- erence relation between the words text and chapter in a
tral message to which details are added in the following specific text).
text, the other perceives the same part as a preparation In formally unexpressed relations, disagreement
octo which the own message is associated afterwards. ([
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). curs naturally: it depends on the recipients what they
infer from the context. Formally expressed relations can
5.7. Text dimensions be also interpreted diferently. There may be
disagreeInter-annotator agreement can also be afected by text ment on the very existence of a coherence relation; this
dimensions. As coreference research shows, the larger disagreement is usually based on the polysemy
(polythe network of possible antecedents for a given word in functionality) of the linguistic form (expression), which
a text, the greater the disagreement between annotators in some contexts functions as a signal of coherence, but
([10, p. 221]; cf. the opportunities for disagreement in not in others. In addition, coherence signals can also
example 4). The author further states that divergent lead to a diferent perception of the semantic type of a
interpretations of coreference can also be chained: if discourse relation (in cases where speakers agree on its
annotators difer in the interpretation of expressions at existence): this is caused by the semantic
underspecificathe beginnings of the coreference chain, their diferent tion of language forms that express coherence (discourse
interpretations can be reflected in other expressions with connectives). The general question arises whether, as
a similar meaning in the text. recipients, we need to understand textual coherence in
        </p>
        <p>It is a question of how the size of the text afects the detail in all contexts, i.e. distinguish not only the simple
variability of understanding in other coherence relations, existence of coherence relations, but also their semantic
such as discourse relations and information structure. coloring. What level actually represents a functional and
We have not yet conducted research in this direction. suficient understanding of the text?
For discourse relations, there can theoretically be more Lexical specificity plays an important role in the
underpotential arguments in a large text that are connected standing of autosemantic words, too; these expressions
by a discourse connective. If the text is longer, it will do not function primarily as signals of coherence.
Corefprobably also be more layered in terms of author’s and erence research shows that for abstract and generic
nomreported speech, metacommunication, insertions, etc., inal phrases in a text, recipients determine with dificulty
which again ofers more possibilities for diferent under- whether the words have the same content; in contrast,
standings of discourse and other relation. On the other for words with a concrete, specific meaning, coreference
hand, a longer text can more accurately describe the con- is easier to determine. The same applies to the semantic
text in which the discourse relations are interpreted, and concreteness of verbs: for verbs with more vague,
genthus contribute to the clarity of understanding. In this eral meanings, it is dificult for annotators to determine
regard, another question arises: whether there is a difer- whether or not they are part of discourse arguments.
ence in the variability in the understanding of coherence Their meaning seems to be too insignificant, whereas the
relations at the beginning of the text (where the text is content of their dependent words is more important.
still short, there are few potential members of diferent This observation also applies to the verbs of thinking
relations available, but also little context) and in its later and saying in the relation of attribution, where the
conparts. tent of reported speech seems to be communicatively
more essential than the act of communication itself. In
the case of attribution, there is another reason for the
6. Conclusion diverse interpretation of the text: it represents one of
the forms of text arrangement (alongside parentheses,
In this study, we observed what common features the oc- meta-comments on the communication, etc.), i.e. a
comcurrences of inter-annotator disagreement have in coher- plication in the simple basic line of the narrative. It thus
ence relations, specifically in discourse relations, coref- provides the possibility for diferent recipients to
intererence and information structure. We were mainly con- pret the overall structure of the text diferently.</p>
        <p>In addition to individual words, such as various
coherence operators or autosemantic expressions, word
order can also cause a disagreement in text
understanding. Specifically, in Czech and other Slavic languages,
word order afects the understanding of the information
structure. If expressions with higher communicative
dynamism (informativeness) appear in the left, topical part
of the sentence, which has a prototypically low
communicative dynamism, typical contradictions in their
evaluation occur.</p>
        <p>In many types of annotation, it turns out that
annotators perceive the importance of individual parts of
the text and their (hierarchical) connections diferently.</p>
        <p>These disagreements are often not so much caused by the
special properties of the text as by diferences between
the annotators (specifically, it may be knowledge of the
language, knowledge of the world, expectations,
experience with diferent text genres, etc.). This area seems
particularly suitable for future psycholinguistic research
focusing on specific domains of coherence. Here, for
example, it is possible to examine the influence of
respondents’ literacy on the understanding of coreference in
abstract words or the process how children learn the text
arrangement.</p>
        <p>The last factor we dealt with is text dimensions. Its
efect on diferent readings was described in coreference
(the longer the text, the greater the disagreement in
interpretation). For other coherence relations, this factor is
still unexplored. We hypothesized that for discourse
relations and information structure, text dimensions could
influence the degree of disagreement in both directions;
the degree of disagreement may also vary by place in the
text and amount of preceding context (early vs. later in
the text). These ideas suggest possible directions for
further research on diferent text comprehension coherence.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>The research reported in this paper was supported by</title>
        <p>the Czech Science Foundation (project no. 24-11132S,
Disagreement in Corpus Annotation and Variation in
Human Understanding of Text); a part of the used data
comes from the project no. LM2018101 by the Czech
Ministry of Education, Youth and Sports (Digital Research
Infrastructure for Language Technologies, Arts and
Humanities).</p>
        <p>The author would like to express her gratitude to Prof.
E. Hajičová for careful proofreading of the manuscript,
dr. J. Mírovský for help with the technical processing
of the text and F. Zikánová for the language examples.
Thank you all for the pleasant cooperation.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>The “problem” of human label variation: On ground truth in data, modeling and evaluation</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>10671</fpage>
          -
          <lpage>10682</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>731</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2022</year>
          .emnlp-main.
          <volume>731</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <source>Penn Discourse Treebank Version 3.0</source>
          ,
          <year>2019</year>
          . URL: https: //hdl.handle.
          <source>net/11272</source>
          .1/AB2/SUU9CB. doi:11272. 1/AB2/SUU9CB.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Poláková</surname>
          </string-name>
          , Š. Zikánová,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mírovský</surname>
          </string-name>
          , E. Hajičová,
          <source>Czech RST Discourse Treebank 1.0</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jínová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mírovský</surname>
          </string-name>
          , L. Poláková,
          <article-title>Analyzing the most common errors in the discourse annotation of the Prague Dependency Treebank</article-title>
          , in: I.
          <string-name>
            <surname>Hendrickx</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kübler</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Simov (Eds.),
          <source>Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories</source>
          , Universidade de Lisboa, Edicoes Colibri, Lisboa, Lisboa, Portugal,
          <year>2012</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Š.</given-names>
            <surname>Zikánová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mladová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mírovský</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jínová</surname>
          </string-name>
          ,
          <article-title>Typical cases of annotators' disagreement in discourse annotations in Prague Dependency Treebank</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2010</year>
          ), European Language Resources Association, Valletta, Malta,
          <year>2010</year>
          , pp.
          <fpage>2002</fpage>
          -
          <lpage>2006</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Š.</given-names>
            <surname>Zikánová</surname>
          </string-name>
          ,
          <article-title>Implicitní diskurzní vztahy v češtině [Implicit Discourse Relations in Czech]</article-title>
          , Charles University,
          <source>Faculty of Mathematics and Physics</source>
          , Prague, Czech Republic,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Crible</surname>
          </string-name>
          , Á. Abuczki, N. Burkšaitienė,
          <string-name>
            <given-names>P.</given-names>
            <surname>Furkó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nedoluzhko</surname>
          </string-name>
          , G. Oleskeviciene, S. Rackevičienė, Š. Zikánová,
          <article-title>Functions and translations of underspecified discourse markers in TED talks: a parallel corpus study on five languages</article-title>
          ,
          <source>Journal of Pragmatics</source>
          (
          <year>2019</year>
          )
          <fpage>139</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Poláková</surname>
          </string-name>
          , J. Mírovský, Š. Zikánová, E. Hajičová,
          <article-title>Developing a Rhetorical Structure Theory Treebank for Czech</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING</article-title>
          <year>2024</year>
          ), European Language Resources Association, Torino, Italy,
          <year>2024</year>
          , pp.
          <fpage>4802</fpage>
          -
          <lpage>4810</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mírovský</surname>
          </string-name>
          , L. Mladová, Š. Zikánová,
          <article-title>Connectivebased measuring of the inter-annotator agreement in the annotation of discourse in PDT</article-title>
          , in: C.
          <string-name>
            <surname>-R. Huang</surname>
          </string-name>
          , D. Jurafsky (Eds.),
          <source>Proceedings of the 23rd International Conference on Computational Linguistics (Coling</source>
          <year>2010</year>
          ), volume
          <volume>1</volume>
          ,
          <string-name>
            <given-names>Chinese</given-names>
            <surname>Information</surname>
          </string-name>
          Processing Society of China, Tsinghua University Press, Beijing, China,
          <year>2010</year>
          , pp.
          <fpage>775</fpage>
          -
          <lpage>781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nedoluzhko</surname>
          </string-name>
          ,
          <article-title>Rozšířená textová koreference a asociační anafora (Koncepce anotace českých dat v Pražském závislostním korpusu) [Extended nominal coreference and bridging anaphora (An approach to annotation of Czech data in the Prague Dependency Treebank)], Studies in Computational and Theoretical Linguistics, Ústav formální a aplikované lingvistiky</article-title>
          , Praha, Česká republika,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Š. Zikánová</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Týnovský</surname>
          </string-name>
          ,
          <article-title>Identification of Topic and Focus in Czech: Comparative Evaluation on Prague Dependency Treebank</article-title>
          , in: G. Zybatow,
          <string-name>
            <given-names>U.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lenertová</surname>
          </string-name>
          , P. Biskup (Eds.), Studies in Formal Slavic Phonology, Morphology, Syntax, Semantics and
          <string-name>
            <given-names>Information</given-names>
            <surname>Structure</surname>
          </string-name>
          .
          <source>Formal Description of Slavic Languages</source>
          <volume>7</volume>
          ,
          <string-name>
            <surname>Universität</surname>
            <given-names>Leipzig</given-names>
          </string-name>
          , Peter Lang,
          <source>Frankfurt am Main, Germany</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>343</fpage>
          -
          <lpage>353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hajič</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Panevová</surname>
          </string-name>
          , E. Hajičová,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sgall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pajas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Štěpánek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Havelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikulová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Žabokrtský</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ševčíková-Razímová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Urešová</surname>
          </string-name>
          ,
          <source>Prague Dependency Treebank 2.0</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bejček</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hajičová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hajič</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jínová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kettnerová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kolářová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikulová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mírovský</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nedoluzhko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Panevová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Poláková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ševčíková</surname>
          </string-name>
          , J. Štěpánek,
          <source>Š. Zikánová, Prague Dependency Treebank 3.0</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dinesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miltsakaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Robaldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <source>The Penn Discourse TreeBank 2.0, in: Proceedings, 6th International Conference on Language Resources and Evaluation</source>
          , Marrakech, Morocco,
          <year>2008</year>
          , pp.
          <fpage>2961</fpage>
          -
          <lpage>2968</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Anchoring a Lexicalized Tree-Adjoining Grammar for discourse</article-title>
          ,
          <source>in: Discourse Relations and Discourse Markers</source>
          ,
          <year>1998</year>
          . URL: https://aclanthology.org/W98-0315.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sgall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hajicová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Panevová</surname>
          </string-name>
          ,
          <source>The Meaning of the Sentence in its Semantic and Pragmatic Aspects</source>
          , Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Š. Zikánová</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Synková</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mírovský</surname>
          </string-name>
          ,
          <article-title>Enriched discourse annotation of PDiT subset 1.0 (PDiT-EDA 1</article-title>
          .0),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taboada</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <article-title>Annotation Guidelines for Rhetorical Structure. (Manuscript)</article-title>
          .,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Artstein</surname>
          </string-name>
          ,
          <article-title>Inter-annotator agreement, Handbook of linguistic annotation (</article-title>
          <year>2017</year>
          )
          <fpage>297</fpage>
          -
          <lpage>313</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Poláková</surname>
          </string-name>
          , Discourse Relations in Czech,
          <source>Ph.D. thesis, Faculty of Mathematics and Physics</source>
          , Charles University in Prague, Prague, Czech Republic,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Š. Zikánová</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Týnovský</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Havelka</surname>
          </string-name>
          ,
          <article-title>Identification of Topic and Focus in Czech: Evaluation of Manual Parallel Annotations</article-title>
          ,
          <source>The Prague Bulletin of Mathematical Linguistics</source>
          (
          <year>2007</year>
          )
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>