<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Text Structure and Its Ambiguities: Corpus Annotation as a Helpful Guide</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Šárka</forename><surname>Zikánová</surname></persName>
							<email>zikanova@ufal.mff.cuni.cz</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Physics</orgName>
								<orgName type="institution">Charles University</orgName>
								<address>
									<addrLine>Malostranské nám. 25</addrLine>
									<postCode>118 00</postCode>
									<settlement>Prague 1</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Conference ITAT (Information Technologies-Applications and Theory)</orgName>
								<address>
									<addrLine>2024: Drienica</addrLine>
									<settlement>Čergovské vrchy</settlement>
									<country key="SK">Slovakia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Text Structure and Its Ambiguities: Corpus Annotation as a Helpful Guide</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">839C88DAC3C2FF542DB9850EFCFC1885</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>inter-annoator agreement</term>
					<term>human label variation</term>
					<term>discourse relations</term>
					<term>coreference</term>
					<term>information structure</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>It is typical for natural languages that their texts can be understood differently by individual recipients. A number of scientific disciplines, from cognitive psychology to linguistics, are devoted to this phenomenon. In this study, we focus mainly on linguistic factors, which may lead to different interpretations of coherence relations in the text (simply speaking, what is related to what and how). This work presents a pilot typological survey of disagreements in Czech corpus annotations of coherence relations (discourse relations, coreference, information structure) and their common features. Polysemy (polyfunctionality) and semantic underspecification of coherent expressions (e.g. discourse connectives), generic / abstract meaning of autosemantic words, presence of attribution constructions, word order as a potential marker of information structure and text size appear to be essential factors for disagreement in interpretation. In addition, subjective reception of the relative importance of different text parts plays an important role, too. Based on the observation of the material, we raise questions and propose possible steps for the ongoing research of variability in the perception of text coherence.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The availability of digital language resources enables an important step forward in linguistic research, both for its theoretical as well as applicational orientation. The originally collected data serving mostly for the study of the lexical studies and those of the study of syntax proper gave an impulse to enrich them by various more sophisticated annotation systems dealing with most different phenomena, going beyond the sentence boundary and incl. e.g. text coherence and phenomena related to inferencing, and elaborating more levels of granularity in the annotation. The annotated data serve for different tasks in the computational processing of natural languages -as training and testing data for the development of language models.</p><p>Human data annotation is a process based on interpretation of observed phenomena and thus may lead to different outcomes. This variation is caused by various factors. Some of them are connected with the shortcomings of the annotation scenario (e.g., not providing instructions for the solution of some cases) or with the leaks of the underlying theory (e.g., non-intuitive solutions or discerning too fine categories, very close to each other). Other cases of inter-annotator disagreement are connected with the learning process of annotators: especially the first annotated batches of data may be influenced by the annotators' unfamiliarity with the annotation scenario. That is the reason why these data are often re-annotated later. To prevent these kinds of inconsistent analysis of the data, annotators usually attend frequent trainings; simultaneously, their feedback at the beginning of the annotation may improve annotation scenario and point out some problematic points in the underlying theory. Before releasing data, annotators' mistakes are searched for and corrected, e.g. a simple overseeing of phenomena that should be marked; nevertheless, some of the mistakes can remain even in the final data. Last, but not least source of the disagreement in the annotation is language vagueness, polysemy and homonymy: in some cases, a language itself allows for several understandings of a sentence.</p><p>Computational linguistics offers several methodological approaches to this variability of the data annotation. One of the solutions is unification: a gold standard is set, e.g. by majority voting or by a third judge.</p><p>Another, more demanding way of data unification is a joint annotation, when annotators mark the data together, discussing each single case and marking the result of their discussion only.</p><p>In order to accept and capture the uncertainty annotators can face while marking language phenomena, some annotation scenarios with hierarchical classifications allow the use of more general levels of the classifications, not discerning the finest classification differences in dubious cases. Another way how to mark the annotators' certainty is a separate marking of their confidence as a specific feature (e.g., (a) a discourse relation is marked as a conjunction and (b) the annotator was absolutely sure about his solution). It is necessary to say that annotator's high certainty does not necessarily mean that his solution is the only possible one; in some cases, another annotator can be equally convinced about a different reading.</p><p>Unification is not the only way how to handle the data. Some researchers argue that unification may result in biased data missing important information about variability of language understanding <ref type="bibr">[1]</ref>. Consequently, biased language models are developed based on this data. Therefore, annotators are allowed to mark multiple description of the same phenomenon in some approaches, (e.g., in the Penn Discourse Treebank 3.0 <ref type="bibr" target="#b1">[2]</ref>, a single discourse relation can be marked as an instantiation and cause at the same time, if the annotator understands it in this way). Other annotation projects publish their data with partial or complete multiple annotations carried out by different annotators; in such data, personal solutions of similar language phenomena can be observed systematically (cf. Czech RST Discourse Treebank, <ref type="bibr" target="#b3">[3]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Aim of the study</head><p>In our research, we deal with the annotation variation from a different perspective, from the linguistic and psycholinguistic point of view, with focusing on human language understanding. We use data with variations as a source of phenomena that are regularly understood in different ways and we search for possible common features of different readings. We pay special attention to the cues that are inherent to a language, rather than to the diversity among humans receiving the texts.</p><p>Questions of human language understanding have been addressed on a theoretical level, e.g. in psycholinguistics or lexical and syntactic semantics. In our study, we want to take use of our practical long term experience with large amounts of language data and possibly to offer some new insights into the variation of language interpretation or to contribute to theoretical discussions with practical findings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data: Text Coherence Annotation</head><p>Multiple reading may result at many language levels and perspectives, such as lexical semantics (cf. polysemy of the word bank as an institution and as a river bank), morphology (homonymous singular and plural form, like sheep or fish), syntax (having an old friend for dinner) etc. Our research is restricted to the area of text coherence in general. Specifically, our data cover multiple annotations of the following phenomena: discourse relations, coreference, and information structure (3.1-3.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Discourse relations</head><p>Discourse relations connect so called discourse arguments (clauses, sentences or larger text segments) and express certain semantic relation between the arguments. They are prototypically expressed by discourse connectives (conjunctions, subjunctions, discourse adverbs etc.), but they may be formally unexpressed, either. The former type of relations is called explicit discourse relations, the latter relations are implicit. &lt;Arg1: She enjoyed working in the office&gt; &lt;Arg2: be-causeREASON she had pretty flowers there.&gt;</p><p>In our data, we work with the data from the following discourse corpora:</p><p>(a) Prague Dependency Treebank 2.0 [12] and 3.0 <ref type="bibr" target="#b13">[13]</ref>. The annotation scenario of the Prague Dependency Treebank was motivated by the approach of the Penn Discourse Treebank ( <ref type="bibr" target="#b14">[14]</ref>, following the Lexical Tree-Adjoining Grammar <ref type="bibr" target="#b15">[15]</ref>) and is based on the Functional Generative Description <ref type="bibr" target="#b16">[16]</ref> as applied in the family of Prague Dependency Treebanks. It discerns 23 semantic types of discourse relations, such as conjunction, disjunction, concession, generalization etc.; the discourse connectives are marked explicitly. The annotation is carried out on so called tectogrammatic (syntactico-semantic) dependency trees which allows the discourse annotation to be related to syntactico-semantic level of a language. The data in the corpus are in Czech.</p><p>(b) Enriched Discourse Annotation of Prague Discourse Treebank Subset 1.0 (PDiT-EDA 1.0, <ref type="bibr" target="#b17">[17]</ref> The annotation scenario follows the approach of the Prague Dependency Treebank; the annotation is enriched with marking of implicit discourse relations.</p><p>(c) Data comparing underspecification of discourse connectives in five languages (English, French, Czech, Hungarian, Lithuanian) as published in <ref type="bibr" target="#b7">[7]</ref>. The annotation scenario is based on the Crible's classification of discourse relations <ref type="bibr" target="#b7">[7]</ref> discerning 15 discourse relations (e.g., opening, addition, topic-shift). Unlike the Praguian discourse approach, Crible's classification takes into account broader pragmatic aspects of discourse (so called domains), explicitly discerning ideational, rhetorical, sequential, and interpersonal domains where the discourse relations are used.</p><p>(d) Czech RST Discourse Treebank 1.0 <ref type="bibr" target="#b3">[3]</ref>. The annotation scenario is based on the Rhetorical Text Structure Theory as applied in the Potsdam Commentary Corpus <ref type="bibr" target="#b18">[18]</ref>. This theory assumes that text as a whole is built from a smaller segments which are all interconnected by discourse relations, without any part being left aside. It discerns 37 discourse relations (e.g., concession, concession as nucleus, textual preparation). A specific feature of RST is that it puts emphasis on different levels of com- </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Coreference</head><p>Coreferential relations connect expressions with the same reference, such as The girl looked into her map, she looked like she was enjoying the adventure. Madelein had a great sense of orientation. The arguments of coreferential relations are prototypically noun phrases (nouns, pronouns) including dropped phrases (While [she] walking through the landscape, she admired the nature's beauty.).</p><p>A coreferential relation may also hold between a larger text segment, such as a whole thought or paragraph and a summarizing pronoun it / this etc. We use coreference data including disagreement in the annotation coming from the Prague Dependency Treebank 2.0 <ref type="bibr" target="#b12">[12]</ref> and 3.0 <ref type="bibr" target="#b13">[13]</ref> where coreference is a part of multi-level annotation including discourse and syntactic semantics (see above).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Information Structure</head><p>Information structure of a sentence expresses a communicative importance of single parts of a sentence in a given context. In general, it captures a topic (what the sentence is about) and a focus of a sentence (what new information is said about the topic), cf. (context: There is a cat under the tree.) It TOPIC is ready for a jump FOCUS.</p><p>Our data about information structure come from an experiment carried out on the data of the Prague Dependency Treebank 2.0 <ref type="bibr" target="#b12">[12]</ref> where information structure is marked on dependency trees on the tectogrammatic (syntactico-semantic) level. 1   1 According to the Functional Generative Approach <ref type="bibr" target="#b16">[16]</ref>, a tectogrammatic tree consists of nodes which prototypically correspond to autosemantic words; the nodes are connected by edges expressing syntactico-semantic relations (e.g., Actor, Patient, Addressee). As for the information structure, each node is ascribed a value of contextual boundness (contextually bound, contextually non-bound, contrastively contextually bound). The nodes are ordered from the left to the right according to their so called communicative dynamism, i.e. measure to which they contribute to the development of information flow in the sentence. The values of topic and focus can be derivated from these two features (contextual boundness and communicative dynamism.)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methodology</head><p>In the present study, we search for general language features of sentences (words, contexts) allowing for variable readings of text structure. For this purpose, we collect occurrences of inter-annotators' disagreement in the language corpora (see Table <ref type="table">1</ref>) and classify them manually, putting aside occurrences of disagreement resulting obviously from other types of reasons (annotator's mistake, technical solutions of the applied theory). We concentrate on the semantic and grammatical features of the examined sentences and expressions. <ref type="foot" target="#foot_0">2</ref>The results are compared and supplemented by a metaanalysis of reports on annotations of single corpora; unfortunately, due to space limitations, the annotation reports often describe reasons of inter-annotators' disagreement very shortly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Measuring inter-annotator disagreement on a text structure</head><p>On the most general level, measuring inter-annotator agreement of textual phenomena concerns with two criteria:</p><p>(a) How often all the annotators found a certain phenomenon (e.g., a discourse relation). E.g. one annotator may ignore a case which should be marked whereas the other one does not. This would be a case of a disagreement on the existence of the phenomenon.</p><p>(Dis)agreement on the existence is usually measured with the F1 measure (a harmonic average of precision and recall).</p><p>(b) Within the cases where all the annotators agree on the existence of a certain phenomenon, it is measured how often annotators agree on the classification of the found phenomenon. If one annotator assigns a discourse relation the semantic type conjunction, whereas the other one sees it as gradation, it is a case of a disagreement on the type of the phenomenon. (Dis)agreement on the type is prototypically evaluated as a simple percentage match or with the Cohen's kappa measure.</p><p>Both types of disagreement are relevant to our research: we are looking for linguistic features that can cause one annotator not to recognize a certain type of contiguity while another does. We are equally interested in the linguistic reasons why annotators ascribe different meanings to one coherence relation. 3   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Analysis</head><p>In our data, which includes the annotation of discourse relations, coreference, and information structure, we have identified seven areas (factors) that repeatedly influence different readings of textual coherence by annotators.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Synsemantic signals of coherence relations: polysemy</head><p>Some words function primarily in the text as explicit markers of coherence relations (discourse connectives for discourse relations, some pronouns for anaphoric relations). However, these words are often polysemous (polyfunctional) as lexical units: they can also be used in Similarly, in coreferential relations, e.g. the word it can perform a pronominal function and be part of a coreferential chain (She played great. I really liked it.), but it can also function as a grammatical word without any reference (The weather is fine. It is not raining anymore.). The presence of such synsemantic expressions in the text does not signal the presence of a coherence relation clearly; thus, recipients may disagree about the existence of a relation depending on their readings of the function of the polysemous word, as in the discourse annotation example 1:</p><p>(1) Annotation 1: explicit discourse relation expressed by a discourse connective přece (because) &lt;Arg1: Neptejte se mě, proč jsem přijel do Prahy.&gt; &lt;Arg2: Je to přece EXPLICATION normální sem přijet.&gt;</p><p>3 General information on measuring inter-annotator agreement can be found in <ref type="bibr" target="#b19">[19]</ref>.</p><p>Many annotation projects adapt their measurement methods to more precisely suit the phenomena under investigation. E.g. in the case of discourse relations, the agreement on existence can be considered strictly as the case where both annotators agree on the exact scope of both discourse arguments and assign it to a certain discourse connective as an agreement on existence. For a looser approach, which respects that the exact localization of arguments can be difficult in some cases, the mere matching of a discourse connective can be considered an agreement on existence. In this case, it does not matter which words exactly the annotators mark as parts of single discourse arguments <ref type="bibr" target="#b9">[9]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Synsemantic signals of coherence relations: underspecification</head><p>Other cases of disagreement are based on the semantic underspecification of words signaling coherence relations: in these cases, the annotators agree on the existence of a certain relation, but they disagree on the assessment of its meaning (disagreement on type). This disagreement is typical for discourse relations, signaled by discourse connectors with a vague meaning, cf. ( <ref type="formula">2</ref>):</p><p>( Different understandings of underspecified discourse conjunctions are also evident in the dataset reported in <ref type="bibr" target="#b7">[7]</ref>, which contains the original English subtitles of TED talks and their equivalents in four languages. In the following document, the original English conjunction but (underspecified discourse connective with contrastive meaning) is translated using the Czech a (and, underspecified discourse connective with a simple conjunctive meaning).</p><p>(3) English original: Today I want to talk to you about the mathematics of love. Now, I think that we can all agree that mathematicians are famously excellent at finding love. But it's not just because of our dashing personalities, superior conversational skills and excellent pencil cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Czech translation:</head><p>Dnes vám chci povědět něco o matematice lásky. Myslím, že se shodneme na tom, že matematici jsou v oblasti lásky proslulí svými schopnostmi. A nestojí za tím jen okouzlující charakter, neobyčejný konverzační um či ostře nabroušené tužky.</p><p>(Dataset of the research reported in <ref type="bibr" target="#b7">[7]</ref>)</p><p>The interchangeability of these words in the given contexts raises certain theoretical questions: for example, what level of text coherence is necessary for the recipient? In the examples given, it seems sufficient to signal that the two arguments are connected by a discourse relation. Which meaning type is specifically involved seems to be irrelevant. Both examples, ( <ref type="formula">2</ref>) and (3) lead at the same time to another question, namely the nature of the semantic types of discourse relations. In the annotations, we differentiate the individual types very precisely; but in fact, contrastivity, like causality, can be scalar, gradual, can be located on the same axis with conjunction, and different recipients can only perceive different degrees of contrastivity or causality. This property of discourse semantic types can be verified using psycholinguistic experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Autosemantic words in coherence relations: genericity and abstractness</head><p>Based on the analysis of the data, we make the assumption that autosemantic words with a concrete, nonabstract meaning (cf. concrete to bake versus abstract to do) and expressions with a specific, not generic reference (the boy vs. the youth as such) are generally more accessible and representable for the recipients. In this context, we observe that words with an abstract meaning or with a generic reference can complicate the understanding of the text coherence structure: in sentences with these expressions, inter-annotator disagreement occurs more often.</p><p>Regarding coreferential relations, Nedoluzhko <ref type="bibr">[10, p. 221]</ref> states that "The more nouns with abstract meaning and expressions with generic reference in the text, the smaller the agreement. " It is often difficult to estimate, for example, whether concepts of two abstract expressions fully overlap (and are therefore fully coreferential), or one is a part of the other, or they are independent, cf. ( <ref type="formula">4</ref> Also in the annotation of discourse relations, words with an abstract, non-specific meaning result in the interannotators' disagreement <ref type="bibr" target="#b5">[5]</ref>. This is the case of sentences including verbs with an abstract, general meaning. As the authors say, "The disagreement occurs when it is not clear whether the potential discourse connective refers to the whole sentence as an independent abstract object (discourse argument), or just to its complement, typically a nominal phrase." <ref type="bibr">[5, p. 2003</ref>]. Thus, in example <ref type="bibr" target="#b5">(5)</ref>, the disagreement between annotators shows that it is questionable whether the second part of the sentence (while chimneys. . . ) is related to the whole previous clause including the verbs with abstract meaning (it is possible to note a small, but distinctive difference between. . . ), or just to the nominal phrase (a small, but distinctive difference between. . . ). <ref type="foot" target="#foot_1">4</ref>( In fact, this is a disagreement on which level the given phenomenon should be captured (in this case, coreference or discourse). It is rather an academic question how to annotate these cases consistently. As for the recipients themselves, the difference in the annotation does not mean a difference in the understanding of the text, as the language levels and perspectives are inter-related and the annotators can ascribe single phenomena to different levels without understanding the text coherence in a different way.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Attribution: verbs of thinking and saying</head><p>Attribution is the relation between the (named) author of a section of text and his speech. A typical component in the attribution construction is the author's name, the verb of thinking or speaking or another form expressing speech (colon, phrases such as according to) and the direct / indirect speech itself (dictum). A language has means how to distinguish the author's speech from the reported speech. Nevertheless, with attributive constructions it is often difficult to distinguish how far discourse relations extend and what is the scope of their arguments, especially when it comes to verbs of thinking and saying. In these cases, annotators often disagree in their interpretations, cf. examples ( <ref type="formula" target="#formula_0">6</ref>) and ( <ref type="formula">7</ref>).</p><p>( </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5.">Word order</head><p>So far, we have observed cases of disagreement between annotators, which result from the lexical properties of expressions ensuring coherence (underspecification vs. specificity, abstractness vs. concreteness) and from the syntactic structure (governing verb of saying/thinking vs. dictum itself). Word order is another area that plays an important role in ensuring the coherence of the text and can also become subject to different interpretations. In Czech, similarly as in other Slavic languages, the word order is relatively free, with few grammatical restrictions. It is used to express information structure of a sentence: the information belonging to the topic is prototypically placed in the sentence to the left, the focus is usually located to the right. However, it is also possible to use a marked word order, when the topic and focus occupy various places in the sentence and are distinguished by intonation, the use of focalizers, or deduced from the context. This freedom in the formal expression of information structure results in some cases in inter-annotator disagreement. Often, annotators interpret differently information structure of the left part of a sentence: some tend to consider it less important, disregarding the used expressions, because it is prototypically a topic position; others are more driven by context and other indicators of possible focus.</p><p>This variability applies especially to adverbials located before the verb in the surface word order, focalized phrases and predicate verbs in the left part of the sentence <ref type="bibr" target="#b11">[11]</ref>. The example (8) presents an ambiguous interpretation of the conditional phrase at the beginning of the sentence; one of the annotators considers it to be a part of the very message of the sentence, the other as a mere unimportant circumstance. Thus, both perceive the given sentence as a response to a different (unspoken) context, as shown by the contextual questions at the end of each interpretation. (The expressions in topic are underlined; the focus is marked with bold characters.) <ref type="bibr" target="#b8">(8)</ref>  (How will your situation be if you take full advantage of your present-day capacities?) <ref type="bibr">([11]</ref>; control multiple annotation of the PDT 2.0, <ref type="bibr" target="#b12">[12]</ref>)</p><p>In example <ref type="bibr" target="#b9">(9)</ref>, there is a collision between two indicators of importance (belonging to the topic / focus): the observed phrase is located at the beginning of the sentence, a place typical for the topic; but at the same time it is emphasized by the focalizer. Annotators perceive its role in the information structure of the sentence differently.</p><p>( (What does the firm do especially in Olomouc?) <ref type="bibr">([11]</ref>; control multiple annotation of the PDT 2.0 <ref type="bibr" target="#b12">[12]</ref>)</p><p>In example <ref type="bibr" target="#b10">(10)</ref>, a striking feature of verbs can be seen: expressions dependent on the verbs often tend to be communicatively more important than the verbs themselves. This can make the role of predicate verbs in the information structure unclear: annotators do not agree whether to classify them as focus or as topic. We have already observed the unclear importance of verbs with respect to dependent parts in examples (5, unclear role of a verb with general meaning in a discourse structure) and (6-7, unclear role of a verb of thinking/saying in a discourse structure, compared to the clear role of dictum).</p><p>( As the previous subsection showed, the variety of understanding of coherence relations often comes from certain linguistic forms (specific word order pattern, etc.). However, the language itself often does not provide a clue: we cannot tell which phrase or syntactic construction was vague enough to allow for multiple readings. The diversity here comes from the different experience of the recipients, from their expectations and knowledge of the world. This type of inter-annotator disagreement is difficult for linguistics to grasp. Nevertheless, since we can document it well in our data, we take the liberty of presenting a few of these phenomena here, which can serve as inspiration for e.g. psycholinguistic research.</p><p>At the local level, subjectivity can be seen in the perception of importance in the information structure (cf. <ref type="bibr" target="#b21">[21]</ref>), i.e. what people see as a topic / focus of a sentence. Furthermore, this variation is found in discourse relations in Rhetorical Structure Theory, which differentiates between a more substantial and a less substantial arguments of a discourse relations (nucleus and satellite, respectively; cf. <ref type="bibr" target="#b8">[8]</ref>). See the following example <ref type="bibr" target="#b11">(11)</ref> where adjacent sentences have the same syntactic structure connected by the phrase not only -but also. One of the annotators considers both parts of these sentences to have the same level of importance and marks a multinuclear relation of contrast between them. The other one understands the second parts (starting with but also) as emphasized, more important, marking thus the relation as antithesis with the nucleus in the second part. <ref type="bibr" target="#b11">(11)</ref>  At the global level, in the annotations according to Rhetorical Structure Theory, the perceptual importance of individual parts of news reports differs, too. Typically, while one annotator understands the introductory part as a central message to which details are added in the following text, the other perceives the same part as a preparation to which the own message is associated afterwards. ( <ref type="bibr" target="#b8">[8]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.7.">Text dimensions</head><p>Inter-annotator agreement can also be affected by text dimensions. As coreference research shows, the larger the network of possible antecedents for a given word in a text, the greater the disagreement between annotators ([10, p. 221]; cf. the opportunities for disagreement in example 4). The author further states that divergent interpretations of coreference can also be chained: if annotators differ in the interpretation of expressions at the beginnings of the coreference chain, their different interpretations can be reflected in other expressions with a similar meaning in the text.</p><p>It is a question of how the size of the text affects the variability of understanding in other coherence relations, such as discourse relations and information structure. We have not yet conducted research in this direction. For discourse relations, there can theoretically be more potential arguments in a large text that are connected by a discourse connective. If the text is longer, it will probably also be more layered in terms of author's and reported speech, metacommunication, insertions, etc., which again offers more possibilities for different understandings of discourse and other relation. On the other hand, a longer text can more accurately describe the context in which the discourse relations are interpreted, and thus contribute to the clarity of understanding. In this regard, another question arises: whether there is a difference in the variability in the understanding of coherence relations at the beginning of the text (where the text is still short, there are few potential members of different relations available, but also little context) and in its later parts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this study, we observed what common features the occurrences of inter-annotator disagreement have in coherence relations, specifically in discourse relations, coreference and information structure. We were mainly con-cerned with the features given by the language itself; we only marginally stopped at cases of disagreement that result from the difference of speakers. We have also formulated some questions that can be the subject of further research.</p><p>Coherence relations can be divided into formally expressed (e.g. in the discourse structure relations expressed by an explicit discourse connective or an information structure expressed by word order) and unexpressed relations that are understood from the context (e.g. coreference relation between the words text and chapter in a specific text).</p><p>In formally unexpressed relations, disagreement occurs naturally: it depends on the recipients what they infer from the context. Formally expressed relations can be also interpreted differently. There may be disagreement on the very existence of a coherence relation; this disagreement is usually based on the polysemy (polyfunctionality) of the linguistic form (expression), which in some contexts functions as a signal of coherence, but not in others. In addition, coherence signals can also lead to a different perception of the semantic type of a discourse relation (in cases where speakers agree on its existence): this is caused by the semantic underspecification of language forms that express coherence (discourse connectives). The general question arises whether, as recipients, we need to understand textual coherence in detail in all contexts, i.e. distinguish not only the simple existence of coherence relations, but also their semantic coloring. What level actually represents a functional and sufficient understanding of the text?</p><p>Lexical specificity plays an important role in the understanding of autosemantic words, too; these expressions do not function primarily as signals of coherence. Coreference research shows that for abstract and generic nominal phrases in a text, recipients determine with difficulty whether the words have the same content; in contrast, for words with a concrete, specific meaning, coreference is easier to determine. The same applies to the semantic concreteness of verbs: for verbs with more vague, general meanings, it is difficult for annotators to determine whether or not they are part of discourse arguments. Their meaning seems to be too insignificant, whereas the content of their dependent words is more important.</p><p>This observation also applies to the verbs of thinking and saying in the relation of attribution, where the content of reported speech seems to be communicatively more essential than the act of communication itself. In the case of attribution, there is another reason for the diverse interpretation of the text: it represents one of the forms of text arrangement (alongside parentheses, meta-comments on the communication, etc.), i.e. a complication in the simple basic line of the narrative. It thus provides the possibility for different recipients to interpret the overall structure of the text differently.</p><p>In addition to individual words, such as various coherence operators or autosemantic expressions, word order can also cause a disagreement in text understanding. Specifically, in Czech and other Slavic languages, word order affects the understanding of the information structure. If expressions with higher communicative dynamism (informativeness) appear in the left, topical part of the sentence, which has a prototypically low communicative dynamism, typical contradictions in their evaluation occur.</p><p>In many types of annotation, it turns out that annotators perceive the importance of individual parts of the text and their (hierarchical) connections differently. These disagreements are often not so much caused by the special properties of the text as by differences between the annotators (specifically, it may be knowledge of the language, knowledge of the world, expectations, experience with different text genres, etc.). This area seems particularly suitable for future psycholinguistic research focusing on specific domains of coherence. Here, for example, it is possible to examine the influence of respondents' literacy on the understanding of coreference in abstract words or the process how children learn the text arrangement.</p><p>The last factor we dealt with is text dimensions. Its effect on different readings was described in coreference (the longer the text, the greater the disagreement in interpretation). For other coherence relations, this factor is still unexplored. We hypothesized that for discourse relations and information structure, text dimensions could influence the degree of disagreement in both directions; the degree of disagreement may also vary by place in the text and amount of preceding context (early vs. later in the text). These ideas suggest possible directions for further research on different text comprehension coherence.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>Don't ask me why I came. Because EXPLICATION it's normal to come here.</figDesc><table><row><cell>Annotation 2: no explicit discourse relation, the</cell></row><row><cell>word přece (after all) expresses the stance of the</cell></row><row><cell>speaker</cell></row><row><cell>Neptejte se mě, proč jsem přijel do Prahy. Je to přece</cell></row><row><cell>normální sem přijet.</cell></row><row><cell>Don't ask me why I came. After all, it's normal to</cell></row><row><cell>come here.</cell></row><row><cell>(according to [6, p. 63]; multiple annotation of the PDiT-EDA 1.0 [17])</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>).</figDesc><table><row><cell>-The materials you have at your disposal today were</cell></row><row><cell>preceded by a long-term research.</cell></row><row><cell>-Zdeněk Dytrych: Since 1969, when we founded</cell></row><row><cell>the Department for Family Research in the former</cell></row><row><cell>Research Institute of Psychiatry, we have mainly</cell></row><row><cell>been dealing with this issue.</cell></row><row><cell>Of course, we had a number of collaborators, and in</cell></row><row><cell>twenty-five years we have done an almost endless</cell></row><row><cell>amount of work as a team. [lit.: endless amount</cell></row><row><cell>of works (plural) which can mean publications as</cell></row><row><cell>well, ŠZ]</cell></row><row><cell>For example, extensive research on the divorce</cell></row><row><cell>rate.</cell></row><row><cell>([10, p. 223-226]; multiple annotation of the PDT 3.0[13])</cell></row><row><cell>In example (4), the question is how the last sentence is</cell></row><row><cell>related to the previous text -what is the research on the</cell></row><row><cell>divorce rate supposed to serve as an example of? One</cell></row><row><cell>annotator sees the phrase research on the divorce rate as</cell></row><row><cell>(4) (context: interview with child psychiatrists who</cell></row><row><cell>published the Czech book Children, Family and</cell></row><row><cell>Stress)</cell></row><row><cell>-Materiálům, které dnes máte k dispozici, předcházel</cell></row><row><cell>dlouholetý výzkum.</cell></row><row><cell>-Zdeněk Dytrych: Od roku 1969, kdy jsme založili v</cell></row><row><cell>bývalém Výzkumném ústavu psychiatrickém Oddě-</cell></row><row><cell>lení pro výzkum rodiny, se hlavně zabýváme touto</cell></row><row><cell>problematikou.</cell></row><row><cell>Měli jsme samozřejmě řadu spolupracovníků a za</cell></row><row><cell>pětadvacet let jsme v týmu udělali téměř nekonečnou</cell></row><row><cell>řadu prací.</cell></row><row><cell>Tak například rozsáhlý výzkum rozvodovosti.</cell></row></table><note>an example of a series (amount) of works in the previous sentence, while the other one sees it as an example of the long-term research in the first sentence. Is a series (amount) of works (publications?) the same as research? Or are the works (publications) only the result of research, i.e. one part of it? Similar contradictions are quite common in the understanding of the coreference of generic and abstract terms.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Při využití všech výukových prostor od rána až do večera] 0-subject jsme schopni ročně při- jmout ke studiu okolo 2500 studentů. Lit.: [When using all classrooms from morn- ing till evening] we_are able a_year to_accept to_studies about 2500 students.</head><label></label><figDesc></figDesc><table><row><cell>(Context: Po ekonomech, kteří nyní už opouštějí</cell></row><row><cell>školu se znalostí pravidel hry v tržním prostředí, je</cell></row><row><cell>hlad. Co hodláte udělat, aby jich bylo dost?</cell></row><row><cell>The economists are now requested who leave the</cell></row><row><cell>school with a knowledge of the life in the market en-</cell></row><row><cell>vironment. How do you intend to provide a sufficient</cell></row><row><cell>number of them?)</cell></row><row><cell>Annotation 1:</cell></row><row><cell>[[When using all our classrooms during the whole</cell></row><row><cell>day], we are able to accept about 2500 new students</cell></row><row><cell>a year.</cell></row><row><cell>(How is your present-day situation?)</cell></row><row><cell>Annotation 2:</cell></row><row><cell>[Při využití všech výukových prostor od rána až do</cell></row><row><cell>večera] jsme schopni ročně přijmout ke studiu</cell></row><row><cell>okolo 2500 studentů.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head></head><label></label><figDesc>&lt;Arg1: Jan Kotík nemaluje jen očima a rukou,&gt;</figDesc><table><row><cell>hands,&gt; CONTRAST / ANTITHESIS &lt;Arg2: but also with</cell></row><row><cell>his brain.&gt;</cell></row><row><cell>&lt;Arg1: Therefore his paintings require not only sen-</cell></row><row><cell>sitivity and receptivity,&gt; CONTRAST / ANTITHESIS &lt;Arg2:</cell></row><row><cell>but also thinking.&gt;</cell></row><row><cell>(Czech RST Discourse Treebank 1.0 [3])</cell></row><row><cell>CONTRAST / ANTITHESIS &lt;Arg2: ale také mozkem.&gt;</cell></row><row><cell>&lt;Arg1: Jeho obrazy tedy vyžadují nejen citlivost</cell></row><row><cell>a vnímavost,&gt; CONTRAST / ANTITHESIS &lt;Arg2: ale také</cell></row><row><cell>přemýšlení.&gt;</cell></row><row><cell>&lt;Arg1: Jan Kotík paints not only with his eyes and</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">This method has its restrictions: it may be questionable how far we interpret the real reasons of inter-annotators' disagreement correctly: what we see as a variation based on a language feature, could have be seen by an annotator just as his clear oversight. We do not have annotators' explanations for their solutions. These questions are being solved by the present-day research by Anna Nedoluzhko; for the time being, we find this method appropriate for the present analysis as a pilot study.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">According to the approach of the Prague Dependency Treebank 2.0, a colon is understood as an explicit discourse connective (<ref type="bibr" target="#b20">[20]</ref>).</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The research reported in this paper was supported by the Czech Science Foundation (project no. 24-11132S, Disagreement in Corpus Annotation and Variation in Human Understanding of Text); a part of the used data comes from the project no. LM2018101 by the Czech Ministry of Education, Youth and Sports (Digital Research Infrastructure for Language Technologies, Arts and Humanities).</p><p>The author would like to express her gratitude to Prof. E. Hajičová for careful proofreading of the manuscript, dr. J. Mírovský for help with the technical processing of the text and F. Zikánová for the language examples. Thank you all for the pleasant cooperation.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The &quot;problem&quot; of human label variation: On ground truth in data, modeling and evaluation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Plank</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.emnlp-main.731</idno>
		<ptr target="https://aclanthology.org/2022.emnlp-main.731.doi:10.18653/v1/2022.emnlp-main.731" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting>the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="10671" to="10682" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Prasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joshi</surname></persName>
		</author>
		<ptr target="https://hdl.handle.net/11272.1/AB2/SUU9CB.doi:11272" />
		<title level="m">Penn Discourse Treebank Version</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><surname>Suu9cb</surname></persName>
		</author>
		<ptr target="/AB2/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Poláková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hajičová</surname></persName>
		</author>
		<title level="m">Czech RST Discourse Treebank</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Analyzing the most common errors in the discourse annotation of the Prague Dependency Treebank</title>
		<author>
			<persName><forename type="first">P</forename><surname>Jínová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Poláková</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Hendrickx</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Kübler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Simov</surname></persName>
		</editor>
		<meeting>the 11th International Workshop on Treebanks and Linguistic Theories<address><addrLine>Lisboa, Lisboa, Portugal</addrLine></address></meeting>
		<imprint>
			<publisher>Edicoes Colibri</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="127" to="132" />
		</imprint>
		<respStmt>
			<orgName>Universidade de Lisboa</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Typical cases of annotators&apos; disagreement in discourse annotations in Prague Dependency Treebank</title>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mladová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jínová</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association</title>
				<meeting>the 7th International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association<address><addrLine>Valletta, Malta</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="2002" to="2006" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Implicitní diskurzní vztahy v češtině [Implicit Discourse Relations in Czech</title>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<pubPlace>Prague, Czech Republic</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Charles University ; Faculty of Mathematics and Physics</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Functions and translations of underspecified discourse markers in TED talks: a parallel corpus study on five languages</title>
		<author>
			<persName><forename type="first">L</forename><surname>Crible</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Á</forename><surname>Abuczki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Burkšaitienė</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Furkó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nedoluzhko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Oleskeviciene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rackevičienė</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Pragmatics</title>
		<imprint>
			<biblScope unit="page" from="139" to="155" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Developing a Rhetorical Structure Theory Treebank for Czech</title>
		<author>
			<persName><forename type="first">L</forename><surname>Poláková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hajičová</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), European Language Resources Association</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Hoste</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sakti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</editor>
		<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), European Language Resources Association<address><addrLine>Torino, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="4802" to="4810" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Connectivebased measuring of the inter-annotator agreement in the annotation of discourse in PDT</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mladová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)</title>
				<editor>
			<persName><forename type="first">C.-R</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<meeting>the 23rd International Conference on Computational Linguistics (Coling 2010)<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<publisher>Tsinghua University Press</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="775" to="781" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Rozšířená textová koreference a asociační anafora (Koncepce anotace českých dat v Pražském závislostním korpusu) [Extended nominal coreference and bridging anaphora (An approach to annotation of Czech data in the Prague Dependency Treebank)</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nedoluzhko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Studies in Computational and Theoretical Linguistics, Ústav formální a aplikované lingvistiky</title>
				<meeting><address><addrLine>Praha, Česká republika</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Identification of Topic and Focus in Czech: Comparative Evaluation on Prague Dependency Treebank</title>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Týnovský</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Studies in Formal Slavic Phonology, Morphology, Syntax, Semantics and Information Structure. Formal Description of Slavic Languages 7</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Zybatow</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Junghanns</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Lenertová</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Biskup</surname></persName>
		</editor>
		<meeting><address><addrLine>Frankfurt am Main, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>Peter Lang</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="343" to="353" />
		</imprint>
		<respStmt>
			<orgName>Universität Leipzig</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Hajič</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Panevová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hajičová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sgall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pajas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Štěpánek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Havelka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mikulová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Žabokrtský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ševčíková-Razímová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Urešová</surname></persName>
		</author>
		<title level="m">Prague Dependency Treebank</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Bejček</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hajičová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hajič</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jínová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kettnerová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kolářová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mikulová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nedoluzhko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Panevová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Poláková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ševčíková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Štěpánek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<title level="m">Prague Dependency Treebank</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">The Penn Discourse TreeBank 2.0</title>
		<author>
			<persName><forename type="first">R</forename><surname>Prasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dinesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Miltsakaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Robaldo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings, 6th International Conference on Language Resources and Evaluation</title>
				<meeting>6th International Conference on Language Resources and Evaluation<address><addrLine>Marrakech, Morocco</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="2961" to="2968" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Anchoring a Lexicalized Tree-Adjoining Grammar for discourse</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">L</forename><surname>Webber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Joshi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/W98-0315" />
	</analytic>
	<monogr>
		<title level="m">Discourse Relations and Discourse Markers</title>
				<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">The Meaning of the Sentence in its Semantic and Pragmatic Aspects</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sgall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hajicová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Panevová</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1986">1986</date>
			<publisher>Springer Science &amp; Business Media</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Synková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mírovský</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Enriched discourse annotation of PDiT subset 1</title>
		<title level="s">PDiT-EDA</title>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">0</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Annotation Guidelines for Rhetorical Structure</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taboada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>Manuscript</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Artstein</surname></persName>
		</author>
		<title level="m">Inter-annotator agreement, Handbook of linguistic annotation</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="297" to="313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Poláková</surname></persName>
		</author>
		<title level="m">Discourse Relations in Czech</title>
				<meeting><address><addrLine>Prague, Czech Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
		<respStmt>
			<orgName>Faculty of Mathematics and Physics, Charles University in Prague</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Identification of Topic and Focus in Czech: Evaluation of Manual Parallel Annotations</title>
		<author>
			<persName><forename type="first">Š</forename><surname>Zikánová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Týnovský</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Havelka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Prague Bulletin of Mathematical Linguistics</title>
		<imprint>
			<biblScope unit="page" from="61" to="70" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
