<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pragmatic Markers in the Corpus “One Day of Speech”: Approaches to the Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kristina Zaides</string-name>
          <email>kristina.zaides@student.spbu.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatiana Popova</string-name>
          <email>tipopova13@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalia Bogdanova-Beglarian</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The article describes the scheme of the annotation of pragmatic markers in the corpus of Russian everyday speech “One Day of Speech”. Pragmatic markers are defined as special units in the speech that have only pragmatic function without any (or with 'bleached') lexical meaning. The annotation of pragmatic markers is usually performed manually due to the existing ambiguity of markers in different contexts. The typology of pragmatic markers includes different groups marked with special annotation tags. The annotation process was split into two stages since several issues of tagging of PMs arose. The main problems, which occurred during the annotation process, and the possible ways of their solution are also discussed in the research. The paper propose the improved methods of problem solving during the annotation of pragmatic markers applied to the corpus of oral speech, which can be useful for the linguistic annotation of any other levels of oral speech.</p>
      </abstract>
      <kwd-group>
        <kwd>Pragmatic Marker</kwd>
        <kwd>Spoken Speech</kwd>
        <kwd>Corpus of Everyday Speech</kwd>
        <kwd>Corpus Linguistics</kwd>
        <kwd>Corpus Annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The annotation of any corpus is the main linguistic tool in the corpus structure used
for receiving correct search results and meta-information about texts and authors
(speakers). Nowadays, the number of corpora of oral speech is growing exponentially
around the world, so that an important and relevant issue in modern linguistics is
being stated—to develop the basic principles of speech annotation, including such its
units, which have never been described in the scientific literature before. Besides the
well-known widespread levels of annotation, such as the marking of prosodic units,
the part-of-speech tagging, the syntactic and semantic parsing, certain linguistic
information should be tagged for some modern research tasks in communication
studies, in particular, the discourse and pragmatic annotations. While the automatic
annotation of a corpus material is implemented by the number of special parsers, the
pragmatic annotation is still carried out manually because the instruments for such
annotation are awaited to be produced in the near future [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Moreover, many kinds
of pragmatic annotation involves such patterns and details of speech that cannot be
fulfilled by the automatic device, e.g., speech acts analysis or pragmatic markers
revealing. This paper presents the results of two stages of pragmatic markers
annotation; therefore, we focus on the definition of the term pragmatic marker and its
characteristics below.
      </p>
      <p>
        A pragmatic marker (PM) is a relatively new term in the linguistics, introduced in
this meaning by N.V. Bogdanova-Beglarian [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which is used towards the particular
speech units: words, expressions and phrases fulfilling different pragmatic functions
in the discourse. The meaning of a term discourse marker (DM) do not coincide with
the content of the term pragmatic marker since they describe different groups of
discourse/pragmatic units, although both of them demonstrate the ability to structure the
discourse but by different means. Discourse markers usually either navigates the
paragraphs of a text or reveal time, causal, conditional and numerous other relations
between the fragments being meaningful content words with a certain lexical meaning.
A brief literature review, based on different researchers’ understanding of DMs, can
identify the specificity of these units more narrowly.
      </p>
      <p>
        B. Fraser defines the DM as “a pragmatic class, lexical expressions drawn from the
syntactic classes of conjunctions, adverbials, and prepositional phrases” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
representatives of this class mainly “signal a relationship between the segment they
introduce, S2, and the prior segment, S1” [Ibid.]. Basically, according to B. Fraser, they
fall into two types: “those that relate aspects of the explicit message conveyed by S2
with aspects of a message, direct or indirect, associated with S1; and those that relate
the topic of S2 to that of S1” [Ibid.]. The researcher characterizes the DM as “a
linguistic expression only which: (i) has a core meaning which can be enriched by the
context; and (ii) signals the relationship that the speaker intends between the utterance
the DM introduces and the foregoing utterance” [Ibid.]. As it is explained, “they
function like a two-place relation, one argument lying in the segment they introduce, the
other lying in the prior discourse” [Ibid.]. Syntactically, DMs do not form a separate
syntactic category. So-called pragmatic markers B. Fraser earlier identified as
“structures and expressions which linguistically encode aspects of the speaker’s direct
communicative intention” [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that “do not contribute to the propositional content of
the sentence but signal different types of messages” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        D. Schiffrin argues that DMs do not fit completely into some linguistic category
since their main function lies in adding to discourse coherence and providing
“contextual coordinates for ongoing talk” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: DMs are “sequentially dependent elements
which bracket units of talk” [Ibid.] which can be sentences, prepositions, speech acts,
tone units, etc.
      </p>
      <p>
        L. Schourup describes as DMs “conversational particles such as well and oh,
parenthetical lexicalized clauses such as y’know and I mean, and a variety of connective
elements in speech and writing, including so, after all, and moreover [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. L. Schourup
pointed out that “DMs are more often regarded as comprising a functional class that
draws on items belonging to various syntactic classes” [Ibid.].
      </p>
      <p>
        E. Traugott notices that DMs “allow speakers to display their evaluation not of the
content of what is said, but of the way it is put together, in other words, they do
metatextual work”. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The author supposes that DMs (in this work, the markers indeed,
in fact, besides are investigated) go the grammaticalization path from the
clauseinternal adverbial through the sentence adverbial to the discourse particle, the subtype of
the class of discourse markers [Ibid.].
      </p>
      <p>
        In case of the annotation, the hesitation disfluencies sometimes are classified as
discourse markers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We suppose that such approach is not very productive since the
hesitations can be detected automatically and usually treated as phonetically filled
hesitation pauses and not as markers.
      </p>
      <p>
        To the contrast, pragmatic markers derive from both content and functional words
(nouns, verbs, adverbs, prepositions, etc.), and, during the process not only of
grammaticalization, but also of pragmaticalization, they lose (in whole or in part) their
lexical and/or grammatical meaning and get pragmatic one in some of their everyday
speech usages. A content or functional word becomes a PM in a process of
pragmaticalization: as a result, the role of its pragmatic component increases and a role of
significant component decreases. The pragmatic function of a PM turns to be the
leading one for a certain word, wherein the grammatical component can be still presented
(for example, Aijmer reports that some units like I think are pragmaticalized, but they
still have tense, aspect, and mood [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). In this understanding, pragmatic markers
such as you know, I think, sort of, actually, and that sort of thing, “have the function
of checking that the participants are on the same wavelength or of creating a space for
planning what to say making revisions, etc.” [Ibid.]. PMs in the discourse approach
“express speaker attitude to what has gone before, what follows, the discourse
situation, and so forth” [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The further development of a pragmatic marker includes the
lexicalization of a new meaning in everyday speech through its usage as the speech
automatism and the assignment the special function to this marker in a certain
communicative context [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The group of various discourse markers is formed by the words and phrases which
are grammatically parts of speech, and the presence of this term, for the most part,
points at the new approach of discourse analysis and constitute the opportunity to
investigate relations of discourse more precisely. The words belonging to the group of
discourse markers are different parts of speech, however, all of them have the ability
to structure the pronounced speech or the written text. The range of pragmatic
markers, as it is supposed here, consists of functionally “new” words – pragmatic markers,
which have as their sources the full meant already existed lexemes, but for now are
related to original words as homonyms. Thus, the class of discourse markers is largely
the way of analyzing the text considering the functions of markers which manage it,
whereas the group of pragmatic markers, it can be said, actually forms a new
independent circle of functional words through their usages as speech automatisms, see
examples below:</p>
      <p>1. ‘vidish/-te’ (V, 2, Sing./Plur.) (you see) is used to attract the listener’s attention
to the subject of speech, but not to point at the item that both the speaker and the
listener see (e.g., it is used during telephone conversation);</p>
      <p>2. ‘sejchas-sejchas-sejchas’ (one moment) or ‘minutochu-minutochku’ (wait a
minute) appear in the speech as hesitation pragmatic markers which forces the listener to
wait a moment until the word, that is looking for by the speaker, is found.</p>
      <p>
        The distinction between pragmatic and discourse markers is formed by the
following points [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]:
      </p>
      <p>a) PMs are used in speech unconsciously, without any reflection, at the level of
speech automatisms; DMs are put in text consciously, in order to structure its parts in
a certain order;</p>
      <p>b) PMs do not have (or have weakened, slightly vanished) lexical and/or
grammatical meaning; they are almost completely “agrammatical”; DMs have full lexical
meaning and grammatical paradigm;</p>
      <p>c) PMs are not content or valuable units of speech, they have only functions; DMs
have their own definite meaning as content words;</p>
      <p>d) PMs are used essentially only in oral spontaneous speech and cannot be found in
written texts (except for oral speech imitations, e.g., in modern plays or movies); DMs
are presented both in written and oral texts equally;</p>
      <p>
        e) PMs usually express speakers’ attitude to the very process of speech production
with all related difficulties being sometimes meta-communicative [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; DMs always
convey only speakers’ evaluation of the subject discussed and its characteristics, but
not of the text that they produce;
      </p>
      <p>f) PMs are not included in the dictionaries in their functional diversity; DMs are
the integral part of traditional lexicography as words, from the one hand, and are the
subject of discourse related studies, from the other hand.</p>
      <p>The typology of pragmatic markers is discussed in details in the section of
presented paper which concerned the annotation of material and the system of tags.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Practical Significance of the Annotation of Pragmatic Markers</title>
      <p>The results obtained by means of analysis of large corpus material allow clarifying
traditional views of communication act using the identifying such discourse units—
different types of pragmatic markers—which are uttered in speech in order to solve
the particular communicative tasks. With the help of PMs, a speaker explicitly
verbalizes his/her communicative intensions, attitude to the addressee, and appeals to the
common with his/her interlocutors’ perceptual basis. Because of the presence of PMs,
the hearer can percept not only truth-conditional, informative level of speech, but also
its structural level, as well as can understand how the communication itself functions:
the beginning and the end of a speech act or an utterance, the search for words and
omissions of lexemes, stressing of the important parts, any disfluencies and call to
continue the interaction are marked.</p>
      <p>The detailed elaboration of the spontaneous speech pragmatic annotation permits to
create the algorithms of automatic checking of the annotation. Approximately each
PM has its homonymic analogue which has a full meaning in sentence and is a part of
speech, so that the distinction based on hesitation pause after the PM, e.g. ‘sejchas’,
cannot be used since the hesitation break can follow the pronoun ‘sejchas’, as well as
the homonymic PM, too. Each decision about the marking of the PM should be made
taking into account the context near PM-“candidate”. However, further annotation
steps, for sure, will show that some kind of automatism can be presented in the
tagging. The ability to implement in the natural language processing system the analysis
of functional and structural sides of language, for its part, will contribute to the
artificial perceptual basis forming. The modeling of realistic speech dialogues “human–
computer/robot/machine” interfaces, that is the most relevant issue in robotics and
artificial intelligence development, will be also possible to improve.</p>
      <p>
        The receiving of a full inventory of pragmatic markers of oral speech is also
important in such applications as linguodidactics and translation practice. In particular,
the introducing of the natural spoken speech materials into textbooks for the foreign
students is essential for training them to understand Russian fluent speech and to
avoid plenty of communicative failures. PMs that are used by the native speakers
easily and naturally, at the level of speech automatisms, do not prevent to perceive the
meaning of a message, and leave beyond the frame of their perceptual field [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
These markers fall into the perceptual field of foreign speakers and can cause great
challenges in communication using a non-native language.
      </p>
      <p>Besides, the typical range of pragmatic markers could be individual for the
particular speaker; consequently, this information may be used for the identification of
diagnostic features of some age, gender, social or psychological group during conducting
linguistic or forensic expertise of oral speech audio recordings.</p>
      <p>As one could see, the annotation of the pragmatic markers is required for different
linguistic, scientific, and practical needs. This study presents one of the possible ways
to organize the process and to develop the methods of the pragmatic annotation that
can be applied to analysis of different corpora data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Research Material</title>
      <p>
        The research was carried out on the material from the corpus of Russian everyday
speech “One Day of Speech” (ORD), which is one of the most representative
resources for the analysis of Russian oral spontaneous dialogic and polylogic speech.
The ORD corpus contains 1,250 hours of speech files recorded from 128 informants,
which are native speakers of Russian, living in St. Petersburg, and more than 1,000 of
their interlocutors, all of them represent various social groups [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. The records
were made using a method of the 24-hours recording of speech day [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and, after
recording, received material were transcribed in the ELAN linguistic annotator. The
ELAN files contain several main levels of annotation: transcribed phrase, speaker
who pronounced the particular phrase, his/her voice characteristics, events in real life
that accompanied the recording, phonetic and phrase commentaries, notes, and
episode to which this communicative situation belongs [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>The pilot subcorpus balanced by gender and age was created for the first
annotation of pragmatic markers. The annotation of 12 episodes of corpus speech taken from
12 recordings of different speakers was performed by the group of four annotators
independently one from another; total duration amounts 1 hour 46 minutes, 10259
word tokens. For the annotation, additional levels in the ELAN files were made:
 PM, which contains the pragmatic marker in its orthographical form;
 Function PM, that indicates the functions of the PM;
 Speaker PM, which marks the speaker’s code;</p>
      <p>Comment PM, that reflect other commentaries connected with the
specific PM usage.</p>
    </sec>
    <sec id="sec-4">
      <title>Development of the System of Tags and Stages of the Annotation</title>
      <p>
        For the annotation, the special system of tags was elaborated that included references
to the groups of pragmatic markers already described in the scientific literature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Briefly, for the marker from each group the function manifested in its name
is main, but there are plenty of markers that have several functions, i.e., share the
common feature of multifunctionality. In the typology of tags below that was
developed matching with the system of pragmatic markers itself, the cases of marker
polifunctionality are specially commented.
      </p>
      <p>1. APPR — marker-approximator that expresses speaker’s uncertainty and hedge:
 ne znayu // *P vidish' / chego-to Kirill% govorit / chto gips luchshe / yesli
(e-e) / tsement bystro vysokhnet / v malen'kikh dyrkakh kak by / yesli
tsement bystro vysokhnet / to (:) on ne budet prochnym [S1];
2. DEICT — deictic marker that points at something vague and consists of 3
elements, two of which are ‘vot’:
 nu v obshchem defekt kishki / kogda (e) na nej takoj otrostochek / kak
byvaet vot (...) (e-e) v venakh / kak appendiks / vot takoj vot kakoj-to tam
[S130];
3. ZAMEST-PR — replacement marker for the whole set of enumeration or its
part:
 Natasha% / vy uzhe otpustili etogo / () Аlekseya%(:) / Maksima% / i vsego
prochego ? *P vot [S19];
 ya govoryu ya togda v devyati tri... tam k devyati pyatnadtsati pridu /
poka to syo... [S124];
4. ZAMEST-CHR — replacement marker for someone’s speech, e.g.,
‘bla-blabla’:
 a / my s toboj zhe byli / pomnish' / Nastya% i Katya%. Аaaa… Kat'ku%
ya videla paru raz v universitete / nu / my s nej poskol'ku ne obshchalis' /
postoyali / «privet-privet» tam / bla-bla-bla [this example is borrowed
from the Russian National Corpus];
5. XEN — quotational marker which marks someone else’s speech before its
appearance in the utterance:
 nikto poka nichego ne mozhet vnyatnogo skazat' / vse tol'ko razvodyat
rukami / (e) i govoryat / nu / sochuvstvuyu tipa mol / *P namekayut
chto(:) prosto da / oforml... oformlyaj novuyu strakhovku i(:) (...) zhivi
spokojno [S110];
6. MET — meta-communicative marker which fulfills meta-communicative
function: the establishment of a contact and understanding between speakers and the
speaker’s reflection on his/her own speech:
 nu i Vadik% priezzhaet / *P i oni yemu govoryat slushaj chuvak my tebe
vsyo otremontirovali / *P tol'ko my tebe koroche (...) (e-e) v bak (...)
vmesto(:) (e) dizelya devyanosto vos'moj zalili [S72];
 nu Аndrej% / togda vy smotrite / znachit ya do devyati budu (...) nu (e)
telefon vyklyuchu / i otvechat' ne budu / to est' ya prosnus' gde-to v devyat' s
kopeechkami / budu uzhe (e) min... vy uzhe v eto vremya budete ekhat'
[S123] (during telephone conversation);
7. NAVIG — navigational marker which serves as structuring device;
 nu i (...) a do etogo proverili / zheludok vsyo khorosho / a tut polosnaya
operatsiya / vot eto ya vsyo ... / vot eto pervaya chast' Kazani u menya
byla normal'naya / a vtoraya chast' (...) vot ya vot na etikh samykh zvonkakh
nepreryvnykh [S130] (the marker ‘vot’ also fulfill the hesitative function
here);
8. SEARCH — searching marker that helps the speaker to find the word or
expression he/she is looking for:
 no pri etom b***d' / *P chuvstvuyesh' takoe na***j opustosheniye ! vnutri
katarsis chuvstvuyesh' // kak eto b***d' () Gracheva% govorila nado //
*V ochishcheniye cherez stradaniye [S15];
9. REFL — reflexive marker which express speaker’s reaction to what is said:
 v itoge my vyzyvali kakogo-to traktorista // *P # khorosho chto nashli vy
traktorista // # ugu // *P ili yeshchyo chego-to takoye / i koroche
vytaskivali Vadika% ottuda // @ ugu [S72 and W1];
10. RHYTHM — rhythm-forming marker that attaches rhythm to the utterance:
 vot sejchas uzhe batarei dali / uzhe on bystro vysokhnet // a tak by vot /
vot kogda dozhdi shli / vot khorosho bylo by zadelat' [S1];
11. SELFCORR — marker of self-correction:
 yarkaya solnechnaya pogoda // govorit' mozhno? tak byl yark… ∫ eto
samoe ∫ byl ∫ iyul'skij den' / vot / nebo bylo chistym / bezoblachnym /
solntse ∫ svetilo (this case is taken from the corpus “Balanced Annotated
Collection of Texts”, another corpus of oral speech, created by the group
of the same linguists as creators of the ORD-corpus);
12. START — marker of the beginning of an utterance or the process of speech
production:

ditya moyo / znachit tak // *P ta(:)k ? // v etom (...) (m-m) v sentyabre /
budet tut vsyo vot tak / *V a v oktyabre / a ... # analogichnaya situatsiya
budet na sleduyushchej nedele // # da // @ a ... / a ... (the marker ‘znachit
tak’ also fulfill the hesitative function here);
13. FIN — marker of the end of an utterance or the process of speech production:
 nu ponyatno delo / nu y**ta / a(:) da tebe voobshche / dazhe zakonnyje
vykhodnyje mogut ne dat' / da ? ya dumayu [S110] (the marker ‘ja
dumaju’ also fulfill the hesitative function here);
 tak / nu vsyo / ya ostanavlivayu zapis' / potomu chto eto pustoye / slushat'
eti kliki / vsyo ravno ya nichego bol'she ne skazhu / vse uzhe spyat [S123];
14. HES — hesitation marker:
 nu tam (...) sil'no deshevle ne bylo / potomu chto ya () zdes' kak by / oni
vsyo ravno ekhali [S103].</p>
      <p>The special guideline for the annotators was elaborated. At the first stage of the
annotation process, the guideline included the tags consisted of several first letters of
particular function (named, as it was showed above), the instructions, such as to write
the marker orthographically, to put the tags in the alphabetic order, noting first the
main function(-s) of PMs and second the additional function(-s), to separate the
repeated markers one from another (do not place them using the hyphen) as well as the
description of the process of new level creation in the ELAN program. The possibility
to point the new function of a marker was also provided to the annotators. Moreover,
before the first try of the annotation, already revealed and described markers were
illustrated with an examples from the corpus with an indication of possible functions
they can perform. Fig. 1 shows a fragment of the table which was made to help the
annotators. The table includes the marker, its structure (one or more words form the
marker), examples of usage in speech in the main and additional functions, the tag,
items per million value counted in previous researches, the tendency to use it in
dialogues or in monologues. In addition, this table contains the link to the document with
so-called “described in dictionaries” usages of homonymic to the pragmatic markers
expressions. We believed that by producing such table we assisted the annotators to
detect the possible pragmatic functions of markers faster and easier.</p>
      <p>
        After the first stage of the annotation, it turned out that the inter-annotator agreement
counted with the help of Kohen’s Kappa coefficient (the formula see in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) was very
low. The best agreement between experts was achieved only for three groups of PMs,
i.e., quotational markers, meta-communicative markers, and reflexives. Therefore, the
decision to improve the guideline for the annotators was made. Fig. 2 presents a
fragment of the table with all possible variants of one marker that can be united by its
main type.
      </p>
      <p>This step allows annotating markers automatically and to narrow down the variants to
one basic construction. Such variety of grammatical forms reflects the process of
pragmaticalization without grammaticalization, as well as the ability of markers to
combine with other pragmatic or “meaningless” (functional) components of speech
(particles, interjections, conjunctions, etc.), and exists for the all the markers
considered in the research: ‘eto’, ‘eto samoje’, ‘kak jego’, ‘ne znaju’, ‘sejchas’, ‘minutu’,
‘sekundu’, ‘tipa’, ‘vrode’, ‘kak by’, ‘takoj’, ‘bla bla’, ‘lia lia’, ‘ili kak eto’, ‘ili kak
jego’, ‘ili chto yeshchyo’ and many others.</p>
      <p>For prepare the next stage of the annotation, it was determined, first, not to reduce
all the variants of one marker to one basic structure, leaving, during the annotation,
the PM in the form in which it was presented in speech, which saved the variety of
markers structure; and second, to shorten the list of PMs’ functions, so that exclude
the most ambiguous cases which revealed total annotators disagreement. Third, the
opportunity to list the main and additional functions in a free order was given to the
annotators, because of mentioned in the introduction of this paper the
multifunctionality of PMs.</p>
      <p>
        At the second stage of the annotation process, the new guideline included fewer
tags as some of them were grouped (e.g., the group of markers of a boundary (G)
unified previous existed start, final and navigational markers), and all the tags were
cut to one letter in order to make the annotation process less time-consuming. The
annotation of the same files was performed by the same group of annotators
independently one from another; they also had been asked to use the new instructions and
the system of tags. The analysis of inter-annotator agreement showed the increased
level of agreement—up to Kappa=0,51, especially for two annotators who are the
authors of presented article [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. It means that the development of the annotation
scheme discussed above, the guideline and the tables of variants improves the results
of annotation. The elaborated procedure of the annotation of PMs is supposed to be
widely used in the investigations involving the similar methods and data.
      </p>
      <p>However, the process of the annotation cannot be lead without any issues. The
human factor and the subjectivity cannot be absolutely removed from the language
analysis, but there are certain problems of the annotation that corpus linguists might deal
with. The ways of solution of this kind of annotation problems are described in the
next section.
5</p>
      <p>
        Main Annotation Problems of Corpus Material and Ways of
Their Solution
During the process of the manually performed annotation of pragmatic markers, the
group of annotators, including the authors of this research, confronted several
problems involving the functions of PMs, the difference between a PM and a homonymic
expressions (see also: [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]), the human factor, the prosodic features of speech, etc.
These problems and the possible methods of their solution will be discussed here one
by one.
5.1
      </p>
      <sec id="sec-4-1">
        <title>The Syntagmatic Division of Spontaneous Speech</title>
        <p>One of the most important issue was the syntactic and intonation division of speech in
syntagmas that cannot be clearly defined in some cases. The addressing of such
ambiguity is relevant for the definition of the PM ‘vot’ functions that performs as a marker
of start or final of a phrase or speech part, according to its pre- or postposition:
 da / poka vot () Marina% ne sde... da / i ne posmotrit i ne otfotografiruyet
// *P vot // *P vsyo // pozhalujsta // vsego dobrogo / do svidaniya [S19];
 ya sejchas pozvonyu Marine% / i vyyasnyu // delo v tom chto / k vam
sobiralas Marina% yekhat' Zhdanova% // ne ne ne ne ne ne // *V Marina%
Glukhareva% // *N vot / *P i (:) (e-e) vot / ya vyyasnyu / poyedet ona
segodnya ili zavtra k vam [S19];
 postoyannye koroche / bunty kakiye-to / sobraniya kakikh-to partij
raznykh / politicheskikh / tam vsyakikh // tam b***d' partiya na partiyu /
koroche / nu vot // *P zastrelili / odnogo na ulitse / sluchayno // *P (e) vot
/ *P vtoroj spilsya / a glavnyj geroj / koroche / u nego umerla eta
devushka [S15];
 moj Seva% byl (...) v techeniye (...) tryokh / chetyryokh dnej v reanimatsii
// vo(:)t / sejchas ya yedu / (...) prosto poyedu / net / nu yego uzhe
vypisyvayut v chetverg / poyedu povezu / on menya poprosil / chto privezti
[S130].</p>
        <p>The pause after the marker means that the topic shift takes its place in the
utterance. This unit can be classified as the PM of start due to its position in the beginning
of a new phrase. However, it is not defined in these examples, whether the marker
attributes to the new topic or discourse fragment itself or the marker closes the
previous speech segment with the meaning of conclusion.</p>
        <p>The annotation of the start, navigational and final markers caused disagreement at
the first stage of the annotation. It is obvious that all these markers share one common
function—the marking of a boundary, with the possible change of topic, the
communication strategy, the conditions or a manner of speech producing, etc. However,
practically, in speech several markers can serve merely in one definite function, e.g.,
‘znachit tak’ for the marking of start or ‘vsyo’ for the marking of the end of speech.
Despite this, the most commonly used markers of this type—‘vot’ and ‘koroche’—
tend to appear in different positions in phrases, not having only one preferable place
of occurrence. Therefore, the new annotation rules were implemented at the second
stage. As a result of the annotation, the receiving of a complete list of markers, as well
as their functions, which all the annotators could agree with, the main goal of the
researchers was achieved. The variety of “boundary”-tags resulted in inter-annotator
disagreement, which showed the disadvantages of tags system. The reduction of tags
by clustering them into groups led to making the functions more identifiable. Thus,
one tag “G” was produced to unite different tags of boundary markers: “START”,
“FIN”, and “NAVIG”. The specifics of each case of boundary PMs will be described
during the qualitative analysis of the material after the annotation of all corpus data.
Moreover, the distinctive features of different types of boundary PMs are planned to
elaborate.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Pragmaticalization as a Continuing Process</title>
        <p>The annotation of pragmatic markers is complicated by the live processes existing in
oral spontaneous speech, i.e. grammaticalization and pragmaticalization. Thus, the
different degrees of pragmaticalization, a closeness of a unit to the PM class, can be
distinguished, e.g.:
 nu ya sproshu // yesli tsementa ne budet / togda ya gips voz'mu // # v
malen'kikh dyrkakh / *P dlya bolshikh dyrok gips ne podkhodit / a () dlya
bolshikh dyrok podkhodit tsement // *P ya dumayu // nu ya ne znayu / *P
chto takoye bolshaya dyrka // *P v takom-to vot sluchaye [W1 and S1];
 nu ponyatno delo / nu y**ta / a(:) da tebe voobshche / dazhe zakonnyje
vykhodnyje mogut ne dat / da ? ya dumayu // *P u menya tam
podnakopilos' etikh samykh / neispol'zovannogo otpuska / da / poetomu ya i
ispol'zuyu [S110];
 *P kak to tak ona korotkovata nemnozhko poluchilas' // vrode yeshchyo
odin shkaf prositsya // *P kholodilnik ne vkhodit a / tak mesto svobodnoye
est' // *P ne znayu [W1];
 ponyatno / ya prosto khochu vam skazat' / ya ne ... / vernej sprosit' /
snachala dlya nachala / potom uzhe skazat' / *V po povodu etoj
programmy (:) / vot ona (...) nastol'ko zamedlyayet rabotu komp'yutera / *P
chto vot (e-e) / nu mne prikhodyat gigantskiye fajly / ya ne znayu chto tam
/ eto samoye / no ... [S19].</p>
        <p>It seems that the first two examples shows already pragmaticalized usages of VP
‘ja dumaju’ that only marks the end of a sentence and do not contribute anything to
the content. These PMs also reflect the speaker’s hesitations and serve as means of a
hedge, as well as the unit ‘ne znaju’ in the third case. It should be noted that there is a
possible interpretation of these markers as not fully pragmaticalized, but only taken a
pragmaticalization path ones, that are mostly potential, than real, PMs.</p>
        <p>The last phrase is truncated, but by the presence of the hesitation (‘eto samoje’) we
can conclude that the speaker does not know what to say next and how to describe the
problems with the computer in more detail. It leads us to the assumption that ‘ja ne
znaju’ in this case is the hesitation PM used in preparing, after all, unsuccessful tries
to continue the speech production. However, this construction can be also examined
as a meaningful sentence, just left by the speaker and not extended further. Since that,
the annotation of such case is ambiguous, from our perspective. The variability of
analysis is not only possible, but also necessary for dealing with PMs. Perhaps, the
annotation of a wider data allows solving the issue of annotating of such phenomena;
the experts have to create the acceptable limits up to which the meaning of a lexeme
is identifiable and the unit is still not a marker, otherwise, it should be considered a
pragmatic unit having only function in oral discourse.
5.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Main and Additional Functions of PMs</title>
        <p>
          The dynamic aspect of producing speech causes certain difficulties in function
attributions: the problem of determination of the main and the additional functions of PMs
and their difference is also complicated by permanent changing the PM place in
phrases. For instance, in phrases:
 nu tam v osnovnom sovetskuyu chital / znayesh literaturu // nashu tam /
a(:) ! vperyod k kommunizmu ! [S15];
 nu ya pytayus // no tam zhe kak prosto kak by () konkurentsiya // *P // to
est' kak by dazhe yesli ya podnimayu ruku / to yeshchyo ne ... // *V nu ya
v printsipe pochti na kazhdom podnimayu / no menya prosto ne vsegda
sprashivayut [S27]
is not possible to identify precisely whether the approximation or hesitation is the
main function of PMs ‘tam’ and ‘kak by’. The role of this PM in the discourse lies in
the fact that they help the speaker to have a little pause in speech structuring and give
him/her an opportunity to express the idea approximately, without further description.
To determine which function is predominant seems quite impossible here (see also:
[
          <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
          ]).
        </p>
        <p>At the second stage of the annotation, we rejected the difference between the main
and the optional functions since the inter-annotator agreement in their annotation was
very low. Henceforth, beyond the annotation of all the functional sets of a particular
marker, it will be possible to determine the criteria of function domination and
increasing prominence.</p>
        <p>
          The tagging of a rhythm-adding function was also uncoordinated and inconsistent.
The findings of the investigation [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] shows that there are rhythm-forming markers
which organize spontaneous speech into isochronous structures:
 vot sejchas uzhe batarei dali / uzhe on bystro vysokhnet // a tak by vot /
vot kogda dozhdi shli / vot khorosho by bylo zadelat' [S1];
 nu i (...) a do etogo proverili / zheludok vsyo khorosho / a tut polosnaya
operatsiya / vot eto ya vsyo ... / vot eto pervaya chast' Kazani u menya
byla normalnaya / a vtoraya chast' (...) vot ya vot na etikh samykh zvonkakh
nepreryvnykh [S130].
        </p>
        <p>We suppose that in the cases (in bold) the rhythm-forming function is realized. The
first PM ‘vot’ in the first example functions as the boarder-marker, the second
operates in the field of hesitation only, the third presumably is a particle for new
information actualization, and the last forms the rhythm and the rate of the utterance,
which are supported by the repetition of ‘vot’. The second case also shows a frequent
usage of ‘vot’, one of which can be regarded as the rhythm-forming PM in the last
position. However, it is possible that all these markers are the individual way of
hesitating of the particular speaker.
5.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Chains of Markers or One Marker?</title>
        <p>
          The cases of neighborhood of pragmatic markers are quite frequent in the
spontaneous dialogues and monologues. It raises the question of what should be considered as
a chain of markers and what—as a new complex PM with another function. D.
Verdonik, M. Rojc, and M. Stabej [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] analyze discourse markers in the corpus of
Slovenian telephone conversations TURDIS and try to deal with cases of markers
collocation, describing the most widespread chain of markers at the beginning of an
utterance. We suppose that the PM which forms one intonation unit and fulfills one
function is one integral marker, otherwise it is the chain of different markers following
one another with a hesitation graduation. However, in case of hesitation PMs it is
difficult to decide whether the function is intensifying or actually is equally shared by
the sequence of markers:
 pod triumfalnuyu_arku$ tam koroche // vot tipa (...) Kebern% ch... nu(:)
rasskazyval // *P ya nachal chitat' / ya tak_skazat'(?) sovsem drugoye prochital
/ chem chto on mne rasskazyval [S15] (hesitation and approximation
marker(-s));
 vchera my s na... s Nadey% vykhodim s raboty // *P ona menya prosit / u vas
est' tam telefon (e-e) Glukharevoy% ? ya govoryu da // *P nu i znachit tam
(...) nakhozhu / diktuyu yej [S19] (boundary, hesitation and approximation
marker(-s));
 tam to delay / tam kak by tam zadaniye // chego-to kak-to ustayu bezumno
na samom dele // *P prosto voobshche kak by / v printsipe i *P ne to chtoby
ya pryamo tut tak umatyvayus // da ? no vot real'no ochen' ustayu [S27]
(hesitation and approximation marker(-s));
 nikto poka nichego ne mozhet vnyatnogo skazat' / vse tol'ko razvodyat
rukami / (e) i govoryat / nu / sochuvstvuyu tipa mol / *P namekayut chto(:) prosto
da / oforml... [S110] (approximator or quotational marker and quotational
marker ‘mol’, probably not the PM since it is used in written texts);
 nu smotrite / *P v poldesyatogo / tak znachit smotrite Andrey% / ya tut
pogovoril / (...) yeshchyo s lyud'mi / mne rasskazali sleduyushcheye / chto vot
eto staraya tak nazyvayemaya [S123] (hesitation and boundary marker(-s)).
        </p>
        <p>The examples above show one of the most interesting tendency of spontaneous
speech, which opposes the principle of language (and speech) economy—the
language redundancy. The repeated markers also present a challenge for the annotators
given that they may be interpreted as one marker since they have the same function or
as two or more repeated markers as words:
 u vas segodnya prikhod budet // *P tak / minutochku minutochku / Gul'% //
*P tak / ya sejchas pozvonyu Marine% / i vyyasnyu // delo v tom chto / k vam
sobiralas' Marina% [S19];
 *P tak tak / tak tak tak / *P kto(?) *P privetik [S117].</p>
        <p>However, the existence of non-one-word markers cannot allow using the
constituent criteria—a word equals a PM—during the annotation. To solve the issue “one or
more markers” we plan to investigate the frequency of such series of PMs in the
speech corpus, which can clarify their language status. At this stage of the annotation,
only minimal structures are annotated, thereafter the cases of markers combination
will be examined more precisely.</p>
        <p>The inversion in Russian is one more problem for the automatic annotation of
PMs:

(e-e) eto dejstvitel'no tak... poka ne ponyal / tak kak eto mne rasskazyval
chelovek / kotoryj nichego ne ponimayet // nu vot v samom etom *N / prosto
skazal / kak eto est' // poetomu elektriki mestnyje / vot troye / s kem ya
pytalsya cherez tret'ye litso svyazatsya / vse otkazalis' / potomu chto oni
skazali tak / *V yesli sdelat' vsyo eto vser'yoz / to eto dorogo [S123].</p>
        <p>This issue is solved by the containing the list of the possible PMs variations, even
performed automatically by combinatorial algorithms.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The annotation of pragmatic markers is still a great challenge for the researchers since
this is mainly manual process, difficult to automation, which creates the theoretical
and practical issues concerning the understanding and the typology of PMs, the
definition of their functions, and the investigation of oral unstructured human discourse.
In the article, the process of the first annotation of pragmatic markers of Russian
spoken speech was fully described, including two stages of the annotation, advantages
and disadvantages of proposed approach to the pragmatic level analysis. The
annotation concerned the pilot subcorpus, but the annotated material will be expanded. The
presented problems of the annotation allowed us to elaborate the guideline for the
annotators and the list of tags in such way that the inter-annotator agreement became
higher. We state that the inclusive automatic tagging of PMs in oral speech cannot be
performed for now, however, the automatic check of the annotation, after obtaining
the full list of PMs’ variations, to avoid the human factor of missing markers is
necessary. The fuzziness and ambiguity of spontaneous speech are significant issues in the
NLP-tasks, and the future research might develop to overcome the multifunctionality
of some PMs during the annotation process.</p>
      <p>Acknowledgement. This research was supported by the Russian Science
Foundation, project № 18-18-00242 “Pragmatic Markers in Russian Everyday Speech”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Leech</surname>
          </string-name>
          , G.:
          <article-title>Adding linguistic annotation</article-title>
          . Wynne, M. (ed.).
          <article-title>Developing Linguistic Corpora: a Guide to Good Practice</article-title>
          .
          <source>Oxbow Books</source>
          , Oxford (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Archer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Corpus annotation: A welcome addition or an interpretation too far? Tyrkkö</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kipiö</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Nevalainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Rissanen</surname>
          </string-name>
          , M. (eds.).
          <article-title>Outposts of historical corpus linguistics: from the Helsinki corpus to a proliferation of resources. Studies in variation, contacts and change in English eSeries (</article-title>
          <year>2012</year>
          ). URL: http://www.helsinki.fi/varieng/series/volumes/10/archer/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          :
          <article-title>Pragmatemy v ustnoj povsednevnoj rechi: opredelenie ponyatia i obshchaja tipologia [Pragmatems in spoken everyday speech: definition and general typology]</article-title>
          .
          <source>Vestnik Permskogo universiteta. Rossijskaja i zarubezhnaja filologia [</source>
          Perm University Herald. Russian and Foreign Philology],
          <volume>3</volume>
          (
          <issue>27</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2014</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>What are discourse markers</article-title>
          ?
          <source>Journal of Pragmatics</source>
          ,
          <volume>31</volume>
          (
          <issue>7</issue>
          ),
          <fpage>931</fpage>
          -
          <lpage>952</lpage>
          (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Commentary pragmatic markers in English</article-title>
          . Estudios ingleses de la Universidad Complutense,
          <volume>5</volume>
          ,
          <fpage>115</fpage>
          -
          <lpage>127</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Schiffrin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Discourse markers</article-title>
          . Cambridge University Press, Cambridge (
          <year>1987</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Schourup</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Discourse markers</article-title>
          .
          <source>Lingua</source>
          ,
          <volume>107</volume>
          (
          <issue>3-4</issue>
          ),
          <fpage>227</fpage>
          -
          <lpage>265</lpage>
          (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Traugott</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The role of the development of discourse markers in a theory of grammaticalization</article-title>
          .
          <source>Paper presented at the 12th International Conference on Historical Linguistics</source>
          , University of Manchester,
          <year>August 1995</year>
          . URL: https://www.researchgate.net/publication/228691469_The_
          <article-title>role_of_discourse_markers_in_ a_theory_of_grammaticalization.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Verdonik</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rojc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stabej</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <fpage>147</fpage>
          -
          <lpage>180</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Aijmer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Pragmatic markers in spoken interlanguage</article-title>
          .
          <source>Nordic Journal of English Studies</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>173</fpage>
          -
          <lpage>190</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Filyasova</surname>
          </string-name>
          , Yu. A.:
          <article-title>Discourse vs. pragmatic markers: a contrastive terminological study</article-title>
          .
          <source>In: 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM</source>
          <year>2018</year>
          , SGEM2018 Vienna ART Conference Proceedings,
          <fpage>19</fpage>
          -
          <issue>21</issue>
          <year>March</year>
          ,
          <year>2018</year>
          , vol.
          <volume>5</volume>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N. V.:</given-names>
          </string-name>
          <article-title>O vozmozhnykh kommunikativnykh pomekhakh v mezhkul'turnoj ustnoj kommunikacii [On the possible communicative barriers in intercultural oral communication]</article-title>
          .
          <source>Mir russkogo slova [The World of Russian Word]</source>
          ,
          <volume>3</volume>
          (
          <year>2018</year>
          )
          <article-title>(in print). (in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zaides</surname>
          </string-name>
          , K. D.:
          <article-title>Metakommunikativnyje vstavki v russkoj ustnoj spontannoj rechi na rodnom i nerodnom jazyke [Meta-communicative insertions in Russian oral spontaneous speech of native speakers and foreigners]</article-title>
          .
          <source>Kommunikativnyje issledovanija [Communication Studies]</source>
          ,
          <volume>3</volume>
          (
          <issue>9</issue>
          ),
          <fpage>19</fpage>
          -
          <lpage>35</lpage>
          (
          <year>2016</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Riehakainen</surname>
            ,
            <given-names>Ye. I.</given-names>
          </string-name>
          :
          <article-title>Vzaimodejstvie kontekstnoj predskazuemosti i chastotnosti v processe vospriyatia spontannoj rechi [The Interaction between Context Predictability and Frequency in the Process of Perception of Spontaneous Speech (on the Material of the Russian Language)], doctorate thesis</article-title>
          , St. Petersburg. (
          <year>2010</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Bogdanova-Beglaryan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asinovskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blinova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markasova</surname>
          </string-name>
          , Ye.,
          <string-name>
            <surname>Ryko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Zvukovoj korpus russkogo yazyka: novaja metodologia analiza ustnoj rechi [Sound Corpus of the Russian Language: a new methodology for analyzing the oral speech]</article-title>
          . In: Shumska,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Osga</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.).
          <article-title>Jazyk i metod: Russkij jazyk v lingvisticheskikh issledovaniakh XXI veka [Language and Method: The Russian Language in the Linguistic Studies of the 21st Century]</article-title>
          , vol.
          <volume>2</volume>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>372</lpage>
          ,
          <string-name>
            <surname>Kraków</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blinova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martynenko</surname>
          </string-name>
          , G.:
          <article-title>Linguistic features and sociolinguistic variability in everyday spoken Russian</article-title>
          .
          <source>In: SPECOM</source>
          <year>2017</year>
          ,
          <article-title>LNAI</article-title>
          , vol.
          <volume>10458</volume>
          , pp.
          <fpage>503</fpage>
          -
          <lpage>511</lpage>
          . Springer, Cham (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <article-title>Russkij jazyk povsednevnogo obshhenija: osobennosti funkcionirovanija v raznyh social'nyh gruppah [Everyday Russian Language in Different Social Groups]. Collective monograph</article-title>
          . Bogdanova-Beglaryan, N. V. (ed.). LAJKA,
          <string-name>
            <surname>SPb</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Asinovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bogdanova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rusakova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stepanova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The ORD Speech Corpus of Russian Everyday Communication «One Speaker's Day»: creation principles and annotation</article-title>
          . In: Matoušek,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Mautner</surname>
          </string-name>
          , P. (eds.)
          <source>TSD</source>
          <year>2009</year>
          ,
          <article-title>LNAI</article-title>
          , vol.
          <volume>57292009</volume>
          , pp.
          <fpage>250</fpage>
          -
          <lpage>257</lpage>
          . Springer, Berlin-Heidelberg (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blinova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martynenko</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baeva</surname>
          </string-name>
          , E.:
          <article-title>Towards a description of pragmatic markers in Russian everyday speech</article-title>
          .
          <source>In: LNAI</source>
          , vol.
          <volume>11096</volume>
          : Speech and Computer. 20th International Conference,
          <string-name>
            <surname>SPECOM</surname>
          </string-name>
          <year>2018</year>
          , Leipzig, Germany,
          <source>September 18-22</source>
          ,
          <year>2018</year>
          , Proceedings, pp.
          <fpage>42</fpage>
          -
          <lpage>48</lpage>
          . Springer, Leipzig (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blinova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martynenko</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaides</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Pragmatic markers in Russian spoken speech: an experience of systematization and annotation for the improvement of NLP tasks</article-title>
          . In: Balandin,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Salmon Cinotti,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Viola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Tyutina</surname>
          </string-name>
          , T. (eds.).
          <source>Proceedings of the 23rd Conference of Open Innovations Association FRUCT</source>
          . Bologna, Italy,
          <fpage>13</fpage>
          -16
          <source>November</source>
          <year>2018</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>77</lpage>
          . FRUCT Oy,
          <string-name>
            <surname>Finland</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Crible</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca</surname>
          </string-name>
          , M.-J.:
          <article-title>Discourse markers in speech: characteristics and challenges for corpus annotation</article-title>
          .
          <source>Dialogue and Discourse</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>149</fpage>
          -
          <lpage>166</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Crible</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zufferey</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Using a unified taxonomy to annotate discourse markers in speech and writing</article-title>
          .
          <source>In: Proceedings of the 11th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation</source>
          , London, UK, pp.
          <fpage>14</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Bogdanova-Beglarian</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kisloshchuk</surname>
            ,
            <given-names>A. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Ju</surname>
          </string-name>
          .:
          <article-title>O ritmoobrazujushchej funkcii diskursivnykh jedinic [On rhythm-forming function of discourse markers]</article-title>
          .
          <source>Vestnik Permskogo universiteta. Rossijskaja i zarubezhnaja filologija [</source>
          Perm University Herald. Russian and Foreign Philology],
          <volume>2</volume>
          (
          <issue>22</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2013</year>
          ).
          <article-title>(in Russ</article-title>
          .).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>