<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Observations on the Annotation of Discourse Relational Devices in TED Talk Transcripts in Lithuanian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giedre˙ Valu˙ naite˙ Oles˘kevic˘iene˙</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deniz Zeyrek</string-name>
          <email>dezeyrek@metu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktorija Maz˘eikiene˙</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Murathan Kurfalı</string-name>
          <email>murathan.kurfali@ling.su.se</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Informatics Institute, Middle East Technical University</institution>
          ,
          <addr-line>Ankara</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Humanities, Mykolas Romeris University</institution>
          ,
          <addr-line>Vilnius</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Stockholm University, Stockholm and Middle East Technical University</institution>
          ,
          <addr-line>Ankara</addr-line>
        </aff>
      </contrib-group>
      <fpage>53</fpage>
      <lpage>58</lpage>
      <abstract>
        <p>Lithuanian researchers are working on enriching the existing corpora; they are also looking for ways to make the corpora inter-operable and co-searchable through the annotation of discourse relations. One of the goals of the present research is working on the annotation of discourse relations in TED talks transcripts translated into Lithuanian and expanding the set of available resources in the Lithuanian language. A second goal is to compare cross-linguistically the annotated texts with the view of looking for translation tendencies in rendering discourse relations in the Lithuanian language. This, we believe, will open up a new research path in digital humanities leading to an understanding of translation tendencies in TED talks transcripts across languages. According to our research results, noteworthy translation tendencies embrace explicitation - a tendency to use more explicitly marked discourse relations in Lithuanian than the original transcripts, verbatim translations of discourse connectives, and also a tendency to use fewer alternative lexicalizations (a type of discourse-relational devices).</p>
      </abstract>
      <kwd-group>
        <kwd>discourse</kwd>
        <kwd>parallel</kwd>
        <kwd>multilingual corpus</kwd>
        <kwd>Lithuanian</kwd>
        <kwd>annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Lithuanian researchers are working on enriching the
existing corpora and are also looking for ways to make
the corpora inter-operable and co-searchable through the
annotation of discourse relations. One of the aims of
the current research is extending the available resources
and lexicons of discourse-relational devices in
Lithuanian cooperating with the international team of researchers
brought together by the European COST Project TextLink
(http://www.textlink.ii.metu.edu.tr/). The aim is partially
achieved by adding Lithuanian annotated texts to the
existing TED Multilingual Discourse Bank, or TED-MDB,
a parallel corpus annotated at the discourse level
following the goals and principles of Penn Discourse Treebank
        <xref ref-type="bibr" rid="ref16">(Zeyrek et al., 2018)</xref>
        . The second aim is to compare
discourse-annotated texts with English annotations with a
view to understanding translation tendencies. Our ultimate
goal is to perform cross-linguistic analysis and transform
this information into the domain of digital humanities. In
the rest of this paper, we describe the addition of
Lithuanian annotations to TED-MDB and discuss our first results
to the extent that discourse relations are concerned. This,
we believe, will serve as the basis for our ultimate aim.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Research background</title>
      <p>The section provides some general insights on Lithuanian,
describes discourse connectives (DCs), and briefly outlines
the PDTB annotation scheme. It also describes the data and
presents some observations about the data.
2.1.</p>
      <sec id="sec-2-1">
        <title>Lithuanian</title>
        <p>Lithuanian is a very old Indo-European language. It is
a Baltic language which has conservative morphology,
e.g. it has preserved morphological aspects of the
protolanguage, such as the word declensions. It is spoken by
about 2,900,000 native Lithuanian speakers in Lithuania
and about 200,000 abroad.</p>
        <p>
          There are two main resources for modern Lithuanian: (a)
The 9-million-word Corpus Academicum Lithuanicum –
CorALit (http://coralit.lt) compiled by Vilnius University.
It contains academic texts from the fields of biomedical
sciences, humanities, physical sciences, social sciences,
and technological sciences. (b) The 102-million-word
online corpus of the Contemporary Lithuanian Language
(http://tekstynas.vdu.lt), which is of general character and
includes publicist texts, fiction, non-fiction, administrative
literature and spoken language. However, parallel
corpora involving Lithuanian are still insufficient; currently
only one parallel two-directional (English - Lithuanian and
Lithuanian - English) corpus exits comprising English
Lithuanian (70,813 parallel sentences) and Lithuanian
English (1,614 parallel sentences) (http://tekstynas.vdu.lt).
Furthermore, the corpus is not discourse-annotated. Such
scarcity of corpora resources is an obvious barrier for
machine translation
          <xref ref-type="bibr" rid="ref13">(Šveikauskiene˙ and Telksnys, 2014)</xref>
          .
Thus, for example, the English phrase calling him a liar
is translated into Lithuanian as skambinti jam melagis (to
phone him a liar) in the google translate application. The
improvement of such issues clearly requires corpora
development, annotation and research.
2.2.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Discourse Connectives and an Outline of the</title>
      </sec>
      <sec id="sec-2-3">
        <title>Annotation Scheme</title>
        <p>
          Discourse connectives signal the way the writer or speaker
would like the reader or listener relate the ideas that are
about to be said to the ideas that have been said before.
According to
          <xref ref-type="bibr" rid="ref3">Baker (2011)</xref>
          , DCs could be used to signal
different relations and the relations could be expressed in many
ways; for example, in English, causality might be expressed
through verbs such as cause, lead to or through DCs
signaling the causality relation. Languages vary in terms of
the type of connectives preferred as well as their frequency.
Since the DCs signal the relations between pieces of
information, they are related to the structuring of information
and provide an insight into the whole logic of discourse
          <xref ref-type="bibr" rid="ref12">(Smith and Frawley, 1983)</xref>
          .
        </p>
        <p>
          The literature suggests that some languages tend to express
discourse relations (DRs) through complex structures while
others prefer to use simpler structures and mark discourse
relations explicitly, as for example, the difference between
English and Arabic illustrates
          <xref ref-type="bibr" rid="ref7">(Holes, 1984)</xref>
          . The author
finds that while English prefers to present information in
smaller pieces of information and signals the relations
between them, Arabic prefers to group information into large
discourse chunks. So the question arises how the
translators deal with DRs when faced with the multitude of
explicit DCs in the source text or conversely, how they render
DRs when there is a limited number of connectives in the
source text. Given that connectives deal with the logic of
the text and they are related to text interpretation, the
process of aligning the patterns of DCs with target language
specifics and the text type of the target language is a
complicated process. Translators could have two choices: for
the sake of a smooth and clear translation, they could insert
additional DCs even when they are not used in the
original text, i.e. resort to explicitation, or they could choose
to translate the explicit DCs of the original text verbatim,
though the resulting translation might sound foreign in the
target language. In practice, translators choose something
in between or use a bit of both techniques
          <xref ref-type="bibr" rid="ref3">(Baker, 2011)</xref>
          .
The PDTB is a 2-million-word corpus manually
annotated for discourse-level information
          <xref ref-type="bibr" rid="ref11">(Prasad et al., 2014)</xref>
          .
The annotation scheme mainly includes explicit and
implicit DCs, alternative lexicalizations, entity relations, no
relations, and their binary arguments, called Arg1 and
Arg2. Senses are assigned to all DRs except entity
relations and no relations. PDTB’s annotation approach is
theory-neutral and lexically grounded. The theory-neutral
approach means that the annotation is not based on a
specific discourse theory. Lexically grounded perception
implies that annotator judgments are effectively elicited both
for explicit DRs and implicit DRs; i.e. even for cases where
there are no explicit markers of the relation.
2.3.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>The Data</title>
        <p>
          Our data comprise Lithuanian TED talks transcripts of the
original English texts included in TED-MDB (Table 1).
TED-MDB is created on the basis of PDTB 3.0 relation
hierarchy
          <xref ref-type="bibr" rid="ref14">(Webber et al., 2016)</xref>
          . The PDTB is chosen mainly
because it has been used reliably to annotate discourse in
other languages, e.g. Turkish
          <xref ref-type="bibr" rid="ref15">(Zeyrek et al., 2013)</xref>
          ,
Arabic
          <xref ref-type="bibr" rid="ref1">(Al-Saif and Markert, 2010)</xref>
          , Chinese
          <xref ref-type="bibr" rid="ref17">(Zhou and Xue,
2012)</xref>
          , and Hindi
          <xref ref-type="bibr" rid="ref10">(Oza et al., 2009)</xref>
          . The corpus already
includes transcripts of 6 languages: Turkish, English,
Polish, German, Russian and Portuguese. As in the TED-MDB
project, Lithuanian transcripts are retrieved from the WIT3
website
          <xref ref-type="bibr" rid="ref4">Cettolo et al. (2012)</xref>
          and annotated for DRs. The
annotations are saved into annotation files corresponding to
the raw texts. They are simple text files where each token
is stored as a series of fields, such as sense, type, argument
spans, delimited by the pipe symbol (|), as explained in
          <xref ref-type="bibr" rid="ref8">Lee
et al. (2016)</xref>
          .
        </p>
        <p>
          Both the TED website and the WIT3 website are open
resources, which is attractive to research as they present
numerous advantages, e.g. subtitles are available in a
substantial number of languages, and the topics cover a wide span
of knowledge fields, making the data applicable in
multiple domains
          <xref ref-type="bibr" rid="ref4">(Cettolo et al., 2012)</xref>
          . However, there are
also certain disadvantages of the data. Firstly, the talks are
translated by (named) volunteers. This does not necessarily
ensure a high-quality translation. The data is also limited
concerning the use of parallel transcripts for DC research
and for translation. For example, the collection of TED
Talks is unidirectional, thus they cannot be used for
exemplifying the differences for different translation directions.
There are also other issues to deal with, such as subtitling,
which is a specific type of translation
          <xref ref-type="bibr" rid="ref9">(Lefer and Grabar,
2015)</xref>
          , and the genre of TED talks, which is a mix of
spoken and written language. Finally, the variety of TED talks
speakers (native and non-native speakers or speakers of
various regional varieties of English) might be another issue to
consider. Despite such issues, given the scarcity of
parallel texts involving Lithuanian and the limited research on
Lithuanian DCs, we chose to annotate the TED talks
transcripts for DRs and examine the translation issues involved.
3.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Annotation Procedures in Lithuanian</title>
      <p>
        In Lithuanian, explicit DCs include expressions from four
grammatical classes: subordinating conjunctions – e.g. kai,
kol, nes, kadangi (when, while, because, since),
coordinating conjunctions – ir, bei, o, tacˇiau (and, but, or, however),
sentential relatives – tam kad, tuo metu kai (so that, at the
time when), and discourse adverbials – faktiškai,
galiausiai (actually, eventually). The main task is to identify if
the words and phrases function as explicit DCs as they can
have other functions. As in the PDTB, five types of
relations are identified and annotated: Explicit relations,
implicit relations, alternative lexicalizations, entity relations,
and no relations. The argument annotation of explicit DCs
and alternative lexicalizations follows the rule that the
argument which appears as syntactically bound to the DC is
marked as Arg2; the other argument is annotated as Arg1.
As in TED-MDB, adverbials called “discourse markers”
        <xref ref-type="bibr" rid="ref6">(Hirschberg and Litman, 1987)</xref>
        are not annotated as they
signal the organizational structure of the discourse rather
than relating two arguments semantically. For example,
Lithuanian dabar and its English equivalent (now) in the
examples below serve to signal discourse organizational
structure, so such cases were not annotated.
      </p>
      <p>1. Dabar kaip matote ˛itampa apie kuria˛ girde˙jome
San Fransiske apie susiru¯pinima˛ de˙l bu¯sto kainu˛ ir
gyventoju˛ išstu¯mimo ir technologiju˛ kompaniju˛,
kurios atneša daug turto ir ˛isikuria, yra tikra.
2. Now you can see, though, that the tensions that we’ve
heard about in San Francisco in terms of people
being concerned about gentrification and all the new tech</p>
      <p>Title/Speaker
The investment of logic for sustainability (Chris McKnett)</p>
      <p>Embrace the near win (Sarah Lewis)</p>
      <p>A glimpse of life on the road (Kitra Cahana)
Social maps that reveal a city’s intersections and separations (Dave Troy)
Word count Eng./Lith.</p>
      <p>1,614 (1,345)
1,772 (1,362)
694 (512)
1,053 (678)
5,133 (3,897)
companies that are bringing new wealth and
settlement into the city are real.</p>
      <p>No relation (NoRel) is annotated when there is no DR
inferred by the reader between the adjacent sentences:
According to PDTB annotation guidelines, in annotating
implicit DRs, the annotator has to insert a DC that best
expresses the inferred relation between two adjacent
sentences. This procedure is adopted, as in Lithuanian
example 3 and its English equivalent in 4. In all the examples,
Arg1 is shown in italics, Arg2 is shown in boldface.</p>
      <p>3. Ji tokie sude˙tingi ir gali atrodyti mums tolimi, kad
galime bu¯ti linke˛ daryti štai ka˛: sle˙pti galva˛ sme˙lyje
ir negalvoti apie tai. [Implicit=Bet] Jei tik galite,
priešinkite˙s tam. (Implicit) (Comparison: Contrast)
4. ...bury our heads in the sand and not think about
it. [Implicit=But] Resist this, if you can. (Implicit)
(Comparison: Contrast)
Alternative lexalization (AltLex) includes cases of
inferred DRs between adjacent clauses, where redundancy
appears if an explicit DC is inserted. The reason for this is
that the relation is already expressed by some alternatively
lexicalized non-connective expression, e.g.</p>
      <p>5. Se˙kme˙ mus motyvuoja, bet beveik pasiekta pergale˙
skatina mus leistis ˛i nuolatinius ieškojimus. [Viena˛ iš
ryškiausiu˛ to pavyzdžiu˛ pastebime], kai žvelgiame ˛i
skirtuma˛ tarp olimpinio sidabro laime˙toju˛ ir
bronzos laime˙toju˛ rungtyne˙ms pasibaigus. (AltLex)
(Expansion: Instatiation)
6. Success motivates us, but a near win can propel us in
an ongoing quest. [One of the most vivid examples of
this comes] when we look at the difference between</p>
      <sec id="sec-3-1">
        <title>Olympic silver medalists and bronze medalists af</title>
        <p>ter a competition. (AltLex) (Expansion:
Instantiation)
Entity relations (EntRel) are annotated between adjacent
sentences when an entity in one argument is described
further in the other argument, as in 7 and its English version in
8.</p>
        <p>7. Jie ture˙tu˛ ˛ivertinti ir tuos efektyvumo rodiklius,
kuriuos vadiname ASV: aplinkosauga, socialiniai
klausimai ir valdymas. Aplikosauga apima energijos
vartojima˛, prieiga˛ prie vandens, atlieku˛ tvarkyma˛ ir
tarša˛ ir ekonomiška˛ ištekliu˛ naudojima˛. (EntRel)
8. Investors should also look at performance metrics in
what we call ESG: environment, social and
governance. Environment includes energy consumption,
water availability, waste and pollution, just making
efficient uses of resource. (EntRel)
9. Tai 4 milijardai viduriniosios klase˙s žmoniu˛, kuriems
reikia maisto, energijosir vandens. Dabar ju¯ s
tub u¯t klausiate save˛s: gal tai tik pavieniai atvejai.
(NoRel)
10. That’s four billion middle class people demanding
food, energy and water. Now, you may be asking
yourself, are these just isolated cases. (NoRel)
TED-MDB adds a new top-level category to the PDTB 3.0
relation hierarchy, called hypophora. This category aims to
capture rhetorical question-response pairs, where the
question is asked and answered by the speaker. TED-MDB
annotates hypophora as a case of AltLex anchored by the
question word. Where possible, the additional sense of the
Q/R pair may be added.</p>
        <p>As in TED-MDB, in Lithuanian, we annotate the question
as Arg2, the answer as Arg1. We consider the question
as Arg2 because the AltLex is part of the question. The
question word (either the wh-word or ar, a specific question
particle used in Yes/No questions, which can also serve as
an explicit DC in Lithuanian) is selected as AltLex since it
marks the DR holding between the question and the answer,
as in example 11 and its equivalent in 12:
11. Niekas nepasikeis, [ar] mes bandysime pakeisti, [ar]
tu nieko nebandysi (Explicit) (Expansion:
Disjunction)
12. Nothing is going to change [either] we try to change
something [or] you don’t try anything. (Explicit)
(Expansion: Disjunction)
In the following pairs of examples, we provide more cases
of how hypophora is annotated in Lithuanian and English.
Lithuanian Q/R pairs are annotated for a primary sense, and
tagged as hypophora as the secondary sense.
13. [Ar] ˛imone˙s, atsižvelgiancˇios ˛i tvaruma˛, išties
finansiškai se˙kmingos? galintis nustebinti atsakymas yra
“taip" (Explicit) (Altlex: Ar; Expansion:
Level-ofdetail:Arg1-as-detail; Hypophora).
14. [Do] companies that take sustainability into
account really do well financially? The answer that
may surprise you is yes. (AltLex: Do) (Hypophora)
15. [Kode˙l] kas nors apskritai rinktu˛si tok˛i gyvenima˛
- Atsakymas ˛i š˛i klausima˛ gali skirtis, kaip skiriasi ir
žmone˙s sutinkami kelyje, bet keliautojai dažnai atsako
vienu žodžiu: laisve˙. (Explicit) (Altlex: Kode˙l;
Contingency: cause: Reason; Hypophora).
16. [Why] anyone would choose a life like this, under
the thumb of discriminatory laws, eating out of
trash cans, sleeping under bridges, picking up
seasonal jobs here and there. The answer to such a
question is as varied as the people that take to the
road, but travelers often respond with a single word:
freedom. (AltLex: Why)(Hypophora)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Intra- and Inter-Annotator Agreement</title>
      <p>
        The stability of the annotation scheme is evaluated both
by intra- and inter-annotator agreement. One transcript
(Text ID 1978), which comprises approximately 25% of
the Lithuanian section of the data is reannoted by the
primary annotator after about 2 months of the first annotation,
and it is annotated independently by the secondary
annotator (cf. Table 2 for the distribution of the annotated,
reannoated and independently annotated DR types).1 We
measured F1 score, which evaluates agreement between the
annotators regarding the existence of a DR between the same
discourse units. To measure agreement on the types and
senses of these DRs, we calculated Cohen’s Kappa
        <xref ref-type="bibr" rid="ref5">(Cohen, 1960)</xref>
        , which is known to be a robust method to
evaluate agreement on categorical items as it takes the chance
agreements into account. In this preliminary evaluation
exercise, we reached very high scores on both measures: The
F1 scores for intra- and inter-annotator agreement are 0.933
and 0.944, respectively. The Kappa values for intra- and
inter-annotator type agreement are 0.974 and 0.991,
respectively; the Kappa values for intra- and inter-annotator sense
agreement are 0.967 and 0.989, respectively.
      </p>
      <p>Relation Type</p>
      <p>AltLex
NoRel
Explicit
Implicit
EntRel</p>
      <p>Primary annotator
1st annot 2nd annot
- 2
15 15
105 107
48 53
28 30</p>
      <p>Secondary annotator</p>
    </sec>
    <sec id="sec-5">
      <title>Research Findings</title>
      <p>In this section, we focus on the whole unit of the annotated
texts in English and Lithuanian and present the frequencies
of annotated DR types (Table 3) as well as the frequencies
of the annotated top-level senses (Table 4). We then discuss
the results.</p>
      <p>
        In Table 3, the low frequency of AltLex annotations in
Lithuanian could reveal a certain tendency characteristic
reflecting the translators’ choices while translating the DCs
it appears that the translators tended to render DCs by the
variants provided by dictionaries rather than using AltLexs,
e.g. kai (when), kol (while), nes (because), nes (since), etc.
This resonates with Baker´s
        <xref ref-type="bibr" rid="ref3">(Baker, 2011)</xref>
        observations in
that translators might choose to align the patterns of DCs
with the target language.
      </p>
      <p>
        1The primary and the secondary annotators are the first and the
third authors of the study.
Another interesting feature observed is that there are more
explicit DRs in the Lithuanian transcripts than in the
English versions. This might be explained by the translators’
effort to render the implicit DRs in English explicitly. There
are also cases where implicit DRs in English texts are
translated explicitly to Lithuanian, which goes in tune with
explicitation, as observed by
        <xref ref-type="bibr" rid="ref2">Baker (1996)</xref>
        . For example:
17. ... that’s okay, right. [Implicit=But] We
want more. (Implicit) (Comparison: Concession:
Arg2_as_denier)
18. Nebogai, tiesa. [Bet] mes norim daugiau. (Explicit)
(Comparison: Concession: Arg2_as_denier)
However, there are also cases when the explicit DCs are
rendered implicitly, which might lead to the loss of the
sense annotated in the original text. For example:
19. ... only looking at race doesn’t really contribute to our
development of diversity. [So] if we’re trying to use
diversity as a way to tackle some of our more
intractable problems, we need to start to think about
diversity in a new way. (Explicit) (Contingency:
Cause: Result)
20. ... žiu¯re˙ti tik ˛i rase˛ nepadeda bandant priside˙ti
prie ˛ivairumo vystymo. [Implicit=Taigi]
Bandome ˛ivairuma˛ naudoti sprendžiant kai kurias
sude˙tingesnes problemas, turime prade˙ti kitaip
galvoti apie ˛ivairuma˛. (Implicit) (Contingency:
Cause: Result)
21. [If] we’re trying to use diversity as a way to tackle
some of our more intractable problems, we need to
start to think about diversity in a new way. (Explicit)
(Contingency: Condition: Arg2_as_condition)
22. Bandome ˛ivairuma˛ naudoti sprendžiant kai kurias
sude˙tingesnes problemas, [Implicit=tode˙l] turime
prade˙ti kitaip galvoti apie ˛ivairuma˛. (Implicit)
(Contingency: Cause: Result)
Examples 19-20 and 21-22 show that the translator chose
not to render the explicit DCs so and if. However, even
though the sense of ‘result’ could be felt implicitly in 20, in
22, we observe a meaning loss, where the sense of
‘condition’ is totally lost.
      </p>
      <p>Finally, the annotation of EntRels also revealed some
interesting cases. We observed that in some Lithuanian
translations, the EntRel is present in two loosely related sentences
as in 23, while in the source English text there is just one
sentence lacking two separate arguments (see 24):
23. Tad pasakysiu kai ka˛, kas gali jus nustebinti:
galios balansas, galintis išties paveikti tvaruma˛, yra
instituciniu˛ investuotoju˛ rankose. Tai tokie didieji
investuotojai kaip pensiju˛ fondai, kiti fondai ir
labdaros fondai. (Entrel)
24. And here’s something that may surprise you: the
balance of power to really influence sustainability rests
with institutional investors, the large investors like
pension funds, foundations and endowment.</p>
      <p>Concerning the frequencies of the top-level senses of DRs,
the distribution seems to be approximately equal for both
languages as indicated in Table 4.</p>
    </sec>
    <sec id="sec-6">
      <title>Summary, Conclusions and Outlook</title>
      <p>
        The research findings presented here represent our initial
observations and reveal certain tendencies in rendering the
discourse of English TED talks in Lithuanian. Our focus
has been on how DRs are expressed in Lithuanian
transcripts. We observed that there are more explicit DRs in
the Lithuanian transcripts, which might be explained by the
translators’ efforts to render the implicit DRs explicitly
this goes in tune with the observations of
        <xref ref-type="bibr" rid="ref3">Baker (2011)</xref>
        . On
the other hand, we noticed that the rendering of explicit
DCs implicitly might lead to the loss of the sense annotated
in the original text. Such choices of the translator could
obscure the meaning of the original, and could be explained
by the requirements of synchronization during transcript
translation. These might be the effect of the issues
discussed by
        <xref ref-type="bibr" rid="ref9">Lefer and Grabar (2015)</xref>
        who identify subtitling
as a specific type of translation. The annotation of entity
relations also reveals interesting cases, such as the translation
of a single English sentence into two loosely related
arguments in the Lithuanian EntRel version. Finally, it should
be kept in mind that there could be some stylistic
preferences of the translators, e.g. some translators might want to
use more explicit connectives, some less. The investigation
of individual translators’ choices could be a specific further
research topic.
      </p>
      <p>In the future, by annotating more of the Lithuanian
transcripts of the English texts in TED-MDB, we hope to reveal
and specify more translation tendencies. Also, by exploring
the transcripts further, we expect to find out what
translation strategies (direct translation, transposition, etc.) are
preferably employed by the translators and what this may
add to the research field of digital humanities.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This research is funded by the European Social Fund under
the No 09.3.3-LMT-K-712 “Development of Competences
of Scientists, other Researchers and Students through
Practical Research Activities” measure. For training in
annotation and generating ideas for research, we acknowledge
the support of the STSM grants by TextLink COST action
IS1312.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Al-Saif</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Markert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>The Leeds Arabic Discourse Treebank: Annotating discourse connectives for Arabic</article-title>
          .
          <source>In LREC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>Corpus-based translation studies: The challenges that lie ahead</article-title>
          .
          <source>Benjamins Translation Library</source>
          ,
          <volume>18</volume>
          :
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>In Other Words: A Coursebook on Translation. Routledge.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cettolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Federico</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>WIT3: Web Inventory of Transcribed and Translated Talks</article-title>
          .
          <source>In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT)</source>
          , volume
          <volume>261</volume>
          , page 268.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1960</year>
          ).
          <article-title>A coefficient of agreement for nominal scales</article-title>
          .
          <source>Educational and Psychological Measurement</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Hirschberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Litman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>1987</year>
          ).
          <article-title>Now let's talk about now: Identifying cue phrases intonationally</article-title>
          .
          <source>In Proceedings of the 25th Annual Meeting on Association for Computational Linguistics</source>
          , pages
          <fpage>163</fpage>
          -
          <lpage>171</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Holes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>1984</year>
          ).
          <article-title>Textual approximation in the teaching of academic writing to Arab students: A contrastive approach</article-title>
          .
          <article-title>English for Specific Purposes in the Arab World</article-title>
          , pages
          <fpage>228</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B. L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Annotating discourse relations with the PDTB Annotator</article-title>
          .
          <source>In COLING (Demos)</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lefer</surname>
          </string-name>
          , M.
          <article-title>-A. and</article-title>
          <string-name>
            <surname>Grabar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Super-creative and over-bureaucratic: A cross-genre corpus-based study on the use and translation of evaluative prefixation in TED talks and EU parliamentary debates</article-title>
          .
          <source>Across Languages and Cultures</source>
          ,
          <volume>16</volume>
          (
          <issue>2</issue>
          ):
          <fpage>187</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Oza</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolachina</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>The Hindi Discourse Relation Bank</article-title>
          .
          <source>In Proc. of the 3rd Linguistic Annotation Workshop</source>
          , pages
          <fpage>158</fpage>
          -
          <lpage>161</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Reflections on the Penn Discourse Treebank, comparable corpora, and complementary annotation</article-title>
          .
          <source>Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R. N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Frawley</surname>
            ,
            <given-names>W. J.</given-names>
          </string-name>
          (
          <year>1983</year>
          ).
          <article-title>Conjunctive cohesion in four English genres</article-title>
          .
          <source>Text-Interdisciplinary Journal for the Study of Discourse</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>347</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Šveikauskiene˙</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Telksnys</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Accuracy of the parsing of Lithuanian simple sentences</article-title>
          .
          <source>Information Technology and Control</source>
          ,
          <volume>43</volume>
          (
          <issue>4</issue>
          ):
          <fpage>402</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>A discourse-annotated corpus of conjoined VPs</article-title>
          .
          <source>In Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL</source>
          <year>2016</year>
          (
          <article-title>LAW-X 2016)</article-title>
          , pages
          <fpage>22</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Zeyrek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Demirs¸ahin, I.,
          <string-name>
            <surname>Sevdik-Çallı</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Çakıcı</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language</article-title>
          .
          <source>Dialogue and Discourse</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>174</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Zeyrek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kurfalı</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Multilingual extension of PDTB-style annotation: The case of TED Multilingual Discourse Bank</article-title>
          .
          <source>In LREC</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>PDTB-style discourse annotation of Chinese text</article-title>
          .
          <source>In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume</source>
          <volume>1</volume>
          , pages
          <fpage>69</fpage>
          -
          <lpage>77</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>