<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Data Set Sent. pairs It Tokens
Training (Europarl)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogues Translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Randy Scansani</string-name>
          <email>randy.scansani@unibo.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcello Federico</string-name>
          <email>federico@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luisa Bentivogli</string-name>
          <email>bentivo@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bologna</institution>
          ,
          <addr-line>Forl`ı</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <volume>300</volume>
      <issue>000</issue>
      <abstract>
        <p>English. In this contribution we describe an approach to evaluate the use of terminology in a phrase-based machine translation system to translate course unit descriptions from Italian into English. The genre is very prominent among those requiring translation by universities in European countries where English is not a native language. Two MT engines are trained on an in-domain bilingual corpus and a subset of the Europarl corpus, and one of them is enhanced adding a bilingual termbase to its training data. Overall systems' performance is assessed through the BLEU score, whereas the f-score is used to focus the evaluation on term translation. Furthermore, a manual analysis of the terms is carried out. Results suggest that in some cases - despite the simplistic approach implemented to inject terms into the MT system - the termbase was able to bias the word choice of the engine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Nel presente lavoro viene
descritto un metodo per valutare l’uso di
terminologia in un sistema PBSMT per
tradurre descrizioni di unita` formative
dall’italiano in inglese. La traduzione di
questo genere di testi e` fondamentale
per le universita` di Paesi europei dove
l’inglese non e` una lingua ufficiale. Due
sistemi di MT vengono addestrati su un
corpus in-domain e un sottoinsieme del
corpus Europarl. Ad uno dei due sistemi
viene aggiunto un glossario bilingue. La
valutazione delle prestazioni globali dei
sistemi avviene tramite BLEU score,
mentre f-score usato per la valutazione
specifica della traduzione dei termini. E` stata
inoltre condotta un’analisi manuale dei
termini. I risultati evidenziano che,
nonostante il metodo elementare utilizzato per
inserire i termini nel sistema di MT, il
termbase in alcuni casi in grado di
infuenzare la scelta dei termini nell’output.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        Availability of course unit descriptions or course
catalogues in multiple languages has started to
play a key role for universities especially after
the Bologna process
        <xref ref-type="bibr" rid="ref6">(European Commission et al.,
2015)</xref>
        and the resulting growth in student
mobility. These texts aim at providing students with all
the relevant information regarding contents,
prerequisites, learning outcomes, etc.
      </p>
      <p>Since course unit descriptions have to be drafted
in large quantities on a yearly basis, universities
would benefit from the use of machine
translation (MT). Indeed, the importance of developing
MT tools in this domain is further testified by two
previous projects funded by the EU Commission,
i.e. TraMOOC1 and Bologna Translation Service2.
The former differs from the present work since it
does not focus on academic courses, while the
latter does not seem to have undergone substantial
development after 2013 and in addition to that,
it does not include the Italian-English language
combination.</p>
      <p>
        Automatically producing multilingual versions
of course unit descriptions poses a number of
challenges. A first major issue for MT systems is the
scarcity of high quality human-translated
parallel texts of course unit descriptions. Also,
descriptions feature not only terms that are
typical of institutional academic communication, but
also expressions that belong to specific disciplines
        <xref ref-type="bibr" rid="ref7">(Ferraresi, 2017)</xref>
        . This makes it cumbersome to
1Translation for Massive Open Online Course http://
tramooc.eu/
2http://www.bologna-translation.eu
choose the right resources and the most effective
method to add them to the MT engine.
      </p>
      <p>For this study, we chose to concentrate on
course units belonging to the disciplinary domain
of exact sciences, since Italian degree programmes
whose course units belong to this domain translate
their contents into English more often than other
programmes.</p>
      <p>A phrase-based statistical machine translation
system (PBSMT) was used to translate course unit
descriptions from Italian into English. We trained
one engine on a subset of the Europarl corpus and
on a small in-domain corpus including course unit
descriptions and degree programs (see sect. 3.1)
belonging to the domain of the exact sciences.
Then, we enriched the training data set with a
bilingual terminology database belonging to the
educational domain (see sect. 3.2) and built a new
engine. To assess the overall performance of the
two systems we automatically evaluated them with
the BLEU score. We then focused on the
evaluation of terminology translation, by computing the
f-score on the list of termbase entries occurring
both in the system outputs and in the reference
translation (see sect. 4). Finally, to gather more
information on term translation, a manual analysis
was carried out (see sect. 5).
2</p>
    </sec>
    <sec id="sec-3">
      <title>Previous work</title>
      <p>
        A number of approaches have already been
developed to use in-domain resources like corpora
and terminology in statistical machine translation
(SMT), indirectly tackling the domain-adaptation
challenge for MT. For example, the WMT 2007
shared task was focused on domain adaptation
in a scenario in which a small in-domain corpus
is available and has to be integrated with large
generic corpora
        <xref ref-type="bibr" rid="ref10 ref10 ref5 ref5">(Koehn and Schroeder, 2007;
Civera and Juan, 2007)</xref>
        . Recently, the work by
Sˇtajner et al. (2016) showed that an
EnglishPortuguese PBSMT system in the IT domain
achieved best results when trained on a large
generic corpus and in-domain terminology.
      </p>
      <p>
        For French-English in the military domain,
Langlais (2002) reported on improvements of
the WER score after using existing
terminological resources as constraints to reduce the
search space. For the same language combination,
Bouamor et al. (2012) used couples of MWEs
extracted from the Europarl corpus as one of the
training resources, yet only observing a gain of
0.3% BLEU points
        <xref ref-type="bibr" rid="ref12">(Papineni et al., 2002)</xref>
        .
      </p>
      <p>Other experiments have focused on how to
insert terms in an MT system without having to stop
or re-train it. These dynamic methods suit the
purpose of the present paper, as they focus (also)
on Italian-English. Arcan et al. (2014b) injected
bilingual terms into a SMT system dynamically,
observing an improvement of up to 15% BLEU
points for English-Italian in medical and IT
domains. For the same domains and with the same
languages (in both directions), Arcan et al. (2014a)
developed an architecture to identify terminology
in a source text and translate it using Wikipedia
as a resource. The terms obtained were then
dynamically added to the SMT system. This study
resulted in an improvement of up to 13% BLEU
score points.</p>
      <p>We have seen that results for the languages we
are working on are encouraging, but since they are
strongly influenced by several factors – i.e. the
domain and the injection method – an experiment
on academic institutional texts is required in
order to test the influence of bilingual terminology
resources on the output.
3
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <sec id="sec-4-1">
        <title>Corpora</title>
        <p>
          A subset of 300,000 sentence pairs was extracted
from the Europarl Italian-English bilingual
corpus
          <xref ref-type="bibr" rid="ref8">(Koehn, 2005)</xref>
          . Limiting the number of
sentence pairs of the generic corpus was necessary
due to limitations of the computational resources
available. Then, bilingual corpora belonging to
the academic domain were needed as development
and evaluation data sets and to enhance the
training data set. One course unit description corpus
was available thanks to the CODE project3. After
cleaning of texts not belonging to the exact
science domain, we merged the corpus with other
two smaller corpora made of course unit
descriptions. We then extracted 3,500 sentence pairs to
use them as development set.
        </p>
        <p>Relying only on course unit descriptions to train
our engines could have led to an over-fitting of
the models. Moreover, high quality parallel course
unit descriptions are often difficult to be found. To
3CODE is a project aimed at building corpora and
tools to support translation of course unit descriptions
into English and drafting of these texts in English as
a lingua franca. http://code.sslmit.unibo.it/
doku.php
overcome these two issues we added a small
number of degree program descriptions to our
indomain corpus. To conclude, a fourth small course
unit descriptions corpus was built to be used as
evaluation data set. All the details regarding the
sentence pairs and tokens are provided in Table 1.
The terminology database was created merging
three different IATE (InterActive Terminology for
Europe)4 termbases for both languages and adding
to them the terms extracted from the fifth volume
of the Eurydice5 glossaries. More specifically, the
three different IATE termbases were: Education,
Teaching, Organization of teaching.</p>
        <p>To verify the relevance of our termbase with
respect to the training data we measured its
coverage. Since the terms in the termbase are in their
base form, in order to obtain a more accurate
estimate we lemmatised6 the training sets before
calculating the overlap between the two resources.</p>
        <p>As we can see in Table 2, the 24.08% of the
termbase entries are also in the source side of the
two training corpora, and 29.19% are in the target
side, meaning that the two resources complement
each other well.</p>
        <p>Europarl lemmas
In-domain lemmas
Termbase entries
Europarl overlap
In-domain overlap
Total overlap</p>
        <p>It
7,848,936
441,030
4,142
23.03%
27.52%
24.08%</p>
        <sec id="sec-4-1-1">
          <title>4http://iate.europa.eu/</title>
          <p>5http://eacea.ec.europa.eu/education/
eurydice/</p>
          <p>
            6Lemmatisation was performed using the
TreeTagger: https://goo.gl/JjHMcZ
We tested the performance of a PBSMT system
trained on the resources described in sections 3.1
and 3.2. The system used to build the engines
for this experiment is the open-source ModernMT
(MMT)7
            <xref ref-type="bibr" rid="ref3">(Bertoldi et al., 2017)</xref>
            . Two engines were
built in MMT:
          </p>
          <p>One engine trained on the subset of Europarl
plus our in-domain corpus.</p>
          <p>One engine trained on the subset of Europarl
plus our in-domain corpus and the
terminology database.</p>
          <p>Both engines were tuned on our development set
and evaluated on the test set (see sect. 3.1).
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental results</title>
      <p>
        To provide information on the overall translation
quality of our PBSMT engines, we calculated the
BLEU scores
        <xref ref-type="bibr" rid="ref12">(Papineni et al., 2002)</xref>
        obtained on
the test set. Table 3 shows the results for both
engines, where the engine without terminology is
referred to as w/o terms and the one with
terminology is referred to as w/ terms.
      </p>
      <p>Furthermore, we evaluated the systems focusing
on their performance on terminology translation.
To this purpose, we relied on the f-score. More in
detail, for both engines we extracted the number of
English termbase entries appearing in the system
output and in the reference translation. Exploiting
these figures, we were able to compute Precision,
Recall and f-score. Results are reported in Table
4.</p>
      <sec id="sec-5-1">
        <title>Engine</title>
        <p>w/o terms
w/ terms</p>
      </sec>
      <sec id="sec-5-2">
        <title>BLEU</title>
        <p>25.92
26.00</p>
        <p>The figures in Tables 3 and 4 show that adding
our termbase to the training data set does not
affect the output in a substantial way. While
according to the BLEU score the w/ terms engine
slightly outperforms the w/o terms engine, the
fscore – indicating performance on term translation
– is marginally higher for the w/o terms system.</p>
        <p>Focusing on the usage of terminology, a number
of observations can be made. As regards the
distribution of termbase entries in the test set - which
contains 3,465 sentence pairs - it is interesting to
know that the number of output and reference
sentences containing at least one term is fairly low, i.e.
945 (27.30%) for the reference text, 866 (24.99%)
for the w/o terms output and 870 (25.10%) for the
w/ terms output.</p>
        <p>Considering the terms found in the two
outputs, we observe that their number only differs
by 23 units (ca. 2% of the number of terms in
the outputs). Also, the number of overlapping
terms is very high, i.e. 882 terms (out of 1,061
for the engine w/o terms and out of 1,083 for the
engine w/ terms). As a matter of fact, the
top6 frequent terms in the systems’ outputs are the
same – course, oral, ability, lecture, technology
and teacher – and cover approximately a half of
the total amount of extracted terms for both
outputs.</p>
        <p>We then compared the English termbase entries
appearing in the target side of the test set to those
appearing in the training set. Each of the 78 terms
occurring at least one time in the test set
(corresponding to 1,133 total occurrences as reported in
Table 4), also occur in the training set – out of
which 60 in its in-domain component.</p>
        <p>However, even though our training data cover
the total amount of terms present in the test data,
and despite the high overlap between the terms
produced by the two engines, still there is a
considerable number of terms that are different. We
thus cannot exclude an influence of the termbase
on the word choice of the w/ terms system. For this
reason, an in-depth analysis of the different terms
produced by the two engines was carried out.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Manual Evaluation</title>
      <p>The analysis of the sentences where the termbase
entries used by the two engines differed showed
that in some cases the termbase forced the system
to use its target term even if a different
translation - sometimes also correct - was present in the
training corpora. Some examples are reported in
Table 5. For the source words prova orale
(Example 1) and esame scritto (Example 2), the
engine w/ terms used oral examination and written
examination, while the one w/o terms used
written exam and oral exam, but only the occurrences
with examination are in the termbase.
Moreover, Example 2 also includes the termbase word
preparazione, which is translated with preparation
by the engine w/ terms, while it is not translated at
all by the engine w/o terms.</p>
      <p>Another interesting example is the translation
of the source word docente (Example 3), where
the termbase corrected a wrong translation. The
Italian term was wrongly translated with lecture
by the engine w/o terminology, and with teacher
which is the right translation for this text - by the
engine w/ terminology.</p>
      <p>In Example 4, the Italian sentence contained the
termbase entry voto finale, which was translated
with final vote by the engine w/o terms and with
the termbase MWE final mark by the w/ terms
engine. Also in this case the termbase corrected a
mistake, since vote is not the correct translation of
voto in this context.</p>
      <p>The comparison between the two engines’
outputs shows that, even though our training data
covered the total amount of terms present in the test
set, the termbase influenced the MT output of the
engine w/ terms biasing the weights assigned to a
specific translation.</p>
      <p>Such results have to be judged taking into
account the preliminary nature of this study, aimed at
understanding the practical implications of using
terminology in PBSMT, and therefore exploiting
a simplistic approach to inject terms. As a matter
of fact, we found that also some of the termbase
entries occurring in the reference – e.g.
certificaSRC La prova orale si svolgera` sugli argomenti del programma del corso.
REF The oral verification will be on the topics of the lectures.</p>
      <p>W/O TERMS The oral exam will take place on the program of the course.
W/ TERMS The oral examination will take place on the program of the course.
SRC La preparazione dello studente sara` valutata in un esame scritto.
REF Student preparation shall be evaluated by a 3 hrs written examination.
W/O TERMS The student will be evaluated in a written exam.</p>
      <p>W/ TERMS The preparation of the student will be evaluated in a written examination.
SRC Ogni docente titolare
REF Each lecturer.</p>
      <p>W/O TERMS Every lecture.</p>
      <p>W/ TERMS Every teacher.</p>
      <p>SRC In tal caso il voto finale terra` conto anche della prova orale.
REF In this case the final score will be based also on the oral part.
W/O TERMS In this case the final vote will take account the oral test.</p>
      <p>W/ TERMS In this case the final mark will be based also the oral test.
!
!
%
!
%
!
%
!
tion, instructor, text book, educational material –
were not used in the output of the system w/ terms
and this is probably due to the limitations of our
method. The terms instructor, text book and
educational material did not occur in the w/o terms
output neither, while certification did.</p>
      <p>To sum up, what emerges is that using
terminology in PBSMT to translate course catalogues - and
more specifically course unit descriptions - can
influence the MT output. In our case, since the
improvements were measured against the output of
the w/o terms engine - which might eventually be
correct even if using different terms from those
included in the termbase - the metrics results were
not informative enough and a manual analysis of
the terms had to be carried out.
6</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and further work</title>
      <p>This paper has described a preliminary analysis
aimed at assessing the use of in-domain
terminology in PBSMT in the institutional academic
domain, and more precisely for the translation of
course unit descriptions from Italian into English.
Following the results of the present experiment
and given its preliminary nature, we are planning
to carry out further work in this field.</p>
      <p>In section 4 we have seen that the institutional
academic terms contained in our testing data also
appeared in the training data, thus limiting the
impact of terminology on the output. However,
course catalogues and course unit descriptions
include terms belonging to the specific disciplines
(see sect. 1) as well. In our future works we are
therefore planning to focus not only on academic
terminology, but also on the disciplinary one
testing its impact on the output of an MT engine
translating course unit descriptions.</p>
      <p>After this first experiment on the
widelyused PBSMT architecture, in future work we are
planning to exploit neural machine translation
(NMT). In particular, our goal is to develop an
NMT engine able to handle terminology correctly
in this text domain, in order to investigate its
effect on the post-editor’s work. For this reason, a
termbase focused on the institutional academic
domain, e.g. the UCL-K.U.Leuven University
Terminology Database8 or the Innsbrucker Termbank
2.09 could be used to select an adequate
benchmark for the development and evaluation of an MT
engine with a high degree of accuracy in the
translation of terms.</p>
    </sec>
    <sec id="sec-8">
      <title>Ackowledgements</title>
      <p>The authors would like to thank Silvia
Bernardini, Marcello Soffritti and Adriano Ferraresi from
the University of Bologna for their advice on
terminology and institutional academic
communication, and Mauro Cettolo from FBK for help with
ModernMT. The usual disclaimers apply.</p>
      <sec id="sec-8-1">
        <title>8https://goo.gl/huoevR 9https://goo.gl/W2GH5h</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Mihael</given-names>
            <surname>Arcan</surname>
          </string-name>
          , Claudio Giuliano, Marco Turchi, and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          . 2014b.
          <article-title>Identification of bilingual terms from monolingual documents for statistical machine translation</article-title>
          .
          <source>In Proceedings of the 4th International Workshop on Computational Terminology</source>
          . Dublin, Ireland, pages
          <fpage>22</fpage>
          -
          <lpage>31</lpage>
          . http://www.aclweb.org/anthology/W14-4803.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Mihael</given-names>
            <surname>Arcan</surname>
          </string-name>
          , Marco Turchi, Sara Tonelli, and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          . 2014a.
          <article-title>Enhancing statistical machine translation with bilingual terminology in a CAT environment</article-title>
          . In Yaser Al-Onaizan and Michel Simard, editors,
          <source>Proceedings of AMTA 2014</source>
          . Vancouver, BC.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Bertoldi</surname>
          </string-name>
          , Roldano Cattoni, Mauro Cettolo, Amin Farajian, Marcello Federico, Davide Caroselli, Luca Mastrostefano, Andrea Rossi, Marco Trombetti, Ulrich Germann, and
          <string-name>
            <given-names>David</given-names>
            <surname>Madl</surname>
          </string-name>
          .
          <year>2017</year>
          . MMT:
          <article-title>New open source MT for the translation industry</article-title>
          .
          <source>In Proceedings of the 20th Annual Conference of the European Association for Machine Translation. Prague</source>
          , pages
          <fpage>86</fpage>
          -
          <lpage>91</lpage>
          . https://ufal.mff.cuni.cz/eamt2017/user-projectproduct-papers/papers/user/EAMT2017 paper 88.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Dhouha</given-names>
            <surname>Bouamor</surname>
          </string-name>
          , Nasredine Semmar, and
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Zweigenbaum</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Identifying bilingual multi-word expressions for statistical machine translation</article-title>
          .
          <source>In Nicoletta Calzolari</source>
          , Khalid Choukri, Thierry Declerck,
          <article-title>Mehmet Ug˘ur Dog˘an, Bente Maegaard</article-title>
          , Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors,
          <source>Proceedings of the Eighth International Conference on Language Resources</source>
          and
          <string-name>
            <surname>Evaluation (LREC-2012). European Language Resources Association</surname>
          </string-name>
          (ELRA), Istanbul, Turkey, pages
          <fpage>674</fpage>
          -
          <lpage>679</lpage>
          . ACL Anthology Identifier:
          <fpage>L12</fpage>
          -
          <lpage>1527</lpage>
          . http://www.lrecconf.org/proceedings/lrec2012/pdf/886 Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Civera</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alfons</given-names>
            <surname>Juan</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Domain adaptation in statistical machine translation with mixture modelling</article-title>
          .
          <source>In Proceedings of the Second Workshop on Statistical Machine Translation. Association for Computational Linguistics</source>
          , Prague, Czech Republic, pages
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          . http://www.aclweb.org/anthology/W/W07/W07- 0222.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <string-name>
            <given-names>EACEA</given-names>
            , and
            <surname>Eurydice</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The European Higher Education Area in 2015: Bologna Process Implementation Report</article-title>
          . Luxembourg:
          <article-title>Publications office of the European Union</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Adriano</given-names>
            <surname>Ferraresi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Terminology in European university settings. The case of course unit descriptions</article-title>
          . In Paola Faini, editor,
          <source>Terminological Approaches in the European Context</source>
          . Cambridge Scholars Publishing,
          <source>Newcastle upon Tyne</source>
          , pages
          <fpage>20</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Europarl: A Parallel Corpus for Statistical Machine Translation</article-title>
          . In
          <source>Conference Proceedings: the tenth Machine Translation Summit.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>AAMT</surname>
          </string-name>
          , Phuket, Thailand, pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . http://mtarchive.info/MTS-2005-Koehn.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Josh</given-names>
            <surname>Schroeder</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Experiments in domain adaptation for statistical machine translation</article-title>
          .
          <source>In Proceedings of the Second Workshop on Statistical Machine Translation. Association for Computational Linguistics</source>
          , Prague, StatMT '
          <volume>07</volume>
          , pages
          <fpage>224</fpage>
          -
          <lpage>227</lpage>
          . http://dl.acm.org/citation.cfm?id=
          <volume>1626355</volume>
          .
          <fpage>1626388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Philippe</given-names>
            <surname>Langlais</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Improving a generalpurpose statistical translation engine by terminological lexicons</article-title>
          .
          <source>In COLING-02 on COMPUTERM</source>
          <year>2002</year>
          : Second International Workshop on Computational Terminology - Volume
          <volume>14</volume>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Stroudsburg, PA, USA, COMPUTERM '
          <volume>02</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . https://doi.org/10.3115/1118771.1118776.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Kishore</given-names>
            <surname>Papineni</surname>
          </string-name>
          , Salim Roukos, Todd Ward, and
          <string-name>
            <surname>Wei-Jing Zhu</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Bleu: A method for automatic evaluation of machine translation</article-title>
          .
          <source>In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics</source>
          , Philadelphia, Pennsylvania, ACL '
          <volume>02</volume>
          , pages
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . https://doi.org/10.3115/1073083.1073135.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>Sanja Sˇ tajner</article-title>
          , Andreia Querido, Nuno Rendeiro,
          <source>Joa˜o Anto´nio Rodrigues, and Anto´nio Branco</source>
          .
          <year>2016</year>
          .
          <article-title>Use of domain-specific language resources in machine translation</article-title>
          .
          <source>In Nicoletta Calzolari</source>
          , Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors,
          <source>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ).
          <source>European Language Resources Association (ELRA)</source>
          , Paris, France, pages
          <fpage>592</fpage>
          -
          <lpage>598</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>