<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Online Automatic Post-Editing across Domains</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rajen Chatterjee</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gebremedhen Gebremelak</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Negri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Turchi Fondazione Bruno Kessler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>chatterjee</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gebremelak</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>negri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>turchig@fbk.eu</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>English. Recent advances in automatic
post-editing (APE) have shown that it is
possible to automatically correct
systematic errors made by machine translation
systems. However, most of the current
APE techniques have only been tested
in controlled batch environments, where
training and test data are sampled from
the same distribution and the training set
is fully available. In this paper, we
propose an online APE system based on an
instance selection mechanism that is able
to efficiently work with a stream of data
points belonging to different domains. Our
results on a mix of two datasets show that
our system is able to: i) outperform
stateof-the-art online APE solutions and ii)
significantly improve the quality of rough MT
output.</p>
      <p>Italiano. Recenti miglioramenti dei
sistemi automatici di post-editing hanno
dimostrato la loro capacita` di correggere
errori ricorrenti commessi dalla traduzione
automatica. Spesso, tuttavia, tali
sistemi sono stati valutati in condizioni
controllate dove i dati di training/test
sono selezionati dalla stessa distribuzione
e l’insieme di training e` interamente
disponibile. Questo articolo propone un
sistema di post-editing online, basato su
tecniche di selezione dei dati, capace di
trattare sequenze di dati appartenenti a
diversi dominii. I risultati su un insieme
di dati misti mostrano che il sistema e` in
grado di ottenere risultati migliori rispetto
i) allo stato dell’arte e ii) al sistema di
traduzione.</p>
      <p>
        Nowadays, machine translation (MT) is a core
element in the computer-assisted translation (CAT)
framework
        <xref ref-type="bibr" rid="ref15">(Federico et al., 2014)</xref>
        . The motivation
for integrating MT in the CAT framework lies in
its capability to provide useful suggestions for
unseen segments, thus increasing translators
productivity. However, it has been observed that MT is
often prone to systematic errors that human
postediting has to correct before publication. The
byproduct of this “translation as post-editing”
process is an increasing amount of parallel data
consisting of MT output on one side and its corrected
version on the other side. Besides being used to
improve the MT system itself
        <xref ref-type="bibr" rid="ref2">(Bentivogli et al.,
2016)</xref>
        , this data can be leveraged to develop
automatic MT quality estimation tools
        <xref ref-type="bibr" rid="ref22 ref31 ref4 ref5 ref6">(Mehdad et
al., 2012; Turchi et al., 2013; C. de Souza et al.,
2013; C. de Souza et al., 2014; C. de Souza et al.,
2015)</xref>
        and automatic post-editing (APE) systems
        <xref ref-type="bibr" rid="ref7 ref7 ref8 ref8 ref9">(Chatterjee et al., 2015b; Chatterjee et al., 2015a;
Chatterjee et al., 2016)</xref>
        . The APE components
explored in this paper should be capable not only
to spot recurring MT errors, but also to correct
them. Thus, integrating an APE system inside the
CAT framework can further improve the quality
of the suggested segments, reduce the workload of
human post-editors and increase the productivity
of translation industries. In the last decade many
studies on APE have shown that the quality of the
machine translated text can be improved
significantly by post-processing the translations with an
APE system
        <xref ref-type="bibr" rid="ref1 ref13 ref25 ref27 ref30 ref7 ref8">(Simard et al., 2007; Dugast et al.,
2007; Terumasa, 2007; Pilevar, 2011; Be´chara et
al., 2011; Chatterjee et al., 2015b)</xref>
        . These systems
mainly follow the phrase-based machine
translation approach where the MT outputs (with
optionally the source sentence) are used as the source
language corpus and the post-edits are used as the
target language corpus. Although these standard
approaches showed promising results, they lack
of the ability to continuously update their inner
models by incorporating human feedback from a
stream of data. To address this problem, several
online systems have been proposed in MT, but
only few of them have been applied to the APE
scenario
        <xref ref-type="bibr" rid="ref20 ref21 ref26 ref3 ref31">(Simard and Foster, 2013; Lagarda et al.,
2015)</xref>
        , only in a controlled working environment
where they are trained and evaluated on
homogeneous/coherent data sets.
      </p>
      <p>In this paper, we propose a novel online APE
system that is able to efficiently leverage data from
different domains.1 Our system is based on an
instance selection technique that is able to retrieve
the most relevant training instances from a pool of
multi-domain data for each segment to post-edit.
The selected data is then used to train and tune the
APE system on-the-fly. The relevance of a training
sample is measured by a similarity score that takes
into account the context of the segment to be
postedited. This technique allows our online APE
system to be flexible enough to decide if it has the
correct knowledge for post-editing a sentence or if it
is safer to keep the MT output untouched, avoiding
possible damages. The results of our experiments
over the combination of two data sets show that
our approach is robust enough to work in a
multidomain environment and to generate reliable
postedits with significantly better performance than a
state-of-the-art online APE system.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Online translation systems</title>
      <p>
        Online translation systems aim to incorporate
human post-editing feedback (or the corrected
version of the MT output) into their models in
realtime, as soon as it becomes available. This
feedback helps the system to learn from the mistakes
made in the past translations and avoid to repeat
them in future translations. This continuous
learning capability will eventually improve the quality
of the translations and consequently increase the
productivity of the translators/post-editors
        <xref ref-type="bibr" rid="ref29">(Tatsumi, 2009)</xref>
        working with MT suggestions in a
CAT environment. The basic workflow of an
online translation system goes through the
following steps repeatedly: i) the system receives an
input segment; ii) the input segment is translated
and provided to the post-editor to fix any errors
1A domain is made of segments belonging to the same
text genre and the MT outputs are generated by the same MT
system.
in it; and iii) the human post-edited version of
the translation is incorporated back into the
system, by stepwise updating the underlying models
and parameters. In the APE context, the input
is a machine-translated segment (optionally with
its corresponding source segment), which is
processed by the online APE system to fix errors,
and then verified by the post-editors. Several
online translation systems have been proposed over
the years
        <xref ref-type="bibr" rid="ref12 ref12 ref17 ref21 ref21 ref23 ref26 ref3 ref3 ref31 ref32">(Hardt and Elming, 2010; Bertoldi et
al., 2013; Mathur et al., 2013; Simard and
Foster, 2013; Ortiz-Mart¨ınez and Casacuberta, 2014;
Denkowski et al., 2014; Wuebker et al., 2015)</xref>
        .
      </p>
      <p>
        The state-of-the-art online APE system is the
Thot toolkit
        <xref ref-type="bibr" rid="ref12 ref23">(Ortiz-Mart¨ınez and Casacuberta,
2014)</xref>
        that has been previously developed to
support fully automatic and interactive statistical
machine translation and then used in the APE task
        <xref ref-type="bibr" rid="ref20">(Lagarda et al., 2015)</xref>
        . To update the inner models
with the user feedback, a set of sufficient statistics
was maintained and incrementally updated. In the
case of language model, only the n-gram counts
are required to maintain sufficient statistics. To
update the translation model, an incremental
version of EM algorithm is used to first obtain word
alignment and then phrase pairs counts were
extracted to update the sufficient statistics. Other
features like source/target phrase-length models
or distortion model are implemented by means
of geometric distributions with fixed parameters.
However, Thot differs from our approach because
it does not embed any techniques for selecting
the most relevant training data. In the long-run,
when data points from different domain are
continuously analysed, this system tends to become
more and more generic, which may not be useful
and even harmful for automatically post-editing
domain-specific segments.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Instance Selection for online APE system</title>
      <p>To preserve all the knowledge gained in the online
learning process and at the same time being able to
apply specific post-editing rules when needed, we
propose an instance selection technique for online
APE that has the ability to retrieve specific data
points whose context is similar to the segment to
be post-edited. These data points are then used to
build reliable APE models. When there are no
reliable data points in the knowledge base, the MT
output is kept untouched, as opposed to the
existing APE systems, which tends to always translate
the given input segment independently from the
reliability of the applicable correction rules.</p>
      <p>
        Our proposed algorithm emulates an online
APE system and assumes to have the following
data to run the online experiments: i) source (src);
ii) MT output (mt); and iii) human post-edits (pe)
of the MT output. At the beginning the knowledge
base of our online APE system is empty and it
will be updated whenever an instance (a tuple
containing parallel segments from all the above
mentioned documents) is processed. When the system
receives an input (src, mt), the most relevant
training instances from a pool of multi-domain data
stored in our knowledge base are retrieved. The
similarity between the training instances and the
input segment is measured by a score based on the
term frequency inverse document frequency
(tfidf ), generally used in information retrieval. The
larger the number of words in common between
the training and the input sentences, the higher is
the score. In our system, these scores are
computed using the Lucene library.2 Only those
training instances that have similarity score above a
certain threshold (decided over a held-out
development set) are used to build: i) a tri-gram local
language model over the target side of the training
corpus with the IRSTLM toolkit
        <xref ref-type="bibr" rid="ref14">(Federico et al.,
2008)</xref>
        ; ii) the translation and reordering models
using the Moses toolkit
        <xref ref-type="bibr" rid="ref18">(Koehn et al., 2007)</xref>
        and
the word alignment of each sentence pair is
computed using the incremental GIZA++ software.3
The log-linear model parameters are optimized
over a part of the selected instances. To obtain
reliably-tuned weights and a fast optimization
process, multiple instances of MIRA
        <xref ref-type="bibr" rid="ref10">(Chiang, 2012)</xref>
        are run in parallel on three small development sets
randomly selected from the retrieved sentences.
The obtained weights are then averaged. If a
minimum value of retrieved sentences is not reached,
the optimization step is skipped because having
few sentences might not yield reliable weights. In
this case, the weights computed on the previous
input segment are used. The tuned weights and
the models built on all the data are then used to
post-edit the input sentences.
      </p>
      <p>In a real translation workflow, the APE segment
is then passed to the human translator that
creates the post-edited segment. Once the post-edit is
2https://lucene.apache.org/
3https://code.google.com/archive/p/
inc-giza-pp/
available it is added to the knowledge base along
with the source and the mt sentences. In our
experiments we emulate the post-edited sentence of the
APE segment with the post-edit of the mt output.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental setup</title>
      <p>
        Data To examine the performance of the online
APE systems in a multi-domain translation
environment, we select two data sets for the
EnglishGerman language pair belonging to information
technology (IT). Although they come from the
same category (IT), they feature variability in
terms of vocabulary coverage, MT errors, and
post-editing style. The two data sets are
respectively a subset of the Autodesk Post-Editing Data
corpus and the resources used at the second round
of the APE shared task at the first conference on
machine translation (WMT2016).4 The data sets
are pre-processed to obtain a joint-representation
that links each source word with a MT word
(mt#src). This representation has been proposed
in the context-aware APE approach by
        <xref ref-type="bibr" rid="ref1">(Be´chara et
al., 2011)</xref>
        and leverages the source information to
disambiguate post-editing rules. Recently,
        <xref ref-type="bibr" rid="ref7 ref8">(Chatterjee et al., 2015b)</xref>
        also confirmed this approach
to work better than translating from raw MT
segments over multiple language pairs. The
jointrepresentation is used as a source corpus to train
all the APE systems reported in this paper and it is
obtained by first aligning the words of source (src)
and MT (mt) segments using MGIZA++
        <xref ref-type="bibr" rid="ref14 ref16">(Gao and
Vogel, 2008)</xref>
        , and then each mt word is
concatenated with its corresponding src words.
      </p>
      <p>The Autodesk training, and development sets
consist of 12,238, and 1,948 segments
respectively, while the WMT2016 data contains 12,000,
and 1,000 segments. To measure the diversity of
the two data sets we compute the vocabulary
overlap between the two joint-representations. This is
performed internally to each data set (splitting the
training data in two halves) and across them. As
expected, in the first case the vocabulary overlap
is much larger (&gt; 40%) than in the second one
( 15%); this indicates that the two data sets are
quite different and few information can be shared.</p>
      <p>
        To emulate the multi-domain scenario, the two
training data sets are first merged together and then
shuffled. The same strategy is also used for the
development sets. This represents the situation in
4http://www.statmt.org/wmt16/
ape-task.html
which an APE system serves two CAT tools that
process documents from two domains and the
sequence of points is random. Our approach and the
competitors are run on all the shuffled training data
and evaluated on the second half (12,100 points).
Evaluation metrics The performance of the
different APE systems is evaluated using the
Translation Error rate (TER)
        <xref ref-type="bibr" rid="ref28">(Snover et al., 2006)</xref>
        , BLEU
        <xref ref-type="bibr" rid="ref24">(Papineni et al., 2002)</xref>
        and the precision
        <xref ref-type="bibr" rid="ref7 ref8">(Chatterjee et al., 2015a)</xref>
        . TER and BLEU measures
the similarity between the MT outputs and their
references by looking at the word/n-gram
overlaps, while precision is the ratio of number of
sentences an APE system improves (with respect to
the MT output) over all the sentences it
modifies.5 Larger values indicate that the APE
system is able to improve the quality of most of the
sentences it changes. The statistical significance
test for BLEU is computed using the paired
bootstrap resampling technique
        <xref ref-type="bibr" rid="ref19">(Koehn, 2004)</xref>
        , and for
TER using the stratified approximate
randomization technique
        <xref ref-type="bibr" rid="ref11">(Clark et al., 2011)</xref>
        .
      </p>
      <p>
        Terms of comparison We evaluate our online
learning approach against the output produced by
the MT system, the batch APE system that
follows the approach proposed in
        <xref ref-type="bibr" rid="ref7 ref8">(Chatterjee et al.,
2015b)</xref>
        , and the Thot toolkit.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Results</title>
      <p>The main goal of this research is to examine the
performance of online APE methods in a
multidomain scenario, where the APE system receives
a stream of data coming from different domains.
The parameters of our approach (i.e. similarity
score threshold and minimum number of selected
sentence) are optimised following the grid search
strategy. We set the threshold values to 1 and the
minimum number of selected sentences to 20. The
results of all the systems are reported in Table 1.</p>
      <p>The batch APE system that is trained only on
the first half of the data is able to slightly improve
the performance of the MT system, but it damages
most of the sentence it changes (precision smaller
than 45%). Although Thot can learn from all the
data, it is interesting to note that it does not
significantly improve over the MT system and the batch
APE system. This suggests that using all the data
5For each sentence in the test set, if the TER score of
APE system is different than the baseline then it is considered
as a modified sentence
MT
Batch APE
Thot
Our approach
without considering the peculiarities of each
domain does not allow an APE system to efficiently
learn reliable correction rules and to improve the
machine translation quality. Moreover, these
results also show that few information can be shared
between the two data sets. This is expected
considering the limited overlap between the two corpora.</p>
      <p>Our approach provides significant
improvements in BLEU, TER and precision over all the
competitors. In particular, it can obtain more than
one TER and BLEU point improvement, and more
than 20% precision points increment over the best
APE system (the Thot toolkit). Such gains
confirm that the instance selection mechanism allows
our APE system to identify domain-specific data
and to leverage it for extracting reliable correction
rules. Further analysis of the performance of the
online systems revealed that our approach
modifies less segments compared with Thot, because it
builds a model only if it finds relevant data,
leaving the MT segment untouched otherwise. These
untouched MT segments, when modified by Thot,
often lead to deterioration. This suggests that, the
output obtained with our solution has a higher
potential for being useful to human translators. Such
usefulness comes not only in terms of a more
pleasant post-editing activity, but also in terms of
time savings yield by overall better suggestions.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We addressed the problem of building a robust
online APE system that is able to efficiently work on
a stream of data points belonging to different
domains. In this condition, our APE has shown its
capability to continuously adapt to the dynamics
of diverse data processed in real-time. In
particular, the instance selection mechanism allows our
APE method to reduce the number of wrong
modifications, which result in significant improvements
in precision over the state-of-the-art online APE
system, and thus making it a viable solution to be
deployed in a real-word CAT framework.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the
ECfunded H2020 project QT21 (grant agreement no.
645452).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          Hanna Be´chara, Yanjun Ma, and Josef van Genabith.
          <year>2011</year>
          .
          <article-title>Statistical post-editing for a statistical mt system</article-title>
          .
          <source>In Proceedings of the XIII MT Summit</source>
          , pages
          <fpage>308</fpage>
          -
          <lpage>315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Luisa</given-names>
            <surname>Bentivogli</surname>
          </string-name>
          , Nicola Bertoldi, Mauro Cettolo, Marcello Federico, Matteo Negri, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>On the evaluation of adaptive machine translation for human post-editing</article-title>
          .
          <source>IEEE/ACM Trans. Audio, Speech &amp; Language Processing</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ):
          <fpage>388</fpage>
          -
          <lpage>399</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Bertoldi</surname>
          </string-name>
          , Mauro Cettolo, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Cache-based online adaptation for machine translation enhanced computer assisted translation</article-title>
          .
          <source>Proceedings of the XIV MT Summit</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jose´ G. C. de Souza</surname>
            , Christian Buck, Marco Turchi, and
            <given-names>Matteo</given-names>
          </string-name>
          <string-name>
            <surname>Negri</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task</article-title>
          .
          <source>In Proc. of the Eighth Workshop on Statistical Machine Translation</source>
          , pages
          <fpage>352</fpage>
          -
          <lpage>358</lpage>
          , Sofia, Bulgaria. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Jose´ G. C. de Souza</surname>
          </string-name>
          ,
          <article-title>Jesu´s Gonza´lez-</article-title>
          <string-name>
            <surname>Rubio</surname>
            ,
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Buck</surname>
            , Marco Turchi, and
            <given-names>Matteo</given-names>
          </string-name>
          <string-name>
            <surname>Negri</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>FBK-UPV-UEdin Participation in the WMT14 Quality Estimation Shared-task</article-title>
          .
          <source>In Proc. of the Ninth Workshop on Statistical Machine Translation</source>
          , pages
          <fpage>322</fpage>
          -
          <lpage>328</lpage>
          , Baltimore, Maryland, USA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Jose´ G. C. de Souza</surname>
            , Matteo Negri, Elisa Ricci, and
            <given-names>Marco</given-names>
          </string-name>
          <string-name>
            <surname>Turchi</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Online multitask learning for machine translation quality estimation</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</source>
          , pages
          <fpage>219</fpage>
          -
          <lpage>228</lpage>
          , Beijing, China, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Rajen</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          , Marco Turchi, and
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Negri</surname>
          </string-name>
          .
          <year>2015a</year>
          .
          <article-title>The fbk participation in the wmt15 automatic post-editing shared task</article-title>
          .
          <source>In Proceedings of the Tenth Workshop on Statistical Machine Translation</source>
          , pages
          <fpage>210</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Rajen</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          , Marion Weller, Matteo Negri, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          . 2015b.
          <article-title>Exploring the planet of the apes: a comparative study of state-of-the-art methods for mt automatic post-editing</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>156</fpage>
          -
          <lpage>161</lpage>
          ,
          <year>July</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Rajen</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          , Jose´ G. C. de Souza, Matteo Negri, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The fbk participation in the wmt 2016 automatic post-editing shared task</article-title>
          .
          <source>In Proceedings of the First Conference on Machine Translation</source>
          , pages
          <fpage>745</fpage>
          -
          <lpage>750</lpage>
          , Berlin, Germany, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Chiang</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Hope and fear for discriminative training of statistical translation models</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>13</volume>
          (Apr):
          <fpage>1159</fpage>
          -
          <lpage>1187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Jonathan H Clark</surname>
          </string-name>
          , Chris Dyer,
          <source>Alon Lavie, and Noah A Smith</source>
          .
          <year>2011</year>
          .
          <article-title>Better hypothesis testing for statistical machine translation: Controlling for optimizer instability</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>176</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Denkowski</surname>
          </string-name>
          , Chris Dyer, and
          <string-name>
            <given-names>Alon</given-names>
            <surname>Lavie</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning from post-editing: Online model adaptation for statistical machine translation</article-title>
          .
          <source>In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>395</fpage>
          -
          <lpage>404</lpage>
          , April.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>Lo¨ıc Dugast, Jean Senellart</article-title>
          , and
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Statistical post-editing on systran's rule-based translation system</article-title>
          .
          <source>In Proceedings of the Second Workshop on Statistical Machine Translation</source>
          , pages
          <fpage>220</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          , Nicola Bertoldi, and
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Cettolo</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Irstlm: an open source toolkit for handling large scale language models</article-title>
          .
          <source>In Proceedings of Interspeech</source>
          , pages
          <fpage>1618</fpage>
          -
          <lpage>1621</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          , Nicola Bertoldi, Mauro Cettolo, Matteo Negri, Marco Turchi, Marco Trombetti, Alessandro Cattelan, Antonio Farina, Domenico Lupinetti, Andrea Martines, Alberto Massidda, Holger Schwenk, Lo¨ıc Barrault, Frederic Blain, Philipp Koehn,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Buck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ulrich</given-names>
            <surname>Germann</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The matecat tool</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>132</lpage>
          , Dublin, Ireland,
          <year>August</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Qin</given-names>
            <surname>Gao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Vogel</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Parallel implementations of word alignment tool</article-title>
          .
          <source>In Proceedings of Software Engineering, Testing, and Quality Assurance for Natural Language Processing</source>
          , pages
          <fpage>49</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Hardt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jakob</given-names>
            <surname>Elming</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Incremental re-training for post-editing smt</article-title>
          .
          <source>In Proceedings of AMTA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          , Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al.
          <year>2007</year>
          .
          <article-title>Moses: Open source toolkit for statistical machine translation</article-title>
          .
          <source>In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. System Demonstrations</source>
          , pages
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Statistical significance tests for machine translation evaluation</article-title>
          .
          <source>In Proceedings of EMNLP</source>
          , pages
          <fpage>388</fpage>
          -
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          Antonio L Lagarda, Daniel Ortiz-Mart¨ınez, Vicent Alabau, and Francisco Casacuberta.
          <year>2015</year>
          .
          <article-title>Translating without in-domain corpus: Machine translation post-editing with online learning techniques</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>109</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Prashant</given-names>
            <surname>Mathur</surname>
          </string-name>
          , Mauro Cettolo, Marcello Federico, and
          <string-name>
            <surname>FBK-Fondazione</surname>
            <given-names>Bruno</given-names>
          </string-name>
          <string-name>
            <surname>Kessler</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Online learning approaches in computer assisted translation</article-title>
          .
          <source>In Proceedings of the Eighth Workshop on Statistical Machine Translation, ACL</source>
          , pages
          <fpage>301</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Mehdad</surname>
          </string-name>
          , Matteo Negri, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Match without a Referee: Evaluating MT Adequacy without Reference Translations</article-title>
          .
          <source>In Proceedings of the Machine Translation Workshop (WMT2012)</source>
          , pages
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          , Montre´al, Canada, June.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Daniel</surname>
          </string-name>
          Ortiz-Mart¨ınez and Francisco Casacuberta.
          <year>2014</year>
          .
          <article-title>The new thot toolkit for fully-automatic and interactive statistical machine translation</article-title>
          .
          <source>In 14th Annual Meeting of the European Association for Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Kishore</given-names>
            <surname>Papineni</surname>
          </string-name>
          , Salim Roukos, Todd Ward, and
          <string-name>
            <given-names>WeiJing</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Bleu: a method for automatic evaluation of machine translation</article-title>
          .
          <source>In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Abdol</given-names>
            <surname>Hamid Pilevar</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Using statistical postediting to improve the output of rule-based machine translation system</article-title>
          .
          <source>IJCSC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Michel</given-names>
            <surname>Simard</surname>
          </string-name>
          and
          <string-name>
            <given-names>George</given-names>
            <surname>Foster</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Pepr: Postedit propagation using phrase-based statistical machine translation</article-title>
          .
          <source>In Proceedings of the XIV MT Summit</source>
          , pages
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Michel</given-names>
            <surname>Simard</surname>
          </string-name>
          , Cyril Goutte, and
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Isabelle</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Statistical Phrase-Based Post-Editing</article-title>
          .
          <source>In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>508</fpage>
          -
          <lpage>515</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Snover</surname>
          </string-name>
          , Bonnie Dorr, Richard Schwartz, Linnea Micciulla,
          <string-name>
            <given-names>and John</given-names>
            <surname>Makhoul</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>A study of translation edit rate with targeted human annotation</article-title>
          .
          <source>In Proceedings of AMTA</source>
          , pages
          <fpage>223</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Midori</given-names>
            <surname>Tatsumi</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Correlation between automatic evaluation metric scores, post-editing speed, and some other factors</article-title>
          .
          <source>In Proceedings of the XII MT Summit</source>
          , pages
          <fpage>332</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Ehara</given-names>
            <surname>Terumasa</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Rule based machine translation combined with statistical post editor for japanese to english patent translation</article-title>
          .
          <source>In Proceedings of the XI MT Summit</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          , Matteo Negri, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Coping with the Subjectivity of Human Judgements in MT Quality Estimation</article-title>
          .
          <source>In Proc. of the Eighth Workshop on Statistical Machine Translation</source>
          , pages
          <fpage>240</fpage>
          -
          <lpage>251</lpage>
          , Sofia, Bulgaria. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>Joern</given-names>
            <surname>Wuebker</surname>
          </string-name>
          , Spence
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>and John DeNero.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Hierarchical incremental adaptation for statistical machine translation</article-title>
          .
          <source>In Proceedings of EMNLP</source>
          , pages
          <fpage>1059</fpage>
          -
          <lpage>1065</lpage>
          , September.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>