=Paper= {{Paper |id=None |storemode=property |title=Machine Translation within One Language as a Paraphrasing Technique Online |pdfUrl=https://ceur-ws.org/Vol-1214/1.pdf |volume=Vol-1214 |dblpUrl=https://dblp.org/rec/conf/itat/BarancikovaT14 }} ==Machine Translation within One Language as a Paraphrasing Technique Online== https://ceur-ws.org/Vol-1214/1.pdf
V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 1–6
http://ceur-ws.org/Vol-1214, Series ISSN 1613-0073, c 2014 P. Barančíková, A. Tamchyna



     Machine Translation within One Language as a Paraphrasing Technique

                                                  Petra Barančíková, Aleš Tamchyna

                                             Institute of Formal and Applied Linguistics
                                   Charles University in Prague, Faculty of Mathematics and Physics
                                           Malostranské náměstí 25, Prague, Czech Republic
                                          {barancikova,tamchyna}@ufal.mff.cuni.cz

Abstract: We present a method for improving machine                      to adjust some MT tools to translate within a single lan-
translation (MT) evaluation by targeted paraphrasing of                  guage for targeted paraphrasing.
reference sentences. For this purpose, we employ MT sys-
tems themselves and adapt them for translating within a
single language. We describe this attempt on two types of                2    Related Work
MT systems – phrase-based and rule-based. Initially, we
                                                                         In [3], a significant improvement in correlation of BLEU
experiment with the freely available SMT system Moses.
                                                                         with human judgment was achieved by targeted paraphras-
We create translation models from two available sources of
                                                                         ing of Czech reference sentences. However, the best re-
Czech paraphrases – Czech WordNet and the Meteor Para-
                                                                         sults were acquired using a simple greedy algorithm for
phrase tables. We extended Moses by a new feature that
                                                                         one-word paraphrase substitution, which does not allow
makes the translation targeted. However, the results of this
                                                                         word order changes and other alternation of reference
method are inconclusive. In the view of errors appearing
                                                                         sentence. The grammatical correctness was achieved by
in the new paraphrased sentences, we propose another so-
                                                                         applying Depfix [22], an automatic post-editing system,
lution – targeted paraphrasing using parts of a rule-based
                                                                         originally designed for improving quality of phrase-based
translation system included in the NLP framework Treex.
                                                                         English-to-Czech machine translation outputs.
                                                                            [13] used lexical substitution and contextual evaluation
1    Introduction                                                        to improve the accuracy of Chinese-to-English MT evalua-
                                                                         tion. In [16], targeted paraphrasing via SMT is used to im-
In this paper, we examine the possibility of improving ac-               prove SMT itself during the parameter optimization phase
curacy of metrics for automatic evaluation of MT systems                 of machine translation. Correct hypotheses are no longer
by the machine translation itself.                                       needlessly penalized due to not having similar wording to
   The first metric correlating well with human judg-                    a corresponding reference sentence.
ment was BLEU [20] and it still remains the most com-                       There are MT evaluation metrics which utilize para-
mon metric for MT evaluation, even though other, better-                 phrasing to improve the accuracy of MT evaluation ([24],
performing metrics exist. [15]                                           [27]). Only one of them – METEOR [10] is available for
                                                                         the Czech language. However, its paraphrase tables are so
   BLEU is computed from the number of phrase overlaps
                                                                         noisy that they actually harm the performance of the met-
between the translated sentence and the corresponding ref-
                                                                         ric [2], as it can award mistranslated and even untranslated
erence sentences, i.e., translations made by a human trans-
                                                                         words.
lator. However, the standard practice is using only one ref-
erence sentence and BLEU then tends to perform badly.
   As there are many translations of a single sentence, even             3    Data
a perfectly correct machine translation might get a low
score due to disregarding synonyms and paraphrase ex-                    We perform our experiments on data from the English-to-
pressions. This is especially valid for morphologically rich             Czech translation task of WMT12 [8]. The data set con-
languages like the Czech language. [7]                                   tains 13 files with Czech outputs of MT systems and one
   We aim to achieve higher accuracy of MT evaluation by                 file with corresponding reference sentences.
targeted paraphrasing of reference sentences, i.e. creating                 The human evaluation of system outputs is available as
a new synthetic reference sentence that is still correct and             a relative ranking of performance of five systems for a sen-
keeps the meaning of the original sentence, but at the same              tence. We compute the absolute score of each MT system
time it is closer in wording to the MT output (hypothesis).                                                                  wins
                                                                         by the “> others” method [6]. It is computed as wins+losses .
   There is a close resemblance between translation and                  We refer to this score as human judgment from now on.
paraphrasing. They both attempt to preserve the meaning                     We use two available sources of Czech paraphrases –
of a sentence, the first one between two languages and the               the Czech WordNet 1.9 PDT [19] and the Czech Meteor
second one within one language by different word choice.                 Paraphrase Tables [9]. Czech WordNet 1.9 PDT contains
[16] However, there are many more tools for MT than for                  high quality lemmatized paraphrases, but it is too small for
paraphrasing. Therefore, it seems only natural to attempt                our purposes.
2                                                                                                                P. Barančíková, A. Tamchyna


   On the other hand, the Czech Meteor Paraphrase tables                     4.1   Language model
are large but very noisy. For example, the following pairs
are selected as paraphrases: na poloostrově (in a penin-                    We create the language model (LM) using the SRILM
sula) – šimpanzím mlékem (milk of a chimpanzee), gates                       toolkit [26] on the data from the Czech part of the Czech-
– vrata (gates) or 1873 – pijavice (a leech). We attempt to                  English parallel corpus CzEng [5].
reduce the noise in the following way:
                                                                             4.2   Phrase models
    1. We keep only pairs consisting of single words, since
       we were not successful in reducing the noise effec-                   Each entry in Moses phrase tables contains a phrase, its
       tively for the multi-word paraphrases. [3]                            translation, several feature scores (translation probability,
    2. We perform morphological analysis using Morče [25]                   lexical weight etc.), and optionally also alignment within
       and replace the word forms with their lemmas.                         the phrase and frequencies of phrases in the training data.
                                                                             The phrase tables are learned automatically from large par-
    3. We keep only pairs of different lemmas.                               allel data. As we do not have any large corpora of Czech-
                                                                             Czech parallel data, we create the following two “fake”
    4. We dispose of pairs of words that differ in their parts
                                                                             translation models for paraphrasing from our paraphrase
       of speech.
                                                                             tables.
    5. We dispose of pairs of words that contain an unknown
       word (typically a foreign word).                                        • Enhanced Meteor tables
                                                                                 This table was created from the Czech Paraphrase
   The last two rules have a single exception – paraphrases                      Meteor table. It was constructed via pivoting. [1]
consisting of numeral and corresponding digits, e.g., osm-                       The pivot method is an inexpensive way of acquiring
náct (eighteen) and 18.1 These paraphrases are very com-                         paraphrases from large parallel corpora. It is based on
mon in the data.                                                                 the assumption that two phrases that share a meaning
   This way we reduce almost 700k pairs of paraphrases                           may have a same translation in a foreign language.
to only 32k couples of lemmas. All previous examples of                          [11]
incorrect paraphrases were removed. We refer to this new                           Each paraphrase pair comes with a pivoting score
lemmatized paraphrase table as filtered Meteor.                                    which we adapt as a feature in out phrase table. How-
                                                                                   ever, this score turns out to be even worse then ran-
4     Moses                                                                        dom selection [3], so we do not expect it to get a high
                                                                                   weight in tuning.
Moses [14] is a freely available statistical machine trans-                        For that reason, we add our own paraphrase scores,
lation engine. In a nutshell, statistical machine transla-                         acquired by distributional semantics. Distributional
tion involves the following phases: creating language and                          semantics assumes that two phrases are semantically
translation models, parameter tuning and decoding. We                              similar if their contextual representations are similar.
use Moses in the phrase-based setting.                                             [17]
   A language model is responsible for a correct word or-
der and grammatical correctness of the translated sentence.                        We collect all contexts (words in a window of limited
A translation model (phrase table) supplies all possible                           size) in which Meteor paraphrases occur in the Czech
translations of a word or a phrase. Models are assigned                            National Corpus [28] and then measure context sim-
weights which are learned during the parameter tuning                              ilarity (cosine distance, taking into account the num-
phase.                                                                             ber of word occurrences) for each pair of paraphrases.
   During the decoding phase, all these models are com-                            We add six scores for each pair of paraphrases ac-
bined to maximize ∑i λi φi ( f¯, ē), where λi is a weight of a                    cording to the size of the context window used (1-3
the sub-model φi and f¯, ē is a hypothesis and source sen-                        words) and whether word order played a role in the
tence, respectively. In our case, we want to make a ref-                           context.
erence sentence closer to a corresponding machine trans-
lation output – ē is the reference sentence and f¯ is a new                   • One-word paraphrase table
synthetic reference.                                                             We first create a set of all words from Czech side of
   On its own, this setting could create paraphrases, but                        CzEng appearing at least five times to exclude rare
they would be just random paraphrases of the reference                           words and possible typos. We also add all words
sentence – their similarity in wording to our original hy-                       appearing in the MT outputs and the reference sen-
potheses would not be guaranteed. Therefore, we also add                         tences. Morphological analysis of the words was then
a new feature for targeted paraphrasing to Moses.                                performed using Morče.
     1 0smnáct has the part of speech C, which is designated for numerals,         For every word x from this set, we add to this transla-
18 is marked with X meaning it is an unknown word for the morphologi-              tion table every pair of words that fulfills at least on
cal analyzer.                                                                      of the following requirements:
Machine Translation within One Language as a Paraphrasing Technique                                                                     3


                   setting         reference sentence used                                            correlation    avg. BLEU
                  Baseline         original reference sentence, no paraphrasing                          0.75           12.8
              Paraphrased          paraphrased by Moses using MERT-learned weights                       0.50           15.8
                  LM+0.2           paraphrased by Moses with LM weight increased by 0.2                  0.24            9.1
                  LM+0.4           paraphrased by Moses with LM weight increased by 0.4                  0.22            6.7

Table 1: Description of basic settings and the results - Pearson’s correlation of BLEU and the human judgment, the
average BLEU scores.


                           Source                        Paclík claims he would dare to manage the association.
                           Baseline                               Paclík tvrdí , že by si na vedení asociace troufl.
                                                             Paclík claims he would dare to lead the association.
                           Hypothesis                              Paclík tvrdí, že by se odvážil k řízení komory.
                                                 Paclík claims he would find the courage to control the chamber.
                           Paraphrased                          Paclík tvrdí, že by se na řízení organizace troufl.
                                                       *Paclík claims he would dare to control the organization.
                           LM+0.2                                Paclík tvrdí, že by si troufl na řízení ekonomiky.
                                                            Paclík claims he would dare to control the economy.
                           LM+0.4                                        Říká se, že Paclík si troufl na řídící rady.
                                                              They say that Paclík ventured to governing boards.

Figure 1: Example of the targeted paraphrasing. The hypothesis is grammatically correct and has very similar meaning as
the source sentence. The new reference is closer in wording to the hypothesis, but there is an error in a word choice. The
sentences created with increased weights of the language model are both grammatically correct, but the sentence lost its
original meaning.



           – x, x (not every word should be paraphrased)                     of words from the hypothesis confirmed by the reference
           – x, y, if lemma of x is lemma of y (some word                    translation.
             might have different morphology in the para-                       Integration into the beam search algorithm used in
             phrased sentence)                                               phrase-based decoding requires us to keep track of feature
                                                                             state (i.e. reference words covered) to allow for correct
           – x, y, if lemma of x and lemma of y are para-                    hypothesis recombination. We also implemented an es-
             phrases according to Czech WordNet PDT 1.9.                     timator of future phrase score, defined as the number of
           – x, y, if lemma of x and lemma of y are para-                    reference translation words covered by the given phrase.
             phrases according to the filtered Meteor.                       Our code is included in Moses.3

       These categories constitute the first four scores in the
       phrase table. A pair of words gets score e if they fall               4.4    Parameter tuning
       in a given category, 1 (e0 ) otherwise.2 This phrase                  We use the minimum error rate training (MERT) [18] to
       table contains more than 1,100k pairs of words.                       find the optimal weights for our models. MERT asserts
       We add another score expressing POS tag similarity                    the weights to maximize the translation quality, which is
                                                  1
       between the two words. It is computed e a+1 , where                   measured with BLEU. We employ the reference sentences
       a is the minimal Hamming distance between tags of                     and the highest rated MT output as the parallel data for
       the words. This probability should reflect how mor-                   tuning.
       phologically distant the paraphrases are.                                This method, however, turned out not to be optimal for
                                                                             our setting. Our feature for targeted paraphrasing naturally
                                                                             obtains the highest weight as it provides an oracle guide
4.3     Feature for targeted paraphrasing                                    towards the hypothesis.
In order to steer the MT decoder (translation engine) in                        Other important models, e.g. the language model, get
the direction of the hypotheses, we implemented an addi-                     comparably very small weights. The paraphrased sen-
tional feature for Moses which measures the overlap with                     tences tend to be closer to the hypothesis, but not gram-
the hypothesis. In order to keep its computation tractable                   matically correct. Therefore, we experiment with increas-
during search, the overlap is defined simply as the number                   ing the weight of the language model manually.

      2 Phrase-table scores are considered log-probabilities.                      3 https://github.com/moses-smt/mosesdecoder/
4                                                                                                    P. Barančíková, A. Tamchyna


                          setting       reference sentence used                     correlation    avg. BLEU
                         Lexical        Only one-word paraphrase table                 0.56           15.1
               Lexical & LM+0.2         Lexical and LM weight increased by 0.2         0.33            9.5
                       Monotone         Lexical and monotone translation               0.61           18.1

            Table 2: Additional settings and the results – Pearson’s correlation and the average BLEU scores.



5 Results                                                         meaning of the sentences is shifted. In the LM+0.4 setting,
                                                                  they also differ a lot in wording from both the hypothesis
We compare four different basic settings, the results are         and the reference sentence.
presented in Table 1 as the Pearson’s correlation coeffi-            Based on such poor results, we decided to experiment
cient of BLEU and the human judgment. A visualization             with three more settings (see Table 2). We omit the En-
of the results is shown in Figure 2. The baseline score is        hanced Meteor tables as they brought most of the noise to
not exceeded by any of our paraphrasing methods, in con-          the translation. One of the common errors using the Para-
trast to our previous results ([2], [3]).                         phrased setting is scrambled word order (often, punctua-
   There are several reasons for the clear decrease in cor-       tion appeared in the middle of the sentences). We attempt
relation with paraphrased references. Hypotheses gener-           to fix that by using monotone translation (i.e. by disabling
ated by the Paraphrased setting, while obtaining a signif-        reordering).
icantly higher BLEU score, were mostly ungrammatical                 These constraints improve the correlation with human
and reduced the correlation of our metric.                        judgment. However, they still do not overcome the base-
   The small weight of the language model seems to be the         line results.
problem, but its increase brings even more chaos. It cre-
ates hypotheses which are nice and grammatically correct
but often wholly unrelated to the source sentence.                6 Conclusion
   This shows that our paraphrase table noise filtering was
by no means sufficient and there is still a lot of noise in our   We experiment with paraphrasing using the phrase-based
phrase tables. Furthermore, the MT output might be far            machine translation system Moses. We show that it is a
from being a correct sentence – given the high weight for         universal tool that can be used for other purposes than ma-
the targeted paraphrase feature, we essentially transform         chine translation directly. Within Moses, we introduced a
the correct reference sentences to incorrect hypotheses at        new feature for targeted paraphrasing and artificial phrase
all cost, using our noisy phrase tables.                          tables for paraphrasing.
   Our targeting feature is also not ideal – it ignores word         However, our results are inconclusive and the correla-
order and operates only on the word level (it does not            tion with human judgment drops. It is caused mainly by
model phrases). Ungrammatical translations with scram-            the high amount of noise in our translation tables and not
bled word order are considered perfectly fine so long as the      well balanced trade-off between paraphrasing and the lan-
translation contains the same words as the reference. So          guage model.
while the feature does provide a kind of oracle, it does not
guarantee reaching the best possible translation in terms of
BLEU score, let alone a grammatical translation.                  7 Future Work
   Another problem is illustrated by very small weights
assigned to our translation models. In fact, the highest          Based on our results, Moses does not seem to be the op-
weight was assigned to the tag similarity feature. This           timal tool for our task, especially unless we have at our
shows that our model features (Meteor score and distri-           disposal better paraphrasing tables. A new paraphrase
butional similarity scores) fail to distinguish good para-        database PPDB [12] for Czech language should be re-
phrases from the noise.                                           leased any time now.
   The combination of noise in the translation tables and            Furthermore, there may be a better solution than a
the boosted language model then caused that during the            phrase-based translation system, namely Treex [21], a
decoding phase, the most common paraphrase according              highly modular NLP software system. Treex was devel-
to the language model with a similar tag got the prefer-          oped for TectoMT, which is a rule-based machine transla-
ence.                                                             tion system that operates on deep syntactic layer.
   Figure 1 represents an example of our paraphrasing                Treex implements the stratificational approach to lan-
method. The hypothesis is grammatically correct and has           guage, adopted from the Functional Generative Descrip-
a very similar meaning as the reference sentence. The new         tion theory [23] and its later extension by the Prague De-
paraphrased reference is slightly closer in wording to the        pendency Treebank [4]. It represents sentences in four lay-
hypothesis, but there is an error due to a bad word choice.       ers: word layer, morphological layer, shallow-syntax layer
The boosted language model reduces errors, however the            and deep-syntax layer (tectogrammatical layer).
Machine Translation within One Language as a Paraphrasing Technique                                                                  5




                               20   Baseline       Paraphrased           LM+0.2             LM+0.4

                               18
                               16
                               14
                               12
                        BLEU




                               10
                               8
                               6
                               4
                                2 0.4     0.6        0.4     0.6      0.4       0.6        0.4     0.6
                                                           human judgment

Figure 2: Visualization of BLEU and human judgment for the four basic settings. We add the linear regression lines to
better demonstrate the linear correlation.



   We can transfer both hypothesis and reference sentence             cation, Youth and Sports of the Czech Republic (project
to the morphological layer, where we can extract lemmas               LM2010013).
that appear in only one of the sentences. Those after fil-
tering according to our paraphrase tables represent candi-
dates for substitution. Furthermore, we are able to transfer          References
a reference sentence to a tectogrammatical layer, where we
can replace individual lemmas from the hypothesis with                 [1] Colin Bannard and Chris Callison-Burch. Paraphrasing
their paraphrases and corresponding grammatemes. Then                      with Bilingual Parallel Corpora. In Proceedings of the 43rd
we transfer the altered reference sentence back to the word                Annual Meeting on Association for Computational Linguis-
layer.                                                                     tics, ACL ’05, pages 597–604, Stroudsburg, PA, USA,
   This way should easily overcome some of the problems                    2005. Association for Computational Linguistics.
that appear when paraphrasing using Moses. First of all,               [2] Petra Barančíková. Parmesan: Meteor without Paraphrases
we only compare two sentences and there is less space for                  with Paraphrased References. In Proceedings of the Ninth
the noise to interfere. Also there is highly developed ma-                 Workshop on Statistical Machine Translation, WMT ’14,
chinery to avoid ungrammatical sentences. We can change                    Stroudsburg, PA, USA, 2014. Association for Computa-
only parts of sentences that are dependent on the changed                  tional Linguistics.
word, thus keeping the rest of the sentence correct and cre-           [3] Petra Barančíková, Rudolf Rosa, and Aleš Tamchyna. Im-
ating more conservative reference sentences.                               proving Evaluation of English-Czech MT through Para-
                                                                           phrasing. In Proceedings of the Ninth International Confer-
                                                                           ence on Language Resources and Evaluation (LREC’14),
8    Acknowledgment                                                        Reykjavik, Iceland, May 2014. European Language Re-
                                                                           sources Association (ELRA). ACL Anthology Identifier:
                                                                           L14-1711.
We would like to thank Ondřej Bojar for his helpful sug-              [4] Eduard Bejček, Eva Hajičová, Jan Hajič, Pavlína Jínová,
gestions and technical advice within the NPFL101 class.                    Václava Kettnerová, Veronika Kolářová, Marie Mikulová,
This research was supported by the following grants:                       Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lu-
1356213 of the Grant Agency of the Charles Univer-                         cie Poláková, Magda Ševčíková, Jan Štěpánek, and Šárka
sity, SVV project number 260 104 and FP7-ICT-2011-                         Zikánová. Prague Dependency Treebank 3.0, 2013.
7-288487 (MosesCore). This work has been using lan-                    [5] Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra
guage resources developed and/or stored and/or distributed                 Galuščáková, Martin Majliš, David Mareček, Jiří Maršík,
by the LINDAT/CLARIN project of the Ministry of Edu-                       Michal Novák, Martin Popel, and Aleš Tamchyna. The Joy
6                                                                                                        P. Barančíková, A. Tamchyna


     of Parallelism with CzEng 1.0. In Proc. of LREC, pages              Computer Science, University of Maryland College Park,
     3921–3928. ELRA, 2012.                                              2010.
 [6] Ondřej Bojar, Miloš Ercegovčević, Martin Popel, and         [17] George A. Miller and Walter G. Charles. Contextual corre-
     Omar F. Zaidan. A Grain of Salt for the WMT Manual                  lates of semantic similarity. Language and Cognitive Pro-
     Evaluation. In Proceedings of the Sixth Workshop on Statis-         cesses, 6(1):1–28, 1991.
     tical Machine Translation, WMT ’11, pages 1–11, Strouds-       [18] Franz Josef Och. Minimum Error Rate Training in Statis-
     burg, PA, USA, 2011. Association for Computational Lin-             tical Machine Translation. In Proceedings of the 41st An-
     guistics.                                                           nual Meeting on Association for Computational Linguis-
 [7] Ondřej Bojar, Kamil Kos, and David Mareček. Tackling              tics - Volume 1, ACL ’03, pages 160–167, Stroudsburg, PA,
     Sparse Data Issue in Machine Translation Evaluation. In             USA, 2003. Association for Computational Linguistics.
     Proceedings of the ACL 2010 Conference Short Papers,           [19] Karel Pala and Pavel Smrž. Building Czech WordNet. In
     ACLShort ’10, pages 86–91, Stroudsburg, PA, USA, 2010.              Romanian Journal of Information Science and Technology,
     Association for Computational Linguistics.                          7:79–88, 2004.
 [8] Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt       [20] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing
     Post, Radu Soricut, and Lucia Specia. Findings of the 2012          Zhu. BLEU: A Method for Automatic Evaluation of Ma-
     Workshop on Statistical Machine Translation. In Seventh             chine Translation. In Proceedings of the 40th Annual Meet-
     Workshop on Statistical Machine Translation, pages 10–51,           ing on Association for Computational Linguistics, ACL
     Montréal, Canada, 2012.                                             ’02, pages 311–318, Stroudsburg, PA, USA, 2002. Asso-
 [9] Michael Denkowski and Alon Lavie. METEOR-NEXT                       ciation for Computational Linguistics.
     and the METEOR Paraphrase Tables: Improved Evaluation          [21] Martin Popel and Zdeněk Žabokrtský. TectoMT: Modu-
     Support For Five Target Languages. In Proceedings of the            lar NLP Framework. In Proceedings of the 7th Interna-
     ACL 2010 Joint Workshop on Statistical Machine Transla-             tional Conference on Advances in Natural Language Pro-
     tion and Metrics MATR, 2010.                                        cessing, IceTAL’10, pages 293–304, Berlin, Heidelberg,
[10] Michael Denkowski and Alon Lavie. Meteor Universal:                 2010. Springer-Verlag.
     Language Specific Translation Evaluation for Any Target        [22] Rudolf Rosa, David Mareček, and Ondřej Dušek. DEPFIX:
     Language. In Proceedings of the EACL 2014 Workshop on               A System for Automatic Correction of Czech MT Outputs.
     Statistical Machine Translation, 2014.                              In Proceedings of the Seventh Workshop on Statistical Ma-
[11] Helge Dyvik. Translations as semantic mirrors: from paral-          chine Translation, WMT ’12, pages 362–368, Stroudsburg,
     lel corpus to wordnet. In Proceedings of the Workshop Mul-          PA, USA, 2012. Association for Computational Linguis-
     tilinguality in the lexicon II at the 13th biennial European        tics.
     Conference on Artificial Intelligence (ECAI’98), pages 24–     [23] Petr Sgall. Generativní popis jazyka a česká deklinace.
     44, Brighton, UK, 1998.                                             Number v. 6 in Generativní popis jazyka a česká deklinace.
[12] Juri Ganitkevitch and Chris Callison-Burch. The Multilin-           Academia, 1967.
     gual Paraphrase Database. In Proceedings of the Ninth In-      [24] Matthew G. Snover, Nitin Madnani, Bonnie Dorr, and
     ternational Conference on Language Resources and Evalu-             Richard Schwartz. TER-Plus: Paraphrase, Semantic, and
     ation (LREC’14), Reykjavik, Iceland, may 2014. European             Alignment Enhancements to Translation Edit Rate. Ma-
     Language Resources Association (ELRA).                              chine Translation, 23(2-3):117–127, September 2009.
[13] David Kauchak and Regina Barzilay. Paraphrasing for            [25] Drahomíra Spoustová, Jan Hajič, Jan Votrubec, Pavel Kr-
     Automatic Evaluation. In Proceedings of the main con-               bec, and Pavel Květoň. The Best of Two Worlds: Coop-
     ference on Human Language Technology Conference of                  eration of Statistical and Rule-Based Taggers for Czech.
     the North American Chapter of the Association of Com-               In Proceedings of the Workshop on Balto-Slavonic Natu-
     putational Linguistics, HLT-NAACL ’06, pages 455–462,               ral Language Processing, ACL 2007, pages 67–74, Praha,
     Stroudsburg, PA, USA, 2006. Association for Computa-                2007.
     tional Linguistics.                                            [26] Andreas Stolcke. SRILM - An Extensible Language Mod-
[14] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris                   eling Toolkit. pages 901–904, 2002.
     Callison-Burch, Marcello Federico, Nicola Bertoldi,            [27] Liang Zhou, Chin yew Lin, Dragos Stefan Munteanu, and
     Brooke Cowan, Wade Shen, Christine Moran, Richard                   Eduard Hovy. PARAEVAL: Using paraphrases to evaluate
     Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and          summaries automatically. In IN: PROCEEDINGS OF HLT-
     Evan Herbst. Moses: Open Source Toolkit for Statistical             NAACL, pages 447–454, 2006.
     Machine Translation. In Proceedings of the 45th Annual
                                                                    [28] Ústav Českého národního korpusu FF UK. Český národní
     Meeting of the ACL on Interactive Poster and Demonstra-
                                                                         korpus - SYN2010. Praha 2010. Available at WWW:
     tion Sessions, ACL ’07, pages 177–180, Stroudsburg, PA,
                                                                         http://www.korpus.cz.
     USA, 2007. Association for Computational Linguistics.
[15] Matouš Macháček and Ondřej Bojar. Results of the
     WMT13 Metrics Shared Task. In Proceedings of the Eighth
     Workshop on Statistical Machine Translation, pages 45–51,
     Sofia, Bulgaria, August 2013. Association for Computa-
     tional Linguistics.
[16] Nitin Madnani. The Circle of Meaning: From Transla-
     tion to Paraphrasing and Back. PhD thesis, Department of