<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ecient Identification of Formal Analogies</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>RALI/DIRO Universit ́e de Montr ́eal Montr ́eal, Canada</institution>
          ,
          <addr-line>H3C 3J7</addr-line>
        </aff>
      </contrib-group>
      <fpage>77</fpage>
      <lpage>86</lpage>
      <abstract>
        <p>Analogical learning is a lazy learning mechanism which maps input forms (e.g. strings) to output ones, thanks to analogies identified in a training material. It has proven e↵ective in a number of Natural Language Processing (NLP) tasks such as machine translation. One challenge with this approach is the identification of analogies in the training material. In this study, we revisit an o✏ine algorithm that has been proposed for enumerating analogies in a corpus. We extend it in order to scale to larger datasets, as well as to deal with new forms. On a task of translating unfrequent words of Wikipedia, we observe that our approach is much more ecient at identifying analogies than previously published methods, and that the resulting engine competes with a state-of-the-art phrase-based statistical machine translation (SMT) system.</p>
      </abstract>
      <kwd-group>
        <kwd>Analogical Learning</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Formal Analogy</kwd>
        <kwd>Machine Translation</kwd>
        <kwd>Rare Word Translation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Mapping forms of an input space into forms of an output one has many
applications in NLP including machine translation, transliteration,
grapheme-tophoneme conversion, etc. Most solutions to such a canonical problem consist in
learning a mapping function from the input space into the output one, making
use of training examples, that is, pairs of input and output forms. This function
typically learns correspondences between sequences of symbols in both spaces
and uses a so-called decoder for producing an output to an unknown input form.</p>
      <p>Analogical learning is a lazy learning technique that relies on the concept of
proportional analogy (or analogy for short) for producing a solution. An analogy,
noted [x : y :: z : t], is a 4-tuple of entities which reads “x is to y as z is
to t”, as in [silane : silicon :: methane : carbon]. In this work, we concentrate
on formal analogies, that is, analogies at the formal or graphemic level, as in
[weak : weaker :: clean : cleaner].</p>
      <p>Given a training set of pairs of input and output forms {(x, x0)}, and an
unknown input form u, analogical learning produces output entities u0 by searching
input elements in the training set that define with u an analogy [x : y :: z : u].
Those analogies are assumed to carry over the output space; that is, u0 should
be a solution of a so-called analogical equations [x0 : y0 :: z0 :?] built thanks to
the training material. This learning paradigm has been tested in a number of
NLP tasks, including grapheme-to-phoneme conversion [14], machine translation
[9, 6], transliteration [2, 5], unsupervised morphology acquisition [12], as well as
parsing [10, 1].</p>
      <p>Building an analogical device requires in the first place to identify (input)
analogies for a given form u. This is a key challenge where a brute force approach
(enumerating all 3-tuples of forms and checking those that define with u an
analogy) is an operation cubic in the number of forms in the input space. A
more practical solution has been proposed in [7], but this is essentially an online
method, which means that time response at test time (which matters in practice)
is impacted.</p>
      <p>In this paper, we revisit and adapt an o✏ine algorithm presented in [8] which
enumerates all the formal analogies in a training set. We show that this algorithm
is only applicable to small datasets (a few thousand forms) and propose a simple,
yet e↵ective heuristic to scale to much larger datasets. We further adapt this
algorithm to the task of identifying analogies involving a form unseen at training
time. We evaluate our proposal on a task of translating rare words in English
Wikipedia into French.</p>
      <p>In Section 2, we discuss the issue of eciently identifying analogies, and
propose our solution to the problem. We present the datasets we used in Section 3
and report in Section 4 some calibration experiments we conducted that show the
e↵ectiveness of our approach. We applied it to translate rare words in Wikipedia
and our results are reported in Section 5. We conclude our work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Identifying Analogies</title>
      <p>We first introduce our notations. A form w is a sequence of symbols (string) over
an alphabet A = {a1, . . . , a|A|} on which an arbitrary order is assumed. We note
|w|c the number of symbols c in w. The Parikh vector of a form w, noted ⇡ (w)
is defined as a function ⇡ : A? ! N|A| given by: ⇡ (w) = (|w|a1 , . . . , |w|a|A| ). We
use the notation [x : y :: z : t] to denote a formal analogy. An analogical equation
is an analogy where the last form is missing and we note [x : y :: z :?] the set of
its solutions.1 Last, we call I, the set of input forms available at training time.
2.1</p>
      <sec id="sec-2-1">
        <title>Online Searching</title>
        <p>
          A brute force enumeration of analogies is cubic in the size of I and this is
manageable for tiny input spaces (a few hundred forms) only. In [7], the authors
developed a strategy for speeding up the search procedure, which main idea is
to exploit this property of formal analogies [10]:
[x : y :: z : u] ) |x|c + |u|c = |y|c + |z|c 8 c 2 A
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
1 Analogical equation solvers typically produce several solutions to an equation.
        </p>
        <p>
          They describe a data structure which maps any vector v to the set of forms in
I that have v as a Parikh vector: fI (v) = {w 2 I : ⇡ (w) = v}; and an algorithm
for computing ⌧ I,u(x), the set of pairs of forms in the input space that verify the
count constraints imposed by the forms x and u in the right side of Equation 1:
⌧ I,u(x) = {(y, z) 2 I2 : ⇡ (x) + ⇡ (u) = ⇡ (y) + ⇡ (z)}
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
        <p>Their search strategy consists in computing ⌧ u(x) for all forms x in I, then
checking that the resulting tuples (x, y, z, u) are true analogies.2 For input spaces
of more than several thousand elements, it it necessary to sample the forms x,
thus risking to miss useful analogies.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>O✏ine Searching</title>
        <p>In [8], Lepage describes an algorithm for enumerating all the analogies in a set of
forms. A tree-structure is proposed which encodes the set ⌃ of all the di↵erences
of 2 Parikh vectors — we call them signatures — one might find in this set of
forms, and which associates to each signature the set of pairs of forms — we call
them stems — that have this di↵erence. The details of the data structure and
the algorithm for building it are rather intricate, but for the sake of our study,
it suces to think at this data structure as a function g : N|A| ! I2 given by
gI ( ) = {(x, y) 2 I2 : ⇡ (x) ⇡ (y) = }. Any pair of stems in gI ( ) verifies the
right hand side of Equation 1 and are therefore candidate analogies. We qualify
a signature as productive, if at least two of its stems define an analogy. We
note (x) y the fact that ⇡ (x) ⇡ (y) = .</p>
        <p>
          Figure 1 illustrates two productive signatures and a few of their stems. Only
symbols with a non null count are represented. For instance, the signature =
e(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).y(-1).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) “encodes” how adjectives in English may be derived from
nouns. We have (alternity) alternative because we can transform the first
form into the second one by removing symbol y and adding (in a specific order
not prescribed by the signature) symbols e, a and v . Note that some stems may
be characterized as noisy, as for instance (tiny ,native) since native is not an
adjective derived from tiny .
        </p>
        <p>
          e(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).y(-1).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(fixity,fixative)
(lubricity,lubricative)
(alternity,alternative)
(tiny,native)
i(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).t(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(purge,purgative)
(compare,comparative)
(administer,administrative)
(cure,curative)
2 Analogy checking is carried out by the polynomial algorithm described in [11].
        </p>
        <p>There are at least two issues with the o✏ine strategy proposed by Lepage.
The first one is scaling. As we shall see in Section 4, on a 8Gb memory computer,
we found the construction of g practical only for input spaces of around 5 000
forms. The second issue is that in applications such as machine translation, we
are interested in treating new (unknown) forms, and this requires to run the
algorithm each time a new form has to be treated. We describe in the sequel our
solutions to both issues.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Scaling to larger Datasets</title>
        <p>We tried two strategies to the scaling issue. A first one restrains the signatures
acquired, for instance by limiting the number of symbols of the alphabet on
which two forms of a stem can disagree. A limit of 3 would for instance reject
the two signatures exemplified in Figure 1, since they both involve 4 symbols.
We show in Section 4 that this heuristic allows the approach of Lepage to scale
nicely to larger datasets. We found however in the application we study here
(machine translation) that by doing so, we loose useful signatures, therefore
useful analogies. Also, by relaxing the constraints on the signatures considered,
we rapidly face scaling issue.</p>
        <p>Thus, we tested a simple yet practical two-pass approach: we first collect the
signatures observed on a subset of the input forms. In a second pass, we consider
the entire input space and collect stems concerned only by those signatures. Of
course, a mix of both strategies can be considered, and we found such a mix
adequate in our translation experiments (see Section 5).
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Dealing with new Forms</title>
        <p>In many applications of interest, we want to identify analogies [x : y :: z : u],
for a form u unseen at training time. We propose an online strategy for doing
so which assumes the existence of a precomputed function g, that is, a set of
signatures ⌃ and their associated stems computed on the training set. We collect
all the forms z in I that can be transformed by one signature into u. Formally,
we compute:</p>
        <p>
          SI (u) = {( , z) : 2 ⌃ , z 2 I, (z) u} (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
        <p>For any pair ( , z) in SI (u), gI ( ) gathers the stems (x, y) in I such that
tuples (x, y, z, u) verify the right hand side of Equation 1. This is illustrated
in Figure 2 where for instance (probity , probative, adversity , adversative) is a
(candidate) tuple identified for the form adversative unseen at training time.</p>
        <p>
          Thanks to the data structure involved in Lepage’s algorithm, computing
SI (u) is time ecient 3 and most of the online computation for dealing with
a new form is spent to check among the candidates, those that are analogies.
3 A few milliseconds in our experiments. The description of the computation would
require too many new notations. Conceptually, we compute the intersection of two
rational languages: one that recognizes all forms in I, the other that recognizes all
the forms reachable from u by any signature in ⌃ .
S(adversative) = {
(adversity, e(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).y(-1).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )), (adverse, i(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).t(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ))
(derivative, i(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).a(-1).s(-1)), (revised, a(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ).t(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).v(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )), . . . }
We downloaded the June 2013 Wikipedia dump in English (4 262 946 articles)
and French (1 398 932 articles). We concentrated on unfrequent English words,
that is, words seen at most 25 times.4 Those words are typically less easy to
translate by automatic methods, and are more dicult to find in bilingual lexicons,
parallel or comparable ressources. Many rare words in Wikipedia are named
entities (persons, places, etc.) that are not necessarily interesting from a translation
perspective (at least for the English-French language pair).
        </p>
        <p>We intersected the less frequent words in Wikipedia with a large in-house
bilingual lexicon (107 799 entries), which led us to 35 209 English words with
at least one (but possibly several) reference translations. We split them in two
disjoint parts: a training set of 32 245 English words (80 311 English-French
pairs), and a test set of 1000 English words seen less than 25 times in English
Wikipedia and their translations (1 219 pairs).</p>
        <p>Some English words are close to their translation (e.g. hexametric/hexam`etre),
while others are not, as unloose which translates — according to our bilingual
lexicon — into desserrer, d´etendre, and relˆacher.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Identifying Analogies in a Corpus</title>
      <p>We first applied our implementation of Lepage’s algorithm for identifying
analogies in a set of forms I. We soon realized that the number of signatures found
by Lepage’s algorithm increases very fast with the number of forms considered
in the input space. Table 1 (left part) reports the main characteristics of the
algorithm as a function of the number of words considered. On a 8Gb
memory computer, we could only treat 5000 words (line 3). The data structure we
built for encoding the function gI described in section 2.2 contains over 166
millions nodes, and no less than 11 967 571 signatures were collected. Most of them
(96.4%) involve only one stem, which means that no analogy would be identified.
We found only 9 856 (0.08%) productive signatures, leading to 762 259 candidate
analogies; 15 814 (2.1%) of them being formal analogies.
4 6.8 million words (92%) in English Wikipedia occur less than 26 times.</p>
      <p>Having a two-pass strategy allows to visit the full training set, keeping fixed
the productive signatures identified in the first pass. The impact of this is
characterized in the right part of Table 1. For the largest configuration, the number
of analogies identified increases from 15 814 to over 4.7 million ones. Most of
the time of the second pass is spent for checking that pairs of stems are formal
analogies. It is also worth noting that the second pass has a much smaller
memory footprint (155k nodes compared to almost 12M nodes in the first pass), since
only productive signatures are encoded.</p>
      <p>In order to digest more of the training material while collecting the set of
signatures during the first pass, we must apply some filters. We explored two
simple ones. First, we control the size of the signatures, that is the number
of symbols in the alphabet on which two forms can disagree in their number
of occurrences. We also impose a limitation on the maximum di↵erence in the
count of one symbol in a stem.</p>
      <p>Table 2 reports statistics when limiting to 4 the number of symbols in the
signature, and to 3 the maximum di↵erence between counts. 5 Comparing with
Table 1, we observe that filtering drastically reduces computation time. For
5 Relaxing constraints, increases the number of signatures and analogies identified,
but at the price of an increased computation time.
instance, on a corpus of 5000 forms (line 2), the processing time drops from 510s
to 8s, leading to less analogies in the first pass (8 412 against 15 814). But since
the full corpus can be parsed, this leads to configurations that can identify a
larger set of signatures, thus more analogies in the end (5.3 versus 4.4 million
analogies).
5</p>
    </sec>
    <sec id="sec-4">
      <title>Translation Experiments</title>
      <p>We discussed so far the problem of identifying analogies in an input space. In
this section we compare a number of analogical engines for translating words
unseen at training time. For this, we project thanks to the training set each
input analogy into a target equation that we solve. The solutions generated are
simply sorted in decreasing order of frequency (a solution may be produced by
several equations). This is illustrated in Figure 3. This is very similar to the
approach described in [5] with the notable exception that we aggregate solutions
by frequency, instead of resorting to a classifier for distinguishing good from
bad solutions. Our main motivation is to compare strategies for identifying
input analogies, and training a classifier would have been too cumbersome here.
Nevertheless, we also compare analogical devices to a strong baseline which is a
phrase-based translation engine trained at the character level.
[probit´e : probatoire :: adversit´e : ?] adversatoire, . . .
[multiplicit´e : multiplicatif :: adversit´e : ?] adversatif , . . .
[lucre : lucratif :: adverse : ?] adversatif , . . .
[conserve : conservateur :: d´eficitaire : ?] d´eficitairateur, . . .
. (adversatif ,257) (adversante,73) (contrairateur,72) (d´eficitairateur,71) (d´efavorablateur,65) . . .
Each translation engine we tested was asked to produce a ranked list of
translations. We measure the precision and recall at rank 1 and 100. Given a test set
{(ei, ri)i2 [1,N]} of pairs of source words ei and their reference translations ri (a
term may have several translations), and ci a ranked list of candidate translations
for each ei, those metrics are computed as ( the Kronecker symbol):
nk = Pi (|ri \ cik| &gt; 0)</p>
      <p>The performance of the translation engines we compared is measured for two
settings, one call Trans in which no filtering of the candidate list is performed,
and one called Align in which we filtered in the only candidates that belong to
the French vocabulary in Wikipedia.6 This second setting evaluates alignment
technology, since the task is not about generating new words, but identifying the
words in the target language that are most likely translations of a test form. It
should be noted that correct translations may be filtered out in this scenario.
5.2</p>
      <sec id="sec-4-1">
        <title>Statistical Machine Translation</title>
        <p>In a nutshell, SMT seeks to find the optimal translation eˆ of a sentence f using
a log-linear combination of models (hi), including a language model p(e) which
scores how likely a hypothesis is in the target language, and a translation model
p(f |e) which predicts the likelihood that two sentences are translations:
eˆ = argmax p(f |e) ⇥ p(e) ⇡ argmax exp
e e</p>
        <p>
          X ihi(e, f )
i
!
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
        </p>
        <p>We trained such a system at the character level, very similarly to the approach
described in [3]. Such a system has been massively used as a key component of
tasks such as transliteration (see for instance the NEWS evaluation campaign),
or machine translation for closely related language pairs [13].</p>
        <p>We used the Moses toolkit [4] for training and testing the SMT engine. The
trigram language model7 has been trained on the target part of the training
material. We tune the coecient ( i) given to each model embedded in Equation 5
on a development corpus of 500 entries of our bilingual lexicon (disjoint from the
training set) using the MERT method implemented in Moses. This development
corpus is not exploited by analogical learning.8
5.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>The performance of some of the translations we tested are reported in Table 3
for two settings aforementioned (Trans and Align). We varied the size of the
training material used for collecting signatures (see Section 2.3). We observe
overall (bottom part), that it is preferable to consume as much as possible of the
training material. Di↵erences between the 20k and the 30k variants are however
not very marked, therefore, we did not train larger configurations.</p>
        <p>We implemented and tested a number of variants of the approach described in
[7]. We report (line LY ) the best configuration we obtained in terms of accuracy.
It compares globally to the largest configurations of our approach; the 30k variant
being at a slight advantage. More importantly, the time required for identifying
6 We counted 2 942 067 di↵erent tokens in French Wikipedia.
7 We tried other n-gram orders without better results.
8 To be more fair to the analogical approach, we could have added this corpus to its
training set, since it does not perform any tuning.
analogies at runtime is 0.03s per test form for the 20k configuration, compared to
7 seconds for LY, a clear advantage of the batch procedure of Lepage we adapted.
We tested faster variants of [7], but they lead to much lower performance. It is
worth noting that LY identifies slightly more analogies than the variants of our
proposed strategy. That this does not translate into better performance does
indicate that the signature information used by our solution is useful.</p>
        <p>We observe that the statistical translation engine delivers better performance
at rank 1. The analogical devices seem to deliver more diversified translations,
since their precision and recall at rank 100 outperform those of the former
system. Recall that in this study, we did not attempt to classify good from bad
candidates, which has been showed for instance in [5] to produce better results
than the aggregation by frequency we conducted here.</p>
        <p>Results on the Align setting overall confirm the observations just made and
reinforce the fact that the analogical device is producing more diversified
translations than the statistical engine. In fact, for around 60% of our test forms, the
analogical device could produce a useful translation, while this rate culminates
at 49% for the statistical engine. Because of this, it is likely that combining the
statistical and the analogical engines will lead to better performance overall. To
illustrate this, we tried one simplistic combination in the Align setting in which
Moses is trusted whenever a solution is proposed, and the analogical engine (the
20k variant) otherwise. This increases accuracy (R@1) to 50.9%, an absolute gain
of over 5 points over the SMT engine, and 8 points over the best analogical one.</p>
        <p>LY</p>
        <p>SMT
SMT + 20k
23.6
27.7
27.0
28.1
We proposed an adaptation of an algorithm proposed in [8] for listing all the
analogies in a corpus. We investigated a way to reduce the computation time
and memory footprint of this algorithm. We also proposed an extension of this
algorithm for listing — online — analogies for an unseen form.</p>
        <p>We applied our solution to the task of translating rare words in Wikipedia.
Our solution leads to slightly better performance than the approach described
in [7] at an order of magnitude faster running time. We also show that while a
statistical engine does deliver better translation at rank 1, a simple combination
of both systems systematically outperform each system individually.
Acknowledgments. This work has been funded by the Natural Sciences and
Engineering Research Council of Canada.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Hassena</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Apprentissage analogique par analogie de structures d'arbres</article-title>
          .
          <source>Ph.D. thesis</source>
          , Univ. de
          <string-name>
            <surname>Rennes</surname>
            <given-names>I</given-names>
          </string-name>
          , France (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dandapat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morrissey</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naskar</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somers</surname>
          </string-name>
          , H.:
          <article-title>Mitigating problems in analogy-based ebmt with smt and vice versa: a case study with named entity transliteration</article-title>
          .
          <source>In: PACLIC. Sendai</source>
          ,
          <string-name>
            <surname>Japan</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Finch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sumita</surname>
          </string-name>
          , E.:
          <article-title>Transliteration Using a Phrase-based Statistical Machine Translation System to Re-score the Output of a Joint Multigram Model</article-title>
          .
          <source>In: 2nd Named Entities Workshop (NEWS'10)</source>
          . pp.
          <fpage>48</fpage>
          -
          <lpage>52</lpage>
          . Uppsala,
          <string-name>
            <surname>Sweden</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callison-Burch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Federico</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertoldi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cowan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moran</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojar</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herbst</surname>
          </string-name>
          , E.:
          <article-title>Moses: Open Source Toolkit for Statistical Machine Translation</article-title>
          .
          <source>In: 45th ACL</source>
          . pp.
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          (
          <year>2007</year>
          ), interactive Poster and Demonstration Sessions
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Mapping source to target strings without alignment by analogical learning: A case study with transliteration</article-title>
          .
          <source>In: 51st ACL</source>
          . pp.
          <fpage>684</fpage>
          -
          <lpage>689</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Translating Unknown Words by Analogical Learning</article-title>
          .
          <source>In: EMNLP</source>
          . pp.
          <fpage>877</fpage>
          -
          <lpage>886</lpage>
          . Prague, Czech Republic (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yvon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Scaling up analogical learning</article-title>
          .
          <source>In: 22nd COLING</source>
          . pp.
          <fpage>51</fpage>
          -
          <lpage>54</lpage>
          . Manchester, United
          <string-name>
            <surname>Kingdom</surname>
          </string-name>
          (
          <year>Aug 2008</year>
          ), poster
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Analogies between binary images: Application to chinese characters</article-title>
          . In: Prade,
          <string-name>
            <surname>H.</surname>
          </string-name>
          , Richard, G. (eds.) Computational Approaches to Analogical Reasoning: Current Trends, pp.
          <fpage>25</fpage>
          -
          <lpage>57</lpage>
          . Studies in Computational Intelligence, Springer-Verlag (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoual</surname>
          </string-name>
          , E.:
          <article-title>Purest ever example-based machine translation: Detailed presentation and assesment</article-title>
          .
          <source>Mach. Translat</source>
          <volume>19</volume>
          ,
          <fpage>25</fpage>
          -
          <lpage>252</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>Shin-ichi, A.: Saussurian analogy: A theoretical account and its application</article-title>
          .
          <source>In: 7th COLING</source>
          . pp.
          <fpage>717</fpage>
          -
          <lpage>722</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stroppa</surname>
          </string-name>
          , N.:
          <article-title>D´efinitions et caract´erisations de mod`eles a` base d'analogies pour l'apprentissage automatique des langues naturelles</article-title>
          .
          <source>Ph.D. thesis</source>
          , Telecom Paris, ENST, Paris, France (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stroppa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yvon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An analogical learner for morphological analysis</article-title>
          .
          <source>In: 9th CONLL</source>
          . pp.
          <fpage>120</fpage>
          -
          <lpage>127</lpage>
          . Ann Arbor, USA (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Tiedemann</surname>
          </string-name>
          , J.:
          <article-title>Character-based psmt for closely related languages</article-title>
          .
          <source>In: Proceedings of 13th EAMT</source>
          . pp.
          <fpage>12</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Yvon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Paradigmatic cascades: a linguistically sound model of pronunciation by analogy</article-title>
          .
          <source>In: In Proceedings of 35th ACL</source>
          . pp.
          <fpage>429</fpage>
          -
          <lpage>435</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>