=Paper= {{Paper |id=None |storemode=property |title=Named entities from Wikipedia for machine translation |pdfUrl=https://ceur-ws.org/Vol-788/paper4.pdf |volume=Vol-788 |dblpUrl=https://dblp.org/rec/conf/itat/HalekRTB11 }} ==Named entities from Wikipedia for machine translation== https://ceur-ws.org/Vol-788/paper4.pdf

Named entities from Wikipedia for machine translation⋆

Ondrej Hálek, Rudolf Rosa, Aleš Tamchyna, and Ondrej Bojar

Charles University in Prague, Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics
ohalek@centrum.cz, rur@seznam.cz, a.tamchyna@gmail.com, bojar@ufal.mff.cuni.cz

Abstract. In this paper we present our attempt to im- Translation of named entities consists of several
prove machine translation of named entities by using Wi- subtasks. NEs have to be identiﬁed in the source text
kipedia. We recognize named entities based on categories and their translations must be proposed. These have to
of English Wikipedia articles, extract their potential trans- be appropriately incorporated into the sentence trans-
lations from corresponding Czech articles and incorporate lation — the sentence context must match the NE and
them into a statistical machine translation system as trans-
vice versa.
lation options. Our results show a decrease of translation
quality in terms of automatic metrics but positive results For the English-Czech language pair, match-
from human annotators. We conclude that this approach ing NEs to the sentence context consists mainly of
can lead to many errors in translation and therefore should inﬂection of NE words. For example, while “London”
always be combined with the standard statistical translation translates to Czech as “Londýn”, in the context of
model and weighted appropriately. a more comlex NE, the name has to be inﬂected in
Czech, such as “London airport” → “Londýnské
letište” (Londonadj airport).
1 Introduction Matching the sentence context to the named en-
tity is needed when some information, such as the
Translation of named entities (NE) is an often over- grammatical gender, comes from the NE. For exam-
looked problem of today’s machine translation (MT). ple, Czech verbs in past tense have diﬀerent forms for
Particularly, most statistical systems do not handle each gender — the verb “came” has to be translated as
named entities explicitly, simply relying on the model “prišel” when the subject is masculine, as “prišla” for
to pick the correct translation. Since most of NEs are feminine and as “prišlo” for neuter subject. This infor-
rare in texts, statistical MT systems are incapable of mation needs to be taken into account in translation:
producing reliable translations of them. “Jeﬀry came.” → “Jeﬀry prišel.”.
Moreover, many NEs are composed of ordinary
words, such as the term “Rice University”. In the at-
tempt to output the most likely translation, a statis- 1.2 Work outline
tical system would translate this collocation word by
word. We experiment with English to Czech translation.
In this paper, we attempt to address this prob- Named entity recognition is done in two steps.
lem by using Wikipedia1 to translate NEs and present First, all potential NEs are recognized using a simple
them already translated to the MT system. recognizer with a low precision but with a high recall.
Then, conﬁrmation/rejection of named entities is done
— if there is an article with the corresponding title in
1.1 Named entity translation task English Wikipedia, we try to conﬁrm the potential NE
as a true NE based on the categories of the article.
The set of named entities is unbounded and there are The translation of a NE is done by looking up
many deﬁnitions of named entities. In our project, we the Czech version of the English Wikipedia article
work with a vague deﬁnition of a named entity being about the named entity. Its title is considered the
a word or group of words which, when left untrans- “base translation”. Other potential translations (in
lated, are a valid translation anyway (despite the fact our case this means simply various inﬂected forms) are
that a “real” translation is usually better if it exists; then extracted from the text of the Czech article. Each
however, it does not exist in many cases). named entity found in the input text is then replaced
⋆
This work has been supported by the grants Euro-
with a set of its potential translations, from which the
MatrixPlus (FP7-ICT-2007-3-231720 of the EU and MT system then tries to choose the best one.
7E09003 of the Czech Republic), P406/11/1499, and The matching of the sentence context to the NE is
MSM 0021620838. not handled explicitly. We rely on target-side language
1
http://en.wikipedia.org/ model to determine the most appropriate option.
24 Ondrej Hálek et al.

2 Recognition of potential named To measure the precision of a NE recognizer, we
entities count the NEs on which the tool agrees with the stan-
dard annotation and divide it by the total number
In our case, the goal of potential NE recognition is to of NEs recognized by the tool. Similarly, the recall
ﬁnd as many potential NEs as possible (i.e. we favour is measured as the number of NEs conﬁrmed by the
higher recall at the expense of precision), because the standard divided by the number of NEs in the stan-
candidates for NEs are still to be conﬁrmed or rejected dard.
in the next step. Thanks to the external world knowl- The performance of the two aforementioned tools
edge provided by Wikipedia, our task is not a typical measured on the evaluation text is shown in Table 1.
NER scenario. NE recognition is not the focal point of
our experiment, so we limit ourselves to using two tools
Recognizer Precision Recall F-measure
for recognition of potential NEs: our simple named en- Simple NER 0.57 0.73 0.64
tity recognizer and Stanford named entity recognizer. Stanford NER 0.70 0.49 0.58

Tab. 1. Comparison of NE recognizers.
2.1 Simple named entity recognizer
We created a simple rule-based named entity recog-
nizer for selecting phrases suspected to be named en- Our Simple NER has a signiﬁcantly higher recall
tities. It looks for capitalized words and uses a small than Stanford NER; it is actually capable of deliver-
set of simple rules for beginnings of sentences — most ing most of the named entities. Its low precision is not
notably, the ﬁrst word of a sentence is a potential NE if an issue for our experiment since in the next step we
the following word is capitalized (except for words on conﬁrm the named entities by using Wikipedia cate-
a stoplist, such as “A”, “From”, “To”. . . ). Sequences gories. Its F-measure is also higher than that of Stan-
of potential NEs are always considered as a single one ford NER, suggesting the Simple NER suits our NE
multiword potential NE. deﬁnition better.
Since the Stanford NER results are well docu-
mented, we assume that its poor results in our exper-
2.2 Stanford named entity recognizer iment are mainly caused by a diﬀerent NE deﬁnition
The Stanford NER [4] is a well-known tool with docu- and the recognition model used — in this setup Stan-
mented accuracy over 90% when analyzing named en- ford NER recognizes only people, locations and organi-
tities according to CoNLL Shared Task [12]. However, zations, but e.g. named entities from the software class
this classiﬁcation does not match our named entity (names of programs, programming language functions
deﬁnition, and we also use only a limited recognition etc.) are left out from the recognition.
model.2 On the other hand, with Stanford NER we are ca-
pable of correctly recognizing complex named entities,
and the recall of recognition of named entities at sen-
2.3 Evaluation of named entity recognizers tence beginnings is higher than that of Simple NER.
To evaluate the tools we use an evaluation text consist-
ing of 255 sentences rich in named entities, originally 3 Confirmation of NEs by Wikipedia
collected for a quiz-based evaluation task [1]. The sen-
tences are quite evenly distributed among four topics For each potential named entity we try to conﬁrm it
— directions, meetings, news and quizes. as a true named entity using Wikipedia categories.
We ﬁrst performed a human annotation of NEs in First we look for the article on English Wikipedia
the evaluation text, where two annotators marked NEs with a title matching the potential NE. If it does not
in the text according to our NE deﬁnition. The inter- exist, we reject it immediately.
annotator agreement F-measure3 was only 83%, which We then get the categories of that article. For each
sets an upper bound on the value for our automatic category we do a search for its superior categories (sev-
recognizers. We then picked one annotation as a stan- eral hard limits had to be introduced, because the
dard, according to which we compare outputs of the categories do not form a tree, not even a DAG; the
NE recognition tools. maximum depth of the search was set to 6).
2
ner-eng-ie.crf-3-all2008-distsim — a conditional In the end, the categories found are compared with
random field model that recognizes 3 NE classes (Lo- our hand-made list of named entity categories. If at
cation, Person, Organization) trained on unrestricted least one of the article categories or their super-cate-
data, uses distributional similarity features gories is contained in the NE categories list, we conﬁrm
3 R
F = P2P+R , where P stands for precision and R for recall the potential NE as a true NE; otherwise we reject it.
Named entities for machine translation 25

http://en.Wikipedia.org/w/api.php?action=query&prop=categories&redirects&clshow=!hidden
&format=xml&titles=Rice_University

. . .

Fig. 1. Example of XML Response to a Request to Wikimedia API.

The following categories are considered to indicate additional translation options for the decoder. This
NEs: can be generally done in several ways, such as by ex-
tending the parallel data, by adding new entries into
– Places the translation model (i.e. the phrase table), or by pre-
– People processing the input data.
– Organizations We use the Moses [6] decoder throughout our ex-
– Companies periments. Input pre-processing can be realized fairly
– Software
easily in Moses via XML markup of the input sen-
– Transport Infrastructure
tences. It is simple to incorporate alternative trans-
To get the information from Wikipedia we use the lations for sequences of words and even to assign the
Wikimedia API [7]. Figure 1 shows an example of the translation probability for each of the options. The
API response. markup of input data is illustrated in Figure 2.
When scoring hypotheses, Moses uses several
translation model scores, namely p(e|f ), p(f |e),
4 Wikipedia translation lex(e|f ) and lex(f |e), i.e. translation probabilities in
both directions (where f stands for “foreign”
For each English Wikipedia article about a NE we (English in this case) and e stands for Czech) and
look if there is a corresponding Czech article (this is lexical weights. The value speciﬁed in the markup (or 1
provided by Wikipedia under the page section “Lan- if omitted) replaces all of these scores.
guages”). If there is one, we use its title as the base Pre-processing of the input data also has the ad-
translation. vantage of not requiring to retrain or modify exist-
We then try to ﬁnd all inﬂected forms of the base ing translation models. Fully trained MT systems can
translation in the text of the Czech article to use as therefore be easily extended to take advantage of our
alternative translations. method.
For each word in the base translation, we trim its
Moses can treat the translation suggestions as ei-
last three letters, keeping at least the ﬁrst three letters
ther exclusive or inclusive. If set to exclusive, only op-
intact. This is considered a “stem”.
tions suggested in the input markup are considered
Then, the Czech article is fetched using Wikime-
as translation candidates. With the inclusive setting,
dia API and wiki markup is stripped. We then search
these options are included among the suggestions from
the article text for sequences of words with the same
the translation model, competing with them for the
stems. If we ﬁnd a match, we consider it an inﬂected
highest score. Depending on the quality of the trans-
form of our base translation and include it in the list
lation model and the external translation suggestions,
of potential translations.
this setting can either improve or hurt translation per-
Finally, we estimate the probability of the various
formance.
forms from their counts of occurrences.
When estimating the probability of our transla-
tions, we distribute the whole probability mass among
5 Translation process them. The scores of translation suggestions provided
by the translation model are typically much lower.
In order to utilize the retrieved translation sugges- However, target language model usually has a signiﬁ-
tions, we had to ﬁnd a way of incorporating them as cant impact on hypothesis scoring, so even if the ex-
26 Ondrej Hálek et al.

They moved to London last year.

Fig. 2. An example of including external translation options using XML markup of input.

ternal translation scores are set to unrealistically high Since manual evaluation would beneﬁt from data
values, the language model makes the “competition” rich in terms of named entity occurrences, we used
with translation model options reasonably fair. the same set of sentences as in NER evaluation. These
The default settings for common language models, sentences cover quite a wide range of topics, so they
such as SRILM or KenLM, as used in Moses, assign seem suitable even for translation evaluation.
zero log-probability (i.e. the probability of 1) to un-
known tokens instead of the intuitive −∞. In most
6.2 Tools
cases, training data of the language model for the tar-
get language also include the target language part of We used the common pipeline of popular tools for
the translation model parallel data, so this is not an phrase-based statistical MT, namely the Moses de-
issue. However, our translation suggestions often con- coder and toolkit, SRILM language modelling
tain tokens unseen in any data, including some noise tool [11], an open-source implementation of IBM
introduced by the imperfect suﬃx trimming heuris- models GIZA++ [8] for obtaining word alignments.
tic. Instead of penalizing such options, the language KenLM [5] was used instead of SRILM during decod-
model promoted them, since the unknown words were ing for its better speed and simplicity.
ignored and therefore did not lower the overall ngram We used the MERT (Minimum Error Rate Train-
probability (any known token has a probability < 1, ing) [9] algorithm to tune weights of the log-linear
scoring inevitably lower). We were able to solve this model and BLEU [10] as the de-facto standard au-
problem by setting a very low probability for unknown tomatic translation quality metric.
tokens. Perhaps a more interesting option would be to
add the full texts of the Czech Wikipedia articles to
the language model. This would ensure the translation 6.3 Automatic evaluation
of the NE is known to the language model and even
including some plausible contexts. We leave this for We evaluated a small subset of possible setups, all our
future research. results are summarized in Table 2. The main goal of
these experiments was to determine which components
of our pipeline are actually important for achieving
6 Experimental results good results.
We began with a simple scenario, only using the ti-
We conducted a series of translation experiments, eval- tles of the articles for translation (i.e. inﬂected occur-
uating various setups of our method. We also carried rences of the title were not available to the decoder)
out a blind manual evaluation, in which the annota- and forcing Moses to use only our suggestions when
tors compared outputs of two MT setups which used translating a NE in a sentence.
our method and of the baseline MT system. In the very ﬁrst case, we also kept unknown named
entities in their original form — by an unknown NE we
6.1 Data sources understand an entity for which the corresponding En-
glish Wikipedia article exists and its categories imply
We used CzEng 0.9 [2] as the source of both parallel that it is a named entity, but there is no corresponding
and monolingual data to train our MT system. CzEng Czech article. Since the Czech version of Wikipedia is
is a parallel richly annotated Czech-English corpus. much smaller, this case occurs quite often.
It contains roughly 8 million parallel sentences from The BLEU score in these simple scenarios conﬁrms
a variety of domains, including European regulations our expectations — in statistical machine translation,
(about 34% of tokens), ﬁction (15%), news (3%), tech- forcing or limiting translation possibilities rarely helps.
nical texts (10%) and unoﬃcial movie subtitles (27%). More speciﬁcally, by excluding phrase table entries, we
In all our experiments we used 200 thousand paral- forbid the log-linear model to use potentially more ad-
lel sentences for the translation model and 5 million equate translations. The phrase table may well include
monolingual sentences for the target language model. many variants of a given named entity translation,
We also used CzEng as a source of a separate set of providing more context and inherent disambiguation.
1000 sentences for tuning the model weights and an- This information should be used and possibly even
other 1000 sentences for automatic evaluation. preferred to a single translation or an enumeration of
Named entities for machine translation 27

NEs Suggested Regular Translations Unknown NEs NER BLEU
Only base forms Excluded Preserved Simple 25.13
Only base forms Excluded Translated Simple 25.38
Only base forms Included Translated Simple 25.80
All forms Included Translated Simple 25.97
All forms Included Translated Stanford 25.98
Baseline 26.62

Tab. 2. BLEU scores of our setups and the baseline system.

potential translations suggested by our tools (albeit able to avoid some errors in each of the steps that,
probabilistically weighted). On the other hand, pro- when combined, resulted in a loss in BLEU score.
moting phrase table entries too eagerly would result A detailed analysis of errors is provided in Section 6.5.
in undesirable translations in some cases, for example On the other hand, we also achieved several no-
when a named entity is composed of common words. table improvements in translation quality even in the
It is also not surprising that keeping unknown en- CzEng test set, some of which are shown in Figure 3.
tities untranslated hurts (automatically estimated)
translation performance, as Czech tends to translate
most of frequent foreign names, and even NEs which 6.4 Manual evaluation
are used in their original form are usually inﬂected in
Czech. NEs that would remain completely unchanged We had four annotators evaluate 255 sentences rich in
are quite rare. Sentences with some NEs left untrans- named entities, using QuickJudge4 which randomized
lated may be more understandable, even considered the input. In the input sentences there were approx-
better translations in some cases, but BLEU score is imately 400 named entities, but the translations dif-
necessarily worse. fered only in 78 sentences. QuickJudge automatically
When we allowed translation model entries to com- skips sentences with identical translations, so the an-
pete with our suggestions, the score improved further notators only saw these 78 sentences.
to 25.80. The target language model was apparently Three setups were evaluated: the “Baseline” un-
able to promote options from the phrase table in spite modiﬁed Moses system, and two modiﬁcations of that
of their low translation model scores compared to our system, “Translate” and “Keep unknown”. The sys-
suggestions (see Section 5). tem marked as “Translate” corresponds to the best-
Our translations could have been inadequate for performing setup, not using Stanford NER. “Keep un-
two main reasons in this scenario: known” is the same system, however, unknown NEs
are handled diﬀerently — if a potential NE is con-
– Lexically incorrect translation, ﬁrmed by Wikipedia, but a Czech translation does not
– Wrong surface form (only title translation used). exist, it is kept untranslated in the output.
The annotators were presented with the source
Adding a full list of all inﬂected forms of NEs along English sentence and with three translations coming
with their estimated probabilities improved the trans- from the three diﬀerent setups. Then they assigned
lation quality slightly, presumably because the target marks 1, 2 and 3 to them. Ties were allowed and only
language model was able to determine which of our relative ranking, i.e. not the absolute values, was con-
suggestions ﬁtted best into the sentence translation. sidered signiﬁcant.
We can therefore conclude that our approach to in- Table 3 summarizes the results. The values suggest
corporating named entity translations works success- a large number of ties — this is not surprising since
fully — the outputs contained some direct translations diﬀerences between systems were small, their outputs
of article titles, some inﬂected forms extracted from often diﬀered only in 1 word or inﬂection of a named
the article content and some phrase table entries. entity.
Using Stanford named entity recognizer brought We ﬁnd it promising that our setups won accord-
no further gains. The recognizer marked a diﬀer- ing to all annotators. The inter-annotator agreement
ent (albeit smaller) set of NEs, but further ﬁltering was however surprisingly low — even though in to-
based on Wikipedia article categories and the absence tal, the annotators’ preferences match, the individual
of many Czech equivalent articles made the diﬀerence sentences that contributed to the results diﬀer greatly
negligible. among them. All annotators agreed on a winner in
Finally, all our scenarios scored worse than the only 25% sentences.
baseline in terms of BLEU. While we believe that the
4
motivation behind our method is valid, we were not http://ufal.mff.cuni.cz/euromatrix/quickjudge/
28 Ondrej Hálek et al.

Source It was Nova Scotia on Wednesday.
Baseline bylmasc to nova scotia ve stredu. (NE is left untranslated)
Our setup to byloneut nové skotskoneut ve stredu. (correct NE translation and gender agreement)

Source In August, 1860, they returned to the Victoria Falls.
Baseline v srpnu, 1860, se k vyjádrenı́ falls. (“Victoria” is left out, “falls” kept untranslated)
Our setup v srpnu, 1860, se na viktoriiny vodopády. (correct translation extracted from Wikipedia)

Fig. 3. Examples of translation improvements. “Our setup” denotes the best-performing setup in terms of BLEU.

Conﬁrming our intuition, annotators usually pre- Suﬃx trimming error Suﬃx trimming also occa-
ferred to keep unknown entities untranslated. The fact sionally matched words or word sequences completely
that all of the annotators speak English certainly con- unrelated to the article name. As an example, the
tributed to this result, however we believe that keeping name of the company Nestlé matched the word “ne-
unknown NEs in the original form is often the best so- správne” (“incorrectly”) in the Czech article. Because
lution, especially in terms of preserved information. this word is quite common, the language model score
Imagine a translation of a guidebook, for example — ensured it to appear in the ﬁnal translation. A simi-
if an MT system correctly detects NEs and keeps un- lar example was matching “pole” (“ﬁeld”) in the ar-
known ones untranslated, the result is probably better ticle about Poland (“Polsko” in Czech). We decided
than if it attempts to translate them. Thanks to the to match case-insensitively in order to cover cases of
NER enhanced by Wikipedia, our system would pro- named entities that do not begin with a capital letter
duce more informative translations than a standard in Czech (such as “Gulf War”, “válka v Zálivu”).
SMT system, which tends to translate NEs in various
undecipherable ways.
Wrong named entity form There are two possible
causes for an error of this kind — either the Czech ar-
ticle did not contain the inﬂected form needed in the
Annotator Baseline Translate Keep unknown
translation, or the language model failed to enforce
1 46 56 51
2 38 45 54 the correct option, mainly because the NE contained
3 41 39 47 words unknown to the model (never seen in the mono-
4 35 43 49 lingual training data).
Since BLEU does not diﬀerentiate between a wrong
Tab. 3. Number of wins (manual annotation). word suﬃx and a completely incorrect word transla-
tion, these errors are equally severe in terms of au-
tomatic evaluation.6 On the other hand, human an-
notators consider a mis-inﬂected (otherwise correct)
6.5 Sources of errors
translation to be better than a completely untrans-
lated named entity.
In order to explain the drop of BLEU in a more de-
tailed fashion, we examined the translation outputs
and attempted to analyze the most common errors 7 Wikipedia translations as a separate
made by our best-performing setup. phrase table
In order to incorporate weighting of our translations
Incorrect Wikipedia translation Quite often, the into MERT, we also used a contrastive setup with an
Wikipedia article contained information about a dif- alternative phrase table instead of the XML markup of
ferent meaning of the term. When translated to Czech, input sentences. The decoder was then working with
the diﬀerence in the meaning became apparent. For two translation tables — the standard one, generated
example, the default Wikipedia article on “Brussels” by GIZA++ from the parallel corpus, and the new
discusses the whole “Brussels Region”, therefore the one, created by our tools. As is shown in Figure 4,
Czech translation is “Bruselský region”. This word ap- 6
Metrics with paraphrasing (e.g. Meteor [3]) could solve
peared several times in the test data and the default a part of the issue. Another option is to replace all
interpretation was wrong in all cases.5 words with their lemmas in the hypothesis and the refer-
ence and use a standard n-gram metric like BLEU. This
5
It is however noteworthy that the inflected form of this would completely ignore errors in word forms, which is
particular name was always chosen correctly. inadequate as well and might seem manipulated.
Named entities for machine translation 29

NEs Suggested Regular Translations Unknown NEs NER BLEU
All forms (old) Included Translated Simple 27.11
All forms (new) Included Translated Simple 26.60
Baseline 26.62

Tab. 4. BLEU scores of two setups using alternative translation table and the baseline system.

there are two scores in our table — the ﬁrst one is the system should be used for all named entities, or only
probability assigned by our tools (based on number for entities not present (or very rare) in the training
of occurrences of the form in the text of the Czech data.
Wikipedia article) and the second one is the “penalty” We described two methods of mixing the newly
for using our NE translation.7 It is up to MERT to proposed translations and the default translations of
estimate the weight to assign to our translations. the MT system. We studied the XML-input method
more and learned that it faces an imbalance in scoring
of hypotheses from the two sources. We also report
London ||| Londýn ||| 0.4 2.718
preliminary results of the other method: alternative
London ||| Londýna ||| 0.2 2.718
decoding paths, allowing the model to choose the best
Fig. 4. Example of phrase table entries. balance automatically. While the automatic scores for
the second method increased slightly, the results are
not yet stable and a further analysis is needed.
7.1 Results In sum, we have shown that Wikipedia can serve as
a valuable source of bilingual information and there is
Although the results of this experiment look promis-
an open space for incorporating this information into
ing, they have not been fully evaluated yet and are
machine translation. However, Wikipedia should not
therefore only preliminary. There is an improvement
serve as the only source of information, and the ex-
in BLEU score (see Table 4), but it is not a result of
tracted information should be conﬁrmed e.g. by anal-
better NE translation. The unstability of MERT pro-
ysis of some other monolingual data.
cess results in diﬀerent weights in both translations,
causing the baseline translation and our experiment
outputs to diﬀer signiﬁcantly in whole sentences, not References
only in NE translation. Futher analysis and experi-
ments are therefore needed. 1. J. Berka, M. Černý, and O. Bojar: Quiz-based evalu-
There are two results reported in Table 4 because ation of machine translation. The Prague Bulletin of
two diﬀerent versions of the inﬂector were used to get Mathematical Linguistics, 95, April 2011, 77–86.
the inﬂected forms. The “old” one uses all text data 2. O. Bojar and Z. Žabokrtský: CzEng 0.9: large paral-
from the body of the article (including e.g. external lel treebank with rich annotation. Prague Bulletin of
links), while the “new” one looks for the inﬂected form Mathematical Linguistics, 92, 2009, 63–83.
only in the text of the article. 3. M. Denkowski and A. Lavie: METEOR-NEXT and
the METEOR paraphrase tables: improved evaluation
support for five target languages. In Proceedings of
8 Conclusion the ACL 2010 Joint Workshop on Statistical Machine
Translation and Metrics MATR, 2010.
Our approach of automatically suggesting translations 4. J.R. Finkel, T. Grenager, and C.D. Manning: Incorpo-
rating non-local information into information extrac-
of named entities based on Wikipedia texts leads to
tion systems by gibbs sampling. In ACL. The Associa-
drop in automatic evaluation but to a slight improve-
tion for Computer Linguistics, 2005.
ment in manual evaluation of MT quality. Part of this 5. K. Heafield: Kenlm: faster and smaller language model
improvement is due to not translating identiﬁed enti- queries. In Proceedings of the Sixth Workshop on
ties at all. Statistical Machine Translation, Edinburgh, UK, July
While some deﬁciencies of the proposed method 2011. Association for Computational Linguistics.
of NE translation can be hopefully mitigated (poor 6. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch,
suﬃx trimming and search for various forms of target- M. Federico, N. Bertoldi, B. Cowan, W. Shen,
side NEs), the incorrectness of some Wikipedia trans- C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin,
lations is not easy to solve. It is therefore questionable and E. Herbst: Moses: open source toolkit for statisti-
whether the named entity translations provided by our cal machine translation. In ACL. The Association for
Computer Linguistics, 2007.
7
This penalty is used in all Moses phrase tables; it is the 7. MediaWiki. Mediawiki – mediawiki, the free wiki en-
.
same for all entries and equals 2.718 = exp(1) = e. gine, 2007. [Online; accessed 23-May-2011].
30 Ondrej Hálek et al.

8. F.J. Och and H. Ney: Improved statistical alignment
models. Hongkong, China, October 2000, 440–447.
9. F.J. Och: Minimum error rate training in statistical
machine translation. In ACL, 2003, 160–167.
10. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu: Bleu:
a method for automatic evaluation of machine trans-
lation. In ACL, 2002, 311-318.
11. A. Stolcke: Srilm – an extensible language modeling
toolkit. June 06 2002.
12. E.F. Tjong Kim Sang and F. De Meulder. Introduction
to the conll-2003 shared task: language-independent
named entity recognition. In Proceedings of the Sev-
enth Conference on Natural Language Learning at
HLT-NAACL 2003 - Volume 4, CONLL ’03, pp. 142–
147, Stroudsburg, PA, USA, 2003. Association for
Computational Linguistics.