=Paper=
{{Paper
|id=Vol-1649/74
|storemode=property
|title=Czechizator – Čechizátor
|pdfUrl=https://ceur-ws.org/Vol-1649/74.pdf
|volume=Vol-1649
|authors=Rudolf Rosa
|dblpUrl=https://dblp.org/rec/conf/itat/Rosa16
}}
==Czechizator – Čechizátor==
ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 74–79 http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 R. Rosa Czechizator – Čechizátor Rudolf Rosa Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Malostranské náměstí 25, 118 00 Prague, Czech Republic rosa@ufal.mff.cuni.cz Abstract: We present a lexicon-less rule-based machine on a deep layer of language representation where typo- translation system from English to Czech, based on a very logical differences of languages become quite transparent, limited amount of transformation rules. Its core is a novel as the meaning itself, rather than the form, is captured. translation module, implemented as a component of the Abstracting away from both lexical and typological dif- TectoMT translation system, and depends massively on ferences in this way, a smallish set of rules and heuristics the extensive pipeline of linguistic preprocessing and post- should be sufficient to obtain a competitive machine trans- processing within TectoMT. Its scope is naturally limited, lation system. but for specific texts, e.g. from the scientific or marketing While the main focus of our work is to test the degree domain, it occasionally produces sensible results. to which the aforementioned hypothesis is valid, our work Prezentujeme lexikon-lesový rule-bazovaný systém has practical implications as well. The number of terms machín translace od Engliše Čecha, který bazoval na used in scientific texts is enormous, many of them being verově limitované amountu rulů transformace. Jeho kor je rare in parallel corpora or even newly created and thus novelový modul translace, implementovalo jako kompo- bound to constitute OOV items for machine translation nent systému translace tektomtu a dependuje masivně na systems. However, as there seems to be some regular- extensivní pipelínu lingvistické preprocesování a postpro- ity in the way that English terms are adapted in Czech, it cesovat v Tektomtu. Jeho skop je naturálně limitovaná, ale should be possible to use a lexicon-less system as an addi- pro specifické texty z například scientifické nebo marke- tional component in a standard machine translation system tování doménu okasionálně producuje sensibilní resulty. to handle OOVs. It may also be beneficial in scenarios where low-quality but light-weight translation system is preferred over a full-fledged but resource-heavy system.1 1 Introduction and Motivation Another use-case is machine-aided translation of sien- tific paper abstracts, as the Czechizator output should often In this work, we present Czechizator, a lexicon-less rule- be a good starting point for creating the final translation by based machine translation system from English to Czech. post-editing. Lexicon-less approach to machine translation has al- Before explaining the approach we used to implement ready been successfuly applied to closely related lan- the translation model, we present a set of three sample out- guages – e.g. the Czech-Slovak machine translation puts of Czechizator, applied to abstracts of two scientific system Česílko [3, 4] featured a rule-based lexicon-less papers (Table 1, Table 2), and one marketing text,2 (Ta- transformation component for handling OOV (out-of- ble 3). Also, as an additional example, the abstract of this vocabulary) words. For transliteration, which can be paper is provided both in English and in its Czechization. thought of as a low-level translation, rule-based systems are also common. However, in this work, we decided to tackle a harder problem: to use a similar approach for a 2 Approach full translation between a pair of only weakly related lan- guages, namely English and Czech. 2.1 TectoMT While we believe that it is impossible to achieve high- TectoMT [15, 1] is a highly modular linguistically oriented quality or even reasonable-quality general-domain trans- machine translation system, featuring a deep-linguistic lation without a large lexicon, we attempt to investigate to three-step processing pipeline of analysis, transfer, and what degree this is possible if the domain is somewhat spe- cial. Specifically, we target the domain of scientific texts (or, more precisely, abstracts of scientific papers), which 1 However, TectoMT itself is rather resource-heavy even when the contain a large amount of terms that tend to be rather sim- lexical models are omitted, so even though the component that we imple- ilar even across more distant languages. In this way, we mented is very light-weight, the complete system that it relies on is not – operate on a pair of languages which are typologically dif- using the Czechizator model instead of the base models in TectoMT only brings a 15% speedup and 40% RAM cut, which is probably not worth ferent but lexically close. Moreover, we crucially rely on the quality drop in any realistic scenario. the strong linguistic abstractions provided by the TectoMT 2 The text was obtained from https://www.accenture.com/ machine translation system [15], which boasts to operate cz-en/strategy-index Czechizator – Čechizátor 75 Source Czechization Reference translation Chimera is a machine translation system Chimera je systém machín translace, Chimera systém strojového překladu, that combines the TectoMT deep- který kombinuje díp-lingvistické kor tek- který kombinuje hluboce lingvistické linguistic core with Moses phrase-based tomtu z fraze-bazovaného MT systému jádro TectoMT s frázovým strojovým MT system. For English–Czech pair mozesu. Pro Engliše – čechová pér překladačem Moses. Pro anglicko-český it also uses the Depfix post-correction také uzuje systém post-korekce Dep- překlad také používá post-editovací system. All the components run on fix. Všechny komponenty runují v systém Depfix. Všechny komponenty Unix/Linux platform and are open Unix / platformu Linuxu a jsou open- běží na platformě Unix/Linux a jsou source (available from CPAN Perl ová sourc (avélabilní z CPAN Perla open-source (dostupné z Perlového repository and the LINDAT/CLARIN repositorie a LINDAT / CLARIN repos- repozitáře CPAN a repozitáře LIN- repository). The main website is itorie). Hlavní webová stránka je DAT/CLARIN). Hlavní webová stránka https://ufal.mff.cuni.cz/tectomt. The de- https://ufal.mff.cuni.cz/tectomt. Devel- je https://ufal.mff.cuni.cz/tectomt. Vývoj velopment is currently supported by the opment kurentně je suport FP projektem je momentálně podporován projektem QTLeap 7th FP project (http://qtleap.eu). 7th qtlípu (http://qtleap.eu). QTLeap ze 7th FP (http://qtleap.eu). Table 1: Abstract of a scientific paper [7], its Czechization, and a reference translation by its author. Source Czechization We propose two novel model architectures for computing con- Propozujeme 2 novelová architektury modelů, že komputují tinuous vector representations of words from very large data kontinuální reprezentace vektorů vordů od verově largových sets. The quality of these representations is measured in a word setů dat. Kvalita těchto reprezentací je mísur ve vord similarita similarity task, and the results are compared to the previously tasku a resulty jsou kompar s previálně nejgůdovšími, perfor- best performing techniques based on different types of neural mují, techniky, kteří bazovali na diferentových typech neurál- networks. We observe large improvements in accuracy at much ních netvorků. Observujeme largové improvementy akurace v lower computational cost, i.e. it takes less than a day to learn muchově lovovší komputacionální kosti, tj. takuje méně než high quality word vectors from a 1.6 billion words data set. Fur- Daie, aby se lírnovalo hajové vektory vordu kvality z dat vordů thermore, we show that these vectors provide state-of-the-art 1.6 bilionu, která setovala. Furtermorově šovujeme, že tyto vek- performance on our test set for measuring syntactic and seman- tory providují state-of-te-artovou performance na našem testu, tic word similarities. který setoval, že mísurují syntaktické a semantické vord simi- larity. Table 2: Abstract of a scientific paper [6] and its Czechization. Source Czechization Accenture Operations combines technology that digitizes and Operacions acenturu kombinuje technologii, která digitizuje a automates business processes, unlocks actionable insights, and automuje procesy businosti, unlokuje akcionabilní insajty a de- delivers everything-as-a-service with our team’s deep industry, liveruje everyting-as-a-servicová s funkcionální a technickou functional and technical expertise. So you can confidently chart expertizou dípové industrie našeho tímu. Tak konfidentně your course to consuming your core business services on de- můžete chartovat svůj kours, konsumuje vaše service businosti mand, accelerate innovation and speed to market. Welcome to kor na demandu, aceleratové inovaci a spídu marketu. Velko- the "as-a-service" business revolution. mujte „as-a-service“ revoluce businosti. Accenture Strategy shapes our clients’ future, combining deep Strategie acenturu šapuje futur našich klientů, kombinuje business insight with the understanding of how technology will dípovou insajt businosti s understandováním, jak technologie impact industry and business models. Our focus on issues re- impaktuje a industrie businosti modely. Náš fokus na isu, kteří lated to digital disruption, redefining competitiveness, operating relovali s digitálním disrupcí, kteří redefinují kompetitivnost, and business models as well as the workforce of the future helps operatování a businost modely, i vorkforc futur helpuje, naši our clients find future value and growth in a digital world. klienti findují futurovou valu a grovt v digitální vorldu. Whether focused on strategies for business, technology or op- Vhetr fokusoval na strategie pro businost, technologie nebo op- erations, Accenture Strategy has the people, skills and experi- erací strategii acenturu, má peoply, skily a experience, aby efek- ence to effectively shape client value. We offer highly objective tivně šapovali valu klienta. Oferujeme hajně objektivní pointy points of view on C-suite themes, with an emphasis on busi- vievu na k-suitových temech s emfasí na businost a technologii, ness and technology, leveraging our deep industry experience. leveraguje naši dípovou experience industrie. Které je hajová That’s high performance, delivered. performanc, který deliveroval. Table 3: A marketing text from Accenture.com and its Czechization. 76 R. Rosa synthesis. TectoMT is implemented in Treex [8, 13], us- English ending Czechized ending ing a representaion of language based on the Functional -sion -se Generative Description [11]. -tion -ce -ison -ace The first step in the translation pipeline is to perform a -ness -nost lingustic analysis of each source (input) sentence up to t- -ise -iza layer, obtaining a deep-syntactic representation of the sen- -ize -iza tence (t-tree). On t-layer, each full (autosemantic) word is -em -ém represented by a t-node with a t-lemma and a set of linguis- -er -r tic t-attributes (such as functor, formeme, number, gender, -ty -ta deep tense) that capture the function of the word. Inflec- -is -e tions and auxiliary words are not explictly represented, but -in -ín their functions are captured by the attributes of the t-nodes. -ine -ín Each source t-tree is then isomorphically transferred -ing -ování -cy -ce to a target t-tree. In the standard TectoMT setup, the t- -y -ie lemma of each t-node is translated by models that have been trained on large parallel data. The other t-attributes Table 4: A list of ending-based transformations of noun are then transferred by a pipeline featuring both rule-based lemmas. and machine-learned steps. Finally, the target sentence is synthesized from the t- The transformations are generally applied sequentially, tree. This step relies heavily on a morphological generator but forking is possible at some places, and so multiple al- [12], which is able to generate a word form based on the ternative Czechizations may be generated; TectoMT uses word lemma and a set of morphological feature values. For a Hidden Markov Tree Model [14] (instead of a lan- the highly flective Czech language, this is a challenging guage model) to eventually select the best combination task; even though we employ a state-of-the-art generator, it of t-lemmas (and other t-attributes). However, as the is sometimes unable to generate the requested word form, Czechizations are usually OOVs for the HMTM, typically especially when the lemma is unknown to the generator. the first candidate gets selected. The target semantic part- TectoMT can (and does by default) use a weighted inter- of-speech identifier is also generated, based on the source polation of multiple translation models to generate trans- semantic part-of-speech and the t-lemma ending; this is lation candidates [10]. This makes it easy to replace or important for the subsequent synthesis steps. complement the existing models with new models, such It should be noted that the current implementation of as our Czechizator model. Czechizator is rather a proof-of-concept than an attempt on a professional translation model. If one was to follow this research path in future, it would be presumably more 2.2 Czechizator translation model appropriate to learn the regular transformations from par- allel (or comparable) corpora, extracting pairs of similar The Czechizator translation model attempts to Czechize words that are translations of each other and generalizing each English t-lemma, unless it is marked as a named en- the transformation necessary to convert one into the other, tity. To Czechize the lemma, it applies the following re- as well as learning to identify the cases in which a transfor- sources, which we manually constructed: mation should be applied. Similar methods could be used as were applied e.g. in the semi-supervised morphological • a shortlist of 36 lemma translations, focusing on generator Flect [2]. words that we believe to be auxiliaries rather than Czechizator uses the standard TectoMT translation full words (and thus presumably should be dropped model interface, and can thus be easily and seamlessly by the t-analysis and represented by t-attributes, plugged into the standard TectoMT pipeline, either replac- but in fact constitute t-lemmas),3 and on cardinal ing or complementing the base lexical translation models. numbers (which presumably should be converted to a language-independent representation by TectoMT 2.3 Surrogate lemma inflection analysis, but are not), As Czechizator generates many weird and/or non-existent • a set of 43 transformation rules based on semantic lemmas, it is an expected consequence that the morpholog- part of speech of the t-node and the ending of its t- ical generator is often unable to inflect these lemmas. For lemma (noun rules are provided as an example in Ta- 3 be, have, do, and, or, but, therefore, that, who, which, what, why, ble 4), and how, each, other, then, also, so, as, all, this, these, many, only, main, mainly 4 As an example, we list several of the transliteration rules here: • a transliteration table, consisting of 33 transliteration th→t, ti→ci, ck→k, ph→f, sh→š, ch→ch, cz→č, qu→kv, igh→aj, rules.4 gh→ch, gu→gv, dg→dž, w→v, c→k. Czechizator – Čechizátor 77 Ending Surrogate lemma Setup BLEU NIST -ovat kupovat Untranslated source 3.41 1.13 -ání plavání No model 2.85 1.62 -í jarní Czechizator 3.01 2.08 -ý mladý Base TectoMT 8.75 3.62 -o město Base + Czechizator 8.33 3.57 -e růže -a žena Table 6: Automatic evaluation scores on the ÚFAL ab- -ost kost stracts dataset[9]. -ě mladě -h, k, r, d, t, n, b, f, l, m, p, s, v, z svrab -ž, š, ř, č, c, j, d’, t’, ň muž the Institute of Formal and Applied Linguistics at Charles University in Prague, who are obliged to provide both a Table 5: List of surrogate lemmas for given endings. The Czech and an English abstract for each of their publica- matched ending gets deleted from the target lemma, ob- tions. These are then stored in the institute’s database of taining the target pseudo-stem, except for the last two publications, Biblio,8 , and can be accessed through a reg- cases (matching hard or soft final consonants), where even ularly generated XML dump.9 . the final consonant is part of the stem. The collected parallel corpus, aligned on the document level, e.g. on individual abstracts, contains 1,556 pairs of abstracts, totalling 121,386 words on the English side and this reason, we enriched the word form generation compo- 76,812 words on the Czech side.10 We did not perform nent of TectoMT5 with a last-resort inflection step.6 If the any filtering of the data, apart from filtering out incom- morphogenerator is unable to generate the inflection, we plete entries (missing the Czech or the English abstract) use a set of simple ending-based rules to find a surrogate and replacing newlines and tabulators by spaces (solely for lemma, as listed in Table 5,7 inflect the surrogate lemma, technical reasons). The dataset is publicly available [9]. strip its ending, and apply it to the target lemma. We fo- cus on endings generated by the Czechizator translation module, but we aimed for high coverage, and successfully 3.2 Evaluation and discussion managed to employ the last-resort inflector even into the base TectoMT translation. Automatic evaluation with BLEU and NIST was per- For example, if one is to inflect the pseudo-adjective formed with the MTrics tool [5]. We evaluated several “largový” (Czechization of “large”) for the feminine ac- candidate translations: the untranslated English source cusative, we replace it with the surrogate lemma (“mladý”) texts, TectoMT with no lexical model, TectoMT with the that corresponds to its ending (“-ý”), obtain its fem- Czechizator model, TectoMT with an interpolation of its inine accusative inflection from the morphogenerator base lexical models (the default setup of TectoMT), and (“mladou”), strip the matched ending from both of the TectoMT with an interpolation of Czechizator and the base lemmas, obtaining pseudo-stems (“largov”, “mlad”), strip lexical models. the surrogate pseudo-stem (“mlad”) from the surrogate in- While translation quality of the Czechizator outputs is flection (“mladou”) to obtain the inflection ending (“-ou”), clearly well below the base TectoMT system, the results and join the ending with the target pseudo-stem (“largov”) show that Czechizator does manage to produce some use- to obtain the target inflection (“largovou”). ful output – its scores are significantly higher than that of TectoMT with no lexical translation model. This shows that lexicon-less translation is somewhat possible in our 3 Evaluation setting, although on average it is far from competitive – at least with the current version of Czechizator, which is a 3.1 Dataset rather basic proof-of-concept implementation, lacking nu- merous simple and obvious improvements that could eas- To automatically evaluate the translation quality by stan- ily be performed and would presumably lead to further sig- dard methods, we collected a small dataset, consisting of nificant increases of translation quality. However, as with Czech and English abstracts of scientific papers. Specifi- many rule-based systems for natural language processing, cally, we collected the abstracts of papers of authors from the code complexity and especially the amount of manual 5 https://github.com/ufal/treex/blob/master/lib/ tuning necessary to push the performance further and fur- Treex/Block/T2A/CS/GenerateWordforms.pm ther is likely to grow very quickly. 6 https://github.com/ufal/treex/commit/ 8 http://ufal.mff.cuni.cz/biblio/ 363d1b18f7140e0cb687ed8deebc4ac4a1051080 7 Although there exists a set of commonly used lemmas to represent 9 https://svn.ms.mff.cuni.cz/trac/biblio/browser/ the basic Czech paradigms, we sometimes use a different lemma – to trunk/xmldump avoid unnecessary ambiguity, and to simplify the application of the end- 10 The difference in the sizes is partially caused by the fact that usu- ing to the target lemma (we avoid surrogate lemmas that exhibit changes ally, the English abstract is the full original, and its Czech translation is on the root during inflection). often shortened considerably by the authors. 78 R. Rosa Manual inspection of the outputs (see also the examples standable to readers (e.g. “tríbank” for “treebank”, “tvít” in the beginning of this paper) showed that the chosen do- for “tweet”, or “kros-lingvální” for “cross-lingual” – here main is quite suitable for lexicon-less translation, but the the base models generated a rather nonsensical “lingual proportion of autosemantic words that cannot be simply kříže”). Unfortunately, it also often Czechizes named en- transformed from English to Czech without a lexicon is tities, even though we explicitly avoid them if they are still rather high – high enough to make many of the sen- marked by the analysis; this seems to be primarily a short- tences barely comprehensible. We therefore acknowledge coming (or unsuitability for this task) of the named en- that at least a small lexicon would be necessary to obtain tity recognizer used [12], which seems to favour preci- reasonable translations for most sentences. On the other sion over recall. Still, Czechizator can sometimes provide hand, we observed many phrases, and occasionally even a better translation than the base models, even in cases whole sentences, whose Czechizations were of a rather where the term is not an OOV – such as the word “post- high quality and understandable to Czech speakers with editing”, which the base models translate into a confus- minor or no difficulties. We thus find our approach in- ing “poúprava”, while Czechizator provides an acceptable teresting and potentially promising, although we believe translation “post-editování”.11 that the amount of work needed to bring the system to a In general, we believe that, if appropriate attention is competitive level of translation quality would be by sev- paid to the identified issues, such as named entities avoid- eral orders of magnitude larger than that spent on creat- ance, Czechizator has the potential of usefully comple- ing the current system (which took less than one person- menting the base TectoMT translation models, especially week). Still, we expect that for the given domain, develop- in handling OOV terms. ing such a rule-based system would constitute many times less work than building an open-domain system. Thanks to the deep analysis and generation provided by 4 Conclusion TectoMT, the Czechizations tend to be rather grammati- cal, with words correctly inflected, even if non-sensical. We implemented a rule-based lexicon-less English-Czech Unfortunately, even grammatical errors occur rather fre- translation model into TectoMT, called Czechizator. The quently – some words are not inflected at all, some violate model is based on a set of simple rules, mainly follow- morphological agreement (e.g. in gender, case or number), ing regularities in adoption of English terms into Czech. etc. This can be explained by realizing that the complex Czechizator has been especially designed for and applied TectoMT pipeline consists of many subcomponents, each to the domain of abstracts of scientific papers, but also operating with a certain precision, occasionally producing provides interesting results for texts from the marketing erroneous analyses. The most crucial stage seems to be domain. syntactic parsing, which has been reported to have only We automatically evaluated Czechizator on a collection approximately 85% accuracy, i.e. roughly 15% of depen- of abstracts of computational linguistics papers, showing dency relations are assigned incorrectly; these typically inferior but promising results in comparison with the base manifest themselves as agreement errors in the Czechiza- TectoMT models; the highest observed potential is in em- tion output. ploying Czechizator as an additional TectoMT translation Evaluation of the main potential use case of Czechiza- model for out-of-vocabulary items. tor, i.e. complementing base TectoMT translation mod- Czechizator is released as an open-source Treex module els for OOVs (Base + Czechizator setup), brought mixed in the main Treex repository on Github,12 and is also made results. There is a small deterioration in the automatic available as an online demo.13 scores, and subsequent manual inspection showed that Czechizator can target OOVs only semi-sucessfully. It Acknowledgments can offer a Czechization of any OOV term, which is of- ten correct (e.g. “anafora” for English “anaphora”, “inter- This research was supported by the grants GAUK lingvální” for “interlingual”, “hypotaktický” for “hypotac- 1572314, and SVV 260 333. This work has been using tical”, or “cirkumfixální” for “circumfixal”), but some- language resources and tools developed, stored and dis- times the Czechization is not correct (e.g. “businost” for tributed by the LINDAT/CLARIN project of the Ministry “business”, “hands-onový” for “hands-on”, or “kolokaty” of Education, Youth and Sports of the Czech Republic for “collocations”). In many cases, a Czechization of the (project LM2015071). term is simply not used in practice, and is less under- standable to the reader than the original English form (e.g. “kejnotový” for “keynote”, “veb-pagová” for “web-page”, 11 Other such examples include the Czechization “reimplemen- “part-of-spích” for “part-of-speech”, or “kros-langvaž” for tace” for “reimplementation” instead of “znovuprovádění”, or “post- “cross-language”). Czechizator also often generates a nominální” for “post-nominal” instead of “pojmenovitý”. 12 https://github.com/ufal/treex/blob/master/lib/ form that is plausible but rarely or never used, although Treex/Tool/TranslationModel/Rulebased/Model.pm one may think that the Czechized form may become the 13 http://ufallab.ms.mff.cuni.cz/~rosa/czechizator/ standard Czech translation in future, and is mostly under- input.php Czechizator – Čechizátor 79 References editor, ITAT, volume 788, pages 7–14, Košice, Slovakia, 2011. Univerzita Pavla Jozefa Šafárika v Košiciach. [1] Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel, [14] Zdeněk Žabokrtský and Martin Popel. Hidden Markov tree and Rudolf Rosa. New language pairs in tectoMT. In Pro- model in dependency-based machine translation. In Pro- ceedings of the 10th Workshop on Machine Translation, ceedings of the ACL-IJCNLP 2009 Conference Short Pa- pages 98–104, Stroudsburg, PA, USA, 2015. Association pers, pages 145–148, Suntec, Singapore, 2009. Association for Computational Linguistics, Association for Computa- for Computational Linguistics. tional Linguistics. [15] Zdeněk Žabokrtský, Jan Ptáček, and Petr Pajas. TectoMT: [2] Ondřej Dušek and Filip Jurčíček. Training a natural lan- Highly modular MT system with tectogrammatics used as guage generator from unaligned data. In Proceedings of the transfer layer. In ACL 2008 WMT: Proceedings of the Third 53rd Annual Meeting of the Association for Computational Workshop on Statistical Machine Translation, pages 167– Linguistics and the 7th International Joint Conference on 170, Columbus, OH, USA, 2008. Association for Compu- Natural Language Processing (Volume 1: Long Papers), tational Linguistics. pages 451–461, Stroudsburg, PA, USA, 2015. Association for Computational Linguistics, Association for Computa- tional Linguistics. [3] Jan Hajič, Vladislav Kuboň, and Jan Hric. Česílko - an MT system for closely related languages. In ACL2000, Tuto- rial Abstracts and Demonstration Notes, pages 7–8. ACL, ISBN 1-55860-730-7, 2000. [4] Petr Homola and Vladislav Kuboň. Česílko 2.0, 2008. [5] Kamil Kos. Adaptation of new machine translation metrics for Czech. Bachelor’s thesis, Charles University in Prague, 2008. [6] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vec- tor space. arXiv preprint arXiv:1301.3781, 2013. [7] Martin Popel, Roman Sudarikov, Ondřej Bojar, Rudolf Rosa, and Jan Hajič. TectoMT – a deep-linguistic core of the combined chimera MT system. Baltic Journal of Mod- ern Computing, 4(2):377–377, 2016. [8] Martin Popel and Zdeněk Žabokrtský. TectoMT: Modular NLP framework. In Hrafn Loftsson, Eirikur Rögnvaldsson, and Sigrun Helgadottir, editors, Lecture Notes in Artificial Intelligence, Proceedings of the 7th International Confer- ence on Advances in Natural Language Processing (IceTAL 2010), volume 6233 of Lecture Notes in Computer Science, pages 293–304, Berlin / Heidelberg, 2010. Iceland Centre for Language Technology (ICLT), Springer. [9] Rudolf Rosa. Czech and English abstracts of ÚFAL papers, 2016. LINDAT/CLARIN digital library at Institute of For- mal and Applied Linguistics, Charles University in Prague. [10] Rudolf Rosa, Ondřej Dušek, Michal Novák, and Martin Popel. Translation model interpolation for domain adapta- tion in TectoMT. In Jan Hajič and António Branco, editors, Proceedings of the 1st Deep Machine Translation Work- shop, pages 89–96, Praha, Czechia, 2015. ÚFAL MFF UK, ÚFAL MFF UK. [11] Petr Sgall, Eva Hajičová, and Jarmila Panevová. The mean- ing of the sentence in its semantic and pragmatic aspects. Springer, 1986. [12] Jana Straková, Milan Straka, and Jan Hajič. Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd An- nual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pages 13–18, Baltimore, Maryland, June 2014. Association for Computational Lin- guistics. [13] Zdeněk Žabokrtský. Treex – an open-source framework for natural language processing. In Markéta Lopatková,