<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A neural machine translation system for Galician from transliterated Portuguese t ext Un sistema de tradución neuronal para el gallego a partir de texto portugués transliterado</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">John</forename><forename type="middle">E</forename><surname>Ortega</surname></persName>
							<email>john.ortega@usc.gal</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Centro de Investigación en Tecnoloxías da Información (CITIUS)</orgName>
								<orgName type="institution">Universidad de Santiago de Compostela</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iria</forename><surname>De-Dios-Flores</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Centro de Investigación en Tecnoloxías da Información (CITIUS)</orgName>
								<orgName type="institution">Universidad de Santiago de Compostela</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">José</forename><forename type="middle">Ramom</forename><surname>Pichel</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Centro de Investigación en Tecnoloxías da Información (CITIUS)</orgName>
								<orgName type="institution">Universidad de Santiago de Compostela</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pablo</forename><surname>Gamallo</surname></persName>
							<email>pablo.gamallo@usc.gal</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Centro de Investigación en Tecnoloxías da Información (CITIUS)</orgName>
								<orgName type="institution">Universidad de Santiago de Compostela</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A neural machine translation system for Galician from transliterated Portuguese t ext Un sistema de tradución neuronal para el gallego a partir de texto portugués transliterado</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BEE112472573098618A0D59E726F1CF8</idno>
					<idno type="DOI">10.18653/v1/</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Galician Language</term>
					<term>Neural Machine Translation</term>
					<term>Transliteration</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present a neural machine translation (NMT) system for translating both Spanish and English to Galician (𝐸𝑆-𝐺𝐿 and 𝐸𝑁 -𝐺𝐿). Galician is a language closely related to Portuguese, with low to medium resources, spoken in northwestern Spain. Our NMT system is trained on large-scale synthetic 𝐸𝑆 → 𝑃 𝑇 → 𝐺𝐿 and 𝐸𝑁 → 𝑃 𝑇 → 𝐺𝐿 parallel corpora created by the spelling transliteration of Portuguese to Galician from a high-quality Spanish to Portuguese (𝐸𝑆-𝑃 𝑇 ) and English to Portuguese (𝐸𝑁 -𝑃 𝑇 ) translation memories. The NMT system is then made available via a public web interface at https://demos.citius.usc.es/nos_tradutor.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Several systems have been compared and developed to perform machine translation (MT), ranging from rule-based systems to systems based on neural networks <ref type="bibr" target="#b0">[1]</ref> Traditionally, rule-based systems like Apertium <ref type="bibr" target="#b1">[2]</ref> are used for languages with a small amount of parallel data. That is because MT systems backed by neural networks, or neural machine translation (NMT) systems, require high amounts of data, typically on the order of millions of sentences or more <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. An interesting option for low-resource languages is the use of zero-shot translation techniques, that is, translating in multilingual settings between language pairs for which the NMT system has never been trained. However, as Gu et al. <ref type="bibr" target="#b4">[5]</ref> state, training zero-shot NMT models easily fails as this task is very sensitive to hyper-parameter setting. The performance of zero-shot strategies is usually lower than that of more conventional pivot-based approaches.</p><p>We describe and implement an approach inspired by previous work <ref type="bibr" target="#b5">[6]</ref> that uses the proximity of Portuguese and Galician to overcome the lack of resources problem and produces corpora to build an NMT system, similar to low-resource NMT systems found in previous work <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>, for translating both Spanish to Galician and English to Galician. Our system first uses high-quality Spanish-Portuguese (ES-PT) and English-Portuguese (EN-PT) parallel corpora to translate the target-sided (Portuguese) sentences (or segments) to Galician using transliteration, the conversion of text in one language to another through spelling. Transliteration between Portuguese and Galician works well due to the orthographic nearness of the two languages found in previous work <ref type="bibr" target="#b8">[9]</ref>. Second, NMT systems with the transliterated Galician parallel text are created to form a Spanish-Galician (ES-GL) and English-Galician (EN-GL) MT system where both Spanish and English are the source languages and Galician is the target language. Two different neural-based architectures were tested: Long short-term memory (LSTM) and Transformers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Method</head><p>Our translation strategy consists of two steps. The first step uses transliteration <ref type="bibr" target="#b9">[10]</ref> to create parallel Galician segments from the Portuguese segments in the aligned corpus, by making using of the transliteration tool port2gal<ref type="foot" target="#foot_0">1</ref> , which contains several hundreds of rules on characters and sequences of characters. Both training and validation sets are transliterated leaving a final parallel Galician corpus. Then, in the second step, the Galician (transliterated) cor- Results obtained for the two language pairs (𝐸𝑆-𝐺𝐿 and 𝐸𝑁 -𝐺𝐿) evaluated on two different systems, LSTM and Transformer, by making use of three quantitative measures: BLEU, TER and ChrF2. The corpus size is quantified in millions of sentences (M).</p><p>pus is used to train an NMT system with Spanish or English as the source language and Galician as the target language. For the first transliteration step, we also tested a more complex strategy by combining PT→GL Apertium translator <ref type="bibr" target="#b1">[2]</ref>, which uses a basic bilingual dictionary to translate word by word, with the transliteration tool for those words that are not in the bilingual dictionary.</p><p>The NMT system that we use for ES-GL and EN-GL translations was created using OpenNMT <ref type="bibr" target="#b10">[11]</ref>, a generic deep learning framework for creating sequence-to-sequence models in machine translation. In particular, we trained a LSTM (long short term memory) seq2seq model as well as a Transformer model for each language pair.</p><p>Concerning LSTM, we used the following default neural network training parameters: two hidden layers, 500 hidden LSTM units per layer, input feeding enabled, 13 epochs, batch size of 64. Alternatively, we modified the default learning step parameters to 100,000 training steps and 10,000 validation steps. Traditional tokenization was performed with Linguakit <ref type="bibr" target="#b11">[12]</ref> The Transformer implementation, described in Garg et al. <ref type="bibr" target="#b12">[13]</ref>, was configured with default training parameters: 6 layers for both encoding and decoding and batch size of 4096 tokens. We also modified the learning step parameters to the same values as the LSTM configuration. In this case, we used sub-word tokenization, performed with SentencePiece <ref type="bibr" target="#b13">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Corpora</head><p>The main parallel sources we used to train the NMT system come from Opus<ref type="foot" target="#foot_1">2</ref> . In particular we used the 𝐸𝑆-𝑃 𝑇 and 𝐸𝑁 -𝑃 𝑇 partitions of both Europarl<ref type="foot" target="#foot_2">3</ref> , with about 2 million sentences per language, and OpenSubtitles<ref type="foot" target="#foot_3">4</ref> , containing about 30 million sentences in 𝐸𝑆-𝑃 𝑇 and 25 in 𝐸𝑁 -𝑃 𝑇 . The Portuguese partition was transliterated to Galician so as to build 𝐸𝑆-𝐺𝐿 and 𝐸𝑁 -𝐺𝐿 parallel corpora. In addition, we also added the Spanish-Galician partition of CLUVI<ref type="foot" target="#foot_4">5</ref> , to the 𝐸𝑆-𝐺𝐿 corpus, containing 144 thousand sentences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Test results</head><p>Table <ref type="table" target="#tab_0">1</ref> show the results of different experiments for 𝐸𝑆-𝐺𝐿 and 𝐸𝑁 -𝐺𝐿 combining the system, LSTM or Transformer, with the size of the corpus. We observe that LSTM works very well for close languages (𝐸𝑆-𝐺𝐿), but for the pair (𝐸𝑁 -𝐺𝐿), two distant languages, the results are slightly better with Transformer. In addition, we also observe that the whole OpenSubtitles corpus hurts the performance in 𝐸𝑆-𝐺𝐿. The best results in 𝐸𝑆-𝐺𝐿 combine Europarl with OpenSubtitles and are comparable to the state-of-the-art <ref type="bibr" target="#b14">[15]</ref>. Let us note that the Movie and TV subtitles of OpenSubtitles are a highly valuable resource but the quality of the resulting sentence alignments is often lower than for other parallel corpora <ref type="bibr" target="#b15">[16]</ref>. The results in Table <ref type="table" target="#tab_0">1</ref> allow us to confirm that using transliteration between two closely aligned languages like Portuguese and Galician, favorable outcomes can be achieved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Demonstration</head><p>Our demonstration is made up of a public-facing web page<ref type="foot" target="#foot_5">6</ref> that provides Galician translations for both Spanish and English inputs. Users will be able to test the system via an open web interface (see Figure <ref type="figure" target="#fig_0">1</ref>) where they could select the language pair (𝐸𝑆-𝐺𝐿 or 𝐸𝑁 -𝐺𝐿) and translation system (LSTM or Transformer) to then enter text and generate translations.</p><p>In our demonstration, we plan to show where our system performs well and where it does not perform well. As an example, the sentence translated from Spanish to Galician using the LSTM system in Table <ref type="table">2</ref> is an excellent translation despite its long length. Additionally, our system translations perform well with syntax and seem to generally translate better than previous systems tested on the same domain. Nonetheless, we have found that when comparing our system's performance for lexical and morphological quality, the Portuguese transliteration affect the performance, found to be better on other rule-based MT systems like Apertium <ref type="bibr" target="#b1">[2]</ref> for example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future work</head><p>We plan to perform further work with a human-inthe-loop to increase the performance based on quality. This is outlined by a continuous improvement plan which insinuates the inclusion of translators for user functionality tests. For example, spelling and lexical issues such as acidente instead of accidente, formal Galician differences that need to be addressed are first to be solved using newly-developed heuristics as part of our future contingency plan. The aim will be to create the highest-quality system in order expand the language pairs to other languages such as Russian or Chinese.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Spanish</head><p>Galician Debemos imponer el cumplimiento de los reglamentos y velar por que se aplique el principio de que "el que contamina paga" para que se utilicen sanciones y también incentivos financieros a fin de presionar a los propietarios de los buques y las compañías petroleras y lograr que se introduzcan los procedimientos mejores. Temos de impor o cumpremento dos regulamentos e celar por que o principio do poluidor-pagador sexa aplicado para que sexan utilizadas sancións e tamén incentivos financeiros a fin de exercer presión sobre os proprietarios dos navíos e das compañías petrolíferas e conseguir que os procedementos mellores sexan introducidos.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Translation using the best performing machine translation system (LSTM).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A screen capture of the web interface.</figDesc><graphic coords="3,97.22,84.19,187.51,161.11" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>system</cell><cell>pair</cell><cell>source</cell><cell cols="2">corpus size bleu</cell><cell>ter</cell><cell>chrF2</cell></row><row><cell>lstm</cell><cell>es-gl</cell><cell>Europarl+CLUVI</cell><cell>2.35M</cell><cell cols="2">48.9 34.4</cell><cell>69.3</cell></row><row><cell>lstm</cell><cell>es-gl</cell><cell>Europarl+CLUVI+OpenSubt(part)</cell><cell>5M</cell><cell>51.1</cell><cell>32.8</cell><cell>70.8</cell></row><row><cell>lstm</cell><cell>es-gl</cell><cell>Europarl+CLUVI+OpenSubt</cell><cell>30M</cell><cell cols="2">46.0 37.2</cell><cell>66.5</cell></row><row><cell>transformer</cell><cell>es-gl</cell><cell>Europarl+CLUVI</cell><cell>2.35M</cell><cell cols="2">17.5 67.4</cell><cell>53.0</cell></row><row><cell>transformer</cell><cell>es-gl</cell><cell>Europarl+CLUVI+OpenSubt</cell><cell>30M</cell><cell cols="2">13.9 66.7</cell><cell>46.4</cell></row><row><cell>lstm</cell><cell>en-gl</cell><cell>Europarl+OpenSubt</cell><cell>27.M</cell><cell cols="2">26.6 50.3</cell><cell>45.5</cell></row><row><cell cols="2">transformer en-gl</cell><cell>Europarl+OpenSubt</cell><cell>27.M</cell><cell>29.3</cell><cell>49.7</cell><cell>51.0</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/gamallo/port2gal</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://opus.nlpl.eu</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://opus.nlpl.eu/Europarl.php</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://opus.nlpl.eu/OpenSubtitles.php</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://repositori.upf.edu/handle/10230/20051</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://demos.citius.usc.es/nos_tradutor</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was funded by the project "Nós: Galician in the society and economy of artificial in-telligence", agreement between Xunta de Galicia and University of Santiago de Compostela, and grant ED431G2019/04 by the Galician Ministry of Education, University and Professional Training, and the European Regional Development Fund (ERDF/FEDER program).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A comparison of machine translation paradigms for use in black-box fuzzy-match repair</title>
		<author>
			<persName><forename type="first">R</forename><surname>Knowles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Koehn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing</title>
				<meeting>the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="249" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Apertium: a free/open-source platform for rule-based machine translation</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Forcada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ginestí-Rosell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nordfalk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>O'regan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ortiz-Rojas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Pérez-Ortiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sánchez-Martínez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ramírez-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Tyers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine translation</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="127" to="144" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.0473</idno>
		<title level="m">Neural machine translation by jointly learning to align and translate</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Koehn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Knowles</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.03872</idno>
		<title level="m">Six challenges for neural machine translation</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Improved zero-shot neural machine translation via ignoring spurious correlations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">O</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1121</idno>
		<ptr target="https://aclanthology.org/P19-1121.doi:10.18653/v1/P19-1121" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1258" to="1268" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Carvalho: Englishgalician smt system from europarl englishportuguese parallel corpus</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R P</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gamallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>García</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento Del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="page" from="379" to="381" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Neural machine translation with a polysynthetic low resource language</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Mamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Translation</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="325" to="346" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Overcoming resistance: The normalization of an Amazonian tribal language</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Castro-Mamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Montoya Samame</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.loresmt-1.1" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, Association for Computational Linguistics</title>
				<meeting>the 3rd Workshop on Technologies for MT of Low Resource Languages, Association for Computational Linguistics<address><addrLine>Suzhou, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A methodology to measure the diachronic language distance between three languages based on perplexity</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Pichel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gamallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Alegria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neves</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Quantitative Linguistics</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="306" to="336" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Knight</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Graehl</surname></persName>
		</author>
		<idno>arXiv preprint cmp-lg/9704003</idno>
		<title level="m">Machine transliteration</title>
				<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">OpenNMT: Open-source toolkit for neural machine translation</title>
		<author>
			<persName><forename type="first">G</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Senellart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rush</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/P17-4012" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2017, System Demonstrations</title>
				<meeting>ACL 2017, System Demonstrations<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="67" to="72" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Lin-guaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction</title>
		<author>
			<persName><forename type="first">P</forename><surname>Gamallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Piñeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Martinez-Castaño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Pichel</surname></persName>
		</author>
		<idno type="DOI">10.1109/SNAMS.2018.8554689</idno>
	</analytic>
	<monogr>
		<title level="m">2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="239" to="244" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Garg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Peitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Nallasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paulik</surname></persName>
		</author>
		<idno>CoRR abs/1909.02074</idno>
		<ptr target="http://arxiv.org/abs/1909.02074.arXiv:1909.02074" />
		<title level="m">Jointly learning to align and translate with transformer models</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Kudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Richardson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1808.06226</idno>
		<title level="m">Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Evaluating machine translation in a low-resource language combination: Spanish-galician</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D C</forename><surname>Bayón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sánchez-Gijón</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Translator, Project and User Tracks</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="30" to="35" />
		</imprint>
	</monogr>
	<note>Machine Translation Summit XVII</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Open-Subtitles2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kouylekov</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/L18-1275" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)</title>
				<meeting>the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)<address><addrLine>Miyazaki, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
