<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multilingual Terminological Resources: Comparing Machine Translation and Corpus-Based Translation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Melania</forename><surname>Cabezas-García</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Granada</orgName>
								<address>
									<addrLine>C/ Buensuceso 11</addrLine>
									<postCode>18002</postCode>
									<settlement>Granada</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pilar</forename><surname>León-Araúz</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Granada</orgName>
								<address>
									<addrLine>C/ Buensuceso 11</addrLine>
									<postCode>18002</postCode>
									<settlement>Granada</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multilingual Terminological Resources: Comparing Machine Translation and Corpus-Based Translation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5F529443C75785C85AD2F3202A01DC3A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multiword term</term>
					<term>machine translation</term>
					<term>corpus</term>
					<term>specialized translation</term>
					<term>terminology</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Terminological resources increasingly use machine translation as a method to speed up time and reduce costs. With a view to enhancing the multilingual representation of multiword terms (e.g. passive stall-regulated wind turbine) in terminological resources, we describe an analysis of English-Spanish multiword term translation in various machine translation systems, paying special attention to the errors encountered. A comparison of machine translation output with the equivalents found in a comparable corpus is also presented. Even though machine translation often shows errors, it can serve as a basis for human post-editing, thus saving time and costs in terminological work. Comparable corpora, on the other hand, offer better results, but searches are more time-consuming.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With a view to expanding markets and disseminating knowledge, specialized texts generate a large volume of translations. Terminological resources should assist in this respect by means of the inclusion of multilingual information. In this sense, machine translation is increasingly being used as a method to speed up time and reduce costs <ref type="bibr" target="#b0">[1]</ref>.</p><p>This paper focuses on the translation of distinctive units of scientific texts, i.e. multiword terms (e.g. passive stall-regulated wind turbine), which pose problems both to human translators and natural language processing systems. However, multiword term machine translation has not been the focus of attention with some exceptions, such as <ref type="bibr" target="#b1">[2]</ref>. This is especially true of more complex multiword terms that have three or more constituents.</p><p>In order to enhance the multilingual representation of multiword terms in terminological resources, we carried out the following tasks: (i) we analyzed English-Spanish multiword term translation in various machine translation systems; (ii) developed a proposal of the causes that may generate errors in multiword term machine translation; and (iii) compared machine translation output with the equivalents that may be manually found in corpora. For this purpose, a set of three-, four-, and five-term English multiword terms related to environmental science were extracted from a specialized corpus on this field (10,228,919 words, <ref type="bibr" target="#b2">[3]</ref>). Environmental science was chosen due to the large volume of translations generated as a result of the increasing environmental awareness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Machine Translation versus Corpus-Based Translation</head><p>1st International Conference on "Multilingual digital terminology today. Design, representation formats and management systems", June 16 -17, Padova, Italy EMAIL: melaniacabezas@ugr.es (M. Cabezas-García); pleon@ugr.es (P. León-Araúz) ORCID: 0000-0002-8622-1036 es (M. Cabezas-García); 0000-0002-8520-2749 (P. León-Araúz)</p><p>Far from the classic challenging view of machine translation, according to which it would replace human translators, machine translation also presents opportunities not only to human translators, as evidenced in the great demand for machine translation post-editing (i.e. reviewing and enhancing a machine translation), but also to terminologists. Evidently, including post-editing in the workflow brings added value to machine translation, minimizing possible mistakes and providing quality equivalents to be included in terminological resources.</p><p>Even though training a neural machine translation system by means of carefully selected corpora from the specialized subject field could provide better results than using generic machine translation engines, the truth is that translators usually do not have user friendly tools to train their own domainspecific engines. For this reason, the selected English multiword terms were provided without context to different generic machine translation engines: Google Translate and DeepL (neural systems), and Apertium (rule-based system).</p><p>To compare machine translations with equivalents found in corpora, parallel or comparable corpora can be used. Parallel corpora are sets of original texts aligned with their translations, thus facilitating the identification of equivalents. However, such corpora are scarce, especially in languages other than English, and generally show a marked influence of the source text on the translation. In contrast, comparable corpora are more useful. Since they are two sets of original texts of the same type and subject, they can be used to analyze native expressions in each language <ref type="bibr" target="#b3">[4]</ref>.</p><p>Therefore, a Spanish comparable corpus was used, which includes environmental texts originally written in this language (10,667,434 words). Techniques for identifying multiword term equivalents in corpora <ref type="bibr" target="#b4">[5]</ref> were employed since translation identification in comparable corpora is not as direct as in machine translation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Translating Multiword Terms using Machine Translation and Corpora</head><p>Multiword terms pose problems both to human translators and natural language processing systems since their adequate translation must consider aspects such as their internal dependencies, the semantic relation between constituents, the specialization of elements, etc. <ref type="bibr" target="#b4">[5]</ref>. Many of these issues involve human intelligence, which machine translation lacks. General multiword expressions (e.g. take a seat, by and large, let's go, as soon as) have been widely explored in machine translation <ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref>. However, specialized multiword terms have received considerably less attention.</p><p>Not surprisingly, our results revealed that machine translation systems' output varies in the different engines. They often show errors of different nature and magnitude, which were used to establish the different causes behind them and could be used to enhance machine translation systems. These errors include: (i) the wrong identification of internal dependencies ([doubly fed] [induction generator] &gt; inducción alimentada doblemente generador, lit. *generator doubly fed induction); (ii) the wrong translation of constituents (wave turbulence interaction parameterization &gt; interacción de turbulencia ondulatoria parameterization); and (iii) the wrong identification of the internal semantic relation (windgenerated electricity &gt; viento-electricidad generada, lit. *generated wind-electricity). However, machine translation can serve as a basis for human post-editing, thus saving time and costs in terminological work.</p><p>Comparable corpora, on the other hand, offer better results, but searches are more time-consuming. Ideally, these different techniques should be integrated into translators' and terminologists workflow, something that language service providers in the 2020s are bound to do. Furthermore, these results can be integrated into training for future translators and terminologists, who will have to work in this everchanging reality.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Acknowledgements</head><p>This research was carried out as part of projects PID2020-118369GB-I00, Transversal integration of culture into an environmental terminological knowledge base (TRANSCULTURE), funded by the Spanish Ministry of Science and Innovation; and project A-HUM-600-UGR20, Culture as a transversal module in an environmental terminological knowledge base (CULTURAMA), funded by the ERDF Operational Programme for Andalucía 2014-2020.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Automatic Enrichment of Terminological Resources: the IATE RDF Example</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arcan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Montiel-Ponsoda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2018</title>
				<meeting>LREC 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="930" to="937" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Improving machine translation output of German compound and multiword financial terms: a comparison with cross-linguistic data</title>
		<author>
			<persName><forename type="first">Christina</forename><surname>Valavani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christina</forename><surname>Alexandris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">George</forename><forename type="middle">K</forename><surname>Mikros</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Human-Intelligent Systems Integration</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="29" to="34" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The EcoLexicon English Corpus as an open corpus in Sketch Engine</title>
		<author>
			<persName><forename type="first">P</forename><surname>León-Araúz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>San Martín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Reimerink</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th EURALEX International Congress</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Čibej</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Gorjanc</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Kosem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Krek</surname></persName>
		</editor>
		<meeting>the 18th EURALEX International Congress<address><addrLine>Ljubljana, Euralex</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="893" to="901" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Terminology and translation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bowker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Handbook of Terminology</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Kockaert</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Steurs</surname></persName>
		</editor>
		<meeting><address><addrLine>Amsterdam, Philadelphia</addrLine></address></meeting>
		<imprint>
			<publisher>John Benjamins</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="304" to="323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Procedimiento para la traducción de términos poliléxicos con la ayuda de corpus</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cabezas-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>León-Araúz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Sistemas fraseológicos en contraste: Enfoques computacionales y de corpus</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Corpas Pastor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Bautista Zambrana</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Hidalgo Ternero</surname></persName>
		</editor>
		<meeting><address><addrLine>Comares, Granada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="203" to="230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Multiword Expressions and Machine Translation</title>
		<author>
			<persName><forename type="first">Arvi</forename><surname>Hurskainen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technical Reports in Language Technology</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1" to="18" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>Report</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">When multiwords go bad in machine translation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Barreiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Orliac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Batista</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MT Summit Workshop Proceedings on Multi-word Units in Machine Translation and Translation Technology</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="26" to="33" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Multiword Expression Processing: A Survey</title>
		<author>
			<persName><surname>Constant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gülşen</forename><surname>Mathieu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Johanna</forename><surname>Eryiǧit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lonneke</forename><surname>Monti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carlos</forename><surname>Van Der Plas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Ramisch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amalia</forename><surname>Rosner</surname></persName>
		</author>
		<author>
			<persName><surname>Todirascu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="837" to="892" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation</title>
		<author>
			<persName><forename type="first">Sara</forename><surname>Ebrahim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Doaa</forename><surname>Hegazy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mostafa</forename><surname>Gadal-Haqq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mostafa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Samhaa</surname></persName>
		</author>
		<author>
			<persName><surname>El-Beltagy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">117</biblScope>
			<biblScope unit="page" from="111" to="118" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multiword Expression aware Neural Machine Translation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zaninello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Birch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)</title>
				<meeting>the 12th Conference on Language Resources and Evaluation (LREC 2020)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3816" to="3825" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
