<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Tiresias: Bilingual Question Answering over DBpedia Abstracts through Machine Translation and BERT</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Michalis</forename><surname>Mountantonakis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">FORTH</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michalis</forename><surname>Bastakis</surname></persName>
							<email>mbastakis@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Loukas</forename><surname>Mertzanis</surname></persName>
							<email>mertzanis@ics.forth.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">FORTH</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yannis</forename><surname>Tzitzikas</surname></persName>
							<email>tzitzik@ics.forth.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">FORTH</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Tiresias: Bilingual Question Answering over DBpedia Abstracts through Machine Translation and BERT</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">DE55903D29E1D8FC0A91BE7E5754F19B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T09:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multilingual QA</term>
					<term>DBpedia Abstracts</term>
					<term>BERT</term>
					<term>Greek Language</term>
					<term>Machine Translation</term>
					<term>Deep Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>There is a high need for providing multilingual solutions to Natural Language Processing tasks, such as for Question Answering, since a lot of people do not speak and read English. In this way, DBpedia abstracts, which are offered in several languages, can be exploited, since they cover valuable information which are never transformed to structured data (to triples), whereas for the same entity, the abstract in different languages can contain complementary information. For making it feasible to answer simple open domain factoid questions over DBpedia abstracts, we introduce Tiresias, a research prototype that supports bilingual Question Answering (QA) over DBpedia abstracts. In particular, it receives a question either in Greek or in English language, and by exploiting Named Entity Recognition models for recognizing the entity of the question, the DBpedia abstracts written in the mentioned languages for the identified entity, Machine Translation (MT) tools, and BERT QA models (pretrained in English corpus), it produces the final answer. Concerning the evaluation, we provide experimental results about the effectiveness and efficiency of MT and QA process by using a Greek evaluation collection, and we present statistics of DBpedia abstracts and several use cases with real examples.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>It is of primary importance to provide multilingual solutions to Natural Language Processing (NLP) tasks, such as for Question Answering (QA), given that a lot of people worldwide do not speak and read English, i.e., only 17% of people worldwide speak English 1 . Even in popular tourist destinations, such as Greece, a high percentage of people cannot speak and read English, i.e., 49% in Greece 2 . For aiding such tasks, we believe that DBpedia abstracts <ref type="bibr" target="#b0">[1]</ref>, which are offered in several languages <ref type="bibr" target="#b1">[2]</ref> can be exploited. Indeed, they cover valuable information that is not always transformed to structured data, whereas for the same entity the abstract in different languages can contain complementary information. In particular, the DBpedia abstracts are derived from Wikipedia, where each page is offered in one or more languages, and for each language different information can be covered, especially for entities that are more related to a specific country (and language). As an example, see the DBpedia abstract for X a n t h i p p e , the wife of Socrates: i) the following text "She was likely much younger than Socrates, perhaps by as much as 40 years", which is part of both English and Greek abstract, has not been transformed to structured data (i.e., to RDF triples), whereas ii) the Greek abstract is quite larger comparing to the English one, and contains complementary information (that is not offered in English).</p><p>For supporting bilingual factoid Question Answering (QA) over DBpedia abstracts for questions like "How many years was Xanthippe younger than Socrates?", we introduce Tiresias, a web application that follows a generic pipeline that can be adopted for any language, for answering such questions over Greek and English, by using as input the DBpedia abstracts in both languages. In particular, a) Tiresias receives a question in English or Greek, b) it recognizes the main entity in the question by using Named Entity Recognition (NER) tools, such as DBpedia Spotlight <ref type="bibr" target="#b2">[3]</ref>, c) it retrieves the abstract of the entity in both English and Greek, through a SPARQL query, d) it translates any Greek input (question, context) in English through Machine Translation (MT) tools (e.g., Bing), e) it produces the final answer in English through BERT QA models (pretrained in an English corpus) by using the DBpedia abstracts as context, and f) it can possibly translate back the answer to Greek. Moreover, it also supports QA by giving a specific context a priori, i.e., without using NER and DBpedia abstracts.</p><p>Regarding the Research Questions, we would like to investigate a) how effective and efficient is the mentioned pipeline, and b) whether by using the proposed approach, we can answer questions that are expressed in English from a Greek text (i.e., abstract), and vice versa. Concerning our contribution, Tiresias, which is available in https://demos.isl.ics.forth.gr/tiresias, is the first online system offering open domain QA in Greek by using MT and BERT models, and QA in English by exploiting texts in Greek. The web application offers a rich configuration for performing bilingual QA over DBpedia abstracts, by combining MT and BERT models. Regarding the evaluation, we provide experimental results about the efficiency of MT and QA process, statistics of DBpedia abstracts and several use cases with real examples.</p><p>The rest of this paper is organized as: §2 discusses related work, §3 presents the process and web UI of Tiresias, §4 presents the experimental evaluation and §5 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The multilingualism in QA is one major challenge in the area of knowledge graphs and semantic web <ref type="bibr" target="#b3">[4]</ref>, and recently, there is a trend towards that direction. In particular, <ref type="bibr" target="#b4">[5]</ref> use machine translation tools for evaluating QA over knowledge graphs for questions in languages that are unsupported by a KGQA system for 8 different languages. Moreover, <ref type="bibr" target="#b5">[6]</ref> uses bilingual lexicon induction (BLI) methods and multilingual models for performing QA over DBpedia. As regards online QA systems, Platypus <ref type="bibr" target="#b6">[7]</ref>, is a multilingual QA that supports QA for 3 languages by using Wikidata <ref type="bibr" target="#b7">[8]</ref>, whereas QAnswer <ref type="bibr" target="#b8">[9]</ref> provides multilingual QA by exploiting many knowledge bases. Finally, Wikidata is used by DeepPavlov <ref type="bibr" target="#b9">[10]</ref> for QA in two languages.</p><p>Concerning QA approaches over knowledge bases that use also the textual description of entities, <ref type="bibr" target="#b10">[11]</ref> proposed an approach for combining the textual description of entities for QA and triples from DBpedia, for providing a hybrid QA. Regarding QA systems in Greek language, APANTISIS <ref type="bibr" target="#b11">[12]</ref> is a QA system that can be plugged in relational databases. Furthermore, there are multilingual evaluation collections over Knowledge graphs, e.g., QALD-9 <ref type="bibr" target="#b12">[13]</ref>, whereas QALD-7 <ref type="bibr" target="#b13">[14]</ref> provides an evaluation collection for Hybrid QA systems in English. Moreover, <ref type="bibr" target="#b14">[15]</ref> surveys approaches over QA and Linked Data, including hybrid QA tools that exploit textual information. Finally, concerning approaches offering multilingual QA over textual sources by using MT, <ref type="bibr" target="#b15">[16]</ref> transforms both the question and context in English for offering QA for French and Japanese, whereas in <ref type="bibr" target="#b16">[17]</ref> a new metric was proposed for evaluating such methods.</p><p>Novelty of Tiresias. To the best of our knowledge, Tiresias is the first online system offering open domain QA in Greek (by exploiting contexts in both English and Greek), whereas it is the first application offering QA in English by exploiting the Greek version of DBpedia abstracts. Finally, it is the first approach combining MT and BERT models for QA in Greek.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The Steps of Tiresias</head><p>Here, we present the steps of the proposed approach (see the upper side of Fig. <ref type="figure" target="#fig_0">1</ref>), by also showing a running example (see the lower side of Fig. <ref type="figure" target="#fig_0">1</ref>).</p><p>Input. Tiresias receives one or more simple factoid questions in a language 𝐿 (i.e., Greek and English), and possibly a context text. When the context is not given, extra steps are needed for recognizing the question entity and retrieving the context, e.g., see Fig. <ref type="figure" target="#fig_0">1</ref>.</p><p>Configuration. The user selects the desired language, and which MT and BERT QA models to use. Below, we provide all the possible choices. In particular, the language can be either Greek or English, whereas for Machine Translation two options are offered: B i n g and H e l s i n k i <ref type="bibr" target="#b17">[18]</ref>. Concerning BERT models for QA, 10 models are offered: 1) D e e p s e t Step A. Named Entity Recognition (NER). Given a question either in Greek or in English, we perform NER to identify the main entity of the question (the first recognized entity). Indeed we use a combination of W A T <ref type="bibr" target="#b18">[19]</ref> and D B p e d i a S p o t l i g h t <ref type="bibr" target="#b2">[3]</ref> tools through L O D s y n d e s i s I E <ref type="bibr" target="#b19">[20]</ref>, that offers NER for the English language and returns the DBpedia link of the identified entity. However, an extra step of translation is needed, when the question is given in Greek, e.g., in Fig. <ref type="figure" target="#fig_0">1</ref> we translated the question and we identified the entity "Xanthippe" and its DBpedia link.</p><formula xml:id="formula_0">/ R o B E R T A , 2) D i s t i l B E R T -c a s e d , 3) B E R T -M a s k -u n c</formula><p>Step B. Retrieval of Bilingual Context (DBpedia Abstract). Here, we send a SPARQL query to DBpedia endpoint for retrieving the abstracts in both languages, e.g., English and Greek. In the second case the abstract is translated to English for being used in the next step. By using only the English version, there would be no need for translation, however, this is not always the ideal case. Indeed, there are 3 abstract categories, a) the abstract exists in English but not in 𝐿, e.g., for the basketball player J a s o n T a t u m , there is not available a Greek abstract, b) the abstract exists in 𝐿 but not in English, i.e., many Wikipedia pages are not offered in English, e.g., see the Cretan village A n o A s i t e s , and c) the abstract exists in both languages. In that case, Tiresias uses both abstracts (e.g., see Fig. <ref type="figure" target="#fig_0">1</ref>), since they can contain complementary data, e.g., for X a n t h i p p e , the Greek DBpedia abstract contains 568 words and the English one only 44 words. Thereby, a question written in English may be answerable only from a Greek context (and vice versa). More examples, showing the impact of using both languages, and statistics are given in Section 4. As we can see in Fig. <ref type="figure" target="#fig_0">1</ref>, both abstracts were retrieved for the main entity.</p><p>Step C. QA over the (translated) context through BERT QA pretrained models in English. Since we want the process to be generic for covering more languages in the future, we decided to perform MT for translating the non-English context (i.e., the Greek) to English, and then to use BERT QA models, pretrained in English corpus for performing QA over the translated input, since they are quite effective <ref type="bibr" target="#b20">[21]</ref>. Another option could be to use either i) a pretrained QA model in a language 𝐿 or ii) a Multilingual BERT QA model. However, for i), they are not available in many languages, including Greek, and for ii), they do not cover all the languages, and can be slower due to their huge training set size, since they are usually trained in a very large corpus for covering many languages (see also <ref type="bibr">Section 4)</ref>.</p><p>Therefore, as a preprocessing step, we use the desired MT model for translating any Greek input in English. Afterwards, we just use the desired BERT QA model, which has been pretrained for the English language. In particular, any of the available models receives as input the given question and the combination of the two DBpedia abstracts and produces the answer with the highest confidence score in English language, e.g., the answer "perhaps 40 years" in Fig. <ref type="figure" target="#fig_0">1</ref>.</p><p>Step D. Translate Back the Answer. The final step is to translate back the answer, if the initial question is not given in English, e.g., in Greek. In the end, the final answer, its confidence score, and extra information such as the provenance of the answer are returned to the user. E.g., see the final answer in Fig. <ref type="figure" target="#fig_0">1</ref>, which is presented in Greek language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The Website &amp; Architecture of Tiresias Web Application</head><p>Tiresias is available in https://demos.isl.ics.forth.gr/tiresias, and it is quite responsive for several devices. It runs on a server with 4GB main memory, 2 cores and 500 GB disk space.</p><p>Examples and Screenshots. Some screenshots are shown in Figure <ref type="figure">2</ref>, i.e., see the configuration, the question, i.e., "Which is the contract of G. Antetokounmpo", and the corresponding answer, i.e., "$228 million". The real example in the upper and in the lower side is the same but for the two supported languages, i.e., English and Greek. In both cases, we used the Helsinki model for MT, and the RoBERTA model for QA, and we retrieved the correct answer. By clicking on the info button, we can retrieve more information about the answer: its provenance, which was the Greek DBpedia abstract in this example, its confidence score, a link to the corresponding DBpedia abstract, and its response time. As we can see it was faster for the English language, Additional Functionality. Tiresias can also receive as input the context, e.g., in Greek or English. In such a case, steps A and B are omitted, i.e., there is no need to retrieve a context from DBpedia. Moreover, it offers an error handling mechanism by showing several messages for informing the users for any possible error, and buttons for providing a fast configuration. Finally, several running examples are offered for both Greek and English language.</p><p>Architecture. Regarding its architecture, it is depicted in Fig. <ref type="figure">3</ref>, e.g., for the case of having a Greek question as an input. As we can see after receiving the question through an AJAX request, Tiresias sends several requests to external APIs, such as to Bing (in case we do not use Helsinki) to LODsyndesisIE for retrieving the main entity, to DBpedia SPARQL endpoint for retrieving the abstracts and again to Bing for translating the Greek context and the answers. On the contrary, the Helsinki model and the BERT QA models have been locally downloaded, therefore we do not send external requests for these models. After completing all the steps, Effectiveness. Fig. <ref type="figure">4</ref> shows the top-10 models according to the percentage of their correct answers. As we can see, the best combination uses the Bing translator, i.e., the combination of "Bing/BioBERT" answered correctly 63% of answers, partially correctly 15.5% and wrongly 21.5%. For this dataset, it outperformed the multilingual BERT QA model (which has been also pretrained in Greek), although for the latter we did not perform any translation. Moreover, some other models using the Bing translator, such as "BERT-Mask-cased" and "Deepset/Maskuncased" were quite effective. Generally, we obtained better results by using Bing, however, Helsinki translator was quite effective in many cases.</p><p>Efficiency. Here, we present results about the efficiency of performing MT and QA with BERT, by using the GreekTexts collection. Fig. <ref type="figure">5</ref> shows the efficiency for each of the BERT QA model for both Helsinki and Bing MT models, starting from the fastest one. We needed from 8 to 13 seconds, whereas by using the Helsinki translator (we do not send an external request), we achieved lower execution times for each BERT model. In comparison to the multilingual "XLM RoBERTa" model, we achieved lower execution times for each of the combination of MT/BERT QA English models, mainly since the multilingual model has been pretrained in a huge corpus (with many languages). Indeed, even in the worst case the combination of MT/BERT QA model was 2.7 seconds faster. Concerning the execution time of the translation steps (they are included in the total execution time of Fig. <ref type="figure">5</ref>), we needed on average 1.6 seconds for the translation of the context, question, and answer by using Bing, and only 0.5 seconds by using Helsinki. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Statistics for DBpedia abstracts for 7,500 entities having both an English and a Greek abstract</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Evaluation of the Other Components of Tiresias and More Results.</head><p>Concerning the effectiveness of the other components, experimental results about LODsynde-sisIE are presented in <ref type="bibr" target="#b19">[20]</ref>, whereas in the webpage of each BERT QA model, one can find more results about their effectiveness through F1score and Exact Match metrics. As regards their efficiency, the NER process of LODsyndesisIE and the retrieval of DBpedia abstracts, they are quite fast, i.e., usually less than 1 second is needed for retrieving this information. Finally, more experimental results and the evaluation texts are available in https://github.com/mbastakis/Tiresias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Statistics of DBpedia Abstracts in Greek and English</head><p>Since the context is derived from DBpedia abstracts, we present some relevant statistics. First, there are 61,159 DBpedia abstracts offered both in Greek and English. Table <ref type="table">1</ref> shows some statistics by randomly using 7,500 entities having abstracts in both languages. On average, for these entities the English abstracts contain more words than the Greek ones, i.e., 121.8 versus 115.4. However, the Greek abstracts contain more words in 44% of cases, and the English abstracts in 56% of cases. By combining both abstracts, the average number of words is 237.2, i.e., smaller than the average number of words in the texts of GreekTexts collection, whereas only a small percentage of entities have a long abstract (over 200 words), e.g., even in the combined case, less than a half of entities. Thereby, we expect that for entities having abstracts with fewer words than the used collection, we can achieve lower execution times (i.e., than those of Fig. <ref type="figure">5</ref>),</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Use Cases of Tiresias based on Real Examples</head><p>Here, we provide some use cases of Tiresias, with real examples that are presented in Table <ref type="table" target="#tab_2">2</ref>.</p><p>•UC1. Bilingual QA DBpedia Entities. The proposed pipeline can be exploited for multilingual QA over any DBpedia entity, by automatically retrieving its DBpedia abstracts as a context. Concerning the current status of Tiresias, it can be used for answering plain open domain factoid questions for both English and Greek language, by exploiting the corresponding language versions of DBpedia abstracts. Therefore, even questions written in English whose answer is included only inside a Greek abstract (and vice versa), are feasible to be answered through Tiresias. Such real examples, where the answer is included only inside either the Greek or the English abstract, are presented in the examples with IDs 1-2 in Table <ref type="table" target="#tab_2">2</ref>. Moreover, the third example includes a question that can be answered from both English and Greek abstract.  • UC2. Multilingual Hybrid QA systems. The process of Tiresias can be used from multilingual hybrid QA systems that also use templates of SPARQL queries, i.e., by combining both structured data and DBpedia abstracts. Indeed, the answer of examples 1-3 exist only inside the abstracts and not in separate triples. On the contrary, the triples of an entity usually cover complementary information, e.g., information about "the major" and "people that were born in" Santorini island exists in separate triples but not in the English and Greek DBpedia abstracts. Therefore, it would be interesting to combine all this information for hybrid QA.</p><p>• UC3. Comparison of MT and QA models. First, the process of Tiresias can be even used as a baseline for Multilingual QA, e.g., for comparing its performance against single-language or multilingual pretrained BERT models. Moreover, through Tiresias, one can compare the performance of different combinations of MT and QA models, e.g., for evaluating the QA process over Greek or English texts. For instance, in the example 4 of Table <ref type="table" target="#tab_2">2</ref>, we can see a real case where two models provided a different answer for the same question, a wrong (Toronto) and the correct one (Montreal). Moreover, in the example 5, the two translation models provided a different translation for the entity of the answer, i.e., "Angelos Charisteas". In particular, Bing provided the correct answer and the Helsinki model failed to correctly translate the final answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>We presented Tiresias, a research prototype that combines popular NER, MT and BERT QA models (pretrained in English), for offering bilingual QA in Greek and English over DBpedia abstracts. We described all its steps and its architecture, and we provided measurements about its effectiveness and efficiency for a Greek evaluation collection with 20 texts and 200 questions and we presented use cases based on real examples. The human labelled results showed that the combination of MT/BERT QA models can be exploited for QA in Greek, e.g., the best</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The process of Tiresias and an example with a simple factoid question in Greek language</figDesc><graphic coords="3,91.76,84.19,411.76,136.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>a s e d , 4) D e e p s e t / M a s k -u n c a s e d , 5) D i s t i l B E R T -u n c a s e d , 6) r s v p -a i / B e r t s e r i n i , 7) d e e p s e t / M i n i L M , 8) B i o B E R T , 9) d e e p s e t / B E R T -c a s e d , and 10) B E R T -M a s k -C a s e d .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Screenshots-A real example of Tiresias in Greek and English language</figDesc><graphic coords="5,149.75,297.40,295.79,171.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :Figure 5 :</head><label>45</label><figDesc>Figure 4: The top-10 models with the highest percentage of correct answers (Human Labelling)</figDesc><graphic coords="7,151.14,243.75,293.00,120.87" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Real examples from Tiresias for the Use Cases, showing the impact of DBpedia Abstracts</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>the backend communicates with the frontend for sending the final response (through an AJAX request) and for presenting the final answer to the user.</p><p>Github link for Code and Tutorial Video. Concerning its code (mainly written in python and javascript) and guidelines for running Tiresias, they are available in https://github.com/ mbastakis/Tiresias, and a tutorial video can be accessed through https://youtu.be/eNiD7Dvco6M.</p><p>Limitations and Possible Extensions. Concerning its limitations, since it sends several requests to external APIs, it can be affected when they do not work properly, e.g., when they are out-of-service. Moreover, some external APIs, such as Bing, offer a limited number of requests per day. Concerning possible extensions, since the selected MT tools and the DBpedia abstracts support hundreds of languages, more languages can be added. Moreover, Tiresias has been designed for enabling the easy addition of new NER, MT and BERT QA models. combination predicted correctly 63% of the answers. As a future work, we plan to a) evaluate Tiresias with specialized multilingual collections (e.g., by also constructing a new multilingual collection based on DBpedia abstracts), b) offer a REST API, c) extend Tiresias by supporting more languages and complex questions with many entities, d) provide a hybrid QA model, and e) use textual resources from other KBs, e.g, LODsyndesis <ref type="bibr" target="#b22">[23]</ref> and Wikidata.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>CEUR Workshop Proceedings (CEUR-WS.org) 1 https://www.babbel.com/en/magazine/how-many-people-speak-english-and-where-is-it-spoken 2 https://doublespeakdojo.com/how-common-is-spoken-english-in-greece/</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Evaluation &amp; Use Cases</head><p>In this section we provide an experimental evaluation for several components of Tiresias, by mainly focusing on the effectiveness/efficiency of the combination of Machine Translation and BERT QA models (pretrained in English), whereas we provide some statistics for DBpedia abstracts and several use cases for indicating the impact of using DBpedia abstracts.</p><p>Evaluation Collection for QA over Greek Language. We have created an evaluation collection, called GreekTexts. We collected manually 20 texts for QA, derived from a Greek text bank (https://www.greek-language.gr/certification/dbs/teachers/index.html), and we have written for each of the 20 texts, 10 questions and their corresponding golden answer, i.e., in total 200 questions and answers. The texts cover a variety of subjects and each text contains on average 398.5 words, each question 7.9 words and each answer 5.6 words. An alternative choice could be to use the collection XQuAD <ref type="bibr" target="#b21">[22]</ref>, which is a translated version of SQuAD that covers Greek language, however, the translated Greek version of its texts was not so accurate. However, we plan also to perform an evaluation with this collection in the future.</p><p>Evaluation Metrics and Problems. The translation can result to several problematic cases, e.g., words/phrases can be replaced by synonyms tenses/suffixes can be changed, especially for complex languages, like Greek, which uses a very extensive inflection. It means that the same word can be represented with many different suffixes (very common case for the Greek language), i.e., for denoting tenses, genders, singular and plural, and others. Therefore, in such cases it can be infeasible to evaluate the results even through F1score. For this reason, for providing accurate results we decided to annotate manually the predicted answers, i.e., in total 4,200 answers. We divided them in 3 categories by checking the golden answer: a) Correct, if the predicted answer has exactly the same meaning with the golden one, b) Partially Correct, if the predicted answer covers either a part or the whole golden answer, however it contains additional information that are irrelevant, and c) Wrong, if both the string representation and the meaning of the predicted answer is totally irrelevant compared to the golden answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental Results over Greek Texts collection</head><p>Concerning the experimental results, we have used 20 combinations of MT/BERT QA models, i.e., for each of the 10 BERT/QA models of Section 3, whereas we use both Bing and Helsinki</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">DBbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="167" to="195" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Brümmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dojchinovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<title level="m">DBpedia abstracts: a large-scale, open, multilingual nlp training corpus</title>
				<imprint>
			<publisher>LREC</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="3339" to="3343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">DBpedia spotlight: shedding light on the web of documents</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>García-Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of conference on semantic systems</title>
				<meeting>conference on semantic systems</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Survey on challenges of question answering in the semantic web</title>
		<author>
			<persName><forename type="first">K</forename><surname>Höffner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Walter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Marx</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Usbeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-C. Ngonga</forename><surname>Ngomo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="895" to="920" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Can machine translation be a reasonable alternative for multilingual question answering systems over knowledge graphs?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Perevalov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM WebConf</title>
				<meeting>the ACM WebConf</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="977" to="986" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improving zero-shot cross-lingual transfer for multilingual question answering over knowledge graph</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NAACL</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="5822" to="5834" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Tanon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>De Assunçao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Caron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Suchanek</surname></persName>
		</author>
		<title level="m">Platypus-A Multilingual Question Answering Platform for Wikidata</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
		<respStmt>
			<orgName>LIP-ENS Lyon</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Wikidata: a free collaborative knowledge base</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">QAnswer: A question answering prototype bridging the gap between a considerable part of the lod cloud and end-users</title>
		<author>
			<persName><forename type="first">D</forename><surname>Diefenbach</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The World Wide Web Conference</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3507" to="3510" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Deeppavlov: Open-source library for dialogue systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Burtsev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2018, System Demonstrations</title>
				<meeting>ACL 2018, System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="122" to="127" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Open domain question answering over knowledge graphs using keyword search, answer type prediction, SPARQL and pre-trained neural models</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nikas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="235" to="251" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">APANTISIS: A greek question-answering system for knowledge-base exploration</title>
		<author>
			<persName><forename type="first">E</forename><surname>Marakakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kondylakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Aris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Strategic Innovative Marketing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="501" to="510" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Qald-9-plus: A multilingual dataset for question answering over DBpedia and Wikidata translated by native speakers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Perevalov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 16th ICSC, IEEE</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="229" to="234" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Usbeck</surname></persName>
		</author>
		<title level="m">Semantic web evaluation challenge</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="59" to="69" />
		</imprint>
	</monogr>
	<note>Qald-7</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A survey on question answering systems over linked data and documents</title>
		<author>
			<persName><forename type="first">E</forename><surname>Dimitrakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sgontzos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of intelligent information systems</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="233" to="259" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Eriguchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hashimoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tsuruoka</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.03275</idno>
		<title level="m">Multilingual extractive reading comprehension by runtime machine translation</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Just ask! evaluating machine translation by asking and answering questions</title>
		<author>
			<persName><forename type="first">M</forename><surname>Krubiński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ghadery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pecina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Conference on Machine Translation</title>
				<meeting>the Sixth Conference on Machine Translation</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="495" to="506" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Opus-mt-building open translation services for the world</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd Annual Conference of the European Association for Machine Translation</title>
				<meeting>the 22nd Annual Conference of the European Association for Machine Translation</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">From TagME to WAT: a new entity annotator</title>
		<author>
			<persName><forename type="first">F</forename><surname>Piccinno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ferragina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the first international workshop on Entity recognition &amp; disambiguation</title>
				<meeting>the first international workshop on Entity recognition &amp; disambiguation</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="55" to="62" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Linking entities from text to hundreds of RDF datasets for enabling large scale entity enrichment</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mountantonakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A survey on machine reading comprehension-tasks, evaluation metrics and benchmark datasets</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zeng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">7640</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">On the cross-lingual transferability of monolingual representations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Artetxe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yogatama</surname></persName>
		</author>
		<idno>CoRR abs/1910.11856</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>a r X i v : 1 9 1 0 . 1 1 8 5 6</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">LODsyndesis: global scale knowledge services</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mountantonakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Heritage</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">23</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
