<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Two-stage Semantic Answer Type Prediction for Question Answering using BERT and Class-Specificity Rewarding</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christos</forename><surname>Nikas</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Information Systems Laboratory</orgName>
								<orgName type="institution">FORTH-ICS</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pavlos</forename><surname>Fafalios</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Information Systems Laboratory</orgName>
								<orgName type="institution">FORTH-ICS</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yannis</forename><surname>Tzitzikas</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Information Systems Laboratory</orgName>
								<orgName type="institution">FORTH-ICS</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<settlement>Heraklion</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Two-stage Semantic Answer Type Prediction for Question Answering using BERT and Class-Specificity Rewarding</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">6006FCDB8B013EC3E431A30EF47D58E7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:16+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Answer type prediction is a key task in Question Answering (QA) that aims at predicting the type of the expected answer for a user query expressed in natural language. In this paper we focus on semantic answer type prediction where the candidate types come from a class hierarchy of a general-purpose ontology. We model the problem as a two-stage pipeline of sequence classification tasks (answer category prediction, answer literal/resource type prediction), each one making use of a fine-tuned BERT classifier. To cope with the harder problem of answer resource type prediction, we enrich the BERT classifier with a rewarding mechanism that favors the more specific ontology classes that are low in the class hierarchy. The results of an experimental evaluation using the DBpedia class hierarchy (∼760 classes) demonstrate a superior performance of answer category prediction (∼96% accuracy) and literal type prediction (∼99% accuracy), and a satisfactory performance of resource type prediction (∼78% lenient NDCG@5).</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Question Answering (QA) is a task in the field of Natural Language Processing and Information Retrieval that aims at automatically answering a question posed by a human in a natural language <ref type="bibr" target="#b3">[4]</ref>. An important sub-task of QA is the prediction of the type of the expected answer based only on the user question. The majority of existing approaches on this task considers a set of coarse-grained question types, usually less than 50. However, this is quite restrictive for the general case of cross-domain QA where the number of types is very large.</p><p>In this paper, we focus on a two-stage answer type prediction task where a first step aims at finding the general category of the answer (resource, literal, boolean), while a second step tries to predict the particular literal answer type (number, date, or string, if the predicted category of the first step is literal ), or the particular resource class (if the predicted category of the first step is resource). We consider the case where the resource classes belong to a rich class hierarchy of an ontology containing a large number of classes (e.g., &gt;500), and model the problem as a set of sequence classification tasks, each one making use of a finetuned BERT model. For the more fine-grained (and thus more challenging) task of resource class prediction, we propose to enrich the BERT classifier with a rewarding mechanism that favors the more specific ontology classes that are low in the class hierarchy. Fig. <ref type="figure" target="#fig_0">1</ref> depicts this two-stage answer prediction task, the classifiers we use in each different sub-task, and the accuracy of the obtained results. The evaluation results using the DBpedia class hierarchy (∼760 classes) and a ground truth of 40,393 train questions for category prediction, 17,571 for resource/literal type prediction, and 4,393 test questions demonstrate the high performance of our approach. Specifically, we achieve 96.2% accuracy on answer category prediction, 99.2% accuracy on literal type prediction, and 77.7% NDCG@5 on resource type ranking.</p><p>The rest of the paper is organized as follows: §2 describes the context, §3 describes our approach, §4 reports the results of the evaluation, and finally, §5 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Context and Datasets</head><p>The context of this work is the SMART (SeMantic AnsweR Type) challenge of ISWC 2020 1 <ref type="bibr" target="#b7">[8]</ref>. Given a question in natural language, the challenge is to predict the type of the answer using a set of candidates. The problem is modeled as a two-stage classification task: in the first step the task is to predict the general category of the answer (resource, literal, or boolean), while in the second step the task is to predict the particular answer type (number, date, string, or a particular resource class from a target ontology).</p><p>Two datasets are provided for this task, one using the DBpedia ontology and the other using the Wikidata ontology. Both follow the below structure: Each 1 https://iswc2020.semanticweb.org/program/semantic-web-challenges/ question has a (a) question id, (b) question text in natural language, (c) an answer category (resource/literal /boolean), and (d) answer type. If the category is resource, answer types are ontology classes from either the DBpedia ontology (∼760 classes) or the Wikidata ontology (∼ 50K classes). If the category is literal, answer types are either number, date, or string. Finally, if the category is boolean, answer type is always boolean.</p><p>An excerpt from this dataset is shown below:</p><p>[ { "id": "dbpedia_14427", "question": "What is the name of the opera based on Twelfth Night?", "category": "resource", "type": ["dbo:Opera", "dbo:MusicalWork", "dbo:Work" ] },{ "id": "dbpedia_23480", "question": "Do Prince Harry and Prince William have the same parents?", "category": "boolean", "type":</p><formula xml:id="formula_0">["boolean"] } ]</formula><p>With respect to the size of the datasets, the DBpedia dataset contains 21,964 questions (train: 17,571, test: 4,393) and the Wikidata dataset contains 22,822 questions (train: 18,251, test: 4,571). The DBpedia training set consists of 9,584 resource, 2,799 boolean, and 5,188 literal questions. The Wikidata training set consists of 11,683 resource, 2,139 boolean, and 4,429 literal questions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Approach</head><p>Here we describe our approach for answer type prediction: in §3.1 we provide some background, in §3.2 we describe question category prediction, in §3. <ref type="bibr" target="#b2">3</ref> we describe literal answer type prediction, and in §3. <ref type="bibr" target="#b3">4</ref> we describe resource answer type prediction. The models and code are publicly available at: https://github. com/cnikas/isl-smart-task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">BERT for Sequence Classification</head><p>BERT <ref type="bibr" target="#b2">[3]</ref>, or Bidirectional Encoder Representations from Transformers, is a language representation model based on the Transformer model architecture of <ref type="bibr" target="#b10">[11]</ref>. A pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. Because of BERT's massive success and popularity, several methods have been presented to improve BERT on its prediction metrics, by using more data and computational speed <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b11">12]</ref>, or by creating lighter and faster models that compromise on prediction metrics <ref type="bibr" target="#b9">[10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Question Category Prediction</head><p>A question can belong to one of the following three categories: (1) boolean, (2) literal, (3) resource. Boolean questions (also referred to as Confirmation questions) only have 'yes' or 'no' as an answer (e.g. "Does the Owyhee river flow into Oregon?"). Thus, there is no further classification for this category of questions. Resource questions have a specific fact as an answer (e.g. "What is the highest mountain in Italy?") that can be described by a class in an ontology (e.g. http://dbpedia.org/ontology/Mountain). Literal questions have a literal value as answer, which can be a number, string, or date (e.g. "Which is the cruise speed of the airbus A340?").</p><p>To detect question categories, we fine-tune a BERT model using the Huggingface PyTorch implementation<ref type="foot" target="#foot_0">2</ref> . We choose this model because we approach answer type prediction as a classification problem where each question is a sequence of words. To fine tune BERT we used the training datasets provided for the SMART challenge (described in §2). Specifically, we used questions from both the DBpedia and the Wikidata dataset. Because the data is imbalanced for categories (13.7% boolean, 26.6% literal, 59.4% resource) we randomly sampled questions for each class so that all classes had the same number of samples.</p><p>As we will see below, this model achieves 96.2% accuracy on our test set in this prediction task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Literal Answer Type Prediction</head><p>The answer type for questions that belong in the literal category can be: 1) a date, i.e. a literal value that describes a date, 2) number, i.e. a numeric value, or 3) a string, i.e. a text value. Due to the small number of classes (3), it is very effective to train a language model. We again use a fined-tuned BERT model to classify literal questions in one of the 3 types. Similar to question category prediction, we used questions from both the DBpedia and the Wikidata dataset and also randomly sampled questions for each class to cope with class imbalance (29.1% date, 27.3% number, 43.6% string). As we will see, the model achieves 99.2% accuracy for literal questions in our test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Resource Answer Type Prediction</head><p>The prediction of the answer type of questions in the resource category is a more fine-grained (and thus more challenging) classification problem, because of the large number of types a question can be classified to (∼760 classes on DBpedia and ∼50K classes on Wikidata). Therefore, it is not effective to train a classifier on all the ontology classes, especially for open-domain tasks.</p><p>To reduce the number of possible types for classification, we selected a subset (C) of all ontology classes, based on the number of samples of each class in the training set. This subset C contains classes that have at least k occurrences in the training set. We set k = 10 as this number provides a good trade-off between number of classes and performance. <ref type="foot" target="#foot_1">3</ref> The choice of this parameter is described more extensively in section 4.2. The final number of classes in C is 88. Because we chose to train the system on a subset of all the classes, our classifier cannot handle questions with labels that are not included in this subset. To tackle this problem, we replace their labels with the labels of super classes that belong in C. Then we fine tuned a BERT model on them.</p><p>Since most questions in the dataset have several answer types ordered by specificity, according to the semantic hierarchy formed in the ontology, in the fine tuning stage we use these questions multiple times, one with each of the provided types as the label. The goal is to find an answer type that is as specific as possible for the question. However, the model may classify a question to a more general answer type in the ontology. To tackle this problem, we 'reward' (inspired by <ref type="bibr" target="#b1">[2]</ref>), the predictions of the classes that lie below the top class. The reward of a class c is measured by the depth of the class in the hierarchy, specifically, reward(c) = depth(c)/depth M ax , where depth(c) is the depth of c in its hierarchy, while depth M ax is the maximum depth of the ontology (6 for DBpedia). This means that, after applying normalization and adding the rewards on the output of the model, the top class can be a sub-class that was originally ranked below a more general class. For example, for the question "What is the television show whose company is Playtone and written by Erik Jendresen?" the top 5 classes that the classifier predicts are: 1) Work, 2) TelevisionShow, 3) Film, 4) MusicalWork, 5) WrittenWork. Then rewards are applied to classes that are a subclass of Work. After applying the rewards, the top 5 classes are: 1) TelevisionShow, 2) Work, 3) Film, 4) Book, 5) MusicalWork. We can see that TelevisionShow, is now the top prediction, which is both correct and more specific than the previous top prediction (Work).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Evaluation Metrics</head><p>We report results for the following metrics:</p><p>-Accuracy, for category prediction (the percentage of questions classified in the correct category). -Precision, for type prediction (the percentage of the questions for which the top type found by the system was one of the types provided in the test dataset, without considering type specificity). -Lenient NDCG@k (with a Linear decay) <ref type="bibr" target="#b0">[1]</ref>, for resource type prediction. Lenient NDCG@k, which has been introduced in <ref type="bibr" target="#b0">[1]</ref>, measures the distance between the predicted type and the most specific type of the answer d(t, t q ). Then it converts this distance into a Gain measure, with a linear decay function. The gain is calculated as: G(t) = 1 − d(t, t q )/6, where 6 is the maximum depth of the hierarchy. For example, for the question "Which company founded by Fusajiro Yamauchi gives service as Nintendo Network?", the top 5 classes found as the answer type by our system are: 'dbo:Company', 'dbo:Organisation', 'dbo:University', 'dbo:Agent', 'dbo:RecordLabel' (in this order). The true types specified on the dataset are: 'dbo:Company', 'dbo:Organisation', 'dbo:Agent'. The most specific of these 3 classes is 'dbo:Company', so we calculate the gain for each type found by our system using the distance from the class 'dbo:Company'. Then we compute DCG as: DCG p = gain 1 + p i=2 gaini log 2 i . We also compute the ideal DCG (iDCG) using the gains of the correct types provided in the dataset, and normalized DCG (nDCG) as DCG iDCG . Finally we compute and report the average nDCG over all questions in the test dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results on split of the DBpedia training set</head><p>Initially, we had no access to the final test dataset of the SMART challenge, so we used 90% of the DBpedia training set<ref type="foot" target="#foot_3">4</ref> as our training dataset and the remaining 10% as our test dataset. For category prediction and literal type prediction we also use the questions from the training dataset for Wikidata for training the classifiers. Our approach achieved the results shown in Table <ref type="table" target="#tab_0">1</ref>. We notice a superior performance of category prediction (96.4% accuracy) and a very high performance of type prediction (83% precision and 79% lenient NDCG@5).</p><p>Running the same experiments without the rewarding mechanism, we notice an around 2% drop in the performance (Lenient NDCG) of literal/resource type prediction. Tuning of the k parameter To find the optimal value for the parameter k, which is the minimum sample size required to include a class in the subset of classes included in the classifier, we evaluated our system using 4 different values: 5, 10, 30 and 50. Table <ref type="table" target="#tab_1">2</ref> shows the number of classes included in the classifier for each different value of k and the corresponding performance. We notice that the best results are obtained using k=10, while the results for all other cases are slightly worse. Error analysis. To better understand the classification performance of category prediction, literal type prediction, and resource type prediction, we inspected their confusion matrices. The results are shown in Table <ref type="table" target="#tab_2">3</ref>. As regards category prediction, we see that our system classifies in the correct category 99% of the boolean questions, 92% of the literal questions, and 98% of the resource questions. For literal type classification, our system classifies in the correct type 98.4% of date questions, 99.5% of number questions, and 99.5% of string questions. We notice that, for category prediction, most errors occur between the classes literal and resource. For instance, 41 questions of literal type are misclassified as of type resource. As regards resource type prediction, the table shows the confusion matrix for the top-5 (most frequent) resource classes. We notice that there is significant confusion between the classes City and Country, as well as between the class Person and other classes. By manually inspecting several of the misclassification cases, we noticed that some of these errors occur on questions where the correct category is very ambiguous, such as the question "In what area is Fernandel buried at the Passy Cemetery?" (labeled as a literal question with type 'string', while our system classifies it as a resource question of type 'dbo:Place'), or the type provided in the dataset is wrong, e.g. the question "What did the pupil of Mencius die of ?" is labeled as a literal question with type 'date', while our system predicts that the question category is resource and 'dbo:Disease' is one of the predicted classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Results over the final DBpedia test set</head><p>After the final test dataset was released, we evaluated our system again, using the script provided by the challenge organizers. We obtain the results shown in Table <ref type="table" target="#tab_3">4</ref> (using k=30). We notice that the results are very close to those reported for the split on the training dataset (cf. Table <ref type="table" target="#tab_0">1</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Efficiency</head><p>Fine-tuning. We fine-tuned the models on Google Colab<ref type="foot" target="#foot_4">5</ref> , a Jupyter notebook environment that runs in the cloud and offers access to GPUs. With a batch size of 32, number of epochs set to 3 and using an Nvidia Tesla K80 GPU, the time required for fine-tuning each classifier is: 49 mins and 25 secs for the resource question type classifier, using 26,259 questions, 27 mins and 51 secs for the question category classifier, using 14,814 questions, and 15 mins and 3 secs for the literal question type classifier, using 8,025 questions. Execution. To classify a question into a category and predict its answer type, we execute the system locally on a machine with 2 cores and 8 GB of RAM, without using a GPU. While the system is running, it requires approximately 2.3 GB of RAM to load the 3 classifiers in memory. This means that the proposed approach has low main memory requirements. Moreover, this memory footprint can be further reduced if we use a smaller and lighter language model, such as DistilBERT <ref type="bibr" target="#b9">[10]</ref>, while sacrificing a small percentage of accuracy. The time required to classify a single question is less than a second (0.17 seconds on average), which is important for the application context that we have in mind (more below). To obtain the system output required to evaluate our system for the SMART challenge, we classified each one of the 4,381 questions provided in the test set sequentially. The process took 12 minutes and 24 seconds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Application Context</head><p>We plan to integrate the proposed classification models in the Question Answering module of Elas4RDF <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b4">5]</ref>, a keyword search system where users can input questions as queries and receive answers in real time according to various perspectives; one of them is the "QA perspective". Screenshots of the system for the query "Greek philosopher from Athens who is credited as one of the founders of Western philosophy" are shown in Figure <ref type="figure" target="#fig_1">2</ref>.</p><p>Moreover the classification model presented in this paper can be exploited also in the "Schema perspective", that shows the classes of the top-ranked triples (for allowing the user to refine as she wishes to), in order to promote (or just mark) the class that corresponds to the predicted answer type. A demo of Elas4RDF over DBpedia <ref type="bibr" target="#b5">[6]</ref> is publicly accessible at: https: //demos.isl.ics.forth.gr/elas4rdf/.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Concluding Remarks</head><p>We have presented an approach for semantic answer type prediction, an important sub-task of QA which splits the problem into a two-stage pipeline of classification tasks: answer category prediction and answer literal/resource type prediction. We model the problem as a set of sequence classification tasks, each one making use of a fine-tuned BERT classifier. For the more fine-grained (and more challenging) problem of answer resource type prediction (since the classes can be hundreds or thousands), we have proposed the enrichment of the BERT model with a rewarding mechanism that considers the hierarchy of the ontology classes, favoring the more specific classes that are low in the class hierarchy. The evaluation results demonstrated the performance of the proposed method, achieving &gt;96% accuracy in predicting the general answer category, &gt;98% accuracy in predicting the literal type, and &gt;77% NCDG@5 in ranking the predicted resource classes.</p><p>Our results showcase that it is feasible to achieve fine grained answer type prediction with very high precision and without expensive computations.</p><p>Issues that are worth further research include: methods for fine-tuning the parameter k that determines the minimum amount of training data needed to obtain a certain degree of performance, and evaluating the rewarding scheme in different datasets, e.g. in knowledge bases that have ontologies with more deep class hierarchies.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Two-stage answer type prediction for QA and performance of our proposed methods.</figDesc><graphic coords="2,147.74,113.00,316.81,163.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Application Context: Elas4RDF</figDesc><graphic coords="9,147.74,115.83,316.80,177.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Evaluation</figDesc><table><row><cell>results</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Results for different values of k</figDesc><table><row><cell cols="4">Value Classes NDCG@5 NDCG@10</cell></row><row><cell>5</cell><cell>180</cell><cell>0.775</cell><cell>0.765</cell></row><row><cell>10</cell><cell>151</cell><cell>0.786</cell><cell>0.778</cell></row><row><cell>30</cell><cell>79</cell><cell>0.785</cell><cell>0.772</cell></row><row><cell>50</cell><cell>55</cell><cell>0.785</cell><cell>0.748</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Confusion matrices for category (top left), literal (top right), and resource (bottom) type prediction.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Actual</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Actual</cell></row><row><cell></cell><cell cols="6">Boolean Literal Resource Sum</cell><cell></cell><cell></cell><cell cols="2">Date Number String Sum</cell></row><row><cell>Predicted</cell><cell>Boolean Literal Resource Sum</cell><cell>287 1 2 290</cell><cell></cell><cell>2 497 41 540</cell><cell>5 13 905 923</cell><cell>294 511 948 1753</cell><cell>Predicted</cell><cell cols="2">Date 120 Number 2 String 0 Sum 122</cell><cell>0 182 1 183</cell><cell>0 120 1 185 191 192 192 497</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Actual</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="8">Person City Country Award Organisation Other</cell></row><row><cell>Predicted</cell><cell>Person City Country Award</cell><cell cols="2">148 3 4 1</cell><cell>4 67 2 0</cell><cell>3 16 42 0</cell><cell>3 0 0 37</cell><cell></cell><cell>0 0 0 0</cell><cell>86 23 17 0</cell></row><row><cell></cell><cell cols="2">Organization</cell><cell>1</cell><cell>2</cell><cell>5</cell><cell>1</cell><cell></cell><cell>32</cell><cell>42</cell></row><row><cell></cell><cell>Other</cell><cell></cell><cell>15</cell><cell>1</cell><cell>8</cell><cell>3</cell><cell></cell><cell>6</cell><cell>351</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Evaluation results over the final test set</figDesc><table><row><cell>Accuracy (category prediction)</cell><cell>0.962</cell></row><row><cell>Lenient NDCG@5 with linear decay (literal/resource type prediction)</cell><cell>0.777</cell></row><row><cell>Lenient NDCG@10 with linear decay (literal/resource type prediction)</cell><cell>0.762</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://huggingface.co/transformers/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">For the SMART challenge, we had submitted our outputs using k = 30. After further experiments on the training dataset, we changed this value to 10 (more in Sect.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">.2).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/smart-task/smart-dataset/tree/master/datasets/ DBpedia</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://colab.research.google.com/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Hierarchical target type identification for entity-oriented queries</title>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Neumayer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st ACM international conference on Information and knowledge management</title>
				<meeting>the 21st ACM international conference on Information and knowledge management</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="2391" to="2394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="page" from="3450" to="3457" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A survey on question answering systems over linked data and documents</title>
		<author>
			<persName><forename type="first">E</forename><surname>Dimitrakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sgontzos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Intelligent Information Systems</title>
		<imprint>
			<biblScope unit="page" from="1" to="27" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Keyword Search over RDF using Document-centric Information Retrieval Systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kadilierakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papadakos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference</title>
				<imprint>
			<publisher>ESWC</publisher>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Elas4RDF: Multi-perspective triple-centered keyword search over RDF using elasticsearch</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kadilierakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Nikas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papadakos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference (ESWC) -Posters &amp; Demonstrations Track</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized bert pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Se</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dubey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gliozzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C N</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Usbeck</surname></persName>
		</author>
		<idno>CoRR/arXiv abs/2012.00555</idno>
		<ptr target="https://arxiv.org/abs/2012.00555" />
	</analytic>
	<monogr>
		<title level="m">mantic Web Challenge</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Keyword Search over RDF: Is a Single Perspective Enough? Big Data and</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nikas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kadilierakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
		<ptr target="https://www.mdpi.com/2504-2289/4/3/22" />
	</analytic>
	<monogr>
		<title level="j">Cognitive Computing</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">22</biblScope>
			<date type="published" when="2020-08">Aug 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</title>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Xlnet: Generalized autoregressive pretraining for language understanding</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
