=Paper=
{{Paper
|id=Vol-2774/paper-07
|storemode=property
|title=Fine and Ultra-Fine Entity Type Embeddings for Question Answering
|pdfUrl=https://ceur-ws.org/Vol-2774/paper-07.pdf
|volume=Vol-2774
|authors=Sai Vallurupalli,Jennifer Sleeman,Tim Finin
|dblpUrl=https://dblp.org/rec/conf/semweb/VallurupalliSF20
}}
==Fine and Ultra-Fine Entity Type Embeddings for Question Answering==
Fine and Ultra-Fine Entity Type Embeddings for Question Answering Sai Vallurupalli, Jennifer Sleeman, and Tim Finin University of Maryland at Baltimore County, Baltimore, MD 21250 USA kolli,jsleem1,finin@umbc.edu Abstract. We describe our system for the SeMantic AnsweR (SMART) Type prediction task 2020 for both the DBpedia and Wikidata Ques- tion Answer Type datasets. The SMART task challenge introduced fine- grained and ultra-fine entity typing to question answering by releasing two datasets for question classification using DBpedia and Wikidata classes. We propose a flexible approach for both entity types using para- graph vectors and word embeddings to obtain high quality contextualized question representations. We augment the document similarity provided by paragraph vectors with semantic modeling and sentence alignment using word embeddings. For the answer category prediction, we achieved a maximum accuracy score of 85% for Wikidata and 88.5% for DBpedia. For the answer types prediction, we achieved a maximum MRR of 40% for Wikidata and a maximum nDCG@5 of 54.8% for DBpedia datasets. Keywords: Word Embedding · Document Embedding · Paragraph Vec- tors · Fine-Grained Entity Typing · Ultra-Fine Entity Typing. 1 Introduction To further research in the area of question answering using entity types, the 19th International Semantic Web Conference (ISWC) 2020 has put forward the Se- mantic Answer Type (SMART) prediction task challenge [18]. Two new datasets were released with the goal of classifying the questions into hundreds of fine and thousands of ultra-fine types using DBpedia and Wikidata ontologies. The two SMART challenge datasets include a total of 44,786 questions which are more varied than the short and single sentence factoid questions from the UIUC and the TREC QA datasets [14, 25]. The task goal is a dual classification where each question is assigned a single answer category, and an unknown number of answer types of the expected answer. While this task is considered a short-text classifi- cation, what makes the classification challenging is a few unique characteristics of the datasets which contribute to data sparsity. 0 Copyrightc c 2020 for this paper by its authors. Use permitted under CreativeCom- mons License Attribution 4.0 International (CC BY 4.0). 2 S Vallurupalli et al. Word Frequencies and Distribution. The questions are composed mainly of simple high frequency words, some of which are considered to be stop words such as: the, and, of, etc. And, these simple high frequency words are distributed uniformly among all questions. For example, all three question categories contain 90% of the top 100 words and 80% of the top 500 frequent words. Answer Type Label Distribution. Answer type labels are not uniformly distributed in the training set, with two-thirds of the labels having less than five training samples. These long tailed distributions for both the datasets are shown in Figure 1. For two-thirds of the data, when no external resources are available for training, the classification task presents similar challenges to that of a low resource setting, needing data generalizations and augmentation. Fig. 1: Answer Type Label distribution - DBpedia Dataset (top) and Wikidata Dataset (bottom) Question Structure and Label Assignment. Sentence grammar and struc- ture are varied, resulting in noisy constituency and dependency parses. The gold Fine and Ultra-Fine Entity Type Embeddings for Question Answering 3 labels in the DBpedia dataset do not always include all the labels in the hierar- chy. In addition, a question can be labeled with multiple answer types, with the number of labels being unknown apriori. Furthermore, the challenge evaluation is dependent on the correct number and ordering of these labels. In this paper, we describe, a novel and flexible approach, that builds upon the seminal work of distributed representations of paragraphs, sentences, phrases and words [19, 11]. The methodology we use in this approach includes training a Doc2vec [11] and Word2vec [19] model on the questions in the training datasets. These models enable the extraction of novel syntactic, grammatical and dis- tributed representations from the question semantics. Using these semantic rep- resentations, nested filtering is applied to the top-N documents inferred by the trained Doc2vec model, with the overall goal of improving the accuracy of the dual classification. The completely unsupervised training of the models, and the semantic representations, leverage the unique aspects of the data to deal with data sparsity. The nested filtering, and our novel method of computing subject similarity proved to be highly effective in handling sparse data. Our method generalizes well, such that it can be applied to both datasets. It is flexible and can be used for both DBpedia and Wikidata class labels. It offers an additional benefit, in that, it can incorporate external data for further refinement. 2 Background Early question classification systems were rule based; a question was matched to a category, based on hand crafted rules. Rule based systems are resource intensive, and do not scale well. They break when a single question is reformu- lated using different words and sentence structures [14]. This led to supervised machine learning methods using feature sets and statistical techniques. SVM classifiers were used to classify semantic features such as bag-of-words, ngrams, wh-words, head words, and hypernyms extracted from WordNet [16, 7, 21]. Su- pervised learning requires expert knowledge to extract features and create cu- rated datasets. Unsupervised learning addresses this with neural models, which automatically extract useful features, when trained on large amounts of data. Distributed Representation of words and phrases (word embeddings) [19] and distributed representations of documents and paragraphs (Paragraph Vectors) [11] are unsupervised learning models called Word2vec and Doc2vec respectively. These shallow neural models are trained to represent words or documents as n-dimensional vectors (aka embeddings). The embeddings produced by these models are effective in a wide variety of applications [2, 6, 27, 8, 3]. In this section, we briefly provide the theoretical basis for these models. 2.1 Word Embeddings. Word embeddings are used as a neural network classifier trained to learn the surrounding words within a fixed window on either side of a word, aka the word’s context. There are two models based on whether the given word predicts 4 S Vallurupalli et al. the context as in the skipgram model, or the context predicts the given word as in the continuous bag of words (CBOW) model. In the latter model, the context is obtained by averaging the vectors of all the words in the context. While this averaging is thought to potentially derive better representations, in practice, the skipgram model performs better on most word similarity tasks [13, 15, 26]. We use the skipgram model which works as follows: Every word is assigned a vector of dimension D. For a training corpus con- sisting of C contexts, given a word w, and its context c, the model objective is to maximize the conditional log probability for the corpus: X argmaxθ logp(c|w, θ) (1) (w,c)∈C If we denote the vector for w as vw , and the vector for the context as vc , the conditional probability is represented as: evc .vw p(c|w, θ) = P vc0 .vw (2) c0 ∈C e Instead of training on all contexts C, negative sampling is used to train on a subset sampled from C. While this changes the objective function (similar to eq. 3), essentially, after training on a number of words and contexts, the model increases the quantity vc .vw for words that share contexts, and decreases it for words that do not share contexts. Any two words sharing similar contexts, and any two contexts sharing many words will end up with similar vectors, resulting in a high cosine similarity between the vectors. 2.2 Paragraph Vectors. Paragraph vectors are used as a neural classifier for question sentences to learn words in a question. Of the two paragraph vector models, distributed memory (DM) and distributed bag of words (DBOW), we used the DBOW model. It was shown to perform better in semantic similarity tasks [10] and does not suffer the curse of dimensionality as much as the DM model. This model is similar to the skipgram Word2vec model described above, where a document, instead of a word, is used to predict the context. The model objective is to maximize the conditional log probability p(c|d, θ) for a corpus C, where vc is the context vector and vd is the document vector: X 1 argmaxθ log (3) 1 + evc .vd (d,c)∈C 3 Related Work Most of the related work identify entity types for the given entity mentions to improve a downstream task of question answering. We highlight a few of these Fine and Ultra-Fine Entity Type Embeddings for Question Answering 5 methods that offer incremental improvements and we describe how our method is different from these previous approaches. Sun et al. [24] used Freebase en- tity types to rank answer candidates for a given question. However, they used a search engine to retrieve sentences related to a question, for which they applied entity linking to extract entities. They used the answer candidates and Freebase for entity typing. Since our method uses low-dimensional embedding models, we are able to achieve a richer understanding of the context of the questions. Dong et al. [4] used 22 different types from DBpedia to classify entity mentions in questions with two methods. They combined the context representations ob- tained from a multilayer perceptron model, and the vector representations of entity mentions obtained using a recurrent neural network, to predict type in- formation. Yavuz et al. [29] built upon the work of [4] using type information to improve semantic parsing for question answering. The semantic parsing compo- nent maps the natural form of a question to an abstract semantic representation of the question by replacing entity mentions with type information. They train a bidirectional LSTM on the abstract forms of questions to infer answer types. In a recent work, Choi et al. [1] proposed a bidirectional LSTM for predicting natural language phrases describing the entity mentions in a given sentence. We believe we can achieve sufficient answer typing with our approach which includes both a word and a document embedding. Not only does our method offer con- textual understanding similar to previous work, our unique combination of the two embedding models offers more flexibility, and is able to work on more varied types of sentences. 4 Methodology The Distributional Hypothesis [5] proposes the grouping of entities that share similar distributional properties. Such entities include ”representations of how words are used in natural context” [9]. We extend this idea of similar repre- sentations to the distributional representation of questions. i.e., similar question sentence representations share similar labels. For example, questions of the form ”Who is the president of United States?”, and ”Who is the president of Canada?” share a similar sentential representation, and similar distributional representa- tions. We posit that such similar questions tend to share category and type labels. Our method is based on a unique collaborative combination of both Word2vec and Doc2vec models [11, 19], designed to achieve better contextual understand- ing of questions. Our trained Doc2vec1 model builds distributional representa- tions of questions. Our trained Word2vec model2 helps with contextualizing the expected answer category and type. We train both models using the python gen- sim package. Training in both models is unsupervised. Both models are trained on the training questions without the gold labels. Gold labels are used later, during inference. The Doc2vec model is trained with each question treated as 1 doc2vec parameters: vector size = 300, hs = 0, negative = 5, epochs = 50 2 word2vec parameters: vector size = 300, window = 10, negative = 5, iter = 50 6 S Vallurupalli et al. a document. The trained model is used to map a given test question to the document embedding space to find similar training questions. Effective training of neural models requires a large amount of training data. With limited data, semantically similar documents tend to be noisy. High variability in sentence structures, and sizes adds to the noise. To help filter the noise, the top N sim- ilar training questions in the embedding space are filtered using syntactic and grammatical modelling applied with the Word2vec model. We select syntactic and semantic feature words which aid in aligning ques- tions of similar category and types. We group these words into three types: Q-word, Action words and Anchor words. Using word vectors which we obtain from training a Word2vec model with the question sentences, we assign a Q- word and a similarity vector to every question. For a test question, we filter the top N similarity matches obtained from the Doc2vec model into two groups. The first group consists of questions with the same Q-word as the test question. The second group consists of questions where the subject vector of the question has a high cosine similarity with the subject vector of the test question. The first group is used to find the answer category, and the second group to find the answer types. We describe our approach as a dual classification framework as shown in Fig. 2, and applicable to both the datasets, except for, inducing a hierarchy for the gold answer type labels. Fig. 2: The Generalized Fine and Ultra-Fine Entity Type Embeddings Frame- work. 4.1 Semantic Processing and Induced Type Hierarchy. The first step in the framework is the pre-processing step which encompasses a pipeline constructed using Stanza [20], and an inducing of DBpedia type hi- erarchy. The pipeline includes a tokenizer, lemmatizer, POS tagger and depen- dency parser. We select a few syntactic and parts of speech (POS) tags for use in our filtering. From the dependency parse of a question we obtain root and subject/object (nsub, nsubj:pass, csubj, csubj:pass and obj) words, which con- tribute most to the answer type of a question (aka Action words). We select words from the Universal POS categories of ADP, AUX, DET and PART, which con- tribute most to the basic sentential structure of a question (aka Anchor words). Since DBpedia types form a hierarchical tree, we induce a hierarchy from the Fine and Ultra-Fine Entity Type Embeddings for Question Answering 7 gold types for each question. A hierarchical path helps include any missing types in the gold labels. Since type labels used in the wikidata dataset are not part of a hierarchy, we did not induce a hierarchy from the gold answer type labels. 4.2 Linking to External Sources. An important component of this framework is identifying entity mentions and replacing them with a type hierarchy. This normalizes questions into abstract forms aiding the Doc2vec model embed similar types of questions closer in the document embedding space. For identifying entity mentions, we collected n- grams (1 to 12) from question sentences and linked to two external sources: pretrained wikipedia2vec word embeddings [28] and a collection of 3.6 million DBpedia entity names with their associated types, collected from DBpedia us- ing SPARQL queries. We performed wikification, the process of identifying en- tity mentions in text by checking against the titles of wikipedia entries [17]. We also performed typification, our novel contribution, similar to wikification, where entity mentions with an associated type in the DBpedia hierarchy are normalized to a generic form. We build on ideas from our previous work which used the identification of fine-grained entity types for improved entity corefer- ence resolution [22, 23]. This normalization to a smaller set of types instead of a large number of noun forms abstracts the question into a more generalized form which helps reduce some of the data sparsity. For example, after typification, a question sentence ”Who is the president of United States?” is transformed to ”who is the Thing Agent Person of Thing Place PopulatedPlace Country Thing Agent Person ? where president and united states, are replaced by their hierarchical DBpedia types. This example also illustrates the noise introduced by DBpedia types: the entity united states is assigned the types: Thing, Place, PopulatedPlace, Country and Person, which resulted in the two induced types where the Thing Agent Person is the noise. DBpedia entities were assigned types through distance learning and these tend to be noisy. While the general- ization helps, the noise does not. By inducing a hierarchy, we account for missing types such as Agent in this example, and reduce the type count from 5 to 2. 4.3 Word Embeddings. Instead of using lexical understanding, we use similarity measures obtained from word embeddings trained on the questions, to aid in filtering. The answer cate- gory is highly dependent on the question words. For example, questions starting with a Is or Does always expect a boolean answer category, whereas, questions starting with when expect either a number or a date, a literal answer category. To reduce noise, we assign a single lemmatized question word, a Q-word, to each question, if it falls into our predefined list of question words shown in x-axis of Fig. 3. About 10-15% of questions do not start with a question word. For these questions, we infer a Q-word using our trained Word2vec model. For example, given the sentence ”After what is marathon named and what is the current record?”, since the first word is not a question word, we use our word2vec model 8 S Vallurupalli et al. to infer the Q-word ”what”. The goal is to group questions into bins based on Q-word. The Q-word assigned to a question is used as a filter to predict the answer category. Fig. 3: The distribution of Question words. To study the efficacy of Q-word inference with our word2vec model, we trained the model on question sentences from the training set. Using the model, we inferred the Q-word for two groups of questions from the test set: a) questions which start with a question word b) questions that do not start with a ques- tion word, but contain one or more question words. During inference, the entire question is used as the context; the question word with the highest cosine simi- larity (only those question words with a frequency > 500 were considered), with respect to the context, is assigned as the Q-word. This assignment is assumed to be correct if it is the same as the question word in the test question. With this model, the prediction accuracy on questions which start with a question word was 80%, and 60% for the questions which contain one or more question words. These results were the same for both datasets (tested individually). The answer type is highly dependent on the Action and Anchor words. We leverage the trained Word2vec model for Q-words to obtain word vectors for the Action and Anchor words. In addition to the Anchor words belonging to the selected POS categories, we added two manually constructed word categories: a list of words that refer to a date and a list of words that refer to a number. We average the word vectors of Action and Anchor words to obtain a Subject vector for the question. This Subject vector aids in predicting answer types. 4.4 Q-Word and Subject Similarity Filtering. Q-words and subject similarity are used for filtering the set of similar questions found by Doc2vec for a test question, into two lists: questions with the same Q-word as the test question, and questions with a high subject similarity. Sub- ject similarity is the cosine similarity between the subject vector of the similar Fine and Ultra-Fine Entity Type Embeddings for Question Answering 9 question and the test question, weighted by a question sentence length measure. Paragraph Vectors are biased towards shorter documents [10]; weighting based on sentence length eliminates this bias. A higher weight is assigned to questions similar in length to the test question, and a lower weight to shorter and longer questions. After filtering, we attach the gold answer labels for the answer cat- egory, and the answer types to these two lists. The first list is sorted by the document similarity, and second list is sorted by the length weighted document and subject similarity. For both the lists, we only consider the top 10% to reduce noise. 4.5 Inference. To infer answer category and types, the document vector for a given test question from the Doc2vec model is used to obtain similar questions from the embedding space. These are filtered to get the Q-word and subject similarity lists. The inferred category is the gold answer category of the top ranked question in the Q-word list. Inferred types are collected from the gold answer types of questions in the subject similarity list belonging to the inferred category. For Wikidata questions we collect up to 50 answer types, and for DBpedia we collect up to 10 answer types for achieving the best evaluation score. For DBpedia types, we unroll the hierarchy, listing types in the higher levels only once. Reranking with only subject similarity is applied when there are no inferred types. This could happen when action words are used in a different context. For these questions, we change the inferred category to the gold answer type category of the top ranked question in the subject similarity list, and proceed to infer answer types based on the newly inferred category. 4.6 Examples: The following illustrate the output of the Doc2vec model before and after filter- ing. Example 1. Similar questions for a test question before filtering. (Shows the generalized questions) Question Generalized Question What did sub-orbital spaceflight mean what did sub-orbital spaceflight mean for the mission that the crew member for the mission that the crew member Alan Shepard was a part of ? thing agent person astronaut was a part of ? Top 3 Similar Questions found by the Doc2vec model (before filtering) 1. What is the human spaceflight mis- what is the human spaceflight mission that sion that Neil Armstrong was part of? thing agent person astronaut was part of ? 2. Gordon Cooper was the crew mem- thing agent person astronaut was the crew ber for which space launch? member for which space launch? 3. When was Dennis Lillee a member when was thing agent person athlete cricketer of the Tasmanian cricket team? a member of the tasmanian cricket team ? 10 S Vallurupalli et al. Example 2. Similar questions for a test question after filtering is applied. (actual questions instead of the generalized questions are listed for readability) Test Question For what work did Poul Anderson receive the Prometheus Award - Hall of Fame? T op 3 Similar Questions found by the Doc2vec model (before filtering) 1. What is the NCL ID of Cao Xueqin? 2. For what work did W.H. Auden receive the Pulitzer Prize for Poetry? 3. For what work did François Mauriac receive the award Grand Prix du roman de l’Académie française? Q-Word Filtering reduces the list to: 1. For what work did W.H. Auden receive the Pulitzer Prize for Poetry? S ubject Similarity Filtering reduces the list to : 1. For what work did François Mauriac receive the award Grand Prix du roman de l’Académie française? From the unfiltered list, although, question 2 appears to be similar, the generalized question is of a different length when compared to the test question 5 Experiments and Evaluation The SMART task challenge consists of two separate datasets – one for assigning Wikidata type classes and another for assigning DBpedia type classes [12]; each of these datasets consists of a training set and a test set. The training set consists of natural language questions with their corresponding answer category and answer type. The number of training questions for Wikidata is 18,251 and the number for DBpedia is 17,571. The number of test questions for Wikidata was 4,571 and the number of test questions for DBpedia was 4,381. We compared several methods to study the effects of various parameters on inference. For both datasets, we compared the results with and without the use of the external data sources. The baseline for comparison is our Doc2vec model without any additional filtering. For this baseline, both the Q-word and Subject similarity filtered lists are the same. We compared the performance of the filtering with three different settings, with and without using external source data. The three settings are: 1) using subject similarity calculated from average word vectors of the action words only, 2) using subject similarity calculated from average word vectors of the action words and the anchor words, and 3) re-adjusting the answer category when no answer types can be inferred. 5.1 Results. Accuracy is used for evaluating the answer category. MRR is used for evaluating answer types from the Wikidata classes. Lenient NDCG@5 and NDCG@10 with a linear decay [17] are used for evaluating the answer types from the DBpedia classes. Accuracy, MRR and NDCG values for the various settings for both the datasets are listed in Table 1. Fine and Ultra-Fine Entity Type Embeddings for Question Answering 11 Experimental Settings Wikidata Dbpedia Accuracy MRR Accuracy NDCG@5 NDCG@10 No External Sources .85 .26 .831 .275 .281 1) Subject Sim. using Action words .85 .39 .883 .542 .519 2) Subj Sim. using Action & Anchor .85 .39 .881 .535 .514 3) Rerank with Subject Sim. .84 .39 .823 .523 .500 Using External Sources .85 .26 .811 .266 .271 1) Subject Sim. using Action words* .85 .38 .870 .531 .508 2) Subj Sim. using Action & Anchor* .85 .39 .873 .527 .505 3) Rerank with Subject Sim.* .85 .40 .812 .517 .494 *Results after eliminating the UNKNOWN type 1) Subject Sim. using Action words .85 .38 .885 .548 .527 2) Subj Sim. using Action & Anchor .85 .39 .881 .544 .525 3) Rerank with Subject Sim. .85 .40 .839 .532 .513 Table 1: Prediction Results for various experimental settings of the framework. 5.2 Analysis - DBpedia Dataset. The dual filtering improved the accuracy of the model by 5% to 6%. It also dou- bled the NDCG@5 values and almost doubled the NDCG@10 values, This shows that both our subject similarity and Q-word filtering are effective in identifying the correct answer category and types. Using Anchor words in the subject sim- ilarity calculation did not improve the model performance as expected, instead reduced the performance slightly. This could be because the Anchor words cause the model to overfit to the sentential structure. Using external sources did not improve model performance as expected. Upon closer examination of the predictions, we found entities consisting of long phrases which essentially reduced a question to one or two phrases. This made it diffi- cult to find similar sentences creating more sparsity instead of alleviating it. In addition, for entities where we did not find DBpedia types we used the type UNKNOWN. About 46% of the questions contained at least one entity with no DBpedia types, and 14% of the the questions contained two or more entities with no DBpedia types. The use of the UNKNOWN type reduced the perfor- mance. By not replacing entities with UNKNOWN we were able to improve the performance, and notice that generalization with abstract forms helped. Readjusting the answer category using the subject similarity list did not improve model performance. This implies the answer category inferred originally with the Q-word list is a better fit than the one found through the subject similarity. This shows that Q-word is a good indicator for predicting answer category. We believe inducing a hierarchy is useful especially when labels in the hierarchy are missing from the gold labels. However, we feel that having these additional type labels, missing in the dataset, resulted in a lower evaluation score. 5.3 Analysis - Wikidata Dataset. The dual filtering improved model prediction for the answer types. The MRR for predicting answer types improved by 1.5 times. This shows that subject sim- 12 S Vallurupalli et al. ilarity filtering is effective in identifying the correct answer types. Using Anchor words in subject similarity did not improve model performance. This could be because the questions in this dataset are more of the factoid kind with a simpler sentence structure, and the sentence weighting already contributed to finding similar sentential structures. Sentence generalization with abstract forms (Us- ing External Sources) slightly reduced the performance. However, this reduction was corrected by using Anchor words. Setting entities with no DBpedia types to UNKNOWN did not negatively affect performance, as only 0.6% of the ques- tions contained an entity with no DBpedia types. Readjusting the category using subject similarity improved MRR. 6 Conclusions and Future Work In this work we present a novel methodology to perform Question Classification that is both flexible and generalizes to other question classification datasets, us- ing a zero shot learning approach. We show that an unsupervised model, namely paragraph vectors, can be used effectively in Question classification. Our model performs well on the limited sized datasets, and long tailed label distributions. Our use of semantic modeling combined with word embeddings helps capture contextual information. Inducing a hierarchy in the Gold Labels for the DB- pedia dataset did not improve performance, as the missing labels added were treated as extraneous labels. However, this does appear to be the best way to represent hierarchical labels. Our results show that semantic modeling can im- prove the performance of shallow neural models. With the use of Word2vec, we show a collaborative approach to semantic modeling and model training. This modular approach is generalized that it can be applied to any question classifi- cation dataset. Although, we did see some change in performance with the use of Anchor words, more experimentation is necessary to realize their contribution. Future work will investigate how we can leverage them to learn better sentential structures. In addition, the incorporation of external sources showed promise in this work. We will explore other ways of incorporating external sources in our future work. References 1. Choi, E., Levy, O., Choi, Y., Zettlemoyer, L.: Ultra-fine entity typing. In: Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 87–96 (2018) 2. Clark, K., Manning, C.D.: Improving coreference resolution by learning entity- level distributed representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Volume 1. pp. 643–653 (2016) 3. Dai, A.M., Olah, C., Le, Q.V.: Document embedding with paragraph vectors. In: NIPS Deep Learning Workshop (2014) 4. Dong, L., Wei, F., Sun, H., Zhou, M., Xu, K.: A hybrid neural model for type classification of entity mentions. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015) Fine and Ultra-Fine Entity Type Embeddings for Question Answering 13 5. Harris, Z.: Mathematical structures of language. In: Interscience tracts in pure and applied mathematics (1968) 6. Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: Proceedings of the 2016 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1367–1377 (2016) 7. Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: Proceedings of the 2008 Conference on Empirical Methods in Nat- ural Language Processing. pp. 927–936. Association for Computational Linguistics 8. Kim, H.K., Kim, H., Cho, S.: Bag-of-concepts: Comprehending document repre- sentation through clustering words in distributed representation. Neurocomputing 266, 336–352 (2017) 9. Landauer, T., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psycho- logical Review 104, 211–240 (1997) 10. Lau, J.H., Baldwin, T.: An empirical evaluation of doc2vec with practi- cal insights into document embedding generation pp. 78–86 (Aug 2016), https://www.aclweb.org/anthology/W16-1609 11. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International conference on machine learning. pp. 1188–1196 (2014) 12. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al.: Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web 6(2) (2015) 13. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3, 211–225 (2015) 14. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th Inter- national Conference on Computational Linguistics - Vol. 1. p. 1–7 (2002) 15. Melamud, O., Dagan, I., Goldberger, J., Szpektor, I., Yuret, D.: Probabilistic mod- eling of joint-context in distributional similarity. In: CoNLL (2014) 16. Metzler, D., Croft, W.B.: Analysis of statistical question classification for fact- based questions. Information Retrieval 8(3), 481–504 (2005) 17. Mihalcea, R., Csomai, A.: Wikify! linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. p. 233–242. CIKM ’07, New York, NY, USA (2007) 18. Mihindukulasooriya, N., Dubey, M., Gliozzo, A., Lehmann, J., Ngomo, A.C.N., Usbeck, R.: SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge. CoRR/arXiv abs/2012.00555 (2020), https://arxiv.org/abs/2012.00555 19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre- sentations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013) 20. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 101–108. Online (Jul 2020) 21. da Silva, J.P.C.G., Coheur, L., Mendes, A., Wichert, A.: From symbolic to sub- symbolic information in question classification. Artificial Intelligence Review 35, 137–154 (2010) 14 S Vallurupalli et al. 22. Sleeman, J., Finin, T.: Type prediction for efficient coreference resolution in het- erogeneous semantic graphs. In: 2013 IEEE Seventh International Conference on Semantic Computing. pp. 78–85. IEEE (2013) 23. Sleeman, J., Finin, T., Joshi, A.: Entity type recognition for heterogeneous seman- tic graphs. AI Magazine 36(1), 75–86 (2015) 24. Sun, H., Ma, H., Yih, W.t., Tsai, C.T., Liu, J., Chang, M.W.: Open domain ques- tion answering via semantic enrichment. In: Proceedings of the 24th International Conference on World Wide Web. pp. 1045–1055 (2015) 25. Voorhees, E.M.: Overview of the trec 2001 question answering track. In: In Pro- ceedings of the Tenth Text REtrieval Conference (TREC. pp. 42–51 (2001) 26. Wang, B., Wang, A., Chen, F., Wang, Y., Kuo, C.C.: Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing 8 (01 2019). https://doi.org/10.1017/ATSIP.2019.12 27. Wang, S., Tang, J., Aggarwal, C., Liu, H.: Linked document embedding for classifi- cation. In: Proceedings of the 25th ACM international on conference on information and knowledge management. pp. 115–124 (2016) 28. Yamada, I., Asai, A., Sakuma, J., Shindo, H., Takeda, H., Takefuji, Y., Matsumoto, Y.: Wikipedia2vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from wikipedia. arXiv preprint 1812.06280v3 (2020) 29. Yavuz, S., Gur, I., Su, Y., Srivatsa, M., Yan, X.: Improving semantic parsing via an- swer type inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 149–159 (2016)