ISOFT at QALD-4: Semantic similarity-based question answering system over linked data Seonyeong Park, Hyosup Shim, and Gary Geunbae Lee Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, Gyungbuk, South Korea {sypark322, hyosupshim, gblee}@postech.ac.kr Abstract. We present a question answering system over linked data. We use nat- ural language processing tools to extract slots and SPARQL templates from the question. Then, we use semantic similarity to map a natural language question to a SPARQL query. We combine important words to avoid loss of meaning, and compare combined words with uniform resource identifiers (URIs) from a knowledgebase (KB). This process is more powerful than comparing each word individually. Using our method, the problem of mapping a phrase of a user ques- tion to URIs from a KB can be more easily solved than without our method; this method improves the F-measure of the system. Keywords: Question answering, SPARQL, Semantic similarity 1 Introduction Question answering (QA) systems (QASs) find the answers to natural language (NL) questions by analyzing huge corpora. In the big data era, this property of QA is increas- ingly necessary. Two popular types of QAS are those based on information retrieval (IR) and those based on knowledgebases (KBs). Most IR-based QASs use the following general strategy [1, 2, 3]: 1) analyze the question and translate the questions into the queries for IR, 2) retrieve a small passages including answers of the document collection by using IR technology, and 3) extract answer candidates and select the final answer. Typically, the question analysis module involves NL processing (NLP) such as part-of-speech (POS) tagging and dependency parsing. Recently, very large, structured, and semantically-rich KBs have become available; examples include Yago [4], DBpedia [5], and Freebase [6]. As an increasing quantity of resource description framework (RDF) data are published in linked form, intuitive ways of accessing the data are becoming increasingly necessary. DBpedia forms the nucleus of the web of linked data [7]; it inter-connects large-scale-RDF data sources with 2.46 billion subject-predicate-object triples. Several QASs use RDF data; exam- ples include Aqualog [8], Feedback, Refinement and Extended VocabularY Aggrega- tion (FREyA) [9], Template-Based SPARQL Learner (TBSL) [10] and the ontology- 1236 based QAS Pythia [11]. Most KB-based QASs use the following general strategy [8,9,10,11,12,14]: 1) analyze the question, 2) map the entity and predicate from user question to the words in a KB, 3) formulate and select queries, and 4) search queries and extract answers. Strategies of KB-based QASs are not much different from those of IR-based QASs, but require that the user’s phrase be translated to words that exist in a KB. In FALCON [1], an IR-based QAS, the question is parsed to extract the expected answer type and an ordered list of keywords that are used to retrieve relevant text pas- sages. However, the keyword search lacks a clear specification of the relations among entities. Translating NL questions into SPARQL query requires performing both a disambig- uation task and a mapping task. DEep Answers for maNy Naturally Asked questions (DEANNA) [12] is based on an integer linear program to solve several disambiguation tasks jointly: segmentation of questions into phrases; mapping of phrases to semantic entities, class and relations; and construction of SPARQL triple patterns. In this paper, we aim to solve the problem of translating the NL question into a SPARQL query to search for the answer in the KB. To improve the F-measure of the system, we use linguistic information about the user question, and semantic similarity for translating the NL words into the words in the KB. We used semantic similarity based on Explicit Semantic Analysis (ESA) [13] for mapping predicates in the user NL question to predicate uniform resource identifiers (URIs) in the KB. ESA converts target strings to semantic vectors that can convey their explicit meaning as weighted vectors of Wikipedia concepts, so calculating the similar- ity of two vectors reveals the semantic relatedness of the two strings from which the vectors were generated. We also increased the effectiveness of mapping the NL words to URIs by concatenating additional-information to the predicate. However, previous work did not use our approach in this problem of using semantic vectors from Wikipedia. FREyA uses a method that generates suggestions with string similarity and hierarchy defined in an ontology, scores them again using string similar- ity algorithms, then maps them to high-scoring concepts in the ontology. PowerAqua [14] includes a linguistic component part, an element mapping component (PowerMap), a triple mapping component, and a merging and ranking component. PowerMap pro- vides automatic mapping for inter-ontology concepts and semantic relevance analysis; it calculates semantic relatedness as the distance between corresponding senses in Wordnet’s graph. 1237 2 System Description 2.1 Overall system architecture Fig. 1. Overall proposed QA system: processes are described in the text We built our system by extending TBSL [10]. We added more question analysis tech- niques and ESA for measuring semantic similarity. The user question was translated into a SPARQL query in several steps (Fig. 1). We must extract the slots and the SPARQL template. The SPARQL templates correspond directly to the internal struc- ture of the question [10]; they specify the query’s “select” or “ask” clause, its filter and aggregation functions, and the number and forms of its triples. Each slot has three com- ponents: a proxy variable; the type of intended URI (class, property or resource); and the NL expression. The proxy variable is replaced with the appropriate URI after it is identified. The translation steps with examples are: 1. NL question: In which country does the Nile start? 1238 2. Slot : 3. SPARQL Template: SELECT ?x WHERE {?x ?p ?y . ?x rdf:type ?c} 4. Slot with URI : < res:Nile, resource, Nile> 5. SPARQL query: PREFIX dbo: PREFIX res: SELECT DISTINCT ?x WHERE{ res:Nile dbo:sourceCountry ?x . ?x rdf:type dbo:Country .} 2.2 Slot extraction We use NL analysis to extract slots. Given NL question inputs, we classify questions to detect question-type keywords (e.g., “who”, “what”, “when”, “where”, and “how”). The heuristics to extract slots and templates differ slightly among question types. After the question is classified, the question is analyzed at the word, syntactic, and semantic levels. For these analyses, we used ClearNLP1, which is available in github. Addition- ally, we used open NLP2 for chunking. For named entity (i.e., resource) disambigua- tion, we used AIDA3. Additionally, we used keywords of questions provided by Ques- tion Answering over Linked Data (QALD-44) organizers. Using the result of NL anal- ysis, we developed several rules to extract slots. First, we check all words in the NL question and find a word to maximize the appro- priateness score (eq. 1), which will become a class (i.e., answer type). The class is a type of proxy variable x that the question seeks.    ( ) =  ( ) +  ( ) +  ( ), (1) where  is word i in the question,  ( ) is the pre-defined weight when the POS of  is NN or NNS; otherwise  ( ) is 0.  ( ) is the pre-defined weight when  is a dependency of the main verb; otherwise  ( ) is 0.  ( ) is the pre-defined weight when  is a head of question type; otherwise  ( ) is 0. We determine appropriateness scores by analyzing the answer type in QALD-4 training data and UIUC data [15]. The following heuristic is used to extract 1 http://clearnlp.wikispaces.com/ 2 https://opennlp.apache.org/ 3 https://www.mpi-inf.mpg.de/yago-naga/aida/ 4 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=task1&q=4 1239 the class in all question types. If the class is not found, we regard pre-defined class (e.g., place) as type of proxy x followed by question-type keywords (e.g., “where”). Heuristics Class Extraction Input S – Natural language question Output c – Class Initialize class c as null; the appropriateness score of null is 0; the number of words in the NL question is n 1: For all words in the NL question 1: Calculate the appropriateness score of wi and the appropriateness score of current c 2: Compare the appropriateness score of wi and the appropriateness score of current c 3: Update current c as the lemma of wi if the appropriateness score of wi > current c 2:Endfor 3:Return c When the named entities or noun phrases consist of more than one word, we use a dependency parser and a chunker to concatenate words in the noun phrase into one word before applying the heuristic rules. For example, first we use AIDA or given key- words to detect the named entity (i.e., resource). Named entities are usually subjects or objects in the NL questions. We used the dependency parser to concatenate “Walt” and “Disney” to “Walt Disney”, which we regard as one word. We use AIDA and given keywords to extract the named entity. We use the chunker to extract the noun phrases and our heuristics to extract the class and then apply the chunker to obtain the predicate between the subject and object (Fig. 2). First, we extract entities as a subject and object, for example, “television show” and “Walt Disney”. To extract the predicate between them, we use the dependency parser to detect the head of each word. We finally extract the predicate, which is a chunk that includes the two heads. If the heads are the same word, we regard the head as the predicate. If the two heads are not in the same chunk, we concatenate two chunks into the predicate. Fig. 2. Example of extracting a predicate: processes are described in the text 1240 After the QALD-4 deadline passed, we added graph-based reduction rules (Fig. 3) to extract slots. Fig. 3. Overall steps to map a graph-based reduction: processes are described in the text 1. Initialize dependency graph: We obtain a dependency graph from the result of the dependency parser. Each node has information such as word surface form and lemma form, and each edge represents a dependency relation. 2. Determine reduction policy tag by reduction rules: We give four policy tags, remove (OMT), merge (MRG), reverse (REV), and remain (REM) to each node followed by rules that check information such as the POS of the head, the de- pendent of the POS, the dependency-label (e.g., SUB or OBJ) and the word of the dependent and the head. 3. Apply Reduction policy: We apply four reduction policies to the initial depend- ency graph to reduce it. These graph-based reduction rules improve our total system F-measure in the QALD- 4 test dataset, but they still have limitations. First, devising rules to determine tags is a difficult task. Moreover, we determine each node as subject, predicate or object by the distance from “question-type keywords” (e.g., “who”, “what”, “when”, “where”, and “how”). For a question such as “Give me all movies with Tom Cruise.”, the slots are difficult to extract. 1241 2.3 SPARQL Template extraction We used lexical information from each question to extract the appropriate SPARQL template. · Template for a Boolean question: ASK WHERE { ?x ?p ?y. } Using lexical information, we detect whether the question is a “wh-question”. If the question is a “yes/no question”, it does not include “question-type keywords”. We regard each question as a “yes/no question” if it contains neither question type words nor “list-question keywords” (e.g., “Give”, “List”). · Template for a simple question: SELECT DISTINCT ?x WHERE { ?x ?p ?y . (Op- tion:?x rdf:type ?c.)} We define the basic query template as above. Optionally, we used ?c (replaced with class) if the class is exactly correct. Our class detections module works well for a “which” question or a “who” question. We usually include only one triple. We use lexical information (conjunction such as “and” and relative pronoun such as “who”) and a dependency parser to extract n triples, but this remains a difficult task. We must extract more than one triple to successfully translate the NL question into a SPARQL query in some cases. However, we cannot extract n triples in many cases such as “Give me all films produced by Steven Spielberg with a budget of at least $80 million.”, because it does not include a conjunction or relative pronoun explic- itly; to extract n triples is a difficult task to complete by only using heuristics. · Template for including an aggregation function question: We used the “aggregation” attribute, which indicates whether any operations beyond triple pattern matching are required to answer the question. We define “aggregation” functions as “count”, fil- ter” and “order”, and define the modifier target as a proxy that is the target of the aggregation function. We used three types of functions, for example: ─ COUNT: SELECT COUNT (DISTINCT ?x) WHERE { ?x ?p ?y. } ─ ORDER: SELECT DISTINCT ?x WHERE {?x ?p ?y. } ORDER BY DESC(?x) OFFSET 0 LIMIT n ─ FILTER: SELECT DISTINCT ?x WHERE {?x ?p ?y . FILTER (?y <1950)} Some words indicate the types of functions to use. We define these words as “aggre- gation indicators”. For example, if the sentence contains “How many” we extract COUNT template and extract the target, i.e., the head of the “many”. If the question contains a superlative, we infer that the question requires an ORDER operation. If the question contains a comparative, we infer that the question requires a FILTER operation. We use the head-dependency relation of an “aggregation indicator” to detect the targets of these types of operations and the constant (e.g., 1950) that the filter uses. 1242 2.4 Mapping the proxy variable to the URI Fig. 4. Overall steps to map the Proxy to the URI: processes are described in the text The process (Fig. 4) for mapping proxy variables to URIs entails the following seven steps: 1. Identify resource/class URIs from noun Phrases: We use noun phrases delivered from the slot extraction module to identify URIs from the KB; normal noun phrases are regarded as class type URIs, and proper noun phrases are regarded as resource-type URIs. We built an inverted index of all resource/class type URIs from DBpedia; consequently, the actual mapping is performed by search- ing the inverted index with variant forms of NPs, which were generated by heu- ristics in capitalization and white spacing; this process works well. To map quickly, we used lucene5. 5 http://lucene.apache.org/ 1243 2. Search the NL pattern: We used PATTY6 to map NL predicates to the words in the KB. However, in most cases, the PATTY pattern repository failed to map the predicates from the questions into the appropriate words in KB. 3. Retrieve predicate URI candidates: We determine that the predicate URI is a word from the user’s phrase; this word is translated to words extracted from the KB. We collect predicate URI candidates occurring in triples that have a re- source or class that was previously identified as a subject or object. Queries to retrieve predicates are structured as follows: SELECT DISTINCT ?p WHERE { { ?p [] .} UNION{[]?p.} } SELECT DISTINCT ?p WHERE { {?x ?p []. ?x rdf:type .} UNION {[] ?p ?x. ?x rdf:type .} } 4. Choose a string from the NL against which to measure the semantic relatedness: Pattern matching alone is insufficient to translate predicates in the NL question to predicate URIs. The slot extraction phase delivers several strings for measur- ing the semantic relatedness. Each string includes a different set of classes that restrict the meaning of the predicate. Restricting the meaning of the predicate helps ESA to choose the most relevant predicates. For example, the NL question "In which country does the Nile start?" requires the predicate URI "sourceCoun- try" for class type: Country. Measuring the semantic relatedness using the string "start" did not assign a high score to the desired predicate URI, but using the string "start country" did. Adding class information such as “start country” to the predicate is better than using only a predicate such as “start” to represent the user intention and semantic meaning of the question. 5. Rank predicate URI candidates by semantic similarity: Calculate the semantic relatedness between each predicate URI in the KB and the string from the ques- tion. 6. Choose n-best predicate URI candidates: Choose n-best similar predicate URIs from the KB. We experimentally determined n many times. In our cases, n > 2 was not helpful; it decreased the precision much more than it increased the re- call. 7. Query generation with n-best predicates: Replace the slots with identified URIs and complete the query. This process identifies the proxy variables that the question requires. 6 http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago- naga/patty/ 1244 3 Experiment We used the QALD-4 test dataset for task 1 to evaluate the proposed method and used DBpedia 3.97 as the KB. We mainly compare two approaches. One (P) uses only pattern matching to identify resource/class/predicate URI, and the other (P+ESA) combines pattern matching to identify the resource/class URI and ESA to identify the predicate URI. The latter yielded a higher F-measure than did the former. Additionally, we added graph-based reduction rules to the latter, and this P+ESA+graph approach yielded the highest F-measure. We used the ESA open library8. The test dataset includes 31 simple questions, four yes/no questions and 15 questions with aggregation. 4 Results and Discussion To evaluate the QA system, we used global precision, recall and F-measure (Table 1). Graph-based reduction reduces the parse tree by applying predefined reduction rules. P+ESA+graph yielded a higher F-measure than did P+ESA. Graph-based reduction is in initial development; we will continue to improve it. We do not have results for graph- based reduction in Tables 2-4. In this section, we mainly compare our semantic simi- larity approach and pattern matching approach. We usually used pattern matching (e.g., capitalization and white spacing) to map a class and resource. However, most predicates cannot be mapped using pattern matching, so we used ESA when mapping the NL predicate to the predicate URI in the KB. We show the number of questions that were answered successfully by predicate mapping using ESA (Table 2) and pattern matching (Table 3). The proposed method of computing semantic similarity achieved higher pre- cision and recall than did the previous pattern matching approach (Tables 2, 3). Table 1. Evaluation results for the QALD-4 test dataset. Method Total SPARQL Correct Partially Global Global Global generation Correct Recall Precision F-measure P 50 0 0 0 0.00 0.00 0.00 P + ESA 50 28 10 3 0.26 0.21 0.23 P+ESA+graph 50 31 16 2 0.36 0.33 0.34 Table 2. Precision (P) and recall (R) using ESA to map the NL predicate to the appro- priate predicate URI in the KB. Bold: only answers found; underlined italic: several candidate answers found (some not appropriate); normal: no answers found. R=1 0