=Paper=
{{Paper
|id=Vol-1809/article-08
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1809/article-08.pdf
|volume=Vol-1809
}}
==None==
Towards a Personalized Query Answering Framework on the Web of Data Enayat Rajabi Christophe Debruyne Declan O’Sullivan Dalhousie University Trinity College Dublin Trinity College Dublin Halifax, NS, Canada Dublin 2, Ireland Dublin 2, Ireland rajabi@dal.ca debruync@scss.tcd.ie declan.osullivan@cs.tcd.ie ABSTRACT is the process of dynamically enhancing a query with related user In this paper, we argue that layering a question answering system preferences stored in a user profile with the aim of providing per- on the Web of Data based on user preferences, leads to the deriva- sonalised answers. To illustrate the question answering application, tion of more knowledge from external sources and customisation consider for example the following case: of query results based on user’s interests. As various users may find “Bob is a high school student and performs most of his studies and different things relevant because of different preferences and goals, homework using search engines. However, he is tired of searching we can expect different answers to the same query. We propose a among the search engines’ results, as it is a tedious work to dis- personalised question answering framework for a user to query cover the precise answer in thousands of candidate contents. He is over Linked Data, which enhances a user query with related pref- aware of the strength of Web of Linked Data and decides to pose erences of the user stored in his/her user profile with the aim of his queries against a personalised question answering system providing personalized answers. We also propose the extension of (PQALD). He registers in PQALD and creates his profile. He also the QALD-5 scoring system to define a relevancy metric that specifies his preferences for search. For example, he is interested measures similarity of query answers to a user’s preferences. in reading fiction books, and romantic movies. Afterwards, he starts surfacing the Web of Data to find the answers of his questions CCS Concepts using PQALD. The system narrows the list of results for Bob to • Information systems➝ Question answering. specific answers that are close to his interests and preferences. As an example, it lists all the romantic movies (as one of Bob’s inter- Keywords ests) as the priorities of search for the following question: ‘best Personalisation; Linked Data; Question Answering. movies of 2016?’ PQALD also considers all other Bob’s prefer- ences and interests in the search. As PQALD relies on the Web of 1. INTRODUCTION Linked Data, it links each found answer to IMDB dataset so that With the rapid growth of Web of Data (currently more than 154 Bob can access more information about each movie. Bob can also billion triples1), answering users’ queries and delivering the actual specify more preferences for his search (one for his homework, an- results have become increasingly a key issue [1]. Retrieving appro- other for his research, etc.) and utilise each one for the specific priate answers by exploring or browsing the Web content based on search.” keyword search mostly fail to exploit the internal structures of data Most of the studies in the area of question answering on the Web and reveal their underlying semantics. The search results are ex- of Data [2] [6] [7] [8] present approaches to retrieve information pected to contain information corresponding to the keywords and and infer knowledge over the Semantic Web utilising a set of on- in most cases, the user is left with the task of sifting through these tologies, reasoning capabilities, and inference engines. Some others results. Question Answering is an information retrieval technique [9] [10] investigate the issues involved in designing a query lan- [2] that tackles this issue by retrieving exact answers to users’ ques- guage in Semantic Web. To the best of our knowledge, query per- tions posed in natural language. On the Web of Data [3], due to its sonalisation for question answering on the Web of Data has not standards and schema, the question answering system is executed been widely considered in studies to date. This short paper presents on a network of RDF datasets and data discovery sometimes re- a personalised question answering framework with the intent of im- quires integrating several datasets. Moreover, different kinds of da- proving as well as customising the search results contingent on a tasets underlying Question Answering systems have been semanti- user’s preferences and interests. The remainder of this paper is cally improved from unstructured text to structured data [4]. structured as follows. In Section 2, we will outline the current stud- One of the significant factors to be considered in a question answer- ies on question answering on the Web of Data. Section 3 introduces ing system is personalisation of the query and answers contingent our proposed approach, followed by conclusion and future work in on the user interest and preferences, as various users may find dif- Section 4. ferent things relevant when searching because of different prefer- ences, goals and interests. Thus, users may naturally expect differ- ent answers to the same query. Typically, query personalisation [5] 1 http://stats.lod2.eu/ [Accessed: 28-Feb-2017] LDOW 2017. 3 April, 2017. Perth, Australia. © 2017 Copyright held by the owner/author(s). LDOW 2017, 03 April 2017, Perth, Australia Enayat Rajabi, Christophe Debruyne, and Declan O’Sullivan 2. BACKGROUND AND RELATED WORK an active learning approach using Query Tree Learner (QTL) algo- The goal of question answering systems is “to allow users to ask rithm. questions in Natural Language (NL), using their own terminology and receive a concise answer” [11]. Recent years have witnessed 3. PERSONALISED QUESTION ANSWER- the transfer of question answering techniques used for traditional ING FRAMEWORK Web or local systems to the development of the semantic query an- Searching information on the Web of Data requires user friendly swering systems on the Web of Linked Data [6], which take queries approaches, similar to the ease of keyword-based search engines, expressed in natural languages and a given ontology as input, and but relying on the RDF. In this vein, typically Question Answering returns answers drawn from one or more datasets that subscribe to systems are proposed to retrieve the best possible answers for end the ontology [1]. Most query answering systems rely on ontology users. Question Answering on Linked Data has recently been stud- specific approaches, where the power of ontologies as a model of ied by researchers along with the associated challenges in the do- knowledge is directly exploited for the query analysis and transla- main [12-15]. Current query personalisation systems mostly con- tion. Aqualog [8], in particular, allows users to choose an ontology cern semi-structured or unstructured data and, to the best of our and then ask natural language queries with respect to the universe knowledge, a personalisation query approach on the Web of Data of discourse covered by the ontology. It identifies ontology map- has not been considered yet. Providing an enriched knowledgebase pings for all the terms and relations in the triple patterns of is another step toward developing a question answering system that SPARQL query by means of string based comparison methods and can be fulfilled by linking different datasets to each other or to ex- WordNet. AquaLog uses generalisation rules to learn novel associ- ternal knowledge on the Web. ations between the natural language relations used by the users and Generally speaking, query personalisation in a question answering the ontology structure. Lopez et al. [1] compared several ontology- system usually falls into two categories: a) information filtering based question answering systems in a study based on a set of cri- systems wherein a stored query or set of queries comprise a user teria including degree of customization, and revealed that most of profile based on which an information filtering system collects and the semantic question answering systems (such as QuestIO [12], distributes relevant information; b) recommendation systems that FreyA [13], and Querix [14]) did not support customization in their produce predictions, recommendations, opinions that help a user approaches, whilst QACID [15] and ORAKEL [16] considered evaluate or select a set of entities, and the system identifies other some levels of domain customisation that have to be performed or similar entities, based on which recommendations or predictions supervised by domain experts. For example, QACID is based on a are produced regarding what the user would like. collection of queries from a given domain that are categorised into clusters, where each cluster, containing alternative formulations of Our approach for personalising the user queries will fall in the first the same query, is manually associated with SPARQL queries. category and relies on a quantitative approach which aims at an ab- None of the mentioned question answering systems, however, did solute formulation of user preferences, such as a user likes come- take the users’ interests and preferences into their consideration. dies very much and westerns to a lesser degree. This allows for total ordering of results and the straightforward selection of those an- With regard to query personalisation studies, Koutrika and Ioan- swers matching user preferences. We may also use techniques in nidis [17] presented an approach on query personalisation in digital query personalisation that reveal some implicit knowledge about libraries over relational databases. They treated query personalisa- the user interests, when incomplete information in the user profile tion as a query-rewriting problem and provided an algorithm that prevents us to retrieve appropriate knowledge for query customisa- produces a personalised version of any query. They captured user tion [20]. Figure 1 outlines the main components and flows of the preferences as query rewriting rules with assigned weights that in- proposed approach wherein we will analyse the questions, custom- dicate user interest. In [18], the authors formulated Constrained ise them based on users’ preferences and profile, extract the an- Query Personalisation (CQP) approach as a state-space search swers from a set of linked datasets, and finally score the results as problem to build a set of personalised queries dynamically taking well as visualise them for users. Below we will explain how we the following features into account: the queries issued, the user’s implement each phase of the proposed framework. interest in the results, response time, and result size. Gheorghiu et al. [19] presented a hybrid preference model that combines quanti- 3.1 Question Analysis tative and qualitative preferences into a unified model using an acy- With respect to question analysis phase, several NLP techniques clic graph, called HYPRE Graph, to personalise the query results. can be used to convert the user questions to SPARQL. In particular, They implemented a framework using Neo4j graph database sys- the underlying idea of AutoSPARQL [21] is an interesting solution tem and experimentally evaluated it using real data extracted from to convert a natural language expression to a SPARQL query, DBLP. The above-mentioned studies in this domain did not imple- which can then retrieve the answers of a question from a given tri- ment their approaches on the Web of Data to leverage the connec- ple store. Our strategy for both syntactic and semantic analysis of tivity and availability of datasets and improve their results. How- questions is not implementing a software from scratch to convert ever, we believe the preference model mentioned in [20] can be the user question to a SPARQL query, instead we intend to apply utilised in development of a query answering system for Linked one of the existing approaches (i.e. AutoSPARQL, GATE2, or the Data. We will also leverage an extensive survey performed by approach in [22]) to select features from the question, to extract and Lopez et al. [1] in our implementation to precisely identify the classify them, and to support the transformation of question to strengths and weaknesses of other approaches with the intent of de- SPARQL. To provide support for multiple languages, we intend to signing a robust system. Moreover, to convert the user questions to follow the approach mentioned in QALD-4 [23] by annotating the SPARQL, we will investigate the possibility of using some text to questions with a set of keywords in an XML or RDF format. The SPARQL approaches like AutoSPARQL [21], which implements 2 https://gate.ac.uk/ [Accessed: 28-Feb-2017] LDOW 2017, 03 April 2017, Perth, Australia Enayat Rajabi, Christophe Debruyne, and Declan O’Sullivan language detection step is also appropriate to identify the user’s 3.4 Answer Scoring and Visualisation language and customise the results for him/her. One of the technologies that can be applied for Answer Scoring is using the lexical answer type. DeepQA3, as the IBM project in 3.2 Query Personalisation NLP, includes a system that takes a candidate answer along with a For the query personalisation phase, the idea is to design and im- lexical answer type and returns a score indicating whether the can- plement a user preference model (based on current well-designed didate answer can be interpreted as an instance of the answer type. preference models e.g., [20]) and customise the query according to This system utilises WordNet or DBpedia datasets to search for a the user’s interests stated in the user profile. Having the user pref- link of hyponymy, instance-of or synonymy between answer and erences in one of the Linked Data vocabularies (including but not its lexical type. We will extend this approach to discover a link be- limited to FOAF or FRAP [24]), the query analyser analyses the tween the answers and the user’s profile or preferences. query (which is presented in SPARQL as the output of previous step) and customises it according to the designed user preference This phase has also a visualisation service to visualise the final can- model. The output of this phase is a new SPARQL query, which didate answers (the most matched answers to the user’s profile) to will be the input of answer extraction service. the user. 3.3 Answer Extraction Service and Reasoning 3.5 Evaluation To extract the answers from a set of linked datasets, we intend to To evaluate the proposed query answering system, we intend to uti- apply a reasoning engine that uses description-logic as its basic for- lise the QALD evaluation approach [6], which provides a common malism and relies one of OWL 2 flavours (e.g. OWL 2 QL) as the evaluation benchmark and allows for an in-depth analysis of a ques- ontology logic. The idea is to select a reasoner that provides com- tion answering system and its progress over time. In this bench- pleteness and decidability of the reasoning problems, offers com- mark, the task for our system would be to return, for a given natural putational guarantees and has more efficient reasoning support than language question and an RDF data source, a list of entities that other formalisms. As we will follow a rule-based reasoning engine, answer the question, where entities are either individuals identified a homogeneous approach will be applied to make a tight semantic by URIs or labels, or literals such as strings, numbers, dates, and integration for embedding rules and ontology in a common logic Booleans. We will extend the benchmark to evaluate the closeness ground. We will also utilise either SWRL [25] or RIF [27] as the of the query results to the user preferences or profile. Particularly, rule languages of the framework in the knowledge layer of this multilingual questions are provided in seven different languages in phase and Jena-Pellet reasoner as our reasoning engine tool. QALD-4 [23] that helps us to cover the users’ language in their Figure 1. Personalised question answering framework 3 https://www.research.ibm.com/deepqa/deepqa.shtml [Accessed: 28-Feb-2017] LDOW 2017, 03 April 2017, Perth, Australia Enayat Rajabi, Christophe Debruyne, and Declan O’Sullivan preferences and answer the correspondent question accordingly. The presented framework is a domain independent system. How- Moreover, QALD-5 [28] allows users to annotate hybrid questions ever, to evaluate the functionality of the system, we intend to apply with several attributes including answer type and aggregation. Ex- the mentioned dataset(s) in an educational system wherein students tending the question annotations by adding more attribute associ- are willing to do their homework/research by answering a set of ated with the users profile in a query allows the question answering questions. First of all, we will provide a set of in-scope and out- system to consider the user preferences in the process. According scope test questions (e.g., 30 questions). To assess the efficiency to this approach, to measure the overall precision of a question q, and robustness of proposed system, some in-scope questions will we consider three following metrics (precision, recall, and rele- require linking datasets to be answered and some others will not be vance): in the scope of knowledgebase (out-scope). For example, for the number of correct system answers for q question of “list of most sold books in 2013”, the system will ex- Recall(q) = plore more than one datasets to discover the answers for users. On number of gold standard answers for q the other hand, students will set their profile and specify a set of number of correct system answers for q preferences that can be applied for their research purposes. For ex- Precision(q) = number of system answers for q ample, information such as student’s grade, language, and field of study will be specified in his/her profile. Also, student’s interests ∑𝑛𝑖 Relevance(q, 𝑎𝑖 ) Relevance(q, u) = such as his/her favourite subjects, music, and books will be pro- number of correct system answers for q vided in the system. Figure 2 illustrates a prototype that the final Where Relevance(q,u) (0<=value<=1) is the total similarity (ac- system will look like, wherein the answers have been personalised cording to the user profile) of all the correct answers (ai) to question based on user’ interests and preferences in the right side of picture. q rated by user u. Students will select some case questions and the system will pro- ∑𝑛𝑖 Relevance(q, 𝑢𝑖 ) vide them a set of candidate answers based on their preferences and Relevance(q) = profile. Eventually, the students will be asked to rate the results, number of rated users that it, we will specify the similarity of generated answers and what Relevance(q) is total relevance of all users (𝑢𝑖 , . . . , 𝑢𝑛 ) to question the students expect to see as the output of system. This metric will q. be used for the evaluation of the accuracy of proposed system. Overall F-measure in our approach is computed as follows: 2 ∗ Precision(q)× Recall(q) F − Measure(q) = × Relevance(q) Precision(q)+ Recall(q) The gold standard answers in our system are defined as most matched answers with the user preferences. 3.6 Evaluation Scenario To select a set of linked datasets for evaluation and test of the pro- posed question answering system, we formulated a set of criteria to assess the abilities and robustness of system. Containing large-scale data, multilinguality, ontological structure, and linkability were part of these criteria. Our knowledgebase will be chosen from one or more of the following datasets for the evaluation: DBpedia4 which is the central interlinking hub for the emerging linked data cloud [29]. The English version of DBpedia includes around 4.6 million things. This dataset Figure 2. A prototype for the personalised query answering frame- has been linked to 41.2 million entities to YAGO catego- work on the Web of Data ries5. MusicBrainz6 as a collaborative open-content music da- taset, contains all of MusicBrainz’ artists and albums as 4. CONCLUSION AND FUTURE WORK well as a subset of its tracks, leading to a total of around This paper described a personalised question answering framework 15 million RDF triples. to improve the results of a question answering system based on a British National Bibliography7 (BNB) dataset that pub- user’s preferences and interests. We also proposed a relevancy met- lishes books and digital objects as Linked Data by British ric to measure the similarity between the answers and the user pro- Library linked to external sources including GeoNames8. file by extending the QALD-5 scoring system. The proposed frame- Currently, BNB includes around 3.1 million descriptions work will be implemented on the Web of Data, where the question (more than 109 million triples) of books and serials pub- answering system uses a set of linked datasets, an API for convert- lished in the UK over the last 60 years. ing questions to SPARQL queries, and a robust answer scoring sys- WordNet [30] dataset with hundreds of thousands of tem to obtain the most interested results for users. facts that provides concepts (called synsets), each repre- senting the sense of a set of synonymous words. 4 http://dbpedia.org [Accessed: 28-Feb-2017] 6 http://musicbrainz.org/ [Accessed: 28-Feb-2017] 5 http://www.mpi-inf.mpg.de/yago-naga/yago/ [Accessed: 28-Feb- 7 http://bnb.bl.uk/ [Accessed: 28-Feb-2017] 2017] 8 http://www.geonames.org/ [Accessed: 28-Feb-2017] LDOW 2017, 03 April 2017, Perth, Australia Enayat Rajabi, Christophe Debruyne, and Declan O’Sullivan ACKNOWLEDGEMENT [15] Ó. Ferrández, R. Izquierdo, S. Ferrández, and J. L. Vicedo, This study is partially supported by the Science Foundation Ireland “Addressing ontology-based question answering with collec- (Grant 13/RC/2106) as part of the ADAPT Centre for Digital Con- tions of user queries,” Information Processing & Manage- tent Technology Platform Research (http://www.adaptcentre.ie/) at ment, vol. 45, no. 2, pp. 175–188, Mar. 2009. Trinity College Dublin. [16] P. Cimiano and M. Minock, “Natural Language Interfaces: What Is the Problem? – A Data-Driven Quantitative Analy- REFERENCES sis,” in proceeding of International Conference on Applica- [1] V. Lopez, V. Uren, M. Sabou, and E. Motta, “Is Question An- tion of Natural Language to Information Systems, Springer swering Fit for the Semantic Web?: A Survey,” Semantic Berlin Heidelberg, pp. 192–206, 2009. web, vol. 2, no. 2, pp. 125–155, 2011. [17] G. Koutrika and Y. Ioannidis, “Rule-based query personaliza- [2] S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng, “Web Ques- tion in digital libraries,” International Journal of Digital Li- tion Answering: Is More Always Better?,” in Proceedings of brary, vol. 4, no. 1, pp. 60–63, Aug. 2004. the 25th Annual International ACM SIGIR Conference on Re- [18] G. Koutrika and Y. Ioannidis, “Constrained Optimalities in search and Development in Information Retrieval, New York, Query Personalization,” in Proceedings of the 2005 ACM NY, USA, pp. 291–298, 2002. SIGMOD International Conference on Management of Data, [3] C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee, “Linked New York, NY, USA, pp. 73–84, 2005. Data on the Web (LDOW2008),” in Proceedings of the 17th [19] R. Gheorghiu, A. Labrinidis, and P. K. Chrysanthis, “Unify- International Conference on World Wide Web, New York, ing Qualitative and Quantitative Database Preferences to En- NY, USA, 2008, pp. 1265–1266. hance Query Personalization,” in Proceedings of the Second [4] S. Shekarpour, K.M. Endris, A. Jaya Kumar, D. Lukovnikov, International Workshop on Exploratory Search in Databases K. Singh, H. Thakkar, and C. Lange, “Question answering on and the Web, New York, NY, USA, pp. 6–8, 2015. linked data: Challenges and future directions”. In Proceedings [20] G. Koutrika, E. Pitoura, and K. Stefanidis, “Preference-Based of the 25th International Conference Companion on World Query Personalization,” in Advanced Query Processing, B. Wide Web, pp. 693-698, 2016. Catania and L. C. Jain, Eds. Springer Berlin Heidelberg, pp. [5] G. Koutrika and Y. Ioannidis, “Personalization of queries in 57–81, 2013. database systems,” in 20th International Conference on Data [21] J. Lehmann and L. Bühmann, “AutoSPARQL: Let Users Engineering, 2004. Proceedings, 2004, pp. 597–608. Query Your Knowledge Base,” in Proceedings of ESWC, [6] V. Lopez, C. Unger, P. Cimiano, and E. Motta, “Evaluating 2011. question answering over linked data,” Web Semantics: Sci- [22] D. Song, F. Schilder, C. Smiley, and C. Brew, “Natural lan- ence, Services and Agents on the World Wide Web, vol. 21, guage question answering and analytics for diverse and inter- pp. 3–13, Aug. 2013. linked datasets,” in The 2015 Conference of the North Ameri- [7] U. Shah, T. Finin, A. Joshi, R. S. Cost, and J. Matfield, “In- can Chapter of the Association for Computational Linguis- formation Retrieval on the Semantic Web,” in Proceedings of tics: Human Language Technologies, pp. 101–105, 2015. the Eleventh International Conference on Information and [23] P. Cimiano, V. Lopez, C. Unger, E. Cabrio, A.-C. N. Ngomo, Knowledge Management, New York, NY, USA, pp. 461–468, and S. Walter, “Multilingual Question Answering over 2002. Linked Data (QALD-3): Lab Overview,” in Information Ac- [8] V. Lopez, M. Pasin, and E. Motta, “AquaLog: An Ontology- cess Evaluation. Multilinguality, Multimodality, and Visuali- Portable Question Answering System for the Semantic Web,” zation, pp. 321–332, 2013. in The Semantic Web: Research and Applications, A. Gómez- [24] L. Polo, I. Mínguez, D. Berrueta, C. Ruiz, and J. M. Gómez, Pérez and J. Euzenat, Eds. Springer Berlin Heidelberg, pp. “User preferences in the web of data,” Semantic Web, vol. 5, 546–562, 2005. no. 1, pp. 67–75, Jan. 2014. [9] R. Fikes, P. Hayes, and I. Horrocks, “OWL-QL—a language [25] “SWRL: A Semantic Web Rule Language Combining OWL for deductive query answering on the Semantic Web,” Web and RuleML.” [Online]. Available: https://www.w3.org/Sub- Semantics: Science, Services and Agents on the World Wide mission/SWRL/. [Accessed: 22-Aug-2016]. Web, vol. 2, no. 1, pp. 19–29, Dec. 2004. [26] “RIF Overview (Second Edition).” [Online]. Available: [10] I. Horrocks and S. Tessaris, “Querying the Semantic Web: A https://www.w3.org/TR/rif-overview/. [Accessed: 03-Dec- Formal Approach,” in International Semantic Web Confer- 2016]. ence, Springer Berlin Heidelberg, Springer Berlin, pp. 177– [27] C. Unger et al., “Question Answering over Linked Data 191, 2002. (QALD-4),” presented at the Working Notes for CLEF 2014 [11] L. Hirschman and R. Gaizauskas, “Natural Language Ques- Conference, 2014. tion Answering: The View from Here,” Natural Language [28] C. Unger et al., “Question Answering over Linked Data Engineering, vol. 7, no. 4, pp. 275–300, Dec. 2001. (QALD-5),” in Working Notes of CLEF 2015 - Conference [12] V. Tablan, D. Damljanovic, and K. Bontcheva, “A Natural and Labs of the Evaluation forum, 2015, vol. 1391. Language Query Interface to Structured Information,” in Eu- [29] E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing ropean Semantic Web Conference (ESWC), Springer Berlin broken links on the web of data: An experiment with DBpe- Heidelberg, pp. 361–375, 2008. dia,” J Assn Inf Sci Tec, vol. 65, no. 8, pp. 1721–1727, Aug. [13] V. T. Danica Damljanovic and K. Bontcheva, “A Text-based 2014. Query Interface to OWL Ontologies,” in Proceedings of the [30] “RDF/OWL Representation of WordNet.” [Online]. Availa- Sixth International Conference on Language Resources and ble: https://www.w3.org/TR/2006/WD-wordnet-rdf- Evaluation (LREC’08), Marrakech, Morocco, 28-30, 2008. 20060619/. [Accessed: 18-Jan-2017]. [14] E. Kaufmann, A. Bernstein, and R. Zumstein, “Querix: A Nat- ural Language Interface to Query Ontologies Based on Clari- fication Dialogs,” In: 5th ISWC, pp. 980–981, 2006.