=Paper=
{{Paper
|id=Vol-1963/paper607
|storemode=property
|title=QAestro Framework - Semantic Composition of QA Pipelines
|pdfUrl=https://ceur-ws.org/Vol-1963/paper607.pdf
|volume=Vol-1963
|authors=Kuldeep Singh,Ioanna Lytra,Kunwar Abhinav Aditya,Maria Esther Vidal
|dblpUrl=https://dblp.org/rec/conf/semweb/SinghLAV17
}}
==QAestro Framework - Semantic Composition of QA Pipelines==
QA ESTRO Framework – Semantic Composition of QA Pipelines Kuldeep Singh1,2 , Ioanna Lytra1,2 , Kunwar Abhinav Aditya2 Maria-Esther Vidal1,2 1 Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Germany 2 Institute for Applied Computer Science, University of Bonn, Germany kuldeep.singh@iais.fraunhofer.de {lytra,vidal}@cs.uni-bonn.de s6kuadit@uni-bonn.de Abstract. Many question answering systems and related components have been developed in recent years. Since question answering involves several tasks and subtasks, common in many systems, existing components can be combined in various ways to build the tailored question answering pipelines. QAestro frame- work provides the tools to semantically describe question answering components and automatically generate possible pipelines given developer requirements. We demonstrate the functionality of QAestro framework for building the question answering pipelines including different tasks and components. Attendees will be able to semantically describe question answering pipelines and integrate them into existing frameworks. A video of the demonstration is available online3 . Keywords: Question Answering, Software Reusability, SAT solver 1 Introduction Question answering (QA) systems allow users to extract useful information from linked open data sets such as DBpedia. At an abstract level, QA systems perform tasks, for ex- ample, question analysis, query generation, and answer generation as part of their QA pipeline [3]. However, a QA system developer implements these tasks either dedicating separate QA components for it or combining few tasks together as one component [3]. Hence, components performing one QA task in a QA system can be reusable in other QA systems. The Qanary ecosystem [2] supports the reusability of such QA compo- nents. However, there is no systematic way to describe existing QA components – ei- ther standalone or parts of other QA systems – based on their functionality, i.e., the task they perform. Therefore, with the increasing number of QA components, identifying all viable combinations of components when creating new QA pipelines requires a manual and time-consuming search in the large combinatorial space of solutions. We demonstrate QA ESTRO, a framework able to deal with the QA pipeline com- position problem by casting it to the query rewriting problem [1] and leveraging state- of-the-art SAT solvers [3]. QA ESTRO helps QA developers to semantically describe QA components and developer requirements based on these semantic descriptions; a controlled vocabulary is utilized to model QA tasks and exploited in the description of the QA components. Attendees will be able to semantically compose QA pipelines and reuse them in frameworks like QANARY, OKBQA, or QALL-ME [3]. Moreover, they will observe how QA component descriptions can be exploited to enhance reusability. 3 https://www.youtube.com/watch?v=9lhamebx7JM&feature=youtu.be Input/Output Encoder Finder Decoder Validator QA CNF Theory Models Developer MCDSAT Requirement QAestro QAestro Decoder UI Encoder Compositions C1 C1 C2 of QA LAV Rules C0 C3 C3 Components (QACM) C0 C4 C5 Fig. 1: The QA ESTRO Architecture. QA ESTRO receives as input a QA developer re- quirement and a set of rules describing QA components, and produces all the valid compositions that implement this requirement. 2 The QA ESTRO Architecture QA ESTRO is built on top of MCDSAT [3] which relies on state-of-the-art SAT solvers to efficiently enumerate compositions of QA components. QA ESTRO allows a developer to build a QA pipeline on demand; it utilizes a controlled vocabulary which encodes the properties of generic QA tasks and is used to semantically describe QA compo- nents. Further, it enumerates the compositions of the QA components that implement a given QA developer requirement. Fig. 2 illustrates the architecture of QA ESTRO. It accepts as input a QA developer requirement which is a conjunctive query over QA tasks which a developer wants to implement. Furthermore, all the mappings of existing QA components to QA tasks act as input to QA ESTRO encoder that translates it into an instance of the Query Rewriting Problem (QRP) and passes it to MCDSAT. MCD- SAT encodes the instance of QRP into a CNF theory in a way that encoding of this theory correspond to solutions of QRP. A SAT solver is used in MCDSAT to model all valid query re-writings that correspond to models of the CNF theory. The output of QA ESTRO is the valid compositions of QA components based on the corresponding developer requirement. For the detailed description of the semantic descriptions of QA components, developer requirements, and an empirical evaluation of QA ESTRO, the reader can refer to [3]. 3 Demonstration of Use Case We motivate our demonstration by considering the problem of semantic composition of a QA pipeline. Let us consider two semantically described QA components: AIDA($y, z) : – disambiguation(x, y, z, t), question(y), disEntity(z) StanfordNER($y, x) : – recognition(y, x), question(y), entity(x) These rules state the following functionalities of AIDA and Stanford NER compo- nents: (i) AIDA4 implements the QA task of disambiguation; a question is received 4 https://gate.d5.mpi-inf.mpg.de/webaida/ Fig. 2: QA ESTRO Demo. A QA developer wants to implement Named Entity Recogni- tion (NER) and Named Entity Disambiguation (NED) tasks together with a constraint that the NED component accepts question and entity as input. as input (marked with $), and a disambiguated entity is produced as output; (ii) Stan- ford NER5 implements the QA task of entity recognition; it takes a question as input and outputs spotted entities in the question. In these rules, disEntity, recognition, disambiguation etc. are the controlled vocabulary terms in QA ESTRO. Using similar rules, we have semantically described 51 QA components from 25 QA systems. Con- sider a QA system developer requirement to implement two QA tasks, for instance, NER and NED. Such requirements can be semantically described as: QADevReq : – recognition, disambiguation Using 51 QA components from 20 QA systems6 and 30 different QA system devel- oper requirements, we will demonstrate the following use cases: – QA developer requirement to implement one QA task. Developers will be able to express requirements similar to the ones in the previous example and implement a pipeline of one QA task. QA ESTRO allows a developer to choose which task she wants to implement and returns the list of components that are associated with the chosen task. Component semantic descriptions are presented as well. 5 http://nlp.stanford.edu:8080/ner/ 6 http://wdaqua.eu/QAestro/qasystems/ – QA developer requirement to implement two QA tasks. We further demonstrate how QA ESTRO can semantically compose the possible combination of components when a developer wants to implement two tasks (e.g., named entity recognition, named entity disambiguation together). This can be done with and without con- straints. In case of “with constraint” option, the developer has a specific require- ment, e.g. “only return the components which accept particular input in one of the tasks”. For example, while implementing NER and NED tasks together, the devel- oper wants to know the composition of QA pipeline in which the NED component accepts an entity as input from the NER task. In this case, QA ESTRO will not re- turn all the components implementing these two tasks, but only the QA components which produce an entity as output in the NER task and the QA components which accept this entity as input in the NED task. Hence, by demonstrating this use case, we illustrate the power of a SAT solver in QA ESTRO to check whether the inter- pretation of developers’ requests holds. – QA developer requirement to implement three, four, and five QA tasks. In this use case, we demonstrate how QA ESTRO returns valid compositions of QA com- ponents implementing three or more QA tasks based on developer requirements. For example, if a developer seeks to implement NER, NED, and query generation (i.e., the components which build a SPARQL query from disambiguated resources) together and see the possible compositions of QA components, QA ESTRO will return all viable compositions. In case the developer has some constraints on a par- ticular task (e.g., its input or output), QA ESTRO respects this as well. Similarly, we demonstrate QA ESTRO for four and five QA tasks. All the use cases of QA ESTRO are publicly available 7 . 4 Conclusion We demonstrate the QA ESTRO framework that allows to semantically describe QA components, as well as developer requirements to compose a QA pipeline from reusable QA components. We illustrate the functionality of QA ESTRO using 51 semantically described QA components and different developer requirements. Moreover, attendees will be able to specify QA pipelines and integrate them into existing QA frameworks. Acknowledgement Parts of this work received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 642795, project: Answering Questions using Web Data (WDAqua). Bibliography [1] A. Y. Halevy. Answering queries using views: A survey. VLDB J., 10(4), 2001. [2] K. Singh, A. Both, D. Diefenbach, S. Shekarpour, D. Cherix, and C. Lange. Qanary - the fast track to creating a question answering system with linked data technology. In ESWC 2016 Satellite Events. [3] K. Singh, I. Lytra, M.-E. Vidal, D. Punjani, H. Thakkar, C. Lange, and S. Auer. QAestro – semantic-based composition of question answering pipelines. In DEXA 2017. 7 http://wdaqua.eu/QAestro/