<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Open Question Answering Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edgard Marx</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Soru</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Esteves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AKSW, Institute of Computer Science, University of Leipzig</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Question answering systems are currently regarded as one of the key technologies to empower lay users to access the Web of Data. Nevertheless, they still pose hard and yet unsolved research challenges. Additionally, developing and evaluating question answering systems to address those challenges is a very complex task. We present openQA, a modular open-source platform that allows developing and evaluating question answering systems. We show the di erent interfaces implemented by openQA as well as the di erent processing steps from a question to the corresponding answer. We demonstrate how openQA can be used to implement, integrate, evaluate and instantiate question answering systems easily by combining two state-of-the-art question answering approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The goal of the demonstration will be to show how to integrate, evaluate and
instantiate a QA application using the openQA platform. In addition, we will show
how openQA implements di erent processing steps from a natural-language (NL)
query to the corresponding answer. In the demonstration, we also go over some of
the implemented plug-ins and applications, in particular SINA [3] and TBSL [5].
At the end, we will discuss the bene ts and disadvantages of the platform as well
as how we plan to address them in the future. In the following, we detail the core
parts of the framework that will be explicated in the demo.</p>
      <sec id="sec-1-1">
        <title>1 http://dbpedia.org 2 http://linkedgeodata.org</title>
        <p>The main work ow of the openQA framework is implemented in the Answer
Formulation module. It encompasses the process of formulating an answer from
a given input (see 1 in Figure 1) query and comprises three stages:
1. Interpretation: The rst and crucial step of the core module is the
interpretation. Here, the framework attempts to generate a formal representation
of the (intention behind the) input question. By these means, openQA also
determines how the input question will be processed by the rest of the system.
Because the interpretation process is not trivial, there is a wide variety of
techniques that can be applied at this stage, such as tokenization,
disambiguation, internationalization, logical forms, semantic role labels, question
reformulation, coreference resolution, relation extraction and named entity
recognition amongst others. The interpretation stage can generate one or
more interpretations of the same input in di erent formats such as SPARQL,
SQL or string tokens.
2. Retrieval: After an interpretation of the given question is generated, the
retrieval stage extracts answers from sources according to the delivered
interpretation format or content. Speci c interpretations can also be used for
extracting answers from sources such as web services.
3. Synthesis: Answers may be extracted from di erent sources. Thus, they might
be ambiguous and redundant. To alleviate these issues, the Synthesis stage
processes all information from di erent retrieval sub-modules. Results that
appear multiple times are fused with an occurrence attribute that helps to
rank, cluster and estimate the con dence of the retrieved answer candidates
(see 3 - 5 in Figure 1).</p>
        <p>
          In addition to to the Answer Formulation, there are two other modules: the
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Service and the (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Context. The Context module allows a more personalized
answer and contains information such as users location, statistics, preferences
and previous queries. It is useful for instance, to rank or determine the language
of the answer, e.g., in a system with support for internationalization.3. Thus, the
same query can generate di erent answers in di erent contexts. The Services
modules are designed to easily add, extend and share common features among
the modules, e.g., a common cache structure. Both modules are accessible by any
of the components in the main work ow via dependency injection.
        </p>
        <p>We will explain the work ow of openQA platform using examples.4
2.2</p>
        <sec id="sec-1-1-1">
          <title>Combining And Evaluating</title>
          <p>An analysis of a theoretical combination of all the participant systems in Question
Answering Over Linked Data 3 (QALD-3) shows that the number of correct
answers can be improved by 87.5% [2]. Moreover, we implement two di erent
interpreter modules using SINA and TBSL. Our evaluation shows that the
combination of the two interpreters leads to an improvement of 11% of correctly
answered queries in QALD-4 when compared with their stand-alone versions.
The openQA test suite There are di erent aspects in evaluating question
answering systems, e.g., runtime, accuracy and usability. Regarding the accuracy,
there are various benchmarks i.e., SemanticSerch'105 and QALD.6
SemanticSearch'10 is based on users queries extracted from YAHOO search log, with
an average distribution of 2.2 words per-query. QALD is designed by experts
and focuses on accurately producing an answer and the generation of a formal
representation of it in the form of a SPARQL query. Therefore, there are more
elaborated queries in QALD compared to SemanticSearch'10. The openQA
platform implements a built-in test suite to evaluate question answering systems
using QALD benchmarks. The test suite enables users to measure the accuracy
and runtime performance of the system. Furthermore, to facilitate debugging and
development, it is also possible to trace the stack of each entry in the generated
answer.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>3 http://www.w3.org/standards/webdesign/i18n</title>
        <p>4 http://openqa.aksw.org/examples.xhtml
5 http://km.aifb.kit.edu/ws/semsearch10/
6 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/</p>
        <p>Marx et al.</p>
        <p>We will show how to implement, integrate and evaluate modules in the openQA
platform using examples as well as some of the available plug-ins.</p>
        <sec id="sec-1-2-1">
          <title>2.3 Instantiating</title>
          <p>The openQA platform implements a Web server that enables an easy instantiation
of QA applications (Figure 1). Two examples of such applications using openQA are
SINA, available at http://sina.aksw.org, and the openQA demo. The openQA
demo combines several openQA plug-ins, e.g., the SINA and TBSL interpreters. It
is accessible at http://search.openqa.aksw.org. As the live demo instances use
the public DBpedia endpoint, the user's experience can be a ected by instabilities.
Thus, a demo video is also available at http://openqa.aksw.org. The Web
server can display the resulting answer materialized in a web page or as a
formal SPARQL query. It also implements a customized search and a plug-in
management interface as well as a built-in RESTful interface. The REST API
serializes the result in JSON format.</p>
          <p>We will show how to instantiate and manage the openQA Web server (see
2 in Figure 1) using the available instances.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusion</title>
      <p>
        During the demo, we will present an open-source and extensible framework that
can be used to implement, integrate, evaluate and instantiate question answering
systems easily. The presented work is a part of a larger agenda to de ne and
develop a common platform for question answering systems. The next e orts
will consist of (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the integration of approaches targeting unstructured data, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
facilitating the deployment of plug-ins as well as (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) the work ow instantiation.
Furthermore, we want to extend the implemented show cases and the number of
the systems integrated in the openQA platform.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          , P. van Kleef,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer. DBpedia -</surname>
          </string-name>
          <article-title>a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <volume>5</volume>
          :1{
          <fpage>29</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , K. Ho ner, J. Lehmann, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Towards an open question answering architecture</article-title>
          .
          <source>In ICSC, SEM '14</source>
          , pages
          <fpage>57</fpage>
          {
          <fpage>60</fpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Shekarpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          . Sina:
          <article-title>Semantic interpretation of user queries for question answering on interlinked data</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>30</volume>
          (
          <issue>0</issue>
          ):
          <volume>39</volume>
          {
          <fpage>51</fpage>
          ,
          <year>2015</year>
          . Semantic Search.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Ho ner, and</article-title>
          <string-name>
            <surname>S. Auer.</surname>
          </string-name>
          <article-title>LinkedGeoData: A core for a web of spatial open data</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <volume>333</volume>
          {
          <fpage>354</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Unger</surname>
          </string-name>
          , L. Buhmann, J.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Gerber</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Template-based question answering over rdf data</article-title>
          .
          <source>In 21st international conference on World Wide Web</source>
          , pages
          <volume>639</volume>
          {
          <fpage>648</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>