<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>O ine Question Answering over Linked Data using Limited Resources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paramjot Kaur</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Blucher</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rricha Jalota</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Moussallem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Usbeck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science Group, University of Paderborn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Standort Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Leipzig University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Question Answering over Linked Data provides concise information to the user from a natural language request instead of ooding them with documents. However, the accessibility of Linked Data resources, e.g., SPARQL endpoints, is bound to an online connection. We present OQA, the rst o ine Question Answering system over Linked Data for mobile devices. We built OQA with the limited resources of an Android mobile device, such as battery power, computational power, or memory consumption in mind. Our OQA system has three main components: 1) question analysis and 2) query generation which identify the type of the question and reform it into a semantically meaningful data structure, i.e., a SPARQL query. Finally, the 3) query execution uses a novel mobile triple store, implemented with RDF4J. Our evaluation suggests that OQA is feasible for daily use in terms of battery consumption and able to answer domain-speci c questions with up to 72% accuracy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The goal of our o ine mobile Question Answering (QA) system is to answer
a spoken or typed user question without a connection to a high-performance
server using only the resources of a mobile device such as a smartphone. The
OQA system was built to explore research challenges in QA, e.g., workers asking
questions in steel factory buildings or tourists walking through buildings and
inner cities with weak internet coverage. Most mobile devices today are limited
in their resources w.r.t. CPU, memory, or storage. Thus the components of the
mobile QA have to be e cient as well as e ective.</p>
      <p>In this demo, we present OQA, the rst o ine QA algorithm which uses
1) an own mobile Linked Data (LD) triple store, 2) a simple but yet e ective
algorithm for the transformation of natural language to SPARQL which does
not require machine learning and 3) focuses on a low resource, i.e., battery and
storage consumption. All software is publicly available on our GitHub repository4
as well as a video of the App in the README. The latest release of OQA on
Android is also available online.5</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Despite the plethora of QA over LD approaches, none focuses on o ine
capabilities or using limited resources. O ine Question Answering connects to
QA over LD as well as the storage of LD resources in triple stores on mobile
devices. Due to space limitations, we only present a brief overview of both
areas. Note, we use LD because its underlying graph structure is more concise
than a textual corpus and thus easier to ship and customize for particular use
cases while being semantically unambiguous. There are several high-quality, but
computationally expensive multilingual and rule-based QA approaches such as
WDAqua [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. WDAqua uses a combinatorial approach, which is computationally
expensive, to formulate SPARQL queries by leveraging the semantics of a given
underlying knowledge base. We refer the interested reader to an extensive survey
of the eld [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. DeQA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an on-device QA system to run locally on mobile
devices with a set of latency and memory optimizations that can be applied
without requiring any changes to the QA system. For storing LD resources on
mobile devices, various open-source solutions exist, e.g., Mobile LD, Microalign,
TriplePlace, and Androjena.6. All solutions are written in Java but lack an active
community or were last updated before 2013. Also, some have only proprietary
licenses and thereby cannot be considered as viable options for OQA.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The OQA system</title>
      <p>OQA is based on 1) linguistic and semantic analysis of the input, and a
subsequent 2) classi cation of the question type to assign a template. Finally, OQA 3)
lls the template and executes it against the mobile triple store. Before
describing our system, we introduce the creation of our mobile triple store, dubbed
RDF4A. We based the development on RDF4J7, which is an open-source Java
framework for processing LD data which has an active community. We ported
RDF4J to Android using version 1.0.3, which is a backport of the current RDF4J
version for Java 1.7 supported on Android. To reduce the storage footprint, OQA
creates subsets of LD datasets. OQA uses two synchronizers to transport LD to a
mobile device. (1) Server-side synchronizer: OQA reads an RDF le as input
and decides for each triple if it is relevant or not, e.g., based on the frequency of
the contained entities. That is, a triple is considered to be relevant if all three
entries of a triple have at least n occurrences. This mechanism can be modi ed in
the future, e.g., to extract only user-relevant parts of an RDF graph or to contain
continuous updates. The target triples are stored as an RDF4J SailRepository,
which then is gzipped for distribution. Such o ine data packages are identi ed
by a hash code. (2) Mobile-side synchronizer: If the mobile device is
connected to the internet, OQA downloads the o ine data package and imports it
into RDF4A.</p>
      <sec id="sec-3-1">
        <title>6 QAMEL report https://tinyurl.com/QAMEL-Report 7 http://rdf4j.org/</title>
        <p>We use the following question over the DBpedia 2016-04 as running
example: When was the Leipzig University founded?, cf. Figure 1. Note, to highlight
research challenges, OQA focuses on simple questions containing exactly one
binary relation, which are the most used questions in voice-driven apps. Also, all
processes are designed to be multilingual and use e cient. To this end, we rely
on deterministic algorithms to keep the computational complexity low.</p>
        <p>Preprocessing: OQA chunks the question into individual tokens separated
by white-space and removes stop words and single-character tokens. For our
running example, we are left with "When Leipzig University founded".</p>
        <p>Question Type Determination: OQA determines the type of question
and the associated type of answer based on the question word. For example,
time questions are represented by \When" and the unknown people question
type can be represent by \Who, What, Which". This helps OQA to reduce the
number of candidates for possible slots signi cantly. For our running example,
we gure out that we are looking for an answer of type time and remove When
from the list of tokens. The resulting query is \Leipzig University founded".</p>
        <p>Entity Candidates: We assume that either an entity, a literal or a property
is missing in the binary relation. Thus, we perform a look-up in the mobile triple
store. We try to assign each token to one or several entity candidates without
using an additional dictionary, entity linking algorithm or other computational
expensive processes. OQA exploits the rdfs:label property using the following
query:</p>
        <p>SELECT DISTINCT ?x ?z WHERE f ?x rdfs : label ?z .</p>
        <p>FILTER ( regex ( str (?x) , ".∗ &lt; TOKEN &gt;.∗") &amp;&amp; lang (?z)='en ') g
This query returns all entities containing the token in their label. For the
tokens left over in our running example \Leipzig University founded", the entity
candidate nding would generate the results in Table 1. For "founded", our query
returns only dbo:foundedBy but not dbo:foundingDate. Thus, we need to nd
better property candidates in the next step.</p>
        <p>Candidate Ranking - For each result from the query, a tuple t = (W; E; L)
consisting of the input token (w 2 W ), and the resulting entity (e 2 KB) as well
as label (l = label(e)) is stored. OQA rst ranks by the number of token wi 2 W
that have found the same entity ej . , formally: rank(ej ) = jf tjt = (wi; ; )gj.
For example, dbr:Leipzig University has a higher priority in the question "When
was the Leipzig University founded?" as dbr:Leipzig since it is found by both
"Leipzig" and "University". In case of a tie exists, we sort by the Levenshtein
distance between the label of an entity and the words it is covering. If a tie
still exists, we use the frequency of the entity in the knowledge base, i.e., its
popularity as a third comparison criterion. After this step, there is a list of
entities sorted by priority. Note,OQA also assigns properties a priority.</p>
        <p>Property Candidates and Ranking: The literal and entity values for
a subject-property or property-object pair have speci c data types. Using the
question type mapping, we can signi cantly reduce the number of possible
answers. Table 2 shows the assignments between question types and data types
based on the http://dbpedia.org ontology. This list can be extended to cover
di erent LD knowledge bases and their ontologies from various domains.
Startxxssdd::Ydaeater dxbsdo::pPllaaccee xsdx:snIdon:ntineNgteeeggraetrive
xsd: oat
xsd:double
xsd:decimal
foaf:person
dbo:Person
ing with the highest ranked entity candidate, all properties for this candidate
are retrieved from the mobile triple store. The labels of these properties are
compared via the Levenshtein distance with each word of the question. If a property
is found which was already found in the entity candidate search phase, its
reliability value is increased proportionally. Finally, the list of entity-property pairs
is sorted according to their reliability value. By executing the SPARQL query
with the missing slot as a variable against the mobile triple store we retrieve the
nal result.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        The evaluation is twofold. First, a use case driven dataset about Cologne is
used. OQA was able to answer 32 out of 44 questions on the mobile device (72%
accuracy).8 Second, we include preliminary results for the battery consumption.
To test the battery consumption, we used Battery Historian9. OQA is only
consuming 5.28% battery in o ine mode for answering 200 questions using the
QALD-9 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] dataset on a reduced version of roughly 120 MB DBpedia data.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Summary</title>
      <p>We presented OQA, an o ine question answering system over Linked Data on
mobile devices. OQA is lightweight and able to work on questions with incorrect
grammar. The OQA system is easily extensible to other languages as it relies on
simple look-ups only. In the future, we plan to implement caching to reduce
battery consumption further and to evaluate the quality of OQA against well-known
benchmarks and analyze further the resource consumption choke points. We also
plan to add a feature of auto-correction of incorrect grammar. By presenting this
demo at SEMANTiCS, we will collect feedback from users and discuss future
research directions to bring Linked Data-based application to the masses.</p>
      <p>Acknowledgments This work was supported by the EuroStars project
QAMEL (no. 01QE1549C) and by the German Federal Ministry of Transport
and Digital Infrastructure (BMVI) through the project LIMBO (no. 19F2029I).</p>
      <sec id="sec-5-1">
        <title>8 https://tinyurl.com/QAMEL-Accuracy-Report</title>
        <p>9 https://developer.android.com/studio/profile/battery-historian</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          . Deqa:
          <article-title>On-device question answering</article-title>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Diefenbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and P.</given-names>
            <surname>Maret.</surname>
          </string-name>
          WDAqua-core0:
          <article-title>A Question Answering Component for the Research Community</article-title>
          . In Semantic Web Challenges, pages
          <volume>84</volume>
          {
          <fpage>89</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. K. Ho ner,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walter</surname>
          </string-name>
          , E. Marx,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          .
          <article-title>Survey on challenges of question answering in the semantic web</article-title>
          .
          <source>Semantic Web</source>
          , pages
          <volume>895</volume>
          {
          <fpage>920</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Gusmita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          .
          <article-title>9th Challenge on Question Answering over Linked Data (QALD-9)</article-title>
          .
          <source>In Joint proceedings of SemDeep-4 and NLIWOD-4</source>
          , pages
          <fpage>58</fpage>
          {
          <fpage>64</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>