<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hierarchical Expected Answer Type Classi cation for Question Answering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksandr Perevalov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Both</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Anhalt University of Applied Sciences, Kothen (Anhalt)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>To know what a user's question is about is a crucial step in the Question Answering (QA) process. Thus, the Expected Answer Type (EAT) of a question enables to signi cantly narrow down the search eld and improve the QA quality. In this paper, we present a Web user interface (UI) and a RESTful API for the hierarchical EAT classi cation over DBpedia. The provided functionality enables end-users to get the EAT predictions for 104 languages, see the con dence of the prediction, and leave feedback. In addition, the API enables researchers and developers to integrate the EAT classi cation into their systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Expected Answer Type Classi cation</kwd>
        <kwd>Target Type Identi cation</kwd>
        <kwd>Knowledge Graph Question Answering</kwd>
        <kwd>Entity Typing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Knowledge Graph Question Answering (KGQA) systems are aimed to
answer entity-oriented questions. For example, while asking a question { like
\Where was Angela Merkel born?" { we expect to see an entity with the type
\Place" (e.g., Hamburg). In this case, \Place" (or even better: \City") is the
expected answer type (EAT). Such types are typically organized into
hierarchical type ontologies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (e.g., DBpedia Ontology1) depending on the particular
knowledge graph used within a QA system.
      </p>
      <p>
        Following the example question, the EAT hierarchy may look as follows:
dbo:City ! dbo:Settlement ! dbo:PopulatedPlace ! dbo:Place2 where
the rst type is the most speci c one and the last { the most general one.
Recently, many research papers have demonstrated that QA systems may bene t
from the EAT classi cation [
        <xref ref-type="bibr" rid="ref3 ref5 ref6">5,3,6</xref>
        ].
      </p>
      <p>In this paper, we present the Web UI and RESTful API for the hierarchical
EAT classi cation over DBpedia3. As we extended our previously developed
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
1 http://mappings.dbpedia.org/server/ontology/classes/
2 dbo { is a pre x for http://dbpedia.org/ontology/
3 https://webengineering.ins.hs-anhalt.de:41009/eat-classification</p>
      <sec id="sec-1-1">
        <title>Question</title>
      </sec>
      <sec id="sec-1-2">
        <title>Category</title>
      </sec>
      <sec id="sec-1-3">
        <title>Classifier</title>
        <p>category
value</p>
        <sec id="sec-1-3-1">
          <title>Previous implementation</title>
        </sec>
        <sec id="sec-1-3-2">
          <title>Extended implementation</title>
        </sec>
      </sec>
      <sec id="sec-1-4">
        <title>Literal</title>
      </sec>
      <sec id="sec-1-5">
        <title>Classifier</title>
      </sec>
      <sec id="sec-1-6">
        <title>Resource</title>
      </sec>
      <sec id="sec-1-7">
        <title>Classifier</title>
      </sec>
      <sec id="sec-1-8">
        <title>DBpedia</title>
      </sec>
      <sec id="sec-1-9">
        <title>Literal</title>
      </sec>
      <sec id="sec-1-10">
        <title>Value</title>
      </sec>
      <sec id="sec-1-11">
        <title>Resource</title>
      </sec>
      <sec id="sec-1-12">
        <title>Hierarchy</title>
      </sec>
      <sec id="sec-1-13">
        <title>Value</title>
        <p>
          approach [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the predictions are available and might be compared for both
the \existing" and the \improved" approach. The tool supports 104 languages,
provides the prediction con dence as well as an opportunity to leave feedback
for a given prediction. The RESTful interface to the functionality enables easy
integration with other existing KGQA systems or future research.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The expected answer type is sometimes referred to as target type in the context
of entity-oriented search [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. So-called Entity- and Type-Centric models were
introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to identify the target type of a question. These models are
used to rank the queries given the entity- or type-related content [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The idea
of incorporating an additional context to improve answer type predictions was
proposed in work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. One of the ISWC 2020's Semantic Web challenge was
addressing the answer type classi cation (SeMantic AnsweR Type prediction
task, SMART) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It has shown that transformer-based models demonstrate the
highest results in this task [
        <xref ref-type="bibr" rid="ref11 ref8">11,8</xref>
        ]. The approach based on using external data
(e.g., KGQA datasets) was introduced in paper [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Recently, the authors of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
proposed a system for EAT prediction in a \distantly supervised fashion" (i.e.,
no manual data annotation is required), however, the evaluation results were not
presented.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach and Implementation</title>
      <p>
        The tool works on top of the approach previously developed by the authors [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
that is capable to identify not only resource answer types (e.g., dbo:City), but
also literal (number, date, string) and boolean types. The extended approach is
targeting the resource answer types by predicting the most speci c EAT for a
“Previous” Resource Type Classifier
      </p>
      <p>“Extended” Resource Type Classifier
type1,
type2,
type3,
type4,
type5</p>
      <p>Question text
1</p>
      <p>2
Resource
Classifier5
3
4
5
type1
Hierarchy</p>
      <p>Retriever
type1,...,typen</p>
      <p>Question text
Resource
Classifier1</p>
      <p>KG
type1 - the most specific
(e.g., dbo:City)
typen - the most general
(e.g., dbo:Place)
given question. After doing so, the corresponding DBpedia hierarchy is fetched
instead of an independent prediction of EAT for each granularity level (see Figure
2). Hence, the extended approach di ers only in the resource classi er.</p>
      <p>Figure 2 demonstrates that in the previous approach, no hierarchy
consistency check is done. Thus, the predicted types may belong to a di erent
hierarchy, which is unacceptable as the prediction becomes inconsistent. In addition,
the hierarchy size is limited only to ve types. On the other hand, the extended
approach predicts the most speci c resource answer type and fetches the rest
of the hierarchy from a KG (e.g., DBpedia) thereafter (via hierarchy retriever).
The hierarchy retriever just executes the SPARQL query and formats the nal
output.</p>
      <p>PREFIX rdfs : &lt; http :// www . w3 . org /2000/01/ rdf - schema #&gt;
SELECT ? sType WHERE {
&lt;type &gt; rdfs : subClassOf * ? sType .</p>
      <p>FILTER ( CONTAINS ( STR (? sType ) , " dbpedia . org / ontology ") )
}
# the 'type ' placeholder is replaced with the predicted type</p>
      <p>Listing 1. Retrieving super types of a given answer type from DBpedia.
In this case, the resource answer type hierarchy is consistent and not limited to
a speci c size.</p>
      <p>For training and evaluation, we used the DBpedia dataset of the SMART
Task. We reuse our previously prepared multilingual extension for the dataset4
and ne-tune the classi er using multilingual language model5 that supports 104
languages.</p>
      <p>
        The evaluation of the obtained EAT classi er demonstrated reasonable
results: (1) category prediction { Accuracy := 0:977, (2) type ranking { NDCG@5
:= 0:745; NDCG@10 := 0:710 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The results are comparable to the 2020s
4 The multilingual dataset extension contains questions in 5 languages: https://
github.com/Perevalov/iswc-classification
5 https://huggingface.co/bert-base-multilingual-cased
SMART winner [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The nal architecture of the EAT classi er is shown in
Figure 1.
      </p>
      <p>
        The Web UI of the EAT classi er is presented in Figure 3. The description of
the numbered elements is as follows: (1) question input eld, (2) switch button
that enables to get the additional prediction with the model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], (3) section with
example questions, (4) results section where the asked question is listed, (5) the
prediction result and the con dence from the new model, (6) feedback buttons
(only for the new model's prediction), and (7) the prediction result as well as
the con dence from the model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The RESTful API6 of the EAT classi er has GET endpoints for both
currently provided models. After providing the parameter question containing
the question's text, the service returns a dictionary with the following elds:
category (holds on of "resource", "literal", or "boolean"), answer type (if
canse of predicting not a resource, then the primitive data is stored in the array,
e.g., ["number"] or ["boolean"], else one or more elements corresponding to
the resource hierarchy, e.g., ["dbo:Person", "dbo:Agent"]); and confidence
{ a oat value f 2 [0; 1] corresponds to the models con dence of the prediction.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this work, we presented the Web UI and the RESTful API for retrieving EAT
predictions and validating EAT classi ers. Currently, two EAT components are
integrated. Among the DBpedia Ontology types (resources), the tool is capable
to distinguish between literal and boolean answer types. The EAT classi er is
capable of providing predictions for questions given using up to 104 languages,
and showed reasonable quality w.r.t. SMART Task evaluation over the DBpedia
dataset.
6 https://webengineering.ins.hs-anhalt.de:41020/docs</p>
      <p>For future work, we plan to improve the approach w.r.t. the quality and
extend it to other ontologies (e.g., Wikidata) to enable comparability. We would
like to atten the architecture of the classi er (see Figure 1) s.t., only one model is
used for the prediction. In addition, it is worth paying attention to the robustness
of the model w.r.t. corrupted input data (e.g., spelling mistakes).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumayer</surname>
          </string-name>
          , R.:
          <article-title>Hierarchical target type identi cation for entity-oriented queries</article-title>
          .
          <source>In: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          . pp.
          <volume>2391</volume>
          {
          <fpage>2394</fpage>
          . CIKM '12,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          ). https://doi.org/10.1145/2396761.2398648
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dash</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gliozzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Type prediction systems</article-title>
          .
          <source>CoRR</source>
          (
          <year>2021</year>
          ), https://arxiv.org/abs/2104.01207
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Garigliotti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasibi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Target type identi cation for entity-bearing queries</article-title>
          .
          <source>In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <volume>845</volume>
          {
          <fpage>848</fpage>
          . SIGIR '17,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2017</year>
          ). https://doi.org/10.1145/3077136.3080659
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Garigliotti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasibi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Identifying and exploiting target entity type information for ad hoc entity retrieval</article-title>
          .
          <source>Inf. Retr</source>
          .
          <volume>22</volume>
          (
          <issue>3</issue>
          {4),
          <volume>285</volume>
          {323 (Aug
          <year>2019</year>
          ). https://doi.org/10.1007/s10791-018-9346-x
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Ho ner,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Walter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ngonga</surname>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>Survey on challenges of question answering in the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <volume>895</volume>
          {
          <fpage>920</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kamath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , Y.:
          <article-title>Predicting and integrating expected answer types into a simple recurrent neural network model for answer sentence selection</article-title>
          .
          <source>Computacion y Sistemas</source>
          <volume>23</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gliozzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usbeck</surname>
          </string-name>
          , R.:
          <article-title>SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge</article-title>
          . CoRR/arXiv (
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2012</year>
          .00555
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nikas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fafalios</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzitzikas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Two-stage semantic answer type prediction for question answering using BERT and class-speci city rewarding</article-title>
          . In:
          <article-title>Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2774</volume>
          , pp.
          <volume>19</volume>
          {
          <fpage>28</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2774</volume>
          /paper-03.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Perevalov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Augmentation-based answer type classi cation of the SMART dataset</article-title>
          . In:
          <article-title>Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2774</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          . CEURWS.org (
          <year>2020</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2774</volume>
          /paper-01.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Perevalov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Improving answer type classi cation quality through combined question answering datasets</article-title>
          .
          <source>In: Knowledge Science, Engineering and Management</source>
          . pp.
          <volume>191</volume>
          {
          <fpage>204</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Setty</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Semantic answer type prediction using BERT IAI at the ISWC SMART task 2020</article-title>
          .
          <article-title>In: Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2774</volume>
          , pp.
          <volume>10</volume>
          {
          <fpage>18</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2774</volume>
          /paper-02.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tonon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catasta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prokofyev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demartini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudre-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Contextualized ranking of entity types based on knowledge graphs</article-title>
          .
          <source>Journal of Web Semantics 37-38</source>
          ,
          <issue>170</issue>
          {
          <fpage>183</fpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1016/j.websem.
          <year>2015</year>
          .
          <volume>12</volume>
          .005
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>