<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KbQAS: A Knowledge-based QA System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dat Quoc Nguyen</string-name>
          <email>datnq@vnu.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dai Quoc Nguyen</string-name>
          <email>dainq@vnu.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Son Bao Pham</string-name>
          <email>sonpb@vnu.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology University of Engineering and Technology Vietnam National University</institution>
          ,
          <addr-line>Hanoi</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the first ontology-based Vietnamese QA system KbQAS where a new knowledge acquisition approach for analyzing English and Vietnamese questions is integrated. Recent years have witnessed a new trend of building ontology-based question answering (QA) systems to make the use of semantic information in terms of semantic web. This demo paper introduces a knowledge-based QA system named KbQAS, the first ontology-based QA system for Vietnamese. The target domain is modeled as an ontology in our KbQAS system to leverage techniques and latest advances in the semantic web. Thus semantic markups can be used to add meta-information to provide more precise answers to complex questions expressed in natural language. This is an avenue that has not been actively explored for Vietnamese.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>máy_tínhK50_computer_science_course) and (sinh_viênstudent, có_quêhas_hometown, Hà_
NộiHanoi). With each ontology-tuple, the Answer extraction module finds all satisfied
instances in the target ontology before generating an answer presented in the figure 1
based on the question structure “And” and the question class “List”.
Intermediate representation of question. The intermediate representation used in
our KbQAS system consists of a question-structure and one or more query-tuples in the
following format: (sub-structure, question-class, T erm1, Relation, T erm2, T erm3).
Simple questions only have one query-tuple and its question-structure is the
querytuple’s sub-structure. More complex questions such as composite questions have
several sub-questions, each sub-question is represented by a separate query-tuple, and the
question-structure is to capture this composition attribute.</p>
      <p>
        Question analysis component. The question analysis component contains three
modules: preprocessing, syntactic analysis and semantic analysis. It makes the use of JAPE
grammars in GATE framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to specify regular expression patterns based on
semantic annotations for question analysis. The preprocessing and syntactic modules are
responsible for identifying noun phrases, question-phrases, and the relations among
noun phrases or between noun phrase and question-phrase in the input questions. The
semantic analysis module embodies the key innovation in the current KbQAS version.
This semantic module utilizes the noun phrase, question-phrase and relation annotations
created by the two preceding modules. It aims to specify the question-structure and to
produce the query-tuples as the intermediate representation of the input question.
      </p>
      <p>
        In the current semantic analysis module, we propose a new knowledge
acquisition approach for analyzing natural language questions by applying Single
Classification Ripple Down Rules (SCRDR) methodology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to acquire rules incrementally.
A SCRDR knowledge base, where grammar rules over semantic annotations are
structured in an exception structure and new rules are only added to correct errors of existing
rules, is built to generate the intermediate representations of questions. This process is
to create rules in a systematic manner to solve difficulties which appear in such most
existing rule-based question analysis approaches as in Aqualog system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the first
KbQAS version [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in managing the interaction between rules and keeping consistency
among them. Moreover, our proposed approach enables ones to easily construct a new
system or adapt an existing system to a new domain or a new language, thus a lot of time
and effort of human experts can be saved. The experimental evaluation of our method
for English and Vietnamese question analyses is detailed in our previous work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Answer retrieval component. The detail description of this component can be found
in the first KbQAS version [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In short, the task of its Ontology mapping module
is to map terms and relations in the query-tuples to concepts, instances and relations
in the target ontology. Then the Answer extraction module finds all instances
associated to mapped instances and concepts, satisfying ontology relations. Depending on the
question-structure and question-class, the best semantic answer will be returned.
Evaluation. The performance of the current KbQAS on a wide range of different
Vietnamese questions is promising with accuracies of 84.1% and 82.4% accounted for the
question analysis and answer retrieval components, respectively.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Demonstration: knowledge acquisition for question analysis</title>
      <p>In this section, we only illustrate the process of systematically constructing a SCRDR
knowledge base for analyzing English questions3. In demonstration session, however,
we plan to present other illustrations of building English and Vietnamese knowledge
bases for question analysis, and to provide other illustrative examples of the KbQAS.</p>
      <p>The following exemplification shows how the knowledge base building process
works. When we encounter the question “who are the researchers in semantic web
research area ?” ([QuestionPhrase: who] [Relation: are the researchers in] [NounPhrase:
semantic web research area]). Supposed we start with an empty knowledge base, the
fired rule (i.e. last satisfied rule) is the default rule4 that gives empty conclusion. This
can be corrected by adding the following exception rule to the knowledge base:
Rule: R1
( ({QuestionPhrase}):QPhrase ({Relation}):Rel ({NounPhrase}):NPhrase ):left
99K :left.RDR1_ = {category1 = “UnknTerm”}
, :QPhrase.RDR1_QP = {} , :Rel.RDR1_Rel = {} , :NPhrase.RDR1_NP = {}
Conclusion: question-structure “UnknTerm” and query-tuple (RDR1_.category1,
RDR1_QP.QuestionPhrase.category, ?, RDR1_Rel, RDR1_NP, ?).</p>
      <p>
        If the condition of rule R1 matches whole input question, a new annotation RDR1_
will be created to entirely cover the input question and new annotations RDR1_QP,
RDR1_Rel and RDR1_NP will also be generated for covering sub-phrases of the
input question. Once rule R1 is fired, the matched input question is deemed to have a
query-tuple with sub-structure taking the value of the feature “category1” of RDR1_
annotation, question-class taking the value of the feature “category” of QuestionPhrase
annotation surrounding the same span as RDR1_QP annotation. Besides, the
querytuple’s Relation is the string covered by RDR1_Rel, T erm2 is the string surrounded by
RDR1_NP while T erm1 and T erm3 are missing.
3 We utilized JAPE grammars employed in AquaLog [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for detecting the noun-phrase,
questionphrase, and relation annotations in English questions. We also reused question-class definitions
and took question examples of Aqualog for building the SCRDR knowledge base.
4 A rule is composed of a condition part and a conclusion part. A condition is a regular
expression pattern over semantic annotations using JAPE grammars. The conclusion contains
the question structure and the tuples corresponding to the intermediate representation where
each element in the tuple is specified by a newly posted annotations from matching the rule’s
condition. The default rule typically contains a trivial condition which is always satisfied.
      </p>
      <p>Using rule R1, the knowledge base returns a correct intermediate representation
of question-structure “UnknTerm” and query-tuple (UnknTerm, QU-who-what, ?,
researchers, semantic web research area, ?) for the encountered question.</p>
      <p>When it comes to the question “How many researchers work on AKT project?”
([RDR1_: [RDR1_QP: how many researchers] [RDR1_Rel: work on] [RDR1_NP: AKT
project]]), rule R1 is the fired rule. However, rule R1 gives a wrong conclusion of
question-structure “UnknTerm” and query-tuple (UnknTerm, QU-howmany, ?, work,
AKT project, ?). We can add the following exception rule R4 to correct the conclusion
of rule R1 by using constrains via an additional condition:</p>
      <p>Rule: R4
({RDR1_}):left 99K :left.RDR4_ = {category1 = “Normal”}
Condition: RDR1_QP.hasAnno == QuestionPhrase.kind == ListWhichHMany
Conclusion: question-structure “Normal” and query-tuple (RDR4_.category1,
RDR1_QP.QuestionPhrase.category, RDR1_QP, RDR1_Rel, RDR1_NP, ?).</p>
      <p>The additional condition of rule R4 matches a RDR1_QP annotation which has
a QuestionPhrase annotation covering their substring with “ListWhichHMany” as the
value of its feature kind. The extra annotation constraint hasAnno requires that the text
covered by the annotation (e.g. RDR1_QP) must contain the specified annotation (e.g.
QuestionPhrase). Rule R4 generates the correct output consisting of question-structure
“Normal” and query-tuple (Normal, QU-howmany, researchers, work, AKT project, ?).</p>
      <p>Turning to the question “which projects are about ontologies and the semantic
web?” ([RDR4_: [RDR1_QP: which projects] [RDR1_Rel: are about] [RDR1_NP:
ontologies]] [And: and] [NounPhrase: the semantic web]), it is satisfied by rule R4,
nevertheless rule R4 results to an incorrect intermediate representation as RDR4_ annotation
only covers a part of the question. The following exception rule R37 is added to rectify
the conclusion of the rule R4:</p>
      <p>Rule: R37
({RDR4_}{And}({NounPhrase}):NPhrase):left
99K :left.RDR37_ = {category1 = “UnknRel”, category2 = “UnknRel”}
, :NPhrase.RDR37_NP = {}
Condition: RDR1_Rel.hasAnno == Relation.category == Rel-Auxiliary
Conclusion: question-structure “And” and two query-tuples (RDR37_.category1,
RDR1_QP.QuestionPhrase.category, RDR1_QP, ?, RDR1_NP, ?) and (RDR37_.category2,
RDR1_QP.QuestionPhrase.category, RDR1_QP, ?, RDR37_NP, ?).</p>
      <p>Rule R37 enables to return a correct intermediate representation for the question
with question-structure “And” and query-tuples (UnknRel, QU-whichClass, projects, ?,
ontologies, ?) and (UnknRel, QU-whichClass, projects, ?, semantic web, ?).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>A Vietnamese Question Answering System</article-title>
          .
          <source>In: Proc. of KSE'09</source>
          ,
          <string-name>
            <surname>IEEE</surname>
            <given-names>CS</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <fpage>26</fpage>
          -
          <lpage>32</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
          </string-name>
          , V.:
          <article-title>GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications</article-title>
          .
          <source>In: Proc. of ACL'02</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Richards</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Two decades of ripple down rules research</article-title>
          .
          <source>Knowledge Engineering Review</source>
          <volume>24</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
          <fpage>159</fpage>
          -
          <lpage>184</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasin</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>AquaLog: An ontology-driven question answering system for organizational semantic intranets</article-title>
          .
          <source>Web Semantics</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
          <fpage>72</fpage>
          -
          <lpage>105</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>Systematic Knowledge Acquisition for Question Analysis</article-title>
          .
          <source>In: Proc. of RANLP</source>
          <year>2011</year>
          . (
          <year>September 2011</year>
          )
          <fpage>406</fpage>
          -
          <lpage>412</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>