<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XB: A Large-scale Korean Knowledge Base for Question Answering Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jongmin Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Youngkyoung Ham</string-name>
          <email>ykham@saltlux.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony Lee</string-name>
          <email>tony@saltlux.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Saltlux Inc.</institution>
          <addr-line>Daewoong Bldg. 689-4, Yeoksam 1 dong, Gangnam-gu, Seoul</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>There are many studies on question answering system which can answer to natural language questions. Diverse techniques are required for building this system, but it cannot be implemented without well-structured knowledge data. For this reason, we construct a large-scale knowledge base in Korean, with the goal of creating a uniquely Korean question answering system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Recently, a variety of Question Answering (QA) systems have been developed,
such as IBM Watson and Apple Siri. In these systems, a user inputs a query in natural
language, and the QA system searches for the corresponding answer, often using
inferences from other related search queries, and provides the user with accurate and
relevant information. Most QA systems use a knowledge base to store knowledge
studied from a multitude of data.</p>
      <p>
        Extremely large knowledge bases, such as YAGO[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Wikidata[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], have been
constructed using documents written in English, with the contents well known in the
world. However, individual countries require individualized QA systems for their
own knowledge.
      </p>
      <p>For example, even though the Eulmi Incident is very significant in Korean history,
no knowledge of it is found in the English version of Wikipedia. If there is a question
about when Eulmi Insident happened, most of existing knowledge resources cannot
answer to it. There is no structured knowledge about that question in Korean DBpedia
and Korean Wikipedia only has that information in the text. For this reason, it was
necessary to construct a large-scale knowledge base in Korean from various
knowledge resources, with the goal of creating a uniquely Korean QA system.</p>
      <p>
        The resulting XB was constructed using the dual-spiral method[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which allows
for both automatic conversion and manual construction simultaneously. In addition,
the XB implemented knowledge bases like GeoNames[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Openstreetmap[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
DBpedia[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and WikiData. Knowledge in the XB is represented as
triple(subject/predicate/object). So far, approximately 200 million triples have been
constructed. Through the owl axiom inference(rdfs:subClassOf, rdfs:subPropertyOf,
owl:Transitive, owl:inverseOf , owl:disjointWith and etc.), the number of triples are
increased by 0.4 billion.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Development</title>
      <p>The XB is a large-scale knowledge base of common sense level for Korean QA
systems, utilizing the ontological method to express knowledge. Figure 2 shows a
simple process of our question answering scenario. A user inputs a question in natural
language form, and it is converted into a SPARQL using various converting
techniques. The converted SPARQL finds answers from the knowledge base.</p>
      <p>
        The XB is built by the following procedure for the QA scenario. To define classes,
we used the hierarchical structure of Korlex[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], WordNet in Korean. Korlex is a
lexical database wherein a variety of linguistic relations among synonym, hypernym
and hyponym are structured. Classes are chosen by the frequency of searching on
each keyword from Korlex and grant relations of higher or not between classes.
      </p>
      <p>Properties refer to YAGO and DBpedia to define key properties based on the
frequency of using per property. In addition, a property is added in case it is requested
additionally or identified from competency question on the way of constructing the
knowledge base.</p>
      <p>To build entities, necessary knowledge is extracted from diverse knowledge
resources through the rule-based automatic conversion and the curation manually
implemented by domain experts, depending on the dual-spiral methodology. Default
entities are from Wikipedia pages and are extended, if other resources contain
unmapped entities.</p>
      <p>The rule-based automatic conversion is a process by which the machine
distinguishes between classes and properties through mapping rules between a
predefined schema and a knowledge resource to build knowledge.</p>
      <p>The curation is a process to additionally verify the automatically converted
knowledge or build a new knowledge by human. For example, a main text in a Wiki
page written in a natural language is not easily automatically converted. The
rulebased automatic conversion and the curation are verified in trade-off for their own
results, respectively. Domains that are high-probable to be used in it so that the
knowledge related to it can be built primarily, since the core part of knowledge is
constructed based on the Korean Wikipedia. Moreover, the knowledge base has been
enlarged with existing knowledge resources such as DBpedia, Wikidata and
GeoNames.</p>
      <p>Generally, a knowledge base based upon ontology uses SPARQL, a standard query
language for RDF data. However, it is very difficult for a user who is not familiar
with ontology to understand a schema correctly and implement a variety of services
utilizing a QA system or a knowledge base through SPARQL. This study provides a
variety of APIs other than SPARQL Endpoint to allow a greater number of users to
access easily to XB. Table 2 lists the APIs supplied by the XB.</p>
      <p>API
/api/class
/api/classInfo
/api/property
/api/propertyInfo
/api/instance
/api/instanceInfo
/api/instanceTime
/api/instanceSpace
/api/checkType
/api/typeRelation
/api/timeRelation
/api/spaceRelation
/api/shortestPath</p>
    </sec>
    <sec id="sec-3">
      <title>Future works</title>
      <p>In the near future, additional tools to enhance quality and quantity are expected to
be developed.</p>
      <p>The knowledge has been completely verified through the curation work, but it is
restricted in that a finite number of human ability cannot verify all knowledge in the
system. To solve that problem, a crowdsourcing service has been being developed to
construct and verify knowledge.</p>
      <p>There is also debate as to whether or not to develop massive amounts of knowledge
through auto-mapping of a knowledge base featuring a large-scale triploid generated
by language processing of knowledge or sentences that are aggregated from different
knowledge resources connected with machine learning.</p>
      <p>In addition, even if not appearing explicitly in the knowledge base, inferencing
rules are defined to analyze relations between pieces of knowledge to generate new
knowledge.</p>
      <p>The XB has been built mainly with a knowledge resource of Korean language as it
is today. However, as most instances are granted with labels and types in English and
based on Wikipedia, we believe that it might be relatively easy to extend into Korean
if the multi-language link of Wikipedia were used.</p>
      <p>The XB will be extended and is expected to be available to public users soon, with
a variety of practical applications.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledge References</title>
      <p>This work was supported by Institute for Information &amp; communications
Technology Promotion (IITP) grant funded by the Korea government (MSIP)
(No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and
reasoning platform)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hoffart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berberich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia</article-title>
          .
          <source>Artificial Intelligence</source>
          , Vol
          <volume>194</volume>
          (
          <year>2013</year>
          )
          <fpage>28</fpage>
          -
          <lpage>61</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vrandečić</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markus</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kyosung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Youngkyoung</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyungil</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Dual-Spiral methodology for knowledgebase constructions</article-title>
          .
          <source>International Conference on Big Data and Smart Computing</source>
          (
          <year>2016</year>
          )
          <fpage>477</fpage>
          -
          <lpage>480</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wick</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernard</surname>
          </string-name>
          , Vatant.:
          <article-title>The geonames geographical database</article-title>
          . Available from World Wide Web: http://geonames. Org (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Haklay</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patrick</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Openstreetmap: User-generated street maps</article-title>
          .
          <source>IEEE Pervasive Computing</source>
          (
          <year>2008</year>
          )
          <fpage>12</fpage>
          -
          <lpage>18</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.:
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          . Springer Berlin Heidelberg (
          <year>2007</year>
          )
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ae-Sun</surname>
          </string-name>
          , et al.:
          <article-title>Construction of Korean Wordnet</article-title>
          .
          <source>Journal of KIISE: Software and Applications 36</source>
          .1 (
          <year>2009</year>
          ):
          <fpage>92</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>