<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Answering Questions over RDF by Neural Machine Translating</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shujun Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Jiao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuhan Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin 300350</addr-line>
          ,
          <country>China Tianjin</country>
          <institution>Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Question Answering over Knowledge Bases (KBQA) is a task that a natural language question can be accurately answered over a knowledge base. Unlike previous methods for KBQA use a pipelined approach, which focuses on entity linking and relation path ranking. In this paper, we present a translation-based approach to translate natural language questions into SPARQL queries. Speci cally, this paper contributes to lling the gap between natural language question and SPARQL by utilizing multiple Neural Machine Translation(NMT) models such as RNN, CNN, and Transformer. More importantly, we bridge the gap between the NMT model and existing KBQA by combining the entity linking and relation linking technologies in KBQA with the NMT model. Based on which, we design four novel question translation approach for any NTM model, i.e., \Pure NMT", \NMT+Entity Linking", \NMT + Relation Linking" and \NMT + Entity Linking + Relation linking". Compared to the traditional KBQA system using a state-of-theart semantic parser, our method achieves a precision measure of 67.9% on the QALD-9 dataset and win the rst place.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Knowledgebase question answering (KBQA) is an important task in NLP that
has many real-world applications, such as in search engines and decision support
systems. Most existing methods for KBQA use a pipelined approach: First, given
a question q, an entity linking step is used to nd KB entities mentioned in q.
Next, relations or relation paths in the KB linked to the topic entities are ranked
such that the best relation or relation path matching q is selected as the one
that leads to the answer entities.</p>
      <p>In the view of the success of Neural Machine Translation (NMT) approaches,
it comes as a surprise that very few such models utilized to address the question
translating challenge(Question!SPARQL) in KBQA. Although some
NMTbased KBQA works have been proposed for answering questions over RDF.</p>
      <p>Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>However, these methods did not utilize the latest transformer model; more
importantly, they did not try to associate the NMT model with critical technologies
in traditional KBQA.</p>
      <p>In order to utilize NMT models in the KBQA area, this paper presents a
large-scale comparison of three distinct neural network architectures
(Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and
the Transformer model). Further, we bridge the gap between the NMT model
and traditional KBQA technologies, and we combine NMT models with the key
technology(entity linking and relation linking) of traditional KBQA to form four
NMT-based KBQA approaches.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview</title>
      <p>NMT
NMT Model</p>
      <sec id="sec-2-1">
        <title>Who developed Skype?</title>
      </sec>
      <sec id="sec-2-2">
        <title>Question Preprocessing</title>
        <p>who developed skype</p>
        <p>NMT-E
Entity Recognition</p>
        <p>&amp; Marker
who developed &lt;e&gt;</p>
        <p>NMT Model</p>
        <p>NMT-R
Relation Recognition</p>
        <p>&amp; Marker
who &lt;p&gt; skype
NMT Model</p>
        <p>NMT-ER
Entity Recognition</p>
        <p>&amp; Marker
Relation Recognition
&amp; Marker
who &lt;p&gt; &lt;e&gt;
NMT Model
select distinct variable where
skype developer variable.
select distinct variable where
&lt;e&gt; developer variable.
select distinct variable where
skype &lt;p&gt; variable.
select distinct variable where
&lt;e&gt; &lt;p&gt; variable.</p>
        <p>SPARQL Encoding Module</p>
      </sec>
      <sec id="sec-2-3">
        <title>SELECT DISTINCT ?uri WHERE { res:Skype dbo:developer ?uri . }</title>
        <p>As shown in Figure 1, we divide these four models into two categories, namely
Pure Translation(NMT) and Template-based Translation(NMT-E,NMT-R and
NMT-ER).</p>
        <p>{ Pure Translation: In this case, NMT model directly trains questions and</p>
        <p>SPARQL query sequences.
{ Template-based Translation: In this case, NMT model trains questions
template and SPARQL template sequences.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>SPARQL Encoding</title>
        <p>Unlike natural language that can be easily tokenized, SPARQL queries are
internally structured, combining elements of the query language with elements from
the KBs and variables. Thus, SPARQL Encoding Module is rst employed to
encode each query as a sequence. Speci cally, we ignore the pre xes of URIs.
Brackets, wildcards, and dots are replaced by their verbal description. SPARQL
operators are lower-cased and represented by a speci ed number of tokens. These
operations can be implemented as a set of replacements, and applying them turns
an original SPARQL query to a nal sequence that contains tokens that are only
formed of characters. An example has shown in Figure 1.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Tested NMT models</title>
        <p>Neural Machine Translation (NMT) models are widely used in intelligent
translation, which achieved excellent performance. We use NMT models to translate
the English language into the SPARQL. Firstly, we encode the English language
questions or templates and SPARQL queries into embedding representations.
Then, we fed them to the NMT models for training. Finally, we can convert any
English input question into its corresponding SPARQL query.</p>
        <p>In this poster, we compare three types of network architectures, RNN-based,
CNN-based, and self-attention models, since those represented the best
performing NMT architectures in the eld at the time of the experiment without
considering hybrid and ensemble methods. Encoded SPARQL queries and natural
language questions are fed to the network on a word-level.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Template-based Translating</title>
        <p>Considering SPARQL as a foreign language is a novel and direct method in the
KBQA task, which turns a question into a SPARQL query with machine
translation. However, it would fail to accurately translate the entities and predicates
of the question when the entity mentions or relation mentions that have not
occurred in the trained set previously.</p>
        <p>We consider learning the structure information and local semantic
information in question and SPARQL query without entities and predicates, which is
translating question template into the SPARQL query template, called
Templatebased Translation. Since no speci c entity is involved and only the location
information is learned, we can get better universality and performance.</p>
        <p>
          Template Construction: There are three main ways to preprocess the
data for constructing the templates: substitute entities, substitute predicates
and substitute both entities and predicates, which has shown in Figure 1. In this
step, we rely on existing entity linking tools[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to recognize, mask, and replace
entities in the question with heii. For the relation mention in the question, we
directly recognize the verbs and adjectives in the questions as relations and
replace them with hpii.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment and Evaluation</title>
      <sec id="sec-4-1">
        <title>Datasets and Metrics</title>
        <p>Our method are evaluated on two well-known public datasets, the Monument
dataset, and QALD-9. For training, validation, and testing, we split the datasets
randomly by 8:1:1.</p>
        <p>Accuracy (Acc). Acc is a metric for evaluating the query results, which is
computed as followed:</p>
        <p>ACC =
the number of right answers
the number of query answers
(1)
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>As shown in Table 1, \Transformer+NMT-E" beats all other combinations and
win the rst place, which acc is 0.6786, while the best result in QALD-9
competition is that gAnswer gets ACC = 0.293.</p>
        <p>As shown in Table 1, \CNN-based+NMT-ER" beats all other combinations
and win the rst place, which acc is 0.9876. Through the experimental results of
two datasets, we can see that it is feasible to translate questions into SPARQL
queries by NMT alone. However, its accuracy can be further improved by
combining entity recognition and relation recognition.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Using natural language questions to query knowledge graphs provides an easy
and natural way for common users to acquire useful knowledge. Most
traditional approaches for semantic parsing via recognizing entities and relations of
the question and assemble them to a semantic query graph; however, it is very
time-consuming. Thus, in this poster, we propose a question translation-based
method translate natural language questions to SPARQLs. Extensive empirical
evaluations over several benchmarks demonstrate that our proposed way is very
useful and promising.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Key Research and Development Program
of China (2017YFC0908401) and the National Natural Science Foundation of
China (61972455,61672377). Xiaowang Zhang is supported by the Peiyang Young
Scholars in Tianjin University (2019XRX-0032).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <article-title>Liang: An Encoder-Decoder Framework Translating Natural Language to Database Queries In Proc</article-title>
          .
          <source>of IJCAI</source>
          <year>2018</year>
          , pp.
          <volume>3977</volume>
          {
          <fpage>3983</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          .:
          <article-title>Language to Logical Form with Neural Attention</article-title>
          .
          <source>In Proc. of ACL</source>
          <year>2016</year>
          , pp.
          <volume>33</volume>
          {
          <fpage>43</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grangier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yarats</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          .:
          <article-title>Convolutional Sequence to Sequence Learning</article-title>
          .
          <source>In Proc. of ICML</source>
          <year>2017</year>
          , pp.
          <volume>1243</volume>
          {
          <fpage>1252</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.D.</given-names>
            <surname>Manning</surname>
          </string-name>
          :
          <article-title>E ective Approaches to Attention-based Neural Machine Translation</article-title>
          .
          <source>In Proc. of EMNLP</source>
          <year>2015</year>
          , pp.
          <volume>1412</volume>
          {
          <fpage>1421</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T.</given-names>
            <surname>Soru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Publio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Esteves</surname>
          </string-name>
          , C.B.
          <article-title>Neto: SPARQL as a Foreign Language</article-title>
          .
          <source>SEMANTICS Posters&amp;Demos</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang and M. Chang</surname>
          </string-name>
          <article-title>Smart: Novel tree-based structured learning algorithms applied to tweet entity linking</article-title>
          .
          <source>In Proc. of ACL</source>
          <year>2015</year>
          , pp.
          <volume>504</volume>
          {
          <fpage>513</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>