<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BERT-based Semantic Query Graph Extraction for Knowledge Graph Question Answering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhicheng Liang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zixuan Peng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuefeng Yang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fubang Zhao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yunfeng Liu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deborah L. McGuinness</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Rensselaer Polytechnic Institute</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Zhuiyi Technology</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Answering complex questions involving multiple entities and relations remains a challenging Knowledge Graph Question Answering (KGQA) task. To extract a Semantic Query Graph (SQG), we propose a BERT-based decoder that is capable of jointly performing multi-tasks for SQG construction, such as entity detection, relation prediction, output variable selection, query type classification and ordinal constraint detection. The outputs of our model can be seamlessly integrated with downstream components (e.g. entity linking) of a KGQA pipeline to construct a formal query. The results of our experiments show that our proposed BERT-based semantic query graph extractor achieves better performance than traditional recurrent neural network based extractors. Meanwhile, the KGQA pipeline based on our model outperforms baseline approaches on two benchmark datasets (LC-QuAD, WebQSP) containing complex questions.§</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Semantic parsing (SP) based approaches to knowledge graph question answering (KGQA)
aim at building a semantic parser that first converts natural language questions into some
logical forms, and then into formal queries like SRARQL that can be executed on the
underlying KG to retrieve answers. For these approaches, constructing the semantic query
graph (SQG) plays a vital role. For example, the SQG of a natural language query (NLQ)
“What awards have been won by the executive producer of Fraggle Rock?” involves
three nodes and two labeled edges, i.e. f(Fraggle Rock, dbo:executiveProducer,
?x), (?x, dbo:award, ?uri)g if represented using triples, where ?x and ?uri are some
free variables. The answers to this query should be the grounded KG nodes for the
output variable ?uri. Despite some work on abstract query graph prediction [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ], there
is yet to be an end-to-end model that jointly performs query graph identification along
with entity mention detection and relation prediction. To this end, we propose a novel
BERT-based neural network to extract SQG in an end-to-end manner for answering
complex questions with multiple triple patterns. We evaluate our approach on two KGQA
benchmark datasets containing complex questions. The experimental results demonstrate
that our approach, by using a simple pipeline built on top of our proposed SQG extractor,
improves the overall KGQA performance outperforming the baseline approaches.
Sentence-Level
Representation
      </p>
      <p>Multi-Task Learning
FFNN Query Type Classification
FFNN
FFNN</p>
      <p>Output Variable Selection
Ordinal Constraint Detection</p>
      <p>SELECT
COUNT
ASK
?x
?uri
… …
First
Second
Last
… …</p>
      <p>Target SPARQL Query
{SELdEbCrT:FDraISggTlIeN_CRToc?kurdi bWoH:eExRecEutiveProducer ?x .</p>
      <p>?x dbo:award ?uri .
}</p>
      <p>Subject of Triple 1 Subject of Triple 2
[CLS] What awards have been won by the executive producer of Fraggle Rock ? [SEP] ?x ?uri [SEP]
SSuubbjejeccttHTeaaildTaTagggginingg 00 00 00 … … ToRkeegnr-eLsesvioenlLLoagyisetric … 01 10 00 00 11 00 00</p>
      <sec id="sec-1-1">
        <title>BERT Encoder</title>
      </sec>
      <sec id="sec-1-2">
        <title>BERT Encoder</title>
        <p>Stage 2
Relation Tagging
Object Tail Tagging
000…… 000…… 000…… … ToRkeegnr-eLsesvioenlLLoagyisetric … dbo:executdivbdeobP:odro:iradewuccat……oerdrr 010…… 000…… 000……
0 0 0 1 0 0
[CLS] Fraggle Rock [SEP] What awards have been won by the executive producer of Fraggle Rock ? [SEP] ?x ?uri [SEP]</p>
        <p>Subject of Triple 1 Object of Triple 1
SQG-Decoder</p>
        <p>Extracted Triple Patterns:
(Fraggle Rock, dbo:executiveProducer, ?x)
(?x, dbo:award, ?uri)</p>
        <p>Entity Linking &amp;
Query Construction</p>
        <p>KG</p>
        <p>SPARQL:
{SdEbLrE:FCrTagDgIlSeT_IRNoCcTk?dubroi:WexHeEcRutEiveProducer ?x .
}?x dbo:award ?uri .</p>
        <p>Query Execution</p>
        <p>KG</p>
        <p>
          Answers:
dbr:Emmy_Award
dbr:Disney_Legends
An overview of our approach for KGQA is given in Fig. 1. We propose a semantic
query graph decoder (SQG-Decoder) that, given an NLQ, aims to predict a set of
triple patterns, each in the form of (Subject, Relation, Object), where Subject and
Object are either some text spans for entities, or some introduced free variables, while
Relation is a grounded relation/predicate defined by the KG schema. As opposed to
single-relation questions, complex questions involve multiple triple patterns that form a
directed acyclic graph (DAG). SQG-Decoder uses BERT [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] as an encoder to obtain both
sentence-level and token-level contextual embeddings of the input NLQ. Specifically,
we construct the input of the BERT encoder in the format of “[CLS] NLQ [SEP]
Free variables [SEP]”. This enables free variables to interact with the entire
NLQ sentence via attention mechanisms used in BERT.
        </p>
        <p>With BERT as an encoder, we propose a novel decoding scheme to extract triple
patterns given an NLQ. It predicts each triple pattern (sj ; pj ; oj ) in two stages, as shown
in Fig. 1. The first stage is to determine the span of a subject, which denotes an entity
mention in the NLQ or a free variable. Given the token-level vector representations
obtained from BERT, i.e. H = fh0; h1; :::; hl 1g, hi is used to predict whether the i-th
token is the head/tail token of the subject span sj by applying a linear feed-forward layer
(or stacking multiple such layers with non-linearity) with sigmoid activation to perform
binary classification with logistic regression. The second stage is to determine the span
of the object and the type of relation for the triple pattern, conditioned on the predicted
subject in the first stage. To achieve this, the predicted sj and the NLQ are concatenated
(with f0; 1g masks to distinguish the two different parts) and fed into the encoder
again to obtain the subject-aware token-level representations H0 = fh00; h01; :::; h0l 1g.
Specifically, the relation pj is jointly predicted with the head token of the object mention
oj considering that they are dependent on each other. The tail token of the object mention
oj is identified in a similar way as that of the subject span sj . In this way, the encoder
layer is shared across the described two stages along with task-specific output layers for
subject and object-relation extraction, respectively.</p>
        <p>Note that our SQG-Decoder is capable of predicting multiple triple patterns for a
query. While doing this, we need to handle the issue of overlapping text spans since an
entity mention may appear in more than one triple pattern given an NLQ, e.g. in Fig. 1
the variable ?x is involved in two triple patterns. Our solution is, instead of labeling the
text span of an entity by using 1 for all involved tokens and 0 otherwise, we label each
token in the NLQ as follows: (1, 0) for the head of entity mention, (0, 1) for the tail of
entity mention, (1, 1) if a token is both the start and the end of an entity mention, and
(0, 0) otherwise.</p>
        <p>To train our SQG-Decoder, we need annotations of text spans in an NLQ that
correspond to the canonical entity names of the subject/object of a triple pattern. To
automate this process, we adopt a reverse linking algorithm mainly based on n-gram fuzzy
matching to annotate the spans of entities mentioned in the corresponding SPARQL query.
To further improve linking performance, we also leverage pre-trained word embeddings
for measuring the similarity between entity names from the KG and their surface forms
in text.</p>
        <p>
          Given an NLQ, the model parameters are learned by maximizing the likelihood of a
set of ground truth triple patterns. Practically, our decoding scheme can be adapted to
various neural-net-based encoders (e.g. BiLSTMs, RNNs, Transformers) from which
token-level representations of the encoder input can be obtained. To ground extracted
text spans to the KG, we employ the same entity linking tools used by the baselines in
our experiments for fair comparison. Specifically, we built an Apache Lucene Index to
retrieve entity candidates, via string similarity between their names and entity mention
queries following the baselines [
          <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
          ] for LC-QuAD, and used the entity linking results [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
for WebQSP.
        </p>
        <p>In addition to triple pattern extraction, we leverage BERT to perform three auxiliary
classification tasks that are important to constructing the query graph as follows. 1)
Output Variable Selection. Since a query graph may have multiple variables, we need
to determine which one represents the returned answer. 2) Query Type Classification.
Query type refers to the type of information that is asked upon the output variable. Here
we focus on three types of queries: SELECT, COUNT, and ASK (boolean) query. 3)
Ordinal Constraint Detection. In some cases, ordinal constraints are imposed on the
output variables. For instance, to answer the question “What is the first book Sherlock
Holmes appeared in?”, we need to sort the books in which the detective appeared by
publication date and then return the first as the answer.</p>
        <p>To perform the above three tasks, we feed the transformer output for the [CLS]
token into separate classification layers. Since these tasks are related to each other, we
adopt a multi-task learning strategy to train the SQG-Decoder jointly by combining the
cross-entropy losses of these tasks together with the binary cross-entropy losses of the
principal task, i.e. triple pattern extraction.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Experiments &amp; Results</title>
      <p>
        Datasets. We evaluate our approach on two datasets, LC-QuAD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and WebQSP [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
both targeting complex questions and having ground truth SPARQL annotations.
Evaluation Metrics. We use accuracy of the answer set as the main evaluation metric
(i.e. the percentage of test questions for which the predicted answer set exactly matches
the ground-truth answer set). Precision, recall and F1 scores are also reported. We use
the official evaluation script for WebQSP and the average F1 is reported by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Experimental Setup. Our SQG-Decoder is fine-tuned based on the pre-trained
language model BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We compare BERTBASE (12-layers, 768-hidden, 12-heads) with
BERTLARGE (24-layers, 1024-hidden, 16-heads) to examine the performance with
different model sizes. We use the Adam optimizer with a learning rate of 2 10 5 and a
batch size of 32. The best model is selected using the development set.
End-to-end KGQA Performance. We only compare with previous work that reported
end-to-end KGQA performance, i.e. those starting with raw input query without linked
entities or relations given. Table 1 &amp; 2 show that we achieve the best F1 score of 54.9 on
LC-QuAD and the best accuracy of 66.0 on WebQSP, both using BERTLARGE.
Evaluation of Triple Pattern Extraction. This evaluates the main contribution of our
work. We compare the precision, recall and F1 scores of extracted triple patterns given
that the SQG-Decoder is built on top of two types of encoders: Bi-LSTM encoder with
GloVe Embeddings, and BERTBASE, as summarized in Table 3. The results demonstrate
that the latter always outperforms the former, mainly due to higher recall, but stays close
in terms of precision. This shows that our proposed SQG-Decoder can be adapted to
other types of encoders besides those based on transformers.
      </p>
      <p>Error Analysis. We randomly select 100 incorrectly answered questions from the
LCQuAD test set. Major errors observed include: 1)Missing triple patterns (54%). Errors
of this type are attributed to some predicates that are in the test set but are never seen
in the training set, as well as the data sparcity issue. 2) Mispredicting semantically
BiLSTM Encoder + GloVe 58.7 33.5 42.6 74.6 58.8 65.8</p>
      <p>BERTBASE Encoder 62.8 56.1 59.2 75.3 70.9 73.0
close predicates (32%). This is typically because some predicates have very similar
semantic meanings while the NLQ is not informative enough to distinguish between
them. 3) Entity span misidentification &amp; redundant predicates, etc (14%). Errors of
this type include incorrect entity span detection that may cause failures in entity linking,
and mispredicting redundant predicates, etc.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions and Future Work</title>
      <p>We present a BERT-based Semantic Query Graph Decoder (SQG-Decoder) for answering
complex natural language queries, using knowledge graphs, which jointly learns multiple
subtasks in an end-to-end trainable manner. Our approach remarkably outperforms the
baselines on two KGQA benchmarks that contain complex questions. As future work,
we would like to explore answering questions with more complex temporal and spatial
constraints using neural networks.</p>
      <p>Acknowledgments This work was partially supported by the DARPA MCS program
award number N660011924033 to RPI under USC-ISI West.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
          </string-name>
          , G.:
          <article-title>Formal query building with query structure prediction for complex question answering over knowledge base</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <fpage>3751</fpage>
          -
          <lpage>3758</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: NAACL</source>
          . pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Diefenbach</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maret</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a question answering system over the semantic web</article-title>
          .
          <source>Semantic Web (Preprint)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maheshwari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>LC-QuAD: A corpus for complex question answering over knowledge graphs</article-title>
          .
          <source>In: ISWC</source>
          . pp.
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vakulenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            <given-names>Garcia</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            ,
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>de Rijke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Message passing for complex question answering over knowledge graphs</article-title>
          .
          <source>In: CIKM</source>
          . pp.
          <fpage>1431</fpage>
          -
          <lpage>1440</lpage>
          . ACM (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yih</surname>
          </string-name>
          , W.t.,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meek</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suh</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The value of semantic parse labeling for knowledge base question answering</article-title>
          .
          <source>In: ACL</source>
          . pp.
          <fpage>201</fpage>
          -
          <lpage>206</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
          </string-name>
          , C.d.,
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improved neural relation detection for knowledge base question answering</article-title>
          .
          <source>In: ACL</source>
          . pp.
          <fpage>571</fpage>
          -
          <lpage>581</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, M.:
          <article-title>Question answering over knowledge graphs via structural query patterns</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>09760</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>