=Paper=
{{Paper
|id=Vol-1883/paper_8
|storemode=property
|title=Natural Language Supported Relation Matching for Question Answering with Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-1883/paper_8.pdf
|volume=Vol-1883
|authors=Hongyu Li,Chenyan Xiong,Jamie Callan
|dblpUrl=https://dblp.org/rec/conf/sigir/LiXC17
}}
==Natural Language Supported Relation Matching for Question Answering with Knowledge Graphs==
<pdf width="1500px">https://ceur-ws.org/Vol-1883/paper_8.pdf</pdf>
<pre>
    Natural Language Supported Relation Matching for Question
                Answering with Knowledge Graphs

                         Hongyu Li                                 Chenyan Xiong                         Jamie Callan
                  hongyul@andrew.cmu.edu                           cx@cs.cmu.edu                     callan@cs.cmu.edu
                                                         Carnegie Mellon University


                                                                   Abstract

                           This work focuses on the relation matching problem in knowledge based
                           question answering systems. Finding the right relation a natural question
                           asks is a key step in current knowledge based question answering systems,
                           while also being the most difficult one, because of the mismatch between
                           natural language question and formal relation type definitions. In this pa-
                           per, we present two approaches to tackle this problem. The first approach
                           tries to directly learn the soft match between the question and the relations
                           from the training data using neural networks. The second approach enriches
                           the relation name with natural language support sentences generated from
                           Wikipedia, which provide additional matches with the question. Experiments
                           on the WebQuestions dataset demonstrate that both of our approaches im-
                           prove the relation matching accuracy of a prior state-of-the-art. Our further
                           analysis reveals the high quality of support sentences and suggests the rich
                           potential of support sentences in question answering and semantic parsing
                           tasks.


1    Introduction
If we were asked the question “Who inspired Obama?”, we might first search for the entity “Barack Obama” in a knowledge
base such as Freebase [4], follow the relation “influenced by” and then get the answer “Nipsey Russell”. Knowledge-
based question answering systems mimic this process and aim to answer natural language questions by the rich structured
semantics stored in large-scale knowledge bases [3].
    Prior research structured the knowledge based question answering as a semantic parsing problem: to parse the natural
language question into a sub-graph of entities, relations and target nodes in the knowledge base, which leads to the answer
(for example, via SPARQL queries) [2, 8]. The key challenge during this process is the relation matching step: the
system needs to figure out which aspect the question is asking and matches the correct relation type. Due to the different
characteristics of natural language questions and the formal relation type definitions, a relation’s name (e.g. “influenced
by”) may not appear in the question (e.g. “Who inspired Obama?”). A simple exact match system would fail due to this
vocabulary mismatch; soft match and paraphrasing techniques are required to address the relation matching task.
    This work presents two approaches for the relation matching task in knowledge-based QA systems. The first approach
tries to directly learn the match between natural language questions to relation names using deep learning techniques.
We build two Bidirectional LSTM neural networks to align the question and the relation using end-to-end learned word
embeddings and LSTM parameters. The second approach enriches the representation of relation using natural language
support sentences (NLSS). We extract all Wikipedia sentences that contain the head and tail entities of a relation, and then

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: L. Dietz, C. Xiong, E. Meij (eds.): Proceedings of the First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR),
Tokyo, Japan, 11-Aug-2017, published at http://ceur-ws.org


                                                                     43
            Figure 1: The architecture of our QA system. Our main focus is the question-relation matching stage.
use neural networks to select the most related (best) NLSS for the relation. The best NLSS of the relation is then matched
with the question via another LSTM, and provide a richer and more natural description of the relation.
   Our experiments on the WebQuestion [3] dataset demonstrated the effectiveness of the two approaches. The LSTM’s
themselves provide similar performances with previous relation matching techniques [7] and can be combined to achieve
better accuracy. The support sentences significantly increase the representation of the relation and provide an absolute 4%
improvement in recall. Our analysis further revealed the potential of natural language support sentences. Although the
coverage of support sentences on relations is limited by our simple support sentence finding technique, among those that
are covered (25%), support sentences influenced the majority of them, and most of the influences are positive.

2     System Architecture
The system architecture is illustrated in Figure 1. The process of answering each question can be described with the
following four steps:

    • Entity Linking: Identify entities from the question [2],
    • Candidate Answer Generation: Retrieval all facts that start with the linked entities (root entities) [8],
    • Semantic Parsing: Match the questions to the candidate facts (relation matching).
    • Answer Ranking: Rank the fact candidates using Learning to rank [5].

  According to our error analysis on one of previous state-of-the-art OpenQA system, Aqqu, [2] on WebQuestions [3],
we identified the following error categories with the following percentages:

    • Ground Truth error (30.0%): The dataset gives the wrong answer or no answers are provided.
    • Relation Matching error (41.96%): Wrong relation is matched by the top fact candidate.
    • Entity Linking error (9.82%): Wrong entities or no entities identified.
    • Other (18.22%): Knowledge base inconsistency, invalid matching patterns, and combinations of several error types.

   The relation matching is the major challenge of the system: most of the errors the system made are due to relation
matching mistakes. Thus this work mainly focuses on this step and follows the implementation of the Aqqu [2] system
in other stages. In the rest of this paper, we will introduce how we approach the relation matching problem with neural
networks and support sentences.

3     Semantic Parsing with LSTM
The goal of the semantic parsing stage is to find a matching score between the question q, and a fact candidate c. A fact
candidate is the triple of subject s, relation r, and object o . Because the object mostly does not appear in the question, the
semantic parser’s goal is to match q with the candidate facts’ subject and relation (s, r).
   We employ two LSTM models to perform this relation matching: LSTM-Seq and LSTM-Match.


                                                                 44
   LSTM-Seq takes the question q and fact pair (subject s and relation r) as the input. It first concatenates them to a
sequence q + s + r, and maps their words to embeddings. It then uses a Bi-LSTM to map the input sequence q + s + r to
hidden representations, and employ a dense layer to predict the ranking score:

                          fLSTM-Seq (q, c) = Dense(Bi-LSTM(Embedding(q + s + r))).                                           (1)

   LSTM-Match takes the question q and the relation r as the input. Instead of concatenating them into one word se-
quence, LSTM-Match maps q and r to embeddings, uses two Bi-LSTM’s to map the q and r to their hidden representations,
and computes the cosine similarity of the Bi-LSTM’s outputs.

            fLSTM-Match (q, c) = cosine(Bi-LSTM(Embedding(q)), Bi-LSTM(Embedding(r)))                                        (2)

    Training: The two LSTM’s are trained using standard pairwise learning to rank:

                                      θ∗ = argminθ max(0, 1 − f (q, c+ ) + f (q, c− )),                                      (3)

The parameters θ to learn include the word embeddings, LSTM parameters, and the Dense layer’s parameters. The two
LSTM’s and their embedding layers are trained independently with separate parameters. The training pairs c+ , c− are
formed by their F1 score [8]. Specifically, for each fact candidate, we use the F1 score of the objects it leads to as its labels
(there could be more than one object connected by the same subject and relation, for example, the movies an actress cast).

4     Support Relation Inference with Wikipedia Sentences
The short natural language question and the formally defined relation names lead to vocabulary mismatch problems. For
instance, the question “how was pluto discovered?” should be matched with the Freebase relation “discovery technique”,
which is hard to distinguish from the relation “discoverer”. This section introduces our approaches that enrich the relation
description with natural language support sentences (NLSS) generated from Wikipedia.

4.1   Support Sentence Generation
For each candidate fact triple (s, r, o), we extract all possible support sentences in Wikipedia that contain both the root
entity s and the answer entity o. Specifically, We scan through the entire enwiki data dump and parse each Wikipedia
page into a list of sentences. Wikipedia has manual annotations for each Wikipedia entity, and we can map them to Freebase
entities. For all sentences in Wikipedia that contain both s and o, we add them to the NLSS pool for the candidate fact triple
(s, r, o). If the answer node o represents an attribute, not an entity, then all sentences contain the root entity are included
as its NLSS.

4.2   Relation Matching via Support Sentences
We first select the best support sentences for each candidate fact, given there could be many sentences mentioning the
subject and object entities. Then we use these best support sentences to enrich the semantic parsing, which provides the
‘relation expansion’ effect.
    Support Sentence Selection. There are two ways to select the best support sentence: (1) the one with the highest
similarity with the question; (2) the one with the highest similarity with the fact candidate. Both similarity scores are
determined by the trained LSTM-Seq model.
    Semantic Parsing with Support Sentences. We use another LSTM-Seq model to match the best support sentences
with the question. It is trained and used the same as described in the last section, only that the fact candidate is replaced by
its best support sentences. This generates two additional ranking scores, which are then combined with the other semantic
parsers’ (for example, LSTM-Seq and LSTM-Match) using RankSVM [5]. The RankSVM treats these relation matching
scores as ranking features, and is trained using standard pairwise loss, the same as described in the last section. When
testing, the relation matching methods (LSTM’s and support sentences) are first applied, and their scores are combined by
the trained RankSVM to generate the final relation matching score.

5     Experimental Methodology
WebQuestions [3], the standard testbed of OpenQA, is used. It contains 5810 questions with crowd-sourced answers in
Freebase. The official train-test split is used: 3778 (70%) training questions and 2032 (30%) testing questions.
Evaluation metrics include precision, recall, and F1 score. The Top K (K={1, 2, 5, 10}) accuracy that evaluates whether
the correct answer appears in the top k ranking results is also included.


                                                            45
Table 1: Overall accuracies of the semantic parsing systems. The first column lists the semantic parsers used in the
corresponding system (multiple parsers are combined by RankSVM). Top k is the number and fraction of questions whose
correct answer can be found in top k ranked facts.
                    Method                 Recall (%)   Precision (%)     F1       Top 1         Top 2          Top 5           Top 10
                    CDSSM                     45.2          31.0         31.3   748 (36.8%)   1047 (51.5%)   1311 (64.5%)    1475 (72.6%)
                  LSTM-Seq                    46.6          33.4         33.6   752 (37.0%)   1060 (52.2%)   1320 (65.0%)    1488 (73.2%)
                 LSTM-Match                   41.7          30.9         30.5   707 (34.8%)   1021 (50.2%)   1285 (63.2%)    1449 (71.3%)
          LSTM-Seq + LSTM-Match               51.0          35.7         36.2   806 (39.7%)   1086 (53.4%)   1363 (67.1%)    1518 (74.7%)
      LSTM-Seq + LSTM-Match + CDSSM           53.1          37.2         37.7   824 (40.6%)   1094 (53.8%)   1399 (68.8%)    1573 (77.4%)
         All Three + Support Sentences        57.2          39.6         38.2   860 (42.3%)   1118 (55.0%)   1412 (69.5%)    1611 (79.3%)


                           Table 2: Distribution of the number of support sentences per fact candidate.
     Number of Support Sentences          0              1 − 10            11 − 100       101 − 1000     1001 − 10000       10001 − 100000
      Number of Fact Candidates    232,368 (74.9%)   42,167 (13.6%)     23,760 (7.66%)   8,395 (2.71%)   3,310 (1.07%)       147 (0.0474%)

Baseline: The main purpose of this work is to study the effectiveness of LSTM parsers and the support sentence’s effect in
enriching them. So we only compare them with our implementation of the CDSSM relation matching method in the recent
state-of-the-art STAGG system [8]. The integration of our semantic parsers into the full QA system is left for future work.
Implementation details: All embeddings’ dimensions are 300. All LSTM layers’ dimensions are 32. Hinge loss is used.
Training was done with mini-batch (size 64) and the RMSProp optimizer.

6     Evaluation
The performances of our relation matching methods are shown in Table 1 shows the performance results. The first column
lists the semantic parsers used, and the ‘+’ sign refers to the combination of multiple parsers using RankSVM.
    Effectiveness of LSTM: LSTM-Seq and LSTM-Match alone are able to achieve similar performance with the state-
of-the-art relation matching model CDSSM. Combining LSTM-Seq and LSTM-Match together outperforms CDSSM by
5% absolute F1 score. Combining the three neural networks together leads to another 2 − 3% relative improvements in
precision and recall, and greatly improves the robustness of the system: 5% more questions are correctly answered in Top
10. This shows that LSTM-Seq and LSTM-Match cover different aspects of the relation matching and can be further
combined with the previous states-of-the-art.
    Effectiveness of Support Sentences: Adding support sentences further improves the recall from 53.1% to 57.2%, and
precision from 37.2% to 39.6%. Intuitively, the support sentences provide additional expressions of the candidate fact’s
relation, mitigate the vocabulary mismatch problem, and thus are more effective in improving recall.
    Here is an example question that is answered correctly after using the support sentences:
    • Question: who was vp for lincoln?
    • Matched fact candidate: [Abraham Lincoln] → government/us president/vice president → Andrew Johnson, Hannibal Hamlin
    • Fact candidate matched previously: [Abraham Lincoln] → person.profession → Lawyer, Politician, Statesman
    • Top support sentence: Also in 1864, Brandegee was a member of the Connecticut delegation to the [[1864 Republican National
      Convention—National Republican Convention]] in [[Baltimore]], which re-nominated President [[Abraham Lincoln—Lincoln]],
      and nominated [[Andrew Johnson]] for the [[Vice President of the United States—Vice Presidency]].

    From this example, we can see that the QA system seems to confuse the question “who was vp for lincoln” with “who
was lincoln”. The matched relation is “person.profession” before using the support sentences. The support sentences
enrich the vocabulary of the fact candidate, and increases the matching score with the question term ‘vp’.
    We further studied the per question win/loss of the support sentences. We found that adding support sentences gives
different answers for a total of 234 questions in testing, with 87 wins, 33 losses, and other ties. The high win/loss ratio
(87/33) demonstrates the advantage of the support sentences, but the small number of influenced questions (234/2032)
limits the overall improvements. The reason is that many fact candidate sentences (75%) have no support sentences at
all, as shown in Table 2. Our support sentence finding method is rather simple and may miss too many possible support
sentences. How to align more natural language sentences to Freebase triples while maintaining their quality is an important
task in the future work.
    Error Analysis: We further performed an error analysis of our system. Among the 58% errors our system made,
38%(771) of the questions are answered with wrong relations, 12% are due to entity linking error, and the rest 8% are
either because the answer or the entities cannot be found in Freebase. The distribution shows that the core problem of


                                                                 46
the system is still relation matching. Many such errors are rather subtle and would require more fine-grained language
understanding. For example, in the example below the system mistakenly ranked ‘place of birth’ before ‘nationality’ while
the question wants the country.
    • Question: where is jason mraz from?
    • Correct fact candidate: [Jason Mraz] → people.person.nationality → United States of America (ranked at 3rd)
    • Matched fact candidate: [Jason Mraz] → people.person.place of birth → Mechanicsville (ranked at 1st)

7     Conclusions and Future Work
This work focuses on the relation matching problem in knowledge based question answering systems. This task is chal-
lenging due to the different characteristics of natural language questions and formal relation type schema: Our manual
error analysis of a prior state-of-the-art system, Aqqu, suggests that most of the errors the QA system made are because
of relation matching mistakes. We developed two different approaches to address this problem. The first one uses two Bi-
LSTM’s to directly learn the match between the question and the candidate fact triples, while the second one enriches the
relation descriptions with support sentences found in Wikipedia, and feed the support sentences into the LSTM to address
the vocabulary mismatches.
   Our experiments demonstrate that both of our approaches provide similar or better relation match accuracy compared to
the CDSSM used in a prior state-of-the-art knowledge based QA system [8]. The LSTM’s and the support sentences cover
different aspects of the relation matches, and can be combined with prior work for better accuracy. Our analysis further
demonstrates the high quality of generated support sentences: among those questions whose fact answers have any support
sentences, adding support sentences influences the major of them and most changes are improvements.
   Possible future work includes experimenting with better support sentence generating techniques, hoping to improve the
coverage while maintaining the high quality. Another direction is to incorporate the presented relation matching methods
into state-of-the-art QA systems, for example, Aqqu [2] and STAGG [8]. Better relation matching models is another
important future research topic, given it is still the biggest error source in our system.
   Perhaps the most interesting finding of our exploration is the promising potential of the joint utilization of the formal
semantics in the knowledge graphs and the rich natural language support sentences. Previously such support sentences
were mostly used during the extraction of the knowledge base facts, for example, in OpenIE [1] or distant supervision [6].
After the knowledge bases are constructed, only the formal entity-relation triples are kept; the downstream inference
and alignment operations were performed with only the formal semantics stored in the knowledge graph. This work
demonstrated the effectiveness of using the natural language support sentences in one of the most important applications
of knowledge graphs and suggests the possible usage of them in many other knowledge graph related tasks.

References
[1] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. In
    Proceedings of the the 14th International Joint Conference on Artificial Intelligence (IJCAI 2007), pages 2670–2676.
    IJCAI, 2007.
[2] H. Bast and E. Haussmann. More accurate question answering on freebase. In Proceedings of the 24th ACM Inter-
    national on Conference on Information and Knowledge Management, CIKM ’15, pages 1431–1440, New York, NY,
    USA, 2015. ACM.
[3] J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer pairs. In Proceedings
    of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013,
    Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages
    1533–1544, 2013.
[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for
    structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management
    of Data, SIGMOD ’08, pages 1247–1250, New York, NY, USA, 2008. ACM.
[5] T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining, KDD ’06, pages 217–226, New York, NY, USA, 2006. ACM.
[6] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In
    Proceedings of ACL 2009, pages 1003–1011, 2009.


                                                             47
[7] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for
    information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and
    Knowledge Management, CIKM ’14, pages 101–110, New York, NY, USA, 2014. ACM.

[8] W. Yih, M. Chang, X. He, and J. Gao. Semantic parsing via staged query graph generation: Question answering with
    knowledge base. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and
    the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language
    Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pages 1321–1331, 2015.


                                                      48

</pre>