Enriching Answers in Question Answering
              Systems using Linked Data

                  Rivindu Perera, Parma Nand, and Gisela Klette

                   School of Computer and Mathematical Sciences,
                   Auckland University of Technology, New Zealand
                       {rperera,pnand,gklette}@aut.ac.nz


        Abstract. Linked Data has emerged as the most widely used and the
        most powerful knowledge source for Question Answering (QA). Although
        Question Answering using Linked Data (QALD) fills in many gaps in the
        traditional QA models, the answers are still presented as factoids. This
        research introduces an answer presentation model for QALD by employ-
        ing Natural Language Generation (NLG) to generate natural language
        descriptions to present an informative answer. The proposed approach
        employs lexicalization, aggregation, and referring expression generation
        to build a human-like enriched answer utilizing the triples extracted from
        the entities mentioned in the question as well as the entities contained
        in the answer.


1     Introduction

Question Answering over Linked Data (QALD) offers new opportunities to tra-
ditional Question Answering (QA) systems by utilizing the massive Linked Data
cloud as an information source. At its core, QALD transforms the natural lan-
guage question to a SPARQL query and then execute it on a Linked Data re-
source to retrieve answers. These answers are then presented to the user as
factoid answers without any further enhancements [1,2].
    The RealText framework1 described in this paper enhances the bare factoid
answers by enriching them with more information and presenting them as natural
text. An enriched answer is defined as an answer which provides a description
of each of the entities contained in the question as well as in the answer to the
question. Therefore, the enriched answer supports and validates the retrieved
answer by providing background information more akin to a human generated
answer. The RealText framework generates the description by using the triples
related to the entity and application of a series of Natural Language Generation
(NLG) techniques. In high level overview, these techniques can be categorized
into lexicalization, aggregation, and Referring Expression Generation (REG),
however each of these categories contain its own set of multiple subtasks to fine
tune the final output.
1
    A video demonstration is available at https://vimeo.com/173608898
2       Rivindu Perera, Parma Nand, Gisela Klette

    The rest of the paper presents an overview description of the framework
features. Further details on some of the modules can be found in [3]. All features
presented herein will be part of the demonstration.

2     Demonstration
The objective of the demonstration will be to present the complete RealText
workflow from associating lexicalizing patterns to presenting an informative an-
swer as natural text. The demonstration will use the RealText standalone appli-
cation (for a screenshot see Fig. 1).


              Fig. 1. A screenshot of the RealText desktop application


2.1    Datasets
For the demonstration we use the factoid questions extracted from the QALD-2
test dataset2 . Since we work on the answer presentation (last step in QA) the
input data comprised of question, SPARQL query, and as well as the extracted
answer.

2.2    Workflow
The workflow comprises of three main modules; the lexicalization module which
transforms the triples to natural language sentences, Referring Expression Gen-
eration (REG) module which assigns appropriate referring expressions to the
mentions of the main entity, and aggregation module which aggregates individ-
ual sentences to form paragraphs. The final output contains the paragraphs as
well as the answer in sentence form generated using our answer sentence gener-
ation framework [4].
2
    http://qald.sebastianwalter.org/index.php?x=publications&q=2
                         Enriching Answers in Question Answering Systems          3

Lexicalization The objective of lexicalization module is to generate lexicaliza-
tion patterns and associate them with triples. The framework is composed of
four lexicalization pattern mining modules.
    Occupational Metonym Patterns utilize the -er nominal based occupational
metonyms to derive a predefined set of lexicalization patterns. For instance, a
triple with occupational metonym, director, as the predicate and a movie as a
subject. This triple can be lexicalized using a pattern such as hS?, is directed by,
O? iL . We have developed a database which contains 33 of such patterns. These
patterns are used to lexicalize a triple by matching the predicate and the core
ontology class of the subject.
    Context Free Grammar (CFG) Patterns uses the language generation capa-
bility of CFG and lexicalize the triples with past tense verb as a predicate. To
be able to use CFG pattern, the verb (in predicate) should be identified as a
verb having the frame, NP↔VP↔NP.
    Relational Patterns use the unstructured text to derive patterns. We first
pre-process text to resolve co-references and then extract relations (harg1 , rel,
arg2 iR ) using OpenIE [5]. Each relation is then aligned with triples (hsubject,
predicate, objectiT ) to extract patterns. The alignment is calculated individu-
ally for subject and object alignment using Phrasal Overlap Measure (POM)
and multiplied to get the final alignment score. Furthermore, we execute some
realization steps using dependency parsing to resolve gender and grammar mis-
matches.
    Property Patterns are predefined set of patterns which can lexicalize a given
triple with specific predicate. For example, a pattern such as hS?’s predicate, is,
O? iL will be used to lexicalize triples with predicates, population total, area
total, and postal code. There are five such patterns defined with their associated
predicates from DBpedia [6].
    We also carry out a realization phase after applying lexicalization patterns.
The realization step corrects the syntactical errors of patterns such as a pattern
does not match with the grammatical gender of the triple subject.
    Table 1 shows some results from lexicalization modules where each triple is
associated with a lexicalization pattern.

Aggregation The aggregation module first cluster the triples based on the
subject. Then within each cluster we sub-cluster the triples based on rules. The
triples within sub-clusters are then transformed to the natural language sen-
tences using associated lexicalization patterns. However, at this level we do not
substitute the subject expression (S?) of the sentence as it may need a referring
expression in the generated paragraphs. Such referring expressions are resolved
in the next phase.

Referring Expression Generation The referring expression generation mod-
ule substitutes the subject expression with appropriate pronouns and entity
names to emulate humans. In order to emulate this we change the referring
expression after two consecutive usages.
4        Rivindu Perera, Parma Nand, Gisela Klette

Table 1. Sample set of triples, lexicalization patterns, and the pattern source. S? and
O? denote subject and object respectively.

Triple                                Pattern                          Source    Score
hRubens Barrichello, birth place,     hS?, was born in, O? iL          Relational 0.8192
Sao PauloiT
hRubens Barrichello, birth date,      hS?, was born on, O? iL          Relational 0.9028
1972-05-22iT
hMount Everest, first ascent per-     hS?, was climbed by, O? iL       Relational 0.4182
son, Edmund HillaryiT
hCaptain America, creator, Joe        hS?, was created by, O? iL       Metonym -
SimoniT
hLyndon B. Johnson, successor,        hO?, succeeded, S? iL            Metonym -
Hubert HumphreyiT
hLondon,      population     total,   hS?’s population total, is, O? iL Property -
8308369iT
hCanada, largest city, TorontoiT      hlargest city in S?, is, O? iL   Property -
hSocrates,             influenced,    hS?, influenced, O? iL           CFG      -
AntisthenesiT
hIntel,   founded    by,    Robert    hS?, is founded by, O? iL        CFG       -
NoyceiT


3    Conclusion
This paper described the process of generating natural language descriptions
for QALD. The approach is mainly inspired by the NLG where triple content
is transformed to natural language paragraphs. In future we expect to extend
the framework mainly focusing on the lexicalization pattern mining module.
Furthermore, we will be looking at integration of this new approach to Intelli-
gent Personal Assistant (IPA) to provide natural descriptions when presenting
answers.


References
1. Perera, R., Nand, P.: Real text-cs - corpus based domain independent content
   selection model. In: ICTAI-2014. (2014) 599–606
2. Perera, R., Nand, P.: The role of linked data in content selection. In: PRICAI-2014.
   (2014) 573–586
3. Perera, R., Nand, P.: A Multi-strategy Approach for Lexicalizing Linked Open
   Data. CICLing (2015) 348–363
4. Perera, R., Nand, P.: Realtext-asg: A model to present answers utilizing the lin-
   guistic structure of source question. In: PACLIC-29, ACL Anthology (2015)
5. Mausam, Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning
   for information extraction. In: EMNLP, Jeju Island, ACL (jul 2012) 523–534
6. Bizer, C., Lehmann, J., Kobilarov, G.: DBpedia-A crystallization point for the Web
   of Data. Web Semantics 7(3) (2009)