<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using BabelNet in Bridging the Gap Between Natural Language Queries and Linked Data Concepts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khadija Elbedweihy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stuart N. Wrigley</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Ciravegna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ziqi Zhang</string-name>
          <email>z.zhangg@dcs.shef.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of She eld</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many semantic search tool evaluations have reported a user preference for free natural language as a query input approach as opposed to controlled or view-based inputs. Although the exibility o ered by this approach is a signi cant advantage, it can also be a major di culty. Allowing users complete freedom in the choice of terms increases the di culty for the search tools to match terms with the underlying data. This causes either a mismatch which a ects precision, or a missing match which a ects recall. In this paper, we present an empirical investigation on the use of named entity recognition, word sense disambiguation, and ontology-based heuristics in an approach attempting to bridge this gap between user terms and ontology concepts, properties and entities. We use the dataset provided by the Question Answering over Linked Data (QALD-2) workshop in our analysis and tests.</p>
      </abstract>
      <kwd-group>
        <kwd>semantic search</kwd>
        <kwd>natural language query approach</kwd>
        <kwd>word sense disambiguation</kwd>
        <kwd>evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In its broadest sense, Semantic Search describes the process of matching
document content with user intent. Semantic search tools adopt di erent query input
approaches, ranging from natural language (`free' NL [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and `guided' NL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ])
to view-based (forms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and graphs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) interfaces. Each of these strategies o er
the user di erent levels of exibility, query language expressiveness and support
during query formulation. Evaluations have shown that users appreciate the
simplicity, exibility, and fast query input of the free NL approach [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. However,
exibility increases the di culty of the underlying system's task of mapping their
linguistic terms onto the correct ontological concepts and properties and Linked
Data entities. Core to this di culty are polysemy (single word with more than
one meaning) and synonymy (multiple words with the same meaning). While
the former a ects precision of results by providing false matches and the latter
a ects recall by causing true (semantic) matches to be missed [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], both of them
are usually addressed by using a word sense disambiguation (WSD) approach.
For example, in order to answer the question \How tall is ...?", the query term
tall needs to be mapped to the ontology property height (a term related to tall).
However, the term tall is also polysemous and has di erent senses including
(from WordNet):
{ \great in vertical dimension; high in stature; tall people; tall buildings, etc."
{ \too improbable to admit of belief; a tall story"
{ \impressively di cult; a tall order"
Therefore, the term must be disambiguated and the right sense identi ed (the
rst in this example), before attempting to gather related terms.
      </p>
      <p>
        An additional di culty is Named Entity (NE) recognition and
disambiguation. NE recognition approaches include parsing and then matching to resources
in the ontology; directly recognising and disambiguating using a state of the art
(SOA) NE recogniser [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or treating them as `normal' query terms which are
mapped to ontology concepts and resources (the approach taken by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]).
The computational cost of matching NEs to resources in a number of
ontologies can increase in proportion to the ontology size. Furthermore, NEs can refer
to multiple real world entities thus necessitating NE disambiguation. This is
usually performed using the context of the query and the structure of the
ontology in which the term occurs. For example, in the question Which river does
the Brooklyn Bridge cross?, the terms Brooklyn Bridge and Brooklyn would be
mapped to the DBpedia resources res:Brooklyn Bridge describing the bridge,
and res:Brooklyn1 describing the city in New York, respectively. The rst can
be correctly selected based on an ontology structure which shows that the
property crosses connects a river to a bridge and not to a city.
      </p>
      <p>In this paper, we present a free-NL semantic search approach that bridges
the gap between the sense of the user query terms and the underlying ontology's
concepts and properties. We use an extended-Lesk WSD approach (Section 2)
and a NE recogniser (Section 3.1) together with a set of advanced string
similarity algorithms and ontology-based heuristics to match query terms to ontology
concepts and properties (Section 3.3). Section 4 presents the evaluation of our
approach using the dataset provided by the Question Answering over Linked
Data (QALD-2) workshop2.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Word Sense Disambiguation (WSD)</title>
      <p>
        Disambiguating polysemous words in user queries is of paramount importance
to understand user intent and provide correct results. Indeed, identifying the
intended sense of a polysemous word is a necessity for query expansion which is
often used by search tools to bridge the gap between user terms and ontological
concepts and properties. Some search approaches consider all the senses of a
polysemous word and use their related terms for query expansion [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However,
we believe this increases noise and irrelevant matches and, therefore, a ect
precision. On the other hand, in the disambiguation approach described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a
speci c WordNet synset is considered relevant only if one of its senses
(separate words in a WordNet synset) exists in the synonyms, hypernyms, hyponyms,
holonyms or meronyms of an ancestor or a descendant of the synset.
      </p>
      <sec id="sec-2-1">
        <title>1 The pre x res refers to: &lt;http://dbpedia.org/resource/&gt;</title>
        <p>2 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=
challenge&amp;q=2</p>
        <p>
          Knowledge-based WSD approaches use dictionaries and lexicons such as
WordNet to perform WSD and are often considered a middle ground between
supervised and unsupervised approaches. Although graph-based approaches have
been gaining more attention recently for their high performance [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ], in this
work, we use an extended-Lesk approach for the following reasons: 1) it was
one of the highest performing in knowledge-based approaches (see UPV-WSD
in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]); [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] showed only 2 % di erence when compared with the Degree
Centrality algorithm, and 2) it is a simpler approach to test our hypothesis of improving
the mapping between NL queries and LD ontologies using a WSD approach with
high-coverage knowledgebase (BabelNet).
        </p>
        <p>
          WordNet is the predominant resource used in such knowledge-based WSD
approaches; however, it has been argued that its ne granularity is the main
problem for achieving high performance in WSD [
          <xref ref-type="bibr" rid="ref12 ref16">12, 16</xref>
          ]. In light of this, we,
alternatively, used BabelNet [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] for disambiguation. BabelNet is a very large
multilingual ontology with wide-coverage obtained from the automatic
integration of WordNet and Wikipedia. We believe that Wikipedia's richness of explicit
and implicit semantic knowledge together with the lexical coverage provided by
WordNet as noted by [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] could be very useful for our approach and in general
for semantic search to both expand and disambiguate users' queries.
        </p>
        <p>
          Following [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], we apply an extended-Lesk approach in which various
semantic relations are added to the context and synsets' bags to perform the
disambiguation. The selected relations provided the highest performance based on our
experiments using the SemEval-2007 coarse-grained all-words dataset3. These
relations { in order of their contribution to the WSD { are: 1) direct hyponyms;
2) second-level hypernyms; 3) synsets' glosses (only the examples part [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]);
4) WordNet semantic relations: attribute, similar to and see also; and nally,
5) glosses of synsets related to the main synset through one of the hyponyms,
hypernyms, or semantic relations.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Sense-aware Search</title>
      <p>
        Users exhibit a general preference for short NL queries, consisting of keywords
or phrases, as opposed to full sentences and a random query structure with no
speci c order for the query terms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In this section we describe a novel semantic
search approach in which free-form natural language queries are processed to
establish the user intent and underlying `meaning' of the query (using word sense
disambiguation; see Section 2) allowing the query terms to be more accurately
associated with the underlying dataset's concepts and properties. Our approach
consists of ve stages:
1. Recognition and disambiguation of Named Entities.
2. Parsing the NL query.
3. Matching query terms with ontology concepts and properties.
4. Generation of candidate triples.
5. Integration of triples and generation of SPARQL queries.
      </p>
      <sec id="sec-3-1">
        <title>3 http://lcl.uniroma1.it/coarse-grained-aw/index.html</title>
        <sec id="sec-3-1-1">
          <title>Recognition and Disambiguation of Named Entities</title>
          <p>
            Named entities are recognised using AlchemyAPI4 which had the best NE
recognition performance in a recent evaluation of SOA recognisers [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. However,
AlchemyAPI exhibits poor disambiguation performance [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]; in this paper, each NE is
disambiguated using BabelNet as described above.
          </p>
          <p>For example, for the question \In which country does the Nile start?", the
term Nile has di erent matches in BabelNet. These matches include:
{ http://dbpedia.org/resource/Nile (singer)
{ http://dbpedia.org/resource/Nile (TV series)
{ http://dbpedia.org/resource/Nile (band)
{ http://dbpedia.org/resource/Nile</p>
          <p>Although one could select the last URI as an exact match to the query term,
syntactic matching alone can not guarantee the intended meaning of the term,
which is better identi ed using the query context. Using our WSD approach, we
would select the correct match for Nile as a river since more overlapping terms
are found between this sense and the query (such as geography, area, culture and
continent ) than the other senses.
3.2</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Parsing and Disambiguation of the Natural Language Query</title>
          <p>
            The second step is to parse the NL query, which is done using the Stanford
parser [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. However, since we do not expect users to adhere to correct grammar
or structure in their queries we do not use the generated parse trees; we only
use lemmatisation and part of speech (POS) tagging. Each query term is stored
with its lemma and POS tag except for previously recognised NEs which are not
lemmatised. Additionally, we identify the position of each term with respect to
the rest of the query and use this in the later step. For example, the question
Which software has been developed by organizations founded in California? from
the QALD-2 dataset generates the following outcome:
{ software: at position 1 and POS NP
{ developed: at position 2 and POS VBN
{ organizations: at position 3 and POS NNS
{ founded: at position 4 and POS VBN
{ California: at position 5 and POS NP
          </p>
          <p>Equivalent output is also generated when using keywords or phrases. At the
end of this step, any proper nouns identi ed by the parser and which were not
recognised by AlchemyAPI as NEs are disambiguated as described in Section 2
and added to the set of recognised entities. This ensures that, for the example
used above: In which country does the Nile start the algorithm does not miss
the entity Nile because it was not recognised by AlchemyAPI.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4 http://www.alchemyapi.com</title>
        <sec id="sec-3-2-1">
          <title>3.3 Matching Query Terms with Ontology Concepts and Properties</title>
          <p>
            The terms generated from the second step (Section 3.2) are then matched to
concepts and properties in the ontologies being used. Noun phrases, nouns and
adjectives are matched with both concepts and properties, while verbs are only
matched with properties. After gathering all candidate ontology matches that are
syntactically similar to a query term, these are then ordered using two string
similarity algorithms: Jaro-Winkler [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] and Double Metaphone [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. Jaro-Winkler
depends on comparing the number and order of common characters. Similar to
Monge Elkan [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] which is used by [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], it gives a high score to terms which are
parts of each other. This is useful since ontology concepts and properties are
usually named in this way: for instance, the term population and the property
totalPopulation are given a high similarity score using this algorithm. An
additional advantage of this algorithm is e ciency; [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] found it to be an order
of magnitude faster than Monge-Elkan. We set the threshold for accepting a
match to 0.791, which was shown by [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ] to be the best threshold value. Double
Metaphone captures words based on a phonetic basis and is, therefore, useful to
capture similarly sounding terms.
          </p>
          <p>If a query term produces no matches, its lemma is used for matching. If no
matches were found, derivationally related forms of the query term are then
used. For example, the property creator in the question Which television shows
were created by Walt Disney? is only found after getting these forms for the term
create. After this, if no matches are found, the query term is then disambiguated
using our WSD approach and terms related to the identi ed synset are gathered.
These terms are used to nd matches in the ontology, based on both their level
in the taxonomy (the nearest, the better) and in order of their contribution to
the WSD as shown by the results of our experiments. Thus, synonyms are used
rst, then semantic relations (the appropriate ones), followed by hyponyms, and
nally hypernyms. For nouns, no semantic relations are used, while for verbs,
see also is used and nally, for adjectives, attribute and similar to are used in
that order. Indeed, the attribute relation is very useful for adjectives since, for
example, the property height is identi ed as an attribute for the adjective tall,
which allows answering the question \How tall is ...?". The query term is marked
as not found if no matches were found after all expansion terms have been used.
Note that we do not match superlatives or comparatives to ontology concepts
or properties; they are used in the next step to generate the appropriate triples.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.4 Generation of Candidate Triples</title>
          <p>After all terms have gone through the matching process, the query can be
interpreted in terms of a set of ontology concepts, properties and instances that need
to be linked together. The structure of the ontology (taxonomy of classes and
domain and range information of properties) in addition to BabelNet are used
in this step, as will be explained next.</p>
          <p>Three-Terms Rule Firstly, each three consecutive terms are matched
(using the information about their relative positions as explained in Section 3.2)
against a set of templates. The intuition is to nd a subject with a speci ed
relation to a given object. Then, the ontology matches associated with each
term are used to generate one or more candidate triples. For instance, the
question Which television shows were created by Walt Disney? which can also be
given as keywords television show, create, Walt Disney matches the template
concept-property-instance and generates the following triples:
?television_show &lt;dbo:creator&gt; &lt;res:Walt_Disney&gt;.
?television_show &lt;dbp:creator&gt; &lt;res:Walt_Disney&gt;.
?television_show &lt;dbo:creativeDirector&gt; &lt;res:Walt_Disney&gt;.</p>
          <p>Triples generated from the same query term are ordered according to the
similarity of the matches found in them with respect to this term. In this
example, the two properties dbo:creator and dbp:creator are ordered
before dbo:creativeDirector5 since they have a higher similarity score with the
query term create. Similar questions that would be matched to this template
include airports located in California, and actors born in Germany. The other
templates capture the di erent ordering that can be found in the query such
as instance-property-concept in the question Was Natalie Portman born in the
United States? or property-concept-instance in the question birthdays of actors
of television show Charmed. Note that in the last example, since we identify the
type of the instance Charmed as `television show', we exclude the latter during
triples generation making it: birthdays of actors of Charmed.</p>
          <p>Two-Terms Rule Some user queries contain fewer than three pieces of
information, thus preventing the application of the Three-Terms Rule. This can
happen in three situations: 1) no match between the derived terms and any
three-term template; 2) the template did not generate candidate triples; or 3)
there are fewer than three derived terms.</p>
          <p>For example, the second situations occurs in the second part of the question
In which lms directed by Garry Marshall was Julia Roberts starring? in which
the terms Garry Marshall, Julia Roberts and starring would be matched to an
existing template but without generating candidate triples. The requirement
that the domain of the property (in this case: Film) must be the type of one
of the instances was not met. In these scenarios, we follow the same process of
template matching and triples generation for each pair of consecutive terms. For
instance, the question area code of Berlin generates the triples:
&lt;res:Berlin&gt; &lt;dbp:areaCode&gt; ?area_code.
&lt;res:Berlin&gt; &lt;dbo:areaCode&gt; ?area_code.</p>
          <p>
            Comparatives As explained earlier, superlatives and comparatives are not
matched to ontology terms but used here to generate the appropriate triples.
For comparatives, there are four di erent scenarios that we found from our
analysis of the queries in datasets used by di erent semantic search evalutions (e.g.,
Mooney dataset [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] and datasets used in QALD challenges6). The rst is when
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>5 The pre x dbo refers to: &lt;http://dbpedia.org/ontology/&gt;; the pre x dbp refers</title>
        <p>to: &lt;http://dbpedia.org/property/&gt;
6 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/
a comparative is used with a numeric datatype property such as the property
numberOfEmployees in the question phrase more than 500000 employees. This
information is known from the range of the property. In this case the following
triples are generated:
?company &lt;dbp:numEmployees&gt; ?employee.
?company &lt;dbp:numberOfEmployees&gt; ?employee.</p>
        <p>These triples are ordered according to their similarity with the original query
term (employee) and a choice is made between using the best match or all
matches depending on the priority of the algorithm (i.e., whether to favour
precision or recall). The chosen triples are then added to the following ones:
?company a &lt;dboCompany&gt;.</p>
        <p>FILTER ( ?employee &gt; 500000)</p>
        <p>The second scenario is when a comparative is used with a concept as in the
example places with more than 2 caves. Here, we want to generate the same
triples that would be generated for places with caves and add the aggregate
restriction: GROUP BY ?place HAVING (COUNT(?cave) &gt; 2).</p>
        <p>In the third scenario, the comparative is used with an object property which,
similarly, requires an aggregate restriction. In the example countries with more
than 2 o cial languages, the following restriction is added to the normal triples
generated between country and o cial language.</p>
        <p>GROUP BY ?country HAVING (COUNT(?official_language) &gt; 2)</p>
        <p>
          The fourth and most challenging scenario can be illustrated by the question
Which mountains are higher than the Nanga Parbat?. The di culty here is to
identify the property referred to by the comparative term (which is `elevation'
in this example) to get its value for the given instance and then do a
comparison with this value. While [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] tackles this challenge by generating suggestions
using the datatype properties associated with the concept and asking the user
for assistance, this can be an overhead on the user. Our algorithm tries to select
the best relation according to the query context. Firstly, all numeric datatype
properties associated with the query concept (in this case mountain) are
identi ed. These are: latS, longD, prominence, firstAscent, elevationM, latD,
elevation, longM, latM, prominenceM, and longS. Using our WSD approach,
each of these properties is rst disambiguated to identify the synset which is
most relevant to the query context. Then, the selected synsets of all the
properties are put together and treated as di erent meanings of a polysemous word in
order to have the WSD approach identify the most related synset to the query.
Using this technique, the algorithm correctly selects the property elevation
to use and then proceeds to nd mountains with elevation higher than that of
the instance Nanga Parbat. In order to verify whether our WSD algorithm was
a ected by the abbreviations (such as latM), we asked the same question
after replacing the abbreviations by their equivalent word (latitude for latM). We
found that it still selected elevation as the most relevant property since it had
more overlapping terms with the query than the others.
        </p>
        <p>Indeed, it is more challenging to identify this link between the term and
the appropriate property for more generic comparatives like larger in the query
cities larger than Cairo. Several interpretations arise, including area of the city,
its population or density. The ability to resolve this scenario is future work.
Superlatives For superlatives, we identi ed two di erent scenarios. Either it is
used with a numeric datatype property such as in the example city with largest
population, or with a concept as in what is the highest mountain. In the rst
scenario, the normal triples between the concept city and property population
are generated, in addition to an ORDER BY clause together with a LIMIT to return
the rst result.</p>
        <p>The second scenario is more challenging and similar to the last comparative
scenario explained above and is indeed tackled using the same technique. All
numeric datatype properties of the concept are identi ed and the most relevant one
(identi ed by our WSD approach) is used in the query. Again, in this example,
the term highest is successfully mapped to the property elevation.</p>
        <sec id="sec-3-3-1">
          <title>3.5 Integration of Triples and Generation of SPARQL Queries</title>
          <p>The nal stage involves generating the SPARQL query by integrating the triples
generated from the previous stages. Information about the query term position is
used to order the sets of triples originating from di erent query terms.
Furthermore, for triples originating from the same query term, care is taken to ensure
they are executed in the appropriate order until an answer is found (when higher
precision is preferred and thus not all matches are used).</p>
          <p>For example, in the question Which software has been developed by
organizations founded in California?, the terms in the rst part | software, developed,
organizations | generate the following triples:
?software &lt;dbp:developer&gt; ?organisation.
?software a &lt;dbo:Software&gt;.
?organisation a &lt;dbo:Organisation&gt;.</p>
          <p>And the terms in the second part | organizations, founded, California |
generate the following triples:
?organisation &lt;dbp:foundation&gt; &lt;res:California&gt;.
?organisation a &lt;dbo:Organisation&gt;.</p>
          <p>To produce the nal query, duplicates are removed while merging the triples
and the SELECT and WHERE clauses are added in addition to any aggregate
restrictions or solution modi ers required.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        This section presents a comparative evaluation of our approach using the
DBpedia test data provided by the 2nd Open Challenge on Question Answering over
Linked Data (QALD-2). Results were produced by QALD-2 evaluation tool7.
7 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=
evaltool&amp;q=2
Table 1 shows the performance in terms of precision, recall and f-measure, in
addition to coverage (number of answered questions, out of 100) and the number
of correct answers (de ned as P=R=(F1)=1.0). Our approach (SenseAware)
is compared with QALD-2 participants [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]: SemSeK, Alexandria, MHE and
QAKiS, in addition to BELA [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which was evaluated after QALD-2 but using
the same dataset and questions. The results show SenseAware is very promising
especially in terms of correctness: 76% of answers were `correct'. It also achieves
higher performance than the other approaches except for BELA. The latter
has higher values for P, R (and F1) since it favours these over coverage and
correctness (only 31 answered of which 55% were `correct').
      </p>
      <p>After excluding out-of-scope questions (as de ned by the organizers) and
any containing references to concepts and properties in ontologies other than
DBpedia since they are not yet indexed by our approach, we had 75 questions
left. The 21 questions { out of 75 { that our approach couldn't provide an answer
for fall into the following categories:
1. No matches were found for one or more query terms after query expansion.
2. Matches were found for all query terms but question type is out-of-scope.
3. Question requires higher level of reasoning than is currently provided.</p>
      <p>Examples of the rst category are: What did Bruce Carver die from? and
Who owns Aldi?, in which the terms die and owns should be mapped to the
properties deathcause and keyPerson, respectively. Questions that are not yet
addressed are mainly the ones which require identifying the right property to use
depending on the answer type. An example is When was the Battle of Gettysburg?
which requires using the property date. Another example is In which lms did
Julia Roberts as well as Richard Gere play?. Here, our approach could not relate
the concept films with Richard Gere. Although, it is designed to maximise
the chance of linking concepts, properties and instances in the query, without
being a ected with the sentence structure, this version cannot yet link a term
( lms) that is being successfully related to other terms (Julia Roberts ) to an
additional term (Richard Gere). However, it can still solve complex questions
that require relating terms that are not consecutively positioned (e.g., lms and
Julia Roberts ) in the question In which lms directed by Garry Marshall was
Julia Roberts starring?. Finally, examples of questions in the third category are
Is Frank Herbert still alive? which requires understanding that the expression
still alive means not nding a value for the death date of the person.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        In designing an approach to answer user questions, it is usually di cult to decide
whether to favour precision or recall, since it is well known that an increase in
one commonly causes a decrease in the other. In fact, which to favour depends
not only on the users but on their speci c information need at some point. We
experienced this while designing our approach since we had to decide on the
following choices to be in favour of precision or recall:
Query Relaxation Consider the question Give me all actors starring in Last
Action Hero. This question explicitly de nes the type of entities requested as
actors which justi es querying the dataset for only this type. Hence, we would
add the triple: ?actor a &lt;dbo:Actor&gt; to restrict the results generated from:
res:Last Action Hero dbp:starring ?actor to only these who are actors.
However, the current quality of Linked Data would be a major problem with
this choice, since not all entities are typed [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], let alone typed correctly. This
causes the previous query to fail, and only succeeds to return the required
answer after query relaxation, i.e., removing the restricting triple ?actor a
&lt;dbo:Actor&gt;. This choice is in favor of recall. It a ects precision since, for the
question How many lms did Leonardo DiCaprio star in?, following this
technique would also return answers that are TV series rather than lms such as
res:Parenthood (1990 TV series). Our decision was to favor precision and
keep the restriction whenever it is explicitly speci ed in the user's query.
Best or All Matches The decision to use only the best match found in the
ontology for a query term or all matches whose similarity exceeds a threshold can
affect precision and recall. For instance, the term founded in the question software
developed by organizations founded in California has several matches including
foundation and foundationPlace. Using only the best match (foundation)
would not generate all the results and, in turn, a ects the recall. On the other
hand, if these properties were not relevant to the query, this would harm the
precision. To balance precision and recall, our algorithm uses all matches while
employing a high value for the similarity threshold and performing checks against
the ontology structure to assure relevant matches are only used in the nal query.
Query Expansion when a query term is not found in the ontology, query
expansion is performed to identify related terms and repeat the matching process
using these terms. However, in some scenarios, this expansion might be useful to
increase the recall, when the query term is not su cient to return all the answers.
Therefore, it would be useful to perform the expansion for all query terms even
if they had matches in the ontology. An example of this is when one of the two
terms website or homepage are used in a query and both of them have matches
in the ontology. Using only one of them could a ect recall for some queries. On
the other hand, the quality/relevance of expansion terms (for polysemous words)
depends fully on the WSD approach. If a wrong sense was identi ed for a query
term, this list will be noisy and lead to false matches. Additionally, the
disambiguation process is computationally expensive and therefore, for these reasons,
we perform query expansion only when no matches are found in the ontology for
a term or when no results are generated using the identi ed matches.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>
        In this paper, we have presented a NL semantic search approach to answer
user queries expressed in their own terms and format. The main steps of the
approach are recognising and disambiguating named entities in the query;
parsing and matching the rest of the query to ontology concepts and properties;
and, nally, generating triples from these matches and integrating them to form
the nal SPARQL query. Using BabelNet in our WSD approach allowed high
performance disambiguation of polysemous query terms and therefore, when
required, produced highly relevant terms for query expansion. Together with a set
of SOA string similarity algorithms and ltering techniques to produce
accurate mappings for query terms, we have shown that our approach's performance
is competitive (especially in the number of questions answered correctly: 76%)
when evaluated using the QALD-2 dataset. We have also discussed challenges
facing our approach (which are in common with many NL approaches). Query
terms that are very di cult to be matched to the ontology or queries
requiring advanced reasoning would, indeed, require a `user in the loop' to assist the
system in addressing these challenges. This agrees with our recommendation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
to combine the bene ts of visualising the search space o ered by view-based
query approaches (which would support the user in the above scenarios) with a
NL-input feature that would balance di culty with speed of query formulation.
      </p>
      <p>Elbedweihy et al.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kaufmann</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <string-name>
            <surname>NLP-Reduce</surname>
          </string-name>
          :
          <article-title>A \nave" but Domainindependent Natural Language Interface for Querying Ontologies</article-title>
          .
          <source>In: Proceedings of ESWC 2007</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stieler</surname>
          </string-name>
          , N.:
          <article-title>PowerAqua: supporting users in querying and exploring the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine</article-title>
          .
          <source>In: Proc. of WITS 2005</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bhagdev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanfranchi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Hybrid Search: E ectively Combining Keywords and Ontology-based Searches</article-title>
          .
          <source>In: Proceedings of ESWC 2008</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
            , E., Gohring,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Querying Ontologies: A Controlled English Interface for End-users</article-title>
          .
          <source>In: Proceedings of ISWC 2005</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kaufmann</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>8</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Elbedweihy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wrigley</surname>
            ,
            <given-names>S.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Evaluating Semantic Search Query Approaches with Expert and Casual Users</article-title>
          .
          <source>In: Evaluations and Experiments Track, 11th International Semantic Web Conference (ISWC)</source>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Using wordnet to disambiguate word senses for text retrieval</article-title>
          .
          <source>In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval.</source>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rizzo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
          </string-name>
          , R.:
          <article-title>NERD : a Framework for Evaluating Named Entity Recognition Tools in the Web of Data</article-title>
          .
          <source>In: Proceedings of ISWC 2011</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Walter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Bar, D.:
          <article-title>Evaluation of a layered approach to question answering over linked data</article-title>
          .
          <source>In: Proceedings of ISWC</source>
          <year>2012</year>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
          </string-name>
          , V.:
          <article-title>PowerAqua: Fishing the Semantic Web</article-title>
          .
          <source>In: Proceedings of ESWC 2006</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          : Babelnet:
          <article-title>The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>193</volume>
          (
          <year>2012</year>
          )
          <volume>217</volume>
          {
          <fpage>250</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Lacalle</surname>
            ,
            <given-names>O.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Knowledge-based wsd on speci c domains: performing better than generic supervised wsd</article-title>
          .
          <source>In: Proceedings of the 21st international jont conference on Arti cal intelligence</source>
          .
          <source>IJCAI'09</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Litkowski</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hargraves</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          : SemEval-2007
          <source>Task</source>
          <volume>07</volume>
          :
          <string-name>
            <surname>Coarse-Grained English</surname>
          </string-name>
          All-Words
          <string-name>
            <surname>Task</surname>
          </string-name>
          .
          <source>In: Proceedings of SemEval-2007</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Knowledge-rich word sense disambiguation rivaling supervised systems</article-title>
          .
          <source>In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vronis</surname>
          </string-name>
          , J.:
          <article-title>Word sense disambiguation: The state of the art</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>24</volume>
          (
          <year>1998</year>
          )
          <volume>1</volume>
          {
          <fpage>40</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Extended gloss overlaps as a measure of semantic relatedness</article-title>
          .
          <source>In: In Proceedings of IJCAI</source>
          <year>2003</year>
          .
          <article-title>(</article-title>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Buscaldi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masulli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Integrating conceptual density with wordnet domains and cald glosses for noun sense disambiguation</article-title>
          .
          <source>In: Advances in Natural Language Processing</source>
          . Volume
          <volume>3230</volume>
          . Springer Berlin Heidelberg (
          <year>2004</year>
          )
          <volume>183</volume>
          {
          <fpage>194</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Reichert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linckels</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meinel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Students perception of a semantic search engine</article-title>
          .
          <source>In: Proceedings of the IADIS International Conference on Cognition and Exploratory Learning in Digital Age (CELDA2005)</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Fast exact inference with a factored model for natural language parsing</article-title>
          .
          <source>In: In Advances in Neural Information Processing Systems 15 (NIPS</source>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Winkler</surname>
          </string-name>
          , W.E.:
          <article-title>String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage</article-title>
          .
          <source>In: Proceedings of the Section on Survey Research</source>
          . (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Philips</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The Double Metaphone search algorithm</article-title>
          . C/C++
          <source>Users Journal</source>
          <volume>18</volume>
          (
          <year>2000</year>
          )
          <volume>38</volume>
          {
          <fpage>43</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Monge</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The eld matching problem: Algorithms and applications</article-title>
          .
          <source>In: In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining</source>
          . (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Damljanovic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agatonovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          , H.:
          <article-title>Natural Language Interface to Ontologies: combining syntactic analysis and ontology-based lookup through the user interaction</article-title>
          .
          <source>In: Proceedings of ESWC 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravikumar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fienberg</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>A comparison of string distance metrics for name-matching tasks</article-title>
          .
          <source>In: Proceedings of IJCAI-03 Workshop on Information Integration</source>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>da</surname>
            <given-names>Silva</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Stasiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.K.</given-names>
            ,
            <surname>Orengo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.M.</given-names>
            ,
            <surname>Heuser</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.A.</surname>
          </string-name>
          :
          <article-title>Measuring quality of similarity functions in approximate data matching</article-title>
          .
          <source>J. Informetrics</source>
          <volume>1</volume>
          (
          <year>2007</year>
          )
          <volume>35</volume>
          {
          <fpage>46</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>L.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
            ,
            <given-names>R.J.:</given-names>
          </string-name>
          <article-title>Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing</article-title>
          .
          <source>In: ECML</source>
          <year>2001</year>
          .
          <article-title>(</article-title>
          <year>2001</year>
          )
          <volume>466</volume>
          {
          <fpage>477</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
          </string-name>
          , R., eds.
          <source>: Proceedings of Interacting with Linked Data (ILD</source>
          <year>2012</year>
          ),
          <source>at ESWC</source>
          <year>2012</year>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciancarini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Type inference through the analysis of wikipedia links</article-title>
          .
          <source>In: LDOW</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>