<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Portable Controlled Natural Languages for Querying Ontologies</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of She eld Regent Court</institution>
          ,
          <addr-line>211 Portobello Street S1 4DP, She eld</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Natural Language Interfaces (NLIs) to structured data allow users to interact with a system using written or spoken language to perform tasks that require knowledge of a formal language. Due to natural language complexity and ambiguity, such interfaces usually support a Controlled Natural Language (CNL): a subset of a natural language that includes certain vocabulary and grammar rules that have to be followed. Building vocabulary di ers from one system to another, and the way this is performed signi cantly a ects portability: portable CNLs for querying ontologies are those that can be adapted easily to new domains without sacri cing performance. In this paper we describe the approach for dynamically extending the vocabulary supported by such systems, through a dialog with the user.</p>
      </abstract>
      <kwd-group>
        <kwd>Controlled Languages</kwd>
        <kwd>ontology</kwd>
        <kwd>user interaction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Natural Language Interfaces (NLIs) for querying structured data allow users
to interact with a system using written or spoken language (e.g., English) to
perform tasks that usually require knowledge of a formal query language. Due
to Natural Language (NL) complexity and ambiguity, such interfaces usually
support a Controlled Natural Language (CNL): a subset of a NL that includes
certain vocabulary and grammar rules that have to be followed.</p>
      <p>
        Building vocabulary for CNLs which are used for querying ontologies di ers
from one system to another, and the way this is performed signi cantly a ects
portability : portable NLIs to ontologies are those that can be adapted easily to
new domains. Although portable NLIs are considered as potentially much more
useful than domain-speci c systems, constructing them poses a number of
technical and theoretical problems as many of the techniques preclude automatic
adaptation of the systems to new domains [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A recent trend in developing
NLIs for querying ontologies includes building the domain vocabulary (lexicon)
automatically from the ontology lexicalisations. However, when ontologies are
built using automated methods such as ontology learning or by automatic
transition from relational databases, many ontology concepts have arti cial names
and do not include human-understandable lexicalisations. For example, arti cial
constructs such as a property with a local name hasEmail and no labels, would
need to be preprocessed in order to be useful. Triples have a form of (Subject,
Predicate, Object) and many existing tools make an assumption that the
Predicate should be related to a verb in the sentence (or sometimes noun for datatype
properties). However, this is often not the case, for many reasons. Firstly,
requirement of having a unique URI means that for two object properties such as
hasEmail and hasAddress, the URIs would be equal under the same namespace.
In addition, it seems unlikely that users would use the verb to have when
enquiring about an email, as they would probably use questions such as What is your
email? instead of What email you have?. Informal rules which are often applied
when naming ontology concepts indirectly a ect the task of extracting the
lexicon from the ontology. Many portable systems solve this problem by the means
of customisation. As it is stated in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], manual customisation increases recall.
This is in line with the statement from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that \there is no free lunch" and that
the customisation is mandatory in order to achieve reasonable performance.
      </p>
      <p>In this paper, we propose a novel approach for minimizing the customisation
of CNLs for querying ontologies without sacri cing performance, when porting
them to work with another domain (ontology). We achieve this by modeling a
dialog for the user in order to enrich already available vocabulary extracted from
the ontology. Along the same lines, the dialog is modeled for any ambiguities
which arise from the user's question, and the user is asked to disambiguate the
speci c meaning, before the question is translated to the formal language.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Context</title>
      <p>
        Several CNL systems for querying ontologies have been developed recently, which
extract the domain vocabulary (lexicon) from the ontology itself by extracting
and processing the lexicalisations such as labels and datatype property values.
Examples of such systems are ORAKEL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], AquaLog [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], QuestIO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and many
others. In case of ORAKEL, a part of the domain-speci c lexicon is created
automatically from the domain ontology, while another part is created manually
and contains mappings of subcategorisation frames (e.g., verbs and nouns) to
ontology properties.
      </p>
      <p>
        Another way to generate/enrich the lexicon for CNLs for querying ontologies
is by using the CNLs for knowledge representation, such as ACE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or CPL
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. ACE is probably the most powerful, not only because of the maturity but
also due to many support tools, such as OWL Verbaliser, which can be used
to generate the lexicon from the ontology which is built externally; the lexicon
can be updated/enriched by changing/adding new ACE sentences. While neither
of the CNLs are tailored to a speci c domain, porting them to a di erent
domain requires knowledge of these CNLs in order to generate/update the domain
knowledge.
      </p>
      <p>Our approach generates the initial lexicon automatically from the ontology
lexicalisations. When the user starts using the system, if a question term is not
found in the lexicon, the combination of syntactic parse and ontology reasoning
rules is used to generate the dialog. The user is then asked to map the unknown
term into the ontology concept and following his selection, the new term is added
to the lexicon. In addition, the lexicon carries the semantics which is related to
the context in which certain word appeared.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Building lexicon through the user interaction</title>
      <p>Generation of the user-de ned lexicon is broken down into the following steps:
1. Generate lexicon by extracting lexicalisations attached to the Ontology
Concepts1. This step includes extracting fragment identi ers, labels, and
values of datatype properties.
2. Perform lexicon-based lookup. This would nd the links between
question terms and the logical form in the ontology. For example, in What is the
population of New York?, New York would be identi ed as Ontology
Concept (OC) referring to both geo:newYork2 and geo:newYorkNY, because it
is matched with the labels of these URIs which is new york.
3. Analyse grammar and identify the candidate words which could be
referring to an Ontology Concept. We call these Potential Ontology Concepts
(POCs). For example, for the above question, population would be identi ed
as POC, however, as there is no such lexicalisation in the ontology, we do
not know to which Ontology Concept this noun refers, and therefore, we ask
the user.
4. Generate the dialog (if a POC cannot be mapped to the logical form
automatically) and ask the user to map the unknown term (POC) into the
speci c concept in the ontology (OC). In addition, if POC refers to more
than one OC, generate the dialog and ask the user to disambiguate. For
example, in What is the population of New York? the question is ambiguous
as it can be translated to two interpretations, where the rst one is the state
population of New York state and the other one is city population of New
York city.
5. Add the POC to the lexicon as a description of the OC. This
description includes the context in which the term appears so that it can
be reused in similar contexts. Figure 1 illustrates an example of mapping
population to the geo:cityPopulation whenever it appears together with
New York as a city 3. If the same word (population) is used together with
New York state, then it will need to be mapped to a di erent OC such as
geo:statePopulation.
1 Note that we use the term Ontology Concept to refer to all types of ontology
resources such as classes, instances, properties and literals.
2 We use geo: instead of the full namespace http://www.mooney.net/geo for brevity.
3 Note that the dialog before this one was asking the user to disambiguate whether</p>
      <p>
        New York is a city or a state
A dynamically enriched lexicon from the user-de ned vocabulary is used
in the system called FREyA4 which serves as an NLI for querying ontologies.
FREyA translates an NL query such as What is the capital of California? into
the formal SPARQL query in order to nd the answer. For more details about
FREyA see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, the lexicon can be easily used by any CNL system for
querying ontologies. Currently, its format is in JSON, and for the above example
of 'mapping' population to the geo:cityPopulation, whenever this term is used
in combination with geo:City, JSON would look similar to the following:
"Key:
population
http://www.mooney.net/geo#City",
"identifier":
      </p>
      <p>"http://www.mooney.net/geo#cityPopulation","function":""
The eld function is used to indicate whether mapping certain words into OCs
requires applying additional functions on their values (such as applying
maximum or minimum function on the values of datatype properties of type
number). For example, if in the question What is the largest state in the US?, there
is state in the lexicon, which refers to geo:State, but there is no largest. The
dialog is modeled and the user can map largest to the maximum value of the
geo:stateArea, whenever it is used as a modi er of the lexicalisation of the
geo:State:
"Key:
largest
http://www.mooney.net/geo#State",
"identifier":</p>
      <p>"http://www.mooney.net/geo#stateArea", "function":"max"
Translating this JSON format into the knowledge representation such as OWL in
a way which can be used by any CNL system is straightforward. For example, the
format of the OWL le could be such that ACE OWL Verbaliser generates proper
ACE sentences so that the lexicon (content words) of ACE can be enriched.
4 http://gate.ac.uk/freya
We experimented with 250 questions from the Mooney GeoQuery dataset5 and
the ontology covering the USA geography domain. The system without any
customisation (domain lexicon generated automatically from the ontology) could
automatically answer 72 questions (28.8%). For the remaining questions, FREyA
generated at most 2 dialogs. When running it in the automatic mode (the lexicon
is generated without engaging the user, but by selecting the rst suggestion
generated by FREyA) the precision and recall were 81.2%. Finally, by engaging
the user for up to 15 minutes these values increased to 92.4%.</p>
      <p>Acknowledgments We would like to thank A. Bernstein and E. Kaufmann
from the University of Zurich, for sharing with us Mooney OWL ontology, and
J. Mooney from University of Texas for making this dataset publicly available.
5 http://userweb.cs.utexas.edu/users/ml/nldata/geoquery.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Grosz</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appelt</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.: TEAM</given-names>
          </string-name>
          :
          <article-title>An experiment in the design of transportable natural-language interfaces</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>32</volume>
          (
          <issue>2</issue>
          ) (
          <year>1987</year>
          )
          <volume>173</volume>
          {
          <fpage>243</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Aqualog: An ontology-driven question answering system for organizational semantic intranets</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>June 2007</year>
          )
          <volume>72</volume>
          {
          <fpage>105</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heizmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mantel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Studer</surname>
          </string-name>
          , R.:
          <article-title>Towards Portable Natural Language Interfaces to Knowledge Bases { the Case of the ORAKEL System</article-title>
          .
          <source>Data and Knowledge Engineering</source>
          <volume>65</volume>
          (
          <issue>2</issue>
          ) (May
          <year>2008</year>
          )
          <volume>325</volume>
          |
          <fpage>354</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Damljanovic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A text-based query interface to owl ontologies</article-title>
          .
          <source>In: 6th Language Resources and Evaluation Conference (LREC)</source>
          , Marrakech, Morocco, ELRA (May
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fuchs</surname>
            ,
            <given-names>N.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaljurand</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Attempto Controlled English for Knowledge Representation</article-title>
          . In Baroglio,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Bonatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.A.</given-names>
            ,
            <surname>Maluszynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Marchiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Scha ert, S., eds.
          <source>: Reasoning Web, Fourth International Summer School 2008. Number 5224 in Lecture Notes in Computer Science</source>
          , Springer (
          <year>2008</year>
          )
          <volume>104</volume>
          {
          <fpage>124</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojcik</surname>
            ,
            <given-names>R.H.</given-names>
          </string-name>
          :
          <article-title>Acquiring and Using World Knowledge Using a Restricted Subset of English</article-title>
          . In Russell, I.,
          <string-name>
            <surname>Markov</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          ., eds.
          <source>: Proceedings of the 18th International FLAIRS Conference (FLAIRS'05)</source>
          , AAAI Press (
          <year>2005</year>
          )
          <volume>506</volume>
          {
          <fpage>511</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Damljanovic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agatonovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          , H.:
          <article-title>Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction</article-title>
          .
          <source>In: Proceedings of the 7th Extended Semantic Web Conference (ESWC</source>
          <year>2010</year>
          ). Lecture Notes in Computer Science, Heraklion, Greece, SpringerVerlag (
          <year>June 2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>