<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Editing OWL through generated CNL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Richard Power</string-name>
          <email>r.power@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Stevens</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donia Scott</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Rector</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing The Open University Milton Keynes</institution>
          ,
          <addr-line>MK7 6AA</addr-line>
          ,
          <country country="UK">U.K</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science The University of Manchester Manchester</institution>
          ,
          <addr-line>M13 9PL</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Traditionally, Controlled Natural Languages (CNLs) are designed either to avoid ambiguity for human readers, or to facilitate automatic semantic analysis, so that texts can be transcoded to a knowledge representation language. CNLs of the second kind have recently been adapted to the requirements of knowledge formation in OWL for the Semantic Web. We suggest in this paper a variant approach based on automatic generation of texts in CNL (as opposed to automatic analysis), and argue that this provides the best of both worlds, allowing us to pursue human readability in addition to a precise mapping from texts to a formal language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Several research groups have proposed interfaces for the Semantic Web
based on Controlled Languages [
        <xref ref-type="bibr" rid="ref1 ref12 ref13 ref2 ref8">12, 13, 2, 8, 1</xref>
        ]. These systems are
designed for use by domain experts who wish to encode knowledge without
having to work directly in formalisms like OWL and RDF, which are
designed for computer processing and data exchange rather than for easy
comprehension by people. After some initial training, authors type in
sentences drawn from a restricted subset of English (or some other
natural language); the input is parsed and transformed into statements in
OWL or some other Semantic Web formalism.
      </p>
      <p>
        Our purpose in this paper is to introduce an alternative approach in
the same tradition. We agree with the research groups cited above that
CNL-based interfaces for editing (or viewing) ontologies and other
metadata on the Semantic Web are plausible solutions to an urgent problem.
Our distinctive proposal is to avoid any reliance on automatic
interpretation of human-authored text. The CNL texts representing the content
of a knowledge base are produced not through human authoring but
through automatic Natural Language Generation (NLG). This brings
several immediate advantages: for instance, it eliminates any possibility
of an interpretation error (by the system); it also eliminates any need to
train users to adhere to the CNL grammar. However, the idea is at rst
sight paradoxical: if all texts are generated by the system, how does the
human author convey the desired content?
Over the last decade, our group has pioneered a method by which this
can be done. The key idea is that the author speci es content by direct
manipulation of a generated `feedback text' which expresses both the
current content of the knowledge base and the options for extending it.
By preserving the link between constituents of the feedback text and the
formal representation of its meaning, the interface supports editing at the
level of meaning, not text. The user can select spans linked to individual
objects or events, and perform operations like classi cation, or assigning
values to properties, presented through menus (again expressed in the
CNL) which pop up when a span is selected. This method, which we have
called Wysiwym3 or more generically Conceptual Authoring, has been
employed in several application domains [
        <xref ref-type="bibr" rid="ref10 ref3 ref4 ref5">10, 3, 5, 4</xref>
        ] and shown through
evaluation studies to be intuitive for subject-matter experts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Two advantages of using NLG have already been mentioned | no
interpretation errors, minimal user training. A more subtle advantage is
that we can design the CNL with the sole purpose of favouring human
understanding: we no longer have to meet requirements resulting from
automatic interpretation or human authoring. An NLG system could
easily support several CNL dialects, some based on other natural languages,
allowing multilingual access to the same knowledge base. Di erent texts
could be generated for editing and viewing, favouring precision and ease
of manipulation in the rst case, and uency in the second.
However, to achieve these bene ts we need to overcome a limitation of
the Conceptual Authoring systems implemented so far. Except for one
or two small experimental prototypes, they are all designed for editing
ABoxes (asserted facts about individuals) using classes from a xed TBox
(de nitions and general statements about classes)4. The purpose of this
paper is to indicate how the theoretical model underlying Conceptual
Authoring can be adapted to cover knowledge bases embracing TBox as
well as ABox [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Editing a knowledge base</title>
      <p>
        Conceptual Authoring was originally developed as a way of de ning the
input for multilingual NLG, in applications where content could be
formally represented by an ABox [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The method depends on a model for
systematically aligning a generated feedback text to a graph representing
the ABox, with nodes denoting individuals and arcs denoting relations
among individuals. Events and propositions are treated as individuals, so
that for example the sentence `a doctor arrived' might express an ABox
fragment with two individuals, one belonging to the class PastArrival
and one to the class MedicalDoctor, related by the property hasAgent:
      </p>
      <p>
        P astArrival(e1) &amp; M edicalDoctor(e2) &amp; hasAgent(e1; e2)
3 What You See Is What You Meant
4 Methods have been proposed for making Wysiwym open-ended by allowing users to
add new terms for atomic classes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; these need to be extended in order to support
full TBox editing with open properties and complex classes.
      </p>
      <p>Editing begins by o ering the user a place-holder text indicating a
constraint on the individual that embraces the whole sentence (e.g., it must
be an event or situation). When this span is clicked, a menu pops up
offering a list of options computed from the speci c event/situation classes
in the TBox, also expressed in CNL. Typically these options will include
further place-holder texts constraining the participants in the event. Here
is a typical editing sequence:</p>
      <p>User action Feedback text
Choose New Event [Something happened]
Choose from pop-up menu [Someone] shaved [someone]
Choose from pop-up menu A barber shaved [someone]
Copy the barber individual A barber shaved [someone]</p>
      <p>Paste on to place-holder A barber shaved himself
The important point to understand here is that all editing operations
are de ned on the ABox graph, not on the text, which is used only as
a means of presenting the ABox and the options for updating it. The
editing process depends on a model for aligning a text to a graph, in
which text spans are mapped to nodes, and span-subspan relations are
mapped to arcs.</p>
      <p>
        Can a similar process be devised for editing a knowledge base
embracing ABox and TBox? The simplest approach, in our view, is to adopt an
alignment model in which sentences denote axioms, and major sentential
constituents denote classes; editing can then be performed by selecting
any span denoting a class, and replacing it by a span denoting another
class, chosen as before from a menu of options generated by the
system. ABox assertions can be incorporated into this model by treating
individuals as enumerated classes containing a single element [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As an
illustration, consider a description logic with the following resources for
constructing classes and axioms:
      </p>
      <p>
        Description Syntax OWL
atomic class AN ame AName
universal class &gt; Thing
enumerated class fag oneOf(a)
exists restriction 9R:C someValuesFrom(R,C)
class inclusion C v D subClassOf(C,D)
The author is allowed to add new terms to the knowledge base
(individual, class or property names), and must link them to words (drawn from
a wide-coverage lexicon) of a syntactic category constrained by the CNL.
In a very simple CNL, individuals could be expressed by proper names,
classes by count nouns, and properties by transitive verbs. A xed CNL
grammar for expressing the description language takes care of the rest.
A knowledge base comprises a list of axioms; when the user invokes New
axiom, a trivial axiom &gt; v &gt; is added, and can be edited by
substitution operations on the classes. Here is an example from the People+Pets
domain [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]:
      </p>
      <p>Logical form</p>
      <sec id="sec-2-1">
        <title>Feedback text</title>
        <p>Everything is a thing
Every pet is a thing</p>
        <p>Every pet is an animal
&gt; v &gt;
P v &gt;</p>
        <p>P v A
Obviously P and A here are arbitrary labels that would be replaced
in a Semantic Web application by URIs meeting the W3C namespace
conventions. Here is a more complex sequence using both individuals
and properties:</p>
        <p>Logical form</p>
      </sec>
      <sec id="sec-2-2">
        <title>Feedback text</title>
        <p>&gt; v &gt; Everything is a thing
fmg v &gt; Mary is a thing
fmg v 9R:&gt; Mary owns one or more things
fmg v 9R:P Mary owns one or more pets
In these sequences, transitions from one line to the next are made by
clicking on a word/phrase representing a class, and choosing from a menu
of permissible substitutions. For instance, starting from `Everything is a
thing', the user might click on `Everything' and then choose `Every pet'
from the following pop-up menu:</p>
        <p>Mary
Every animal
Every dog
Every pet
Everything that likes one or more things
Everything that owns one or more things
. . .</p>
        <p>
          This method raises an obvious problem of scale: for any non-trivial
ontology, classes will have to be selected from thousands of alternatives, and
some kind of search mechanism will therefore be needed. One solution
already used in Wysiwym applications [
          <xref ref-type="bibr" rid="ref3 ref5 ref6">3, 6, 5</xref>
          ] is a menu equipped with
a text eld through which users can narrow the focus by typing in some
characters from the desired word or phrase. In an ontology editor this
search mechanism could be enhanced by using the ontology itself in order
to pick options that are conceptual rather than orthographic neighbours
| for instance on typing in `dog' the user would obtain a focussed list
containing `poodle' and `pekingese' as well as `doggerel'.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>We have described a new approach to editing hybrid knowledge bases
using controlled natural language. The novelty of this approach is that
it relies entirely on natural language generation, thereby avoiding
possible pitfalls of interpretation-based methods (e.g., interpretation errors,
training e ort) as well as bringing potential bene ts (e.g., multilinguality,
more exibility in designing the CNL). However, our proposals have not
yet been developed in detail or applied and evaluated for large-scale
ontologies5. It remains to be seen (a) whether the proposed editing method
is intuitive for users, (b) whether users are able (and willing) to de ne the
necessary lexical resources during ontology building, (c) whether such an
5 These tasks are being addressed in SWAT (Semantic Web Authoring Tool), a joint
project between the Open University and the University of Manchester.
editing tool will scale up to large ontologies and expressive DLs like OWL
Full, and (d) whether the di erences between interpretation-based and
generation-based approaches are important or merely an implementation
detail.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Ra aella Bernardi, Diego Calvanese, and
          <string-name>
            <given-names>Camilo</given-names>
            <surname>Thorne</surname>
          </string-name>
          .
          <article-title>Lite natural language</article-title>
          .
          <source>In 7th Int. Workshop on Computational Semantics (IWCS-7)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          and E. Kaufmann. GINO {
          <article-title>a guided input natural language ontology editor</article-title>
          .
          <source>In Proceedings of the 5th International Semantic Web Conference</source>
          , Athens, Georgia,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Nadjet</surname>
            Bouayad-Agha, Richard Power,
            <given-names>Donia</given-names>
          </string-name>
          <string-name>
            <surname>Scott</surname>
          </string-name>
          , and Anja Belz. PILLS:
          <article-title>Multilingual generation of medical information documents with overlapping content</article-title>
          .
          <source>In Proceedings of the Third International Conference on Language Resoures and Evaluation (LREC</source>
          <year>2002</year>
          ), pages
          <fpage>2111</fpage>
          {
          <fpage>2114</fpage>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Palmas</given-names>
          </string-name>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Dongilli</surname>
          </string-name>
          .
          <article-title>Discourse Planning Strategies for Complex Concept Descriptions</article-title>
          .
          <source>In Proceedings of the 7th International Symposium on Natural Language Processing</source>
          , Pattaya, Chonburi, Thailand,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Piwek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cahill</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Tipper</surname>
          </string-name>
          .
          <article-title>Natural Language Processing in CLIME, a Multilingual Legal Advisory System</article-title>
          .
          <source>Journal of Natural Language Engineering</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <volume>101</volume>
          {
          <fpage>132</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Catalina</given-names>
            <surname>Hallett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Donia</given-names>
            <surname>Scott</surname>
          </string-name>
          , and Richard Power.
          <article-title>Composing queries through conceptual authoring</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>33</volume>
          (
          <issue>1</issue>
          ):
          <volume>105</volume>
          {
          <fpage>133</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hielkema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mellish</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Edwards</surname>
          </string-name>
          .
          <article-title>Using WYSIWYM to create an open-ended interface for the semantic grid</article-title>
          .
          <source>In Proceedings of the 11th European Workshop on Natural Language Generation</source>
          , Schloss Dagstuhl,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kaljurand</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          .
          <article-title>Verbalizing OWL in Attempto Controlled English</article-title>
          .
          <source>In Proceedings of OWL: Experiences and Directions</source>
          , Innsbruck, Austria,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Qing</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Haarslev</surname>
          </string-name>
          .
          <article-title>OntoKBEval : A Support Tool for DLbased Evaluation of OWL Ontologies</article-title>
          .
          <source>In Proceedings of the 2006 International Workshop on OWL: Experiences and Directions</source>
          , Georgia, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>R.</given-names>
            <surname>Power</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Scott</surname>
          </string-name>
          .
          <article-title>Multilingual authoring using feedback texts</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <volume>1053</volume>
          {
          <fpage>1059</fpage>
          , Montreal, Canada,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Schaerf</surname>
          </string-name>
          .
          <article-title>Reasoning with individuals in concept languages</article-title>
          .
          <source>Data and Knowledge Engineering</source>
          ,
          <volume>13</volume>
          :
          <fpage>141</fpage>
          {
          <fpage>176</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwitter</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tilbrook</surname>
          </string-name>
          .
          <article-title>Controlled natural language meets the semantic web</article-title>
          .
          <source>In Proceedings of the Australasian Language Technology Workshop</source>
          , pages
          <volume>55</volume>
          {
          <fpage>62</fpage>
          , Macquarie University,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. Thompson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pazandak</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Tennant</surname>
          </string-name>
          .
          <article-title>Talk to your semantic web</article-title>
          .
          <source>IEEE Internet Computing</source>
          ,
          <volume>9</volume>
          (
          <issue>6</issue>
          ):
          <volume>75</volume>
          {
          <fpage>78</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>