<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Community-Driven Controlled Natural Languages Evolution</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Tallinn University of Technology</institution>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ELIKO Competence Centre in Electronics-</institution>
          ,
          <addr-line>Informationand Communications Technologies</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>FocusIT Ltd</institution>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>M3D Ltd</institution>
          ,
          <country country="EE">Estonia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Controlled Natural Languages (CNLs) are usually engineered in a design-centric paradigm by a coherent project team. We can draw a parallel between such an approach and a commercial software development process, genetically modified organism engineering, cathedral design, auxiliary languages design. According to our knowledge, no planned international auxiliary language has turned out to be a viable solution, contrary to creoles and pidgins. The paper proposes a methodological vision towards a community-driven CNLs evolution a paradigm where a group of volunteers develop CNLs based on the principles of bazaar-style open software development methodologies. We demonstrate the application of the approach on the basis of industrial use-cases elaborated for developing a Controlled Natural Estonian.</p>
      </abstract>
      <kwd-group>
        <kwd>Controlled Natural Language design</kwd>
        <kwd>use-cases</kwd>
        <kwd>community-driven</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Creole and pidgin languages are often considered simplified languages unconsciously
born from a practical situation. Sociolinguistically, an environment spontaneously
gives birth to a pidgin language [1]. A planned language, on the other hand, is created
consciously by a person (Esperanto, Volapük, Latino sine flexione, et al.) or by a
coherent project team (Interlingua de IALA, Lojban, et al.). Because there is a group to
create the pidgin and creole language, a group of speakers is not lacking. In contrast,
an individual creates a planned language; to stay alive, it has to draw a group of
speakers to learn and use it [ibid].</p>
      <p>After more than centuries of crafting planned international auxiliary languages,
and despite marketing efforts, no viable language has emerged [2]. In contrast,
“creolization” of some pidgins has given us several lingua francas.</p>
      <p>Proceeding from the hypothesis that natural phenomena are more viable than
artificial, we mix the natural phenomenon with open software development approach
and present, thus, a methodological vision on how to develop community-driven CNLs
(cf Sec.2). As regards the applicability of the approach in the business context, we have
started with industrial stakeholders to elaborate the first motivating use cases (cf
Sec.3). These use cases will serve as a basis for the designers of a CNL, providing them
with descriptions of sociolinguistic profiles, abilities and requirements of the users.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Community-driven CNLs evolution: methodological vision</title>
      <p>Supporting the formation of natural diction with open software development
techniques, the methodological approach towards a community-driven CNL
development is as follows:
1. Attracting the attention of organizations and projects, where CNL could
provide business value.
2. Elaborating motivating use cases. In order to map target groups and fields
of application, motivating use cases are described. The use cases will
broadly meet the requirements stated in the OpenUP process framework:
name, brief description and purpose, actor(s), work flows and conditions
of the use case will be described [3]. As a result, it is possible to detail the
needs and requirements for specific user profiles.
3. For every use case, an initial human-authored corpus of text samples is
assembled, a process called Corpus-Based Requirements Analysis [4].
4. Composing the sociolinguistic profiles, abilities and requirements of the
user.
5. Adjusting the properties of the CNL according to the framework proposed
by Adam Wyner et al. [5] within a number of generic, design, linguistic,
and relational properties.
6.
7.</p>
      <p>Selecting reusable components (software, linguistic assets, etc) from a
CNL repository, based on a distance between the properties of the CNL
under construction (see Step 5) and available components described by
the same framework.</p>
      <p>Customizing these components to the needs outlined in steps 2-5.</p>
      <p>These process-steps will not be carried out in a rigorous order, but in a flexible,
iterative manner. The process description itself will be open for improvements by the
CNL community – we will make arrangements to publish the process description using
a EPF Wiki engine [6], based on Software &amp; Systems Process Engineering
MetaModel (SPEM 2.0) [7].</p>
      <p>The aim of such an approach is to (i) foster the exploitation of CNLs by lowering
the barrier of creating a CNL and (ii) to speed up the time to market.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Motivating use cases</title>
      <p>The scenario, functional and non-functional requirements, context, user profiles, user
needs and other aspects of our project are explained in a manner familiar to software
developers – we use traditional use case technique invented by Ivar Hjalmar Jacobson,
widely recommended by software development methodologies, including OpenUP [3].
3.1</p>
      <sec id="sec-3-1">
        <title>UC1 – Information retrieval based on semantic similarity</title>
        <p>
          Main idea: document abstracts and (wiki) article summaries are written in a Controlled
Natural Language, enabling semantic search and article recommendation based on the
semantic distance of CNL abstracts (for an overview of semantic relatedness, similarity
and distance, see [8], for the use of CNLs in wikis, see [9], [
          <xref ref-type="bibr" rid="ref8">10</xref>
          ]).
        </p>
        <p>Applicability: search engines, recommendation engines, wiki engines, Electronic
Document and Records Management Systems (EDRMS).</p>
        <p>Features requested by our clients are:
• A look-ahead editor that goes beyond predicting only by one word –
a pattern-based sentence authoring is needed: an annotator selects an intention
from a list and a sentence pattern in a CNL is provided by the system,
enabling to fill in words from ontologies and to extend the sentence with
subpatterns.
• A possibility to create a mixed content, i.e. a text where a CNL-part
is not spatially separated from the main, “uncontrolled” part of the document,
rather distributed throughout a text and marked with microformats. This will
allow domain experts to start with semi-correct CNL texts and other users
more fluent in CNLs to iteratively improve the correctness of the CNL part,
while preserving the semantics.
• An editor with real-time named entity recognition (NER) and
disambiguation capabilities – a feature rated more useful than consistency
checks, for example.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>UC2 – Tagging of digital items</title>
        <p>Main idea: Digital items are described by using Controlled Natural Language, enabling
cross-lingual information access and delivery. Potentiality to facilitate and speed up the
provision of online services centered around computer-based translation.</p>
        <p>Applicability: Blog entries, user profiles in social networks, photos, other digital
items (i.e. sources like Europeana1).</p>
        <p>Storyboard. A person looking for digitized photos of old cars from historical
archives. Mostly short content description is added in pertinent mother tongue. By
writing those descriptions in a CNL, it is possible to make information widely
accessible and it is much cheaper than using human-translations into various languages.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>UC3 – Localizing (open) software</title>
        <p>Purpose: User interfaces (UI) and documentation of software are available to end-users
in their mother tongue, the translation process is smooth, quick, and cost-effective.</p>
        <p>Actors: open software developers, localizers.</p>
        <p>
          Main workflow: In the development process of a software product, its UI messages
and documentation (i.e. comments, manuals) are written in a CNL (for example,
Processable Afrikaans [
          <xref ref-type="bibr" rid="ref9">11</xref>
          ]). In the translation/localization process, a pivot
1
http://www.europeana.eu
pattern/architecture is used, Interlingua de IALA as one possible candidate. In this way,
no translation rules are needed for every language pair.
        </p>
        <p>Remarks: These types of CNLs are not FOL-based.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>UC4 – Communication with smart environment</title>
        <p>Purpose: Enabling easier/flexible use of smart environment (e.g. home) automation.</p>
        <p>Actor: Person controlling automated systems (e.g. inhabitant of a smart home).</p>
        <p>Preconditions: A CNL supported automation control unit (ACU) with voice
recognition capabilities.</p>
        <p>Main Workflow:
1. Actor asks a question or gives an order in a natural speech (e.g. lower the
house's internal temperature -2 degrees when alarm is engaged and restore
default settings when alarm is disengaged).
2. ACU repeats the command in a CNL.
3. Actor confirms.
4. After the conditions are met (e.g. the alarm is engaged), ACU performs the
required actions (e.g. a signal will be sent to a heating room control unit).</p>
        <p>We hereby name additional use cases: management of (controlled) vocabularies;
creation and management of BPMN2 schemas; web-service annotation with CNL-based
SA-WSDL3 descriptions, web-service composition.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work</title>
      <p>The paper proposes and presents a methodological vision towards a community-driven
CNLs evolution – a paradigm where a group of volunteers develop CNLs based on the
principles of bazaar-style open software development methodologies. It is rather a
vision than a clear methodology or descriptions of first results.</p>
      <p>We call for:
1.
2.</p>
      <p>Establishing a repository for collecting reusable CNL components:
use-case descriptions, software, linguistic assets, etc. The metadata
of these assets should take advantage of the framework proposed in
[5].</p>
      <p>Elaborating a “open-source” process description for creating/
customizing CNLs.
This research was partially supported by the target-financed theme No. EKKTT10-75
of the Estonian Ministry of Education and Research.</p>
      <p>The authors would like to thank their industrial partners: Estonian Land Board, ELIKO
Competence Centre in Electronics-, Information- and Communications Technologies,
and four anonymous reviewers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Haitao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Creoles</surname>
            , Pidgins, and
            <given-names>Planned Languages. Schubert. K.</given-names>
          </string-name>
          <article-title>(red.): Planned Languages: From Concept to Reality</article-title>
          .
          <source>Brussel: Hogeschool voor Wetenschap en Kunst</source>
          , p.
          <fpage>121</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Olsen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <article-title>Marketing an International Auxiliary Language: Challenges to a New Artificial Language</article-title>
          .
          <source>In Journal of Universal Language</source>
          ,
          <year>2003</year>
          -
          <fpage>4</fpage>
          , p.
          <fpage>75</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Building Natural Language Generation Systems</source>
          . Cambridge University Press,
          <year>1999</year>
          . ISBN 0-521-62036-8.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          (ed.).
          <source>In Proceedings of the Workshop on Controlled Natural Languages (CNL</source>
          <year>2009</year>
          ), Marettimo Island, Italy,
          <fpage>8</fpage>
          -
          <lpage>10</lpage>
          June,
          <year>2009</year>
          . LNCS/LNAI, vol.
          <volume>5972</volume>
          , Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Software process engineering metamodel</article-title>
          .
          <source>Version 2</source>
          .0. Final Adopted Specification formal/2008-04-
          <fpage>01</fpage>
          . Object management group (OMG),
          <year>April 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Budanitski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Lexical Semantic Relatedness and Its application in Natural Language Processing</article-title>
          .
          <source>Technical Report CSRG-390</source>
          , Dept. of Computer System Science, University of Toronto,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <article-title>Further Use of Controlled Natural Languages for Semantic Annotation of Wikis</article-title>
          .
          <source>In Proceedings of the 1st Semantic Authoring and Annotation Workshop at ISWC2006</source>
          , Athens, Georgia, USA,
          <year>November 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smart</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadbolt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braines</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>A Controlled Natural Language Interface for Semantic Media Wiki</article-title>
          .
          <source>In Proceedings of the 3rd Annual Conference of the International Technology Alliance (ACITA'09)</source>
          ,
          <fpage>23rd</fpage>
          -24th
          <source>September</source>
          <year>2009</year>
          , Maryland, USA.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pretorius</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwitter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Towards Processable Afrikaans</source>
          . Fuchs, N.E. (ed.).
          <source>In Proceedings of the Workshop on Controlled Natural Languages (CNL</source>
          <year>2009</year>
          ), Marettimo Island, Italy,
          <fpage>8</fpage>
          -
          <lpage>10</lpage>
          June,
          <year>2009</year>
          . LNCS/LNAI, vol.
          <volume>5972</volume>
          , Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>