<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shape Analysis as an Aid for Grammar Induction1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ife Adebara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronica Dahl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Simon Fraser University</institution>
          ,
          <addr-line>8888 University Drive, Burnaby</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <fpage>55</fpage>
      <lpage>57</lpage>
      <abstract>
        <p>Visual shapes inherent in di↵ erent aspects of language processing have been manifesting themselves as important not only for enhancing that process itself, but also for helping solve open problems in ways that are more economical and more intuitive than the usual statistical-based, massive processing approaches. In this article we investigate an interesting use of gleaning shape from input sets, as an aid for mixed language grammar induction.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>shape implicit in errors</kwd>
        <kwd>womb grammars</kwd>
        <kwd>constraint-based parsing</kwd>
        <kwd>multilingual text</kwd>
        <kwd>grammar induction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction, Background, and The Main Problem</title>
      <p>Imagine an automatic language processing system (for a language we shall call
the source language) that can adjust its own grammar rules so that they become
those of another language (which we shall call the target language). Imagine that
for doing so, our system only needs access to a corpus of representative correct
sentences of the target language, plus access to the target language’s lexicon.</p>
      <p>
        A computational methodology exists- Womb Grammars, or WG [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]- for
solving precisely this grammar induction problem. It is implemented on top of CHRG
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and evolved from Property Grammars [
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ]. WGs have been useful in various
applications such as second language tutoring [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], language acquisition [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
bio-inspired computation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>WGs expect a language’s grammar to be formulated in terms of grammar
constraints (properties) between pairs of constituents of a phrase. For instance
we can define a (very simple) noun phrase pattern by saying that its allowable
constituents are determiners and nouns (constituency constraint), that each can
appear only once (unicity), that the noun is obligatory, that the determiner must
precede the noun (precedence), and so on. WGs work by observing the list of
violated properties that are output when correct sentences in the target language
are fed to the source grammar, and “correcting” that grammar so that these
properties are no longer violated.</p>
      <p>1This research was supported by NSERC Discovery grant 31611024</p>
      <p>
        With the spontaneous language mixes inherent in social media
communications across countries, it has become important to automate the processing of
mixed languages and jargons also. A proposal for using WGs in this sense was
put forward in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but an open problem remained: that of determining how
predominantly a main language’s given constructs show up versus the secondary
language’s counterpart. There was only the suggestion that it might be solved by
using some statistical analysis in a second round of parsing.
      </p>
      <p>It is our thesis here that visual clues can adequately address this problem,
together with an expert’s small amount of time and interaction with the system.
We shall present our ideas through the example of noun phrase’s properties.
Similar considerations apply to other types of phrases.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Our Proposed Solution</title>
      <p>For illustration purposes, let us assume that English noun phrases consistently
follow the linear precedence rule adj &lt; noun. Concretely, we propose to line up our
set of input phrases one below another and visually mark all the incorrectly (with
respect to the source grammar) ordered nouns in this set (or conversely, all
adjectives), say by writing them all in blue. If the input corpus is correct with respect
to English, the resulting blue shape will be a straight line. For a Y`oru`b´a native
speaker, the corpus may be tinted with alternative, Y`oru`b´a-inspired orderings,
introducing visible “scoliosis” into the resulting blue shape.</p>
      <p>Thus, a simple visual inspection of the coloured shape formed in the set of
input phrases would give us a quick idea not only of whether the corpus exhibits
violations of the main language’s constraints, but also of how predominant the
secondary language is, for the property in question, for the user at hand. In general
terms, the more visual scoliosis, the more deviation from the norm- independently
of how the input sentences are ordered. Similarly, we can visually mark failed
properties that WG have found out along our input corpus, using di↵ erent colours
for each property: disallowed constituents (such as a verb as direct daughter of a
noun phrase) can be marked in red, to bring them to the human expert’s attention,
who may then decide to include the “extraneous” category because ubiquitous.</p>
      <p>Obligatory categories that are missing can be marked as labelled arcs across
the phrase where they are missing. Violations of uniqueness can be highlighted in
another colour, to quickly draw the expert’s eye towards a decision of whether to
delete the extra occurrence because of deeming it a typo, or to adjust the grammar
in order to relax the uniqueness constraint. Visually marking two constituents
that exclude each other could quickly call on the expert to modify the grammar
so as to accept, e.g. the Y`oru`b´a-influenced coexistence of a determiner with a
proper name, as in “the Veronica”.</p>
      <p>Should the main language in our system include a noun’s strict requirement
for a determiner, the correct action when our English-tinted input “Lions sleep
tonight” shows up would be to relax the requirement under the stated
circumstance. Again we must colour absence if we are to catch the expert’s eye to solicit
their input on whether to add a determiner, or adjust the grammar to include the
said relaxation condition for plural generic nouns. And again, the marking must
be done as a label on an arc that covers the entire phrase.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Concluding Remarks</title>
      <p>We have shown how some Womb Grammar parsing results can be re-expressed in
terms of shape, so that a human expert can quickly determine visually the relative
strengths of competing properties of the grammar. With this work we hope to
stimulate further research into extending grammars with visual interactive means
for adjusting them. We believe that complementing logic-based grammars with
visually driven interactions with an expert can become a very fruitful, while less
expensive alternative to statistical parsing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Adebara</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dahl</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>T.S.:</surname>
          </string-name>
          <article-title>Completing mixed language grammars through womb grammars plus ontologies</article-title>
          .
          <source>In: In Proceedings of the International Conference on Agents and Artificial Intelligence</source>
          , Lisbon, Portugal. pp.
          <fpage>292</fpage>
          -
          <lpage>297</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Becerra</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dahl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jim´</surname>
          </string-name>
          enez-L´opez, M.D.:
          <article-title>Womb grammars as a bio-inspired model for grammar induction</article-title>
          .
          <source>In: Trends in Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection</source>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Becerra</given-names>
            <surname>Bonache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Dahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Miralles</surname>
          </string-name>
          , J.:
          <article-title>On second language tutoring through womb grammars</article-title>
          . In: Rojas,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Joya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Gabestany</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Advances in Computational Intelligence. Lecture Notes in Computer Science</source>
          , vol.
          <volume>7902</volume>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>197</lpage>
          . Springer Berlin Heidelberg (
          <year>2013</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -38679-4_
          <fpage>18</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Property grammars: A fully constraint-based theory</article-title>
          .
          <source>In: Proceedings of the First International Conference on Constraint Solving and Language Processing</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . CSLP'
          <volume>04</volume>
          , Springer-Verlag, Berlin, Heidelberg (
          <year>2005</year>
          ), http://dx.doi.org/10.1007/ 11424574_
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Christiansen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>CHR grammars</article-title>
          .
          <source>TPLP</source>
          <volume>5</volume>
          (
          <issue>4-5</issue>
          ),
          <fpage>467</fpage>
          -
          <lpage>501</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dahl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miralles</surname>
          </string-name>
          , J.:
          <article-title>Womb grammars: Constraint solving for grammar induction</article-title>
          . In: Sneyers,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Fru¨hwirth, T. (eds.)
          <source>Proceedings of the 9th Workshop on Constraint Handling Rules</source>
          . vol.
          <source>Technical Report CW 624</source>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          . Department of Computer Science, K.U.
          <string-name>
            <surname>Leuven</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Dahl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Directly executable constraint based grammars</article-title>
          .
          <source>In: Proc. Journees Francophones de Programmation en Logique avec Contraintes</source>
          ,
          <string-name>
            <surname>JFPLC</surname>
          </string-name>
          <year>2004</year>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dahl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miralles</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becerra</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>On language acquisition through womb grammars</article-title>
          .
          <source>In: 7th International Workshop on Constraint Solving and Language Processing</source>
          . pp.
          <fpage>99</fpage>
          -
          <lpage>105</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Wattenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Arc diagrams: Visualizing structure in strings</article-title>
          .
          <source>In: Proceedings of the IEEE Symposium on Information Visualization</source>
          . pp.
          <fpage>110</fpage>
          -
          <lpage>116</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Strzalkowski</surname>
          </string-name>
          (ed), T.:
          <article-title>Reversible grammar in natural language processing</article-title>
          . Springer Science + Business Media,
          <string-name>
            <surname>B.V.</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>