<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Practical Ambiguity Detection for Context-Free Grammars</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Research Abstract</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H. J. S. Basten</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The use of unconstrained context-free grammars for generalized parsing techniques has several advantages over traditional grammar classes, but comes with the danger of undiscovered ambiguities. The ambiguity problem for these grammars is undecidable in the general case, but this does not have to be a problem in practice. Our goal is to nd ambiguity detection techniques that have su cient precision and performance to make them suitable for practical use on realistic grammars. We give a short overview of related work, and propose new directions for improvement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Generalized parsing techniques allow the use of the entire class of context-free
grammars (CFGs) for the speci cation of the syntax of programming languages.
This has several advantages. First, it allows for modular syntax de nitions, which
simpli es grammar development and enables reuse. Second, it grants total
freedom in structuring a grammar to best t its intended use. Grammars do not
have to be squeezed into LL, LALR or LR(k) form for instance.</p>
      <p>Unfortunately, using unconstrained context-free grammars comes with the
danger of ambiguities. A grammar is ambiguous if one or more sentences in its
language have multiple parse trees. The semantics of a sentence is usually based
upon the structure of its parse tree, so an ambiguous sentence can have multiple
meanings. This often indicates a grammar bug which should be avoided.
However, in some cases a grammar is intended to contain some degree of ambiguity.
For instance in reverse engineering, where certain legacy languages can only be
disambiguated with type checking after parsing. In both cases it is important
to know the sources of ambiguity in the developed grammar, so they can be
resolved or veri ed.</p>
      <p>
        Unfortunately, detecting the (un)ambiguity of a grammar is undecidable in
the general case [
        <xref ref-type="bibr" rid="ref7">7, 10, 9</xref>
        ]. However, this does not necessarily have to be a problem
in practice. Several Ambiguity Detection Methods (ADMs) exist that approach
the problem from di erent angles, all with their own strengths and weaknesses.
Because of the undecidability of the problem there is a general tradeo between
precision and performance/termination. The challenge for all ADMs is to give the
most precise and understandable answer in the time available. The current state
of the art is not yet su ciently advanced to be practical on realistic grammars,
especially the larger ones.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Brief Overview of Related Work</title>
      <p>
        Existing ADMs can roughly be divided into two categories: exhaustive methods
and approximative ones. Methods in the rst category exhaustively search the
language of a grammar for ambiguous sentences. This so called sentence
generation is applied by [
        <xref ref-type="bibr" rid="ref1">11, 8, 13, 1</xref>
        ]. These methods are 100% accurate, but a problem
is that they never terminate if the grammar's language is of in nite size, which
usually is the case. They do produce the most precise and useful ambiguity
reports, namely ambiguous sentences and their parse trees.
      </p>
      <p>
        Approximative methods sacri ce accuracy to be able to always nish in nite
time. They search an approximation of the grammar for possible ambiguity. The
methods described in [
        <xref ref-type="bibr" rid="ref6">12, 6</xref>
        ] both apply conservative approximation to never
miss ambiguities. The downside of this is that when they do nd ambiguities, it
is hard to verify whether or not these are false positives.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] we compared the practical usability of several ADMs on a set of
grammars for real world programming languages. It turned out that the exhaustive
sentence generator AMBER [13] was the most practical due to its exact reports
and reasonable performance. However, it was still unsatisfactory to nd realistic
ambiguities in longer sentences. The approximative Noncanonical Unambiguity
test [12] had a reasonably high accuracy, but it is only able to assess the
ambiguity of a grammar as a whole. Its reports might point out sources of individual
ambiguities, but these can be hard to understand.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Solution</title>
      <p>The aim of this research is to increase the precision and performance of
ambiguity detection to a practical level. More speci cally, the idea is to increase
the performance of exhaustive searching by reducing the search space using
approximative techniques. For instance by identifying harmless production rules
in a grammar. These are rules that are certainly not used in the derivation of
any ambiguous string. Since these rules do not contribute to the ambiguity of
the grammar they can be removed before exhaustive searching is applied, which
reduces the number of sentences to generate.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] we describe a rst exploration in this direction. We propose an extension
to the approximative Noncanonical Unambiguity test that enables it to identify
harmless production rules. We implemented this ltering technique into a tool [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
which we applied on a series of real world programming language grammars
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It is shown that the performance of the sentence generators AMBER and
CfgAnalyzer is indeed improved by several orders of magnitude, with only a
small ltering overhead.
      </p>
      <p>Further research will be focussed on nding more detailed detection
techniques that can identify harmless rules with more precision, as well as faster
sentence generation methods. For instance by exploring opportunities for
paralellisation. Furthermore, we like to extend existing techniques to include
commonly used disambiguation constructs, like priorities and associativities, longest
match, keyword reservation, etc.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Research Method</title>
      <p>New techniques will be validated both theoretically and experimentally. Through
formal speci cation they will be proved correct. Then, to test a technique's
suitability for practical use, a prototype implementation will be tested on a
series of realistic benchmark grammars. For instance grammars for real world
programming languages, that will be seeded with ambiguities if needed.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This abstract proposes research into ambiguity detection for context-free
grammars, to make it suitable for practical use. More speci cally, it aims at
combining approximative searching with exhaustive searching, to be able to nd real
ambiguous sentences in shorter time. First explorations in this direction show
promising results.
8. Cheung, B.S.N., Uzgalis, R.C.: Ambiguity in context-free grammars. In:
Proceedings of the 1995 ACM Symposium on Applied Computing (SAC 1995). pp. 272{276.
ACM Press, New York, NY, USA (1995), http://doi.acm.org/10.1145/315891.
315991
9. Chomsky, N., Schutzenberger, M.: The algebraic theory of context-free languages.</p>
      <p>In: Bra ort, P. (ed.) Computer Programming and Formal Systems, pp. 118{161.</p>
      <p>North-Holland, Amsterdam (1963)
10. Floyd, R.W.: On ambiguity in phrase structure languages. Communications of the</p>
      <p>ACM 5(10), 526{534 (1962)
11. Gorn, S.: Detection of generative ambiguities in context-free mechanical languages.</p>
      <p>J. ACM 10(2), 196{208 (1963), http://doi.acm.org/10.1145/321160.321168
12. Schmitz, S.: Conservative ambiguity detection in context-free grammars. In: Arge,
L., Cachin, C., Jurdzinski, T., Tarlecki, A. (eds.) ICALP'07: 34th International
Colloquium on Automata, Languages and Programming. LNCS, vol. 4596 (2007)
13. Schroer, F.W.: AMBER, an ambiguity checker for context-free grammars. Tech.
rep., compilertools.net (2001), see http://accent.compilertools.net/Amber.
html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Axelsson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heljanko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Analyzing context-free grammars using an incremental SAT solver</article-title>
          .
          <source>In: Proceedings of the 35th International Colloquium on Automata, Languages, and Programming (ICALP</source>
          <year>2008</year>
          ). LNCS, vol.
          <volume>5126</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basten</surname>
            ,
            <given-names>H.J.S.:</given-names>
          </string-name>
          <article-title>The usability of ambiguity detection methods for context-free grammars</article-title>
          . In: Johnstone,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Vinju</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the Eigth Workshop on Language Descriptions, Tools and Applications (LDTA</source>
          <year>2008</year>
          ). ENTCS, vol.
          <volume>238</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Basten</surname>
            ,
            <given-names>H.J.S.:</given-names>
          </string-name>
          <article-title>Tracking down the origins of ambiguity in context-free grammars</article-title>
          .
          <source>In: Proceedings of the Seventh International Colloquium on Theoretical Aspects of Computing (ICTAC</source>
          <year>2010</year>
          ). Springer (
          <year>2010</year>
          ), To appear, see www.cwi.nl/~
          <article-title>basten for a preliminary version.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Basten</surname>
            ,
            <given-names>H.J.S.</given-names>
          </string-name>
          , van der Storm, T.:
          <article-title>AmbiDexter: Practical ambiguity detection, tool demonstration</article-title>
          .
          <source>In: Proceedings of the Tenth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM</source>
          <year>2010</year>
          ). IEEE (
          <year>2010</year>
          ), To appear, see www.cwi.nl/~
          <article-title>basten for a preliminary version.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Basten</surname>
            ,
            <given-names>H.J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinju</surname>
            ,
            <given-names>J.J.:</given-names>
          </string-name>
          <article-title>Faster ambiguity detection by grammar ltering</article-title>
          .
          <source>In: Proceedings of the Tenth Workshop on Language Descriptions, Tools and Applications (LDTA</source>
          <year>2010</year>
          ). ACM (
          <year>2010</year>
          ), To appear, see www.cwi.nl/~
          <article-title>basten for a preliminary version.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Brabrand</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giegerich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , M ller, A.:
          <article-title>Analyzing ambiguity of context-free grammars</article-title>
          .
          <source>Science of Computer Programming</source>
          <volume>75</volume>
          (
          <issue>3</issue>
          ),
          <volume>176</volume>
          {
          <fpage>191</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cantor</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>On the ambiguity problem of Backus systems</article-title>
          .
          <source>Journal of the ACM</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ),
          <volume>477</volume>
          {
          <fpage>479</fpage>
          (
          <year>1962</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>