<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Parsing of Context-Sensitive Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rychnovsky</string-name>
          <email>rychnov@fit</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Systems, Faculty of Information Technology. Brno University of Technology</institution>
          ,
          <addr-line>BoZet6chova1, 612 66 Brno</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>eywords: Turing Machines, Parsing of Context-Sensitive Languages, Fornral Program Verification</institution>
          ,
          <addr-line>Scattered-Context Grammars</addr-line>
        </aff>
      </contrib-group>
      <fpage>219</fpage>
      <lpage>226</lpage>
      <abstract>
        <p>This article presentssomeideasfrom parsing Context-Sensitive languages. Introduces Scattered-Context grammars and languages and describes usage of such grammars to parse CS languages. Also there are presented additional results fronr type checking and formal prograrn verification using CS parsing. Scattered context grammars Definition 1. A scatter-edco'ntertg'raTrlTna(rSCC) G is a q'uadruple(V,7, P, S), whereV is afi"niteset of symbols,TcV, S e I/\", andP is aset of production rules of the form t t A t x z A z . . . x n A n I n + r = * I 1 w 1 r 2 w 2 . . . T n . u ) n u n q 1 ** is refl,eriue,t,ransitiueclosure of +. Langu,agegeneratedbu the grammar G is definedas L(c) - {w € T*l,g +. ar}.</p>
      </abstract>
      <kwd-group>
        <kwd>( A r</kwd>
        <kwd />
        <kwd />
        <kwd />
        <kwd />
        <kwd>A</kwd>
        <kwd>- ) - ( u r</kwd>
        <kwd />
        <kwd />
        <kwd />
        <kwd />
        <kwd>u n )</kwd>
        <kwd>f r ) I</kwd>
        <kwd>Y A i</kwd>
        <kwd>A i e I / \ ?</kwd>
        <kwd>Y w i</kwd>
        <kwd>w i € V "</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>This work results from [Kol-04] where relationship betweenregulated pushdown
automata (RPDA) and T\ring machinesare discussed.We are particulary
interested in algorithms which rnay be used to obtain RPDAs from equivalent Turing
machines.Such interest arisespurely from practical reasonsbecauseRPDAs
provide easier implementation techniquesfor problems where Tbring machines are
naturally used. As a representant of such problems we study context-sensitive
extensions of programming language grammars which are usually context-free.
By introducing context-sensitive synta-x analysis into the source code parsing
processa whole classof new problems may be solved at this stage of a compiler.
Namely issueswith correct variable definitions, type checkingetc.</p>
      <p>L. Rychnovsky
Theorern 1. Let L,(,SC) be thefamily of languagesgeneratedby SC grammars
whosenumber of product'ionsthat contains two or more contert-freeproductions
(degreeoJ'co'ntert sensiti'uitu) 'isn o'r less.L(R,E) r\e'notesthe farnilg of
recursiueLyenumerablelanguages.</p>
      <p>L2(SC) - Loo(^SC:) L(RE)</p>
      <sec id="sec-1-1">
        <title>Proof. See [lvIed-}S] Lemma 1 and Theorem S.</title>
        <p>3</p>
      </sec>
      <sec id="sec-1-2">
        <title>Parsing of Context-Sensitive Languages</title>
        <p>The main goal of regulated formal systems is to extend abilities from standard
CF Ll-parsing to CS or RE families with preservation of easeof parsing.</p>
        <p>In [Kol-04] and [Rych-0s] rve can find some basic facts from theory of
regulated pushdown automata (RPDA). We figured that regulated pushdown
automata can in some casessimulate Turing machines so we could use this theory
for constructing parsersfor context-sensitivelanguagesor eventype-0 ianguages.</p>
        <p>\AIehave also demonstrated the basic problem of this concept: complexity.
Almost trivial Turing machinewas transformed to regulated pushdown automata
with aimost 6 000 rules.</p>
        <p>As rve have seenin previous lines, converting deterministic (linear bounded)
Turing machine or scattered-contextgrammar to deterministic RPDA is very
complex task. For the most simpie context-sensitive ianguages corresponding
deterministic RPDA has thousandsof rules.If we want to usethesealgorithms for
creating some practical parser for real context-sensitiveprogramming language
it rnay result in millions of rules. Therefore, we are looking for another wav
to parse context-sensitive languages.We would like to extend some
contextfree grammar of any common programming language (such as pascal, ClCff or
Java). After extending context-freegrammar to correspondingcontext-sensitive
grammar, parsing should be straightforward.
3.1</p>
        <p>KontextZAP03
As an cxample of a coniext-free language we llse a language calied ZAP}J
[ZAP-03] rvhich has ven' similar syntax to Pascal. We .rvill use follorving
program as an example program in ZAP03 language.
i n t I &amp; , b , c , d ;
s t r i n g : s ;
begin
a = L ;
b = 2 ;
cL - \,.
end</p>
        <p>Parsing rrow proceeds in the fblloiving way: star,rtingsymbol is S S' arrd
derivation will go as usual until there is DCL S' processedand (DCL, S') --)
(TYPE [ : ] [id] ID-LIST, D) rule is to applied.At this moment S' is rewritlen
to D indicating that variable [id] is dcfined.When variable [id] is risedon left resp.
right side of assignrnentD is rewritten to DL resp. DRaccording to secondresp.
thircl previously shown fragment. If variable [id] is used without being definerl
befolehand.a parseerror occursbecauseS' is not rewritten to D and S' cannot
be rewritten to DL or DR directly. When the input is parsed S is rewritten to
program code and during LL parsiug is popped out of the stack. S' is rewritten
onto D{LR}* and this is only string that remains.
s s' =+* DCL S', + TYPE [: ] [id] ID-LIST D =+* COMMANDD=+
+ [1d] CMDCOMMANDDL +* DL
If the only remaining symbol is D, it means that variable [id] was defined but
never used. If DL+ is the only remaining symbol, we know that variable [id] was
defined and used only on the left sides of assignments.Finally if there is the
only remaining DR(LR)*,w€ know that the first occurrenceof variable [id] is on
the right side of an assignmentstatement and therefore it is being read without
being set. Irr all these casesthe compiler should gerreratea warrrirrg.These and
similar probiems are usually addressedby a data-flow analysisphasecarried out
during semantic analysis.</p>
        <p>Using this algorithur wc can only process one variable at a time. But the
proposed mechanism can be easily extended to a finite number of variables by
adding new S' . . .' every time we discover a variable definition. Parsing of
described SC grammar can be implemented by pushdown automaton with finite
number of pushdowns. The first pushdorvnis ciassicLL pushdown. The second
one is variable specific and every [id] holds its own.</p>
        <p>Because original ZAP03 grammar is LL1 and using described algorithm
wasn't any rule added, KontextZAPO3 grammar has unambiguous derivations.</p>
        <p>A few examples can clear the idea. This program is well-formed according
to ZAP03 grammar, but it's semantics is not correct and parsing it as
KontextZAPO3 program should reveal this error.
1 n t r d , b , c , d ;
s t r i n g : s ;
begin
a = 1 ;
b = 2 ;
d = 1 5 ;
s = t t f o o t t ;
c l - L ,
end
begin
a = 1 ;
d = 1 5 ;
s = t t f o o " ;
a = c ;</p>
        <p>Corresponding stack to variable c will be DR wher,tlead to warning:
V a r l a b l e c r e a d b u t n o t s e t ,
Second example shows another variation
i n t i d , b , c , d ;
s t r i n g : s ;</p>
        <sec id="sec-1-2-1">
          <title>Parsingof Context-SensitiLvaenguages</title>
          <p>3.2</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>Typ. checking</title>
          <p>By using scattered-contextgranrmarswe can describetype set of ianguage (INT,
STR) and type check rules directly in grammar. Almost trivial language with
type check using 5 stacks can look like this:
(r) - (pr"ogra,m) (n,ert) -- (program,)
( p r o g r a m )- - ( d c l ) ( t A p " , ,) - ( [ l n t ] ,, I M )
(program) * (fbegin]commandfendl) (tAp",,) * (lstring),, STR)
(dcl,)-, (ty'pr:.1:)dcl2) (:om'mand,D,I I'17,,) *
( d c l 2 ,^ 9 1, , 4 / 7 ., ) * ( l i d l c m d c o m m a n d ,D L , I M . I I V f )
( f i d ] i d - l i s t f ; l n e r t ,D , I N T , I ^ l T ) ( c o m m a n d ,D , S T R , , )
(dcl2,S, STR,, ) - (lid] c'mdco'rn'rnandD, L,STR, S"R)
(lid]id-list[;]nert, D, ST R, ST R) (command) -' (e)
(id-list)--- ([,]tdlistz) (cmd) --' ([:]slrnt[;])
(id-list2,S) * (lid)id-list,D) (stmt, D) - (lid,l,D R)
( z d - l i s t )- , ( E ) ( s t m t . , , , I ^ l T ) - - -( f d i g i t ] , , , , )
( n e r t ) - - - ( d . c l , ) ( s t r n t , . , , S T R ) - - -( [ s t ' r u o , l,],,, )</p>
          <p>Stacks at even positions are the variable specific stacks as mentioned in
previous chapter. The first stack is classicLL stack and the rest of stacks at odd
positions are temporary used stacks for additional information. Although the
underlying context-free grammar is not LL grammar becausethere are several
identic rules (command --* [id] cmd command), this grammar is unarnbiguous.</p>
          <p>Example of error program code can be
i n t r d , b , c , d ;
s t r i n g : s ;
begln
It is obvious, that using a siurple scattered-context extension of CF languages,
we obtain a grammar with interesting properties with respect to analysis of a
programming languagesourcecode.We provide somebasic motivation examples.
wheretu is word and lu,'l1sis the lengthof tl written as a decimalnumber.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>4 Formal Program Verification</title>
        <p>Onc of the pr&lt;-rrrrisirragpplicatiorrsof the describedcorrceptis introducing a
notion of preconditions and postcondi,tionsto the ZAP03 programrning language.
Preconditions and postconditions are essentiallysets of logical formulae, which
arc required to hold at entering resp. leaving a program or method. For
example, when computing a square root of r,, we can require a positivity of r using
precondition {r ;- 0}. A computation of siriusfunction {y : sinr} a natural
postcondition{(y &lt;- 1) A (A &gt;- -1)} arises.</p>
        <p>Statements of a programming languagesthen induce transformation rules on
precondition and postcondition sets. For example, assignmentstaternent in the
fbrni V'.- E definesa transforrnation:</p>
        <p>{ P [ E l v l ] v: : E { P } ,
whcre V is a variable, E is an cxpression,P is precondition and PItr/V] derrotes
a substitution of V for all occurrencesof E in P. Other transformation examples
may be found in [Gor-98].</p>
        <p>The purpose of introducing preconditions and postconditions into a languagc
is to be able to derive postconditions from specifiedpreconditions using
transforrnation rules irr a particular program. Such program than carries a fbrmal proof
of its correctnesswith it, which is a desirableproperty.</p>
        <p>In the ZAP}S languagewe can implement the describedconcept by
introducing two new kevwords pre and post and by extending the nrles of the language
grammar with abovementioned transformation rules. When the parsing of a
program is initiated, the precondition set is constructed using the pre declarations
and the transformation rules are applied to it as the parsing progressesthrough
the source code. When the parsing terrninates, the resulting set of transformed
preconditions is compared with the deciared postconditions. If these two sets
match, the parsedprogram is correctwith respectto the specifiedpreconditions
and postconditions.</p>
        <p>However, for practical reasonswe must define constraints on the possible
preconditions and postconditions. Postconditions must be generally derivable
from. preconditions or (as a corollary of the Godeis incompletenesstheorem)
the postconditions need not be provable from the preconditions at ali, but may
still hold. In this particular casewe have encountered a program which may be
correct, but we carrnot verif-ythis fact.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>In this article are presented some ideas from parsing of
context-sensitivelanguagesbased on scattered-contextgrammars. Very powerful cxtension of classic
parsing techniques is introduced and this context-sensitiveextension iet us
disco\rersoilrecorlnlon errors such as undefinedor unusedvariabiesin parsing tirne.
The extension suggestedin this work is far more beyond. In some conditions we
are abie to add some features to languagethat enablesus to check formal
progranr verification.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Aho 72]
          <string-name>
            <surname>Aho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
          </string-name>
          , J.:
          <source>Thc Theory of Parsing</source>
          , Translation, and
          <string-name>
            <surname>Compiling</surname>
          </string-name>
          ,
          <string-name>
            <surname>Prentice-Hall</surname>
            <given-names>INC</given-names>
          </string-name>
          .
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Gor-98] Gordon,
          <string-name>
            <surname>J. C.</surname>
          </string-name>
          \,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Programming Language 'fheory and its Implementation, r998.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Kol-04
          <string-name>
            <surname>] I(ol6i</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          : Pushdown Automata:
          <article-title>New N'Iodifications and Transformations</article-title>
          ,
          <source>Habilitation Thesis</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [lvled-00] Meduna,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kol</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          : Reguleted Pushdown Automata,
          <source>Acta Cybernetica</source>
          , Vol.
          <volume>14</volume>
          ,
          <year>2000</year>
          .
          <fpage>Pages653</fpage>
          -
          <volume>664</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Med-03] Medutta,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fernau</surname>
          </string-name>
          , H.:
          <article-title>On the degreeof scattered context-sensitivity,Elsevier, Theoretical Computer Science290 (</article-title>
          <year>2003</year>
          ), Pages
          <fpage>2121</fpage>
          -
          <lpage>2124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Sal-73] Salomaa,
          <string-name>
            <surname>A.</surname>
          </string-name>
          : Irormal Languages,Academic Press,New York,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>[</surname>
          </string-name>
          Rych-Os] Ryc]nrovsky, L.:
          <article-title>Relation betrveen regulated pushdown autonrata and Turing mascines</article-title>
          ,
          <source>Report from semestralproject</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          IZAP-03]
          <article-title>htt p:I I www.fit. vutbr.cz/study f course sf ZAP/public I pr o ject/</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>