<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a formal model of natural language description based on restarting automata with parallel DR-structures?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marketa Lopatkova</string-name>
          <email>lopatkova@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frantisek Mraz</string-name>
          <email>frantisek.mraz@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Platek</string-name>
          <email>martin.platek@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>25</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>We provide a formal model of a strati cational layers of linguistic description in a parallel way. The dependency approach to natural language description. This novelty of this approach consists in the formal preformal model is motivated by an elementary method of sentation of the stepwise parallel composition of tree analysis by reduction, which serves for describing correct structures on di erent language layers. sentence analysis. The model is based on enhanced restart- In [2], natural language description is modeled as ing automata that assign a set of parallel dependency struc- a formal string to string translation using a suitable ttuurreess tcoapetvuerrey rthedeuccotirornesopfonadneinncpeutofsednetpeenncde.enTchyesteresetsruocn- model of restarting automata. [3] introduces a class of di erent layers of linguistic description, namely layer of enhanced restarting automata with an output consistsurface syntax and layer of language meaning. ing of a single DR-tree. Here we discuss a model which The novelty of this contribution consists in (i) the exten- is able to represent several parallel dependency strucsion of enhanced restarting automata in order to produce tures and thus to capture relations between syntactic tree structures with several interlinked layers and (ii) the structures on di erent layers derived from RA. application of these automata to the strati cational description of a natural language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Functional Generative Description</title>
      <sec id="sec-2-1">
        <title>The theoretical linguistic basis for our research is</title>
        <p>
          1 Introduction provided by the Functional Generative Description
(FGD in the sequel, see esp. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). FGD is characterized
Formal modeling of syntactic structure of a natural by its strati cational and dependency-based approach
language, its syntactic analysis as well as synthesis, to the language description.
has an important impact on an insight into the char- The strati cational approaches split language
deacteristic features of the language and into the needs scription into layers, each layer providing complete
of its explicit description. description of a (disambiguated) sentence and having
        </p>
        <p>We attempt to provide a formal model for natural its own vocabulary and syntax. We use the version of
language description which would adequately re ect FGD that distinguishes four layers of description:1
linguistic methods and makes it possible to formulate
and re ne linguistic observations and thus deepen the t-layer (tectogrammatical layer) capturing deep
synunderstanding of the language. tax, which comprises language meaning in a form</p>
        <p>
          The proposed formal model is based on an ele- of a dependency tree; the core concepts of this
mentary method of analysis by reduction. The anal- layer are dependency, valency, and topic-focus
arysis by reduction (RA henceforth, see [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ], here Sec- ticulation, see esp. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ];
tion 1.2), serves for describing correct reductions of a-layer (analytical layer) capturing surface syntax in
natural language sentences (particularly for languages a form of a dependency tree (non-projective in
with free word order) on several linguistic layers (see general);
Section 1.1). m-layer (morphological layer) capturing morphology
        </p>
        <p>The proposed model is based on the concept of en- (string of triples [word form, lemma, tag]);
hanced restarting automata that assign a set of depen- w-layer (word layer) capturing individual words and
dency structures (DR-structures) to every reduction punctuation marks in a form of a simple string.
of an input sentence; DR-structures can capture a set
of dependency trees representing sentence on di erent</p>
      </sec>
      <sec id="sec-2-2">
        <title>There are one-to-one correspondences between</title>
        <p>w- and m-layer (we leave aside small exceptions here)</p>
        <p>
          1 We adopt the notation of the Prague Dependency
Treebank [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], a large corpus of Czech sentences, which uses
FGD as its theoretical basis.
and between m- and a-layer; individual symbols of
these three layers (surface layers in the sequel) re ect
individual `surface' words and punctuation marks.
        </p>
        <p>On the other hand, individual symbols of t-layer
re</p>
        <p>ect only lexical words (so called function words, as,
e.g., prepositions, auxiliary verbs, are captured as
attributes of lexical words); moreover, surface ellipses
(as, e.g., elided subject in Czech) are restored as nodes
on t-layer.</p>
        <p>
          Similarly as in other strati cational approaches,
see e.g. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], the layers are ordered; the lowest one
being the simplest w-layer, the highest being the most
abstract t-layer.
        </p>
        <p>
          FGD as a dependency-based approach captures
both surface and deep syntactic information in a form
of dependency structures. Words (i.e., their a- and
tcorrelates, respectively) are represented as nodes of
the respective trees, each node being a complex unit
capturing the lexical, morphological and syntactic
features; relations among words are represented by
oriented edges [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The dependency nature of these
representations is important particularly for languages with
relatively high freedom of word order; it also complies
with the shift of focus to deep syntactic representation
for which dependency approach is commonly used.
        </p>
        <p>The following example illustrates description of
a sentence at four layers of FGD (slightly simpli ed).
iSnufcohrmaatdioenscroinptaiodniseaxmpbriegsuseasteadllsennetceenscsea.ry linguistic oFfigth.e1.saPmarpalelleslenretepnrecseeanctcaotirodninogntot-F,Ga-D,.m- and w-layers
S dlo dnes mohla m t ve state Texas.
residence - today - could - have - in - state - Texas
`She (= elided Sb) could reside in the state of Texas sequences of reductions (deletions) in the sentence {
today.' each step of RA is represented by (i) deleting at least
Figure 1 shows the deep syntactic tree on t-layer, the one word of the input sentence, or (ii) by replacing
surface non-projective syntactic tree on a-layer, the an (in general discontinuous) substring of a sentence
string of triples [word form, lemma, tag] on m-layer by a shorter substring. Consequently, it is possible to
and the string of wordforms on w-layer. The dotted derive formal dependency relations between
individlines interconnect corresponding nodes. We focus on ual sentence members based on the possible order(s)
the non-trivial relation between a-layer and t-layer of reductions.
here; (i) preposition ve `in' as well as noun state `state' Using RA we analyze an input sentence (w-layer)
in the a-tree are linked to the single t-symbol rep- enriched with the metalanguage information from the
resenting lexical word stat `state'; (ii) similarly, both m-, a- and t-layer. Symbols on di erent layers
repremodal verb mohla `could' and lexical verb m t `have' senting a single word of an input sentence are
proare represented as the single t-node m t `to have' (in- cessed simultaneously.
formation on modal verb is preserved as the attribute The principles of RA can be summed up in the
`poss'); as a result, the non-projective a-tree is trans- following observations:
formed to the projective t-tree; (iii) moreover, subject
elided in a surface sentence is restored in the t-tree
(the node with the symbol starting with #PersPron),
thus this node has no counterpart on the a-layer.
1.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Basic principles of analysis by reduction</title>
      <p>Analysis by reduction is based on a stepwise
simplication of an analyzed sentence. It de nes possible
{ The fact that a certain word (or a group of words)
can be deleted implies that this word (or group of
words) depends in RA on one of the words retained
in the simpli ed sentence; the latter being called
governing word(s) in RA.
{ Two words (or groups of words) can be deleted in
an arbitrary order if and only if they are mutually
independent in RA.
w-layer m-layer a-layer t-layer
S dlo s dlo.NNNS4 Obj PAT
ACT
TWHEN
dnes
mohla
m t
ve
state
Texas
.</p>
      <p>
        dnes.Db- - - Adv
moci.VpYS- Pred
m t.Vf- - - Obj
v-1.RV- -6 AuxP
stat.NNIS6 Adv
Texas.NNIS6 Atr
..Z: AuxK
PRED.Frame1.poss
LOC
ID
{ In order to ensure adequate modeling of natural
language meaning (on t-layer), certain groups of
words have to be deleted in a single step (e.g.,
valency frame evoking words2 and their (valency)
complementaions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]); such words is said to
constitute a reduction component. Even in such cases
it is usual to determine governing-dependent pairs
on the layer of surface syntax (a-layer). In such
a case, it is necessary to de ne special rules for
particular language phenomena.
      </p>
      <sec id="sec-3-1">
        <title>When simplifying input sentence, it is necessary to apply certain elementary principles assuring adequate analysis:</title>
        <p>2 A frame evoking word is a lexical word (verb, noun,
adjective or adverb) that requires a set of
syntacticosemantic complementations, as e.g. the verb to give
requires three complementations, namely actor (ACT,
who gives something), patient (PAT, what is given) and
addressee (ADDR, to whom something is given).
lency complementations are reduced there, the frame { dl { deletes the visited symbol and shifts the head
evoking predicate and its complementations can be on its right neighbor;
processed on the t-layer. Let us describe the analysis { wr[b] { rewrites the visited symbol by the symbol b;
of this component in more detail. { pb { serves for marking (putting a pebble on) the</p>
        <p>First, a prepositional group ve+state `in (the) visited item only: marked items are used as nodes
state' is identi ed in the sentence { it is processed in DR-trees (in any other aspect it is an empty
in a single step on `surface' layers (its w- and m- sym- operation, see later);
bols are deleted and a-symbols are included into the { accept { halts the computations and accepts the
a-structure as adverbial modi cation of the verb m t input word.
`to have'). The whole prepositional group will be
represented as a single t-node stat , see Figure 1. It will be Of course, neither the left sentinel c nor the right
identi ed as a local valency complementation LOC on sentinel $ must be deleted. At the right end of the
the t-layer. Second, the noun s dlo `residence' is pro- tape M either halts and accepts, or it halts and rejects,
cessed on the surface layers and marked as a valency or it restarts, that is, it places its window over the left
PAT complementation on the t-layer. Third, the sym- end of the tape and reenters the initial state. It is
bol for elided subject, which is restored on the t-layer, required that prior to the rst restart step and also
is marked as ACT valency complementation. All the between any two restart steps, M executes at least
valency complementations of the predicate are identi- one delete operation. During each step, M can change
ed now, the valency frame Frame1 is saturated. its internal state.</p>
        <p>Next, we focus on the modal verb mohla `(she) We can see that any nite computation of M
concan'. On the a-layer, this is a governing node of the sists of certain phases. A phase, called a cycle, starts in
lexical verb m t ; the respective edge is created, which a restarting con guration, the window is moved along
results in a non-projective a-tree (the symbols on the the tape by performing its operations until a restart
surface layers are deleted). On the other hand, modal operation is performed and thus a new restarting
converbs are accounted functional verbs in FGD and thus guration is reached. If no further restart operation
represented as attributes of lexical verbs in t-trees (m t is performed, each nite computation necessarily
nin our case, value `pass' in the respective t-node). Thus ishes in a halting con guration { such a phase is called
the non-projectivity is eliminated in the t-tree. a tail. We assume that no delete and rewrite operation</p>
        <p>Now the predicate with its complementations is executed in a tail computation.</p>
        <p>ACT, PAT and LOC (Frame1) can be identi ed as The notation u `cM v denotes a reduction
perthe core predicative structure on the t-layer, the rel- formed during a cycle of M that begins with the tape
evant edges for individual valency complementations inscription cu$ and ends with the tape inscription cv$;
are created and the simpli ed sentence is accepted (in the relation `cM is the re exive and transitive closure
the accepting step, the nal full stop is processed on of `cM .
the surface layers). A string w 2 is accepted by M , if there is an
accepting computation which starts in the restarting
con guration with the tape inscription cw$ and ends
2 Restarting automata
by executing the accept operation. By LC (M ) we
denote the language consisting of all words accepted by
M ; we say that M recognizes (accepts) the
character</p>
      </sec>
      <sec id="sec-3-2">
        <title>First, we introduce a relevant type of a simple restart</title>
        <p>ing automaton { sRL-automaton { rather informally.</p>
        <p>
          From technical reasons, we do it in a slightly di erent istic language LC (M ).
way than in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Further we will refer to a sRL-automaton M as
        </p>
        <p>An sRL-automaton M is (in general) a nondeter- a tuple M = ( ; c; $; R(M ); A(M )), where is a
charministic machine with a nite-state control Q, a - acteristic vocabulary (alphabet), c and $ are sentinels
nite characteristic alphabet , and a head (window of not belonging to , R(M ) is a nite set of restarting
size 1) that works on a exible tape delimited by the instructions over and A(M ) is a nite set of
acceptleft sentinel c and the right sentinel $ (c; $ 62 ). For ing meta-instructions over .
an input w 2 , the initial tape inscription is cw$. Remark: sRL-automata are two-way nondeterministic
To process this input, M starts in its initial state q0 automata which allow to check whole input sentence
with its window over the left end of the tape, scanning prior to any changes. It resembles linguist who can
the left sentinel c. According to its transition relation, read the whole sentence rst and reduce the sentence
M performs the following operations in the individual in a correct way afterward. We choose
nondeterminissteps: tic model to enable various orders of reductions. This
{ moves to the right or to left { shift the head one can serve for veri cation of independency between
inposition to the right or to the left; dividual parts of a sentence.</p>
        <p>
          Similarly as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], we use a restricted type of sRL- other hand, it is easy to design a nondeterministic
automata for which the number of rewrite, delete and sRL-automaton which is not correctness preserving
pebble operations made per cycle is limited by a con- (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). We consider only the correctness preserving
stant. Such sRL-automata can be described by (meta)- automata in the sequel in order to model adequately
instructions, which describe the moves of the head and the analysis by reduction.
the changes of the states implicitly. Each cycle of M
is described by a single restarting instructions over 2.1
of the following form:
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Restarting automata enhanced with</title>
    </sec>
    <sec id="sec-5">
      <title>DR-structures</title>
      <p>IR = (c E0; [a1]1o1; E1; [a2]2o2; E2; : : : ; Es 1; [as]sos;
Es $; Restart), where
{ E0; E1; : : : ; Es (s &gt; 0), are regular languages over
(usually represented by regular expressions);
{ o1; ; os 2 fdl; pb; wr[b]g, where b 2 .
{ The symbols a1; a2; : : : ; as 2 correspond to the
symbols on which the corresponding operations
fo1; ; osg are executed.</p>
      <p>Let us de ne auxiliary function o :
operation o 2 fdl; pb; wr[b]g:
!
for each
{ pb(ai) = ai,
{ dl(ai) = , and
{ wr[b](ai) = b.</p>
      <sec id="sec-5-1">
        <title>In this section we introduce enhanced restarting au</title>
        <p>tomata, so called sERL-automata. During their
computations, these automata build structures consisting
of deleted, rewritten, or marked items and of directed
edges between pairs of such items.</p>
        <p>
          Enhanced restarting automata sERL-automata
were formally introduced in a restricted form in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Their formal de nition is rather long and very
technical. So we prefer to use an informal description of the
model and concentrate on their possible applications.
In contrast to sRL-automata, there can be attached
a so called DR-structure to any item of the tape of an
sERL-automaton.
        </p>
        <p>
          A DR-structure is a slight generalization of
DR-trees used in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It is an oriented graph D =
(V; Hd), where V is a nite set of nodes, and Hd is a
nite set of edges. A node u 2 V is a tuple u = [i; j; a],
where a is a symbol assigned to the node, i; j are
natural numbers; i represents the horizontal position of
the node u, j represents the vertical position of u (it
is equal to 0 or to the number of nodes with the same
horizontal position i from which there are oriented
paths to u). An edge h of D is an ordered pair of
nodes h = (u; v). We de ne two types of edges:
When trying to execute IR starting from a tape
inscription cw$, M will get stuck (and so reject), if
w does not admit a factorization of the form w =
v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i =
0; : : : ; s. On the other hand, if w admits factorizations
of this form, then one of them is chosen
nondeterministically, and cw$ is transformed (reduced) into
cv0o1(a1)v1
        </p>
        <p>vs 1os(as)vs$.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Tails of accepting computations are described by</title>
        <p>accepting instructions over of the form
IA = (c E0; [a1]1; E1; [a2]2; E2; : : : ; Es 1; [as]s; Es
$; Accept), where individual Ei are regular languages
over .</p>
        <p>For our linguistic application (i.e., modeling FGD),
we consider the accepting instructions with nite Ei's We say that D = (V; Hd) is a DR-tree if the graph D
only. is an oriented tree (with a single root, in which all</p>
        <p>A tail performed by the instruction IA starts with maximal paths in D end).
the inscription on the tape cz$; if z 2 E0a1 asEs, Each sERL-automaton Me is actually an
sRLthen M accepts z (and the whole computation as well). automaton with enhanced instructions. An enhanced
Otherwise the computation halts with rejection. This instruction is a pair Ie = (I; G) consisting of an
special form of accepting instruction is introduced instruction I of a sRL-automaton and an acyclic
with regard to the future enhancements of restarting graph G representing the required structure for
symautomata. bols processed { deleted, rewritten or marked
(pebThe class of all sRL-automata are denote as sRL. bled) { during the application of the instruction I. The
A crucial role in our applications has the following restrictions put on the set of edges of G are described
property of restarting automata. below together with the application of an enhanced
(Correctness Preserving Property) A sRL-auto- instruction Ie = (I; G), where G = (U; H), on a tape
maton M is correctness preserving if u 2 LC (M ) and containing cw$.
u `cM v imply that v 2 LC (M ). All the symbols on the tape are stored in so called</p>
        <p>It is already known that all deterministic items which are actually nodes of a DR-structure. I.e.,
sRL-automata are correctness preserving. On the a symbol x is stored in a node [i; j; x], where i is
{ Oblique edge: h = (u; v), where u = [iu; ju; a], v =</p>
        <p>[iv; jv; b] and iu 6= iv;
{ Vertical edge: h = (u; v), where u = [iu; ju; a],
v = [iv; jv; b] and iu = iv, jv = ju + 1;
its horizontal position. Initially, horizontal position of
n-th input symbol of the input equals n, j = 0 and
there are no edges between the items.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Restarting instruction. If I is a restarting in</title>
      <p>struction I = (c E0; [a1]1o1; E1; [a2]2o2; E2; : : : ; Es 1;
[as]sos; Es $; Restart) then o1; ; os 2 fdl; pb; wr[b]g
(b 2 ) are the operations performed on symbols
fa1; ; asg. Let oi1 = wr[bi1 ]; : : : ; oir = wr[bir ], for
some r 0, be all rewrite operations from fo1; : : : ; osg.</p>
      <p>Let G = (U; H), where U = f1; 2; : : : ; s; i01; i02; : : : ; i0rg,
and let the nodes correspond to the symbols a1; : : : ; as
and symbols bi1;: : :;bir , respectively. An edge (u;v) 2 H
is of one of the following forms:
Then two consecutive applications of I1 on the word
aaabbbccc will result in the following sequence of tape
contents (in the gure the tape content consists of the
bold items depicted in the upper horizontal part of
a picture for a particular con guration):
[0,0,c][1,0,a][2,0,a][3,0,a][4,0,b][5,0,b][6,0,b][7,0,c][8,0,c][9,0,c][10,0,$]
[0,0,c][1,0,a][2,0,a]</p>
      <p>[5,0,b][6,0,b][7,1,X][8,0,c][9,0,c][10,0,$]
1. deleting: u = i corresponds to a deleted symbol ai [3,0,a] [4,0,b]
(hence oi = dl, 1 i s) and v = j for some
j 2 U , j 6= i. Let us note that the deleting edge is Accepting instruction. If I is an accepting
inalways oblique. struction I = (c E0; [a1]1pb; E1; [a2]2pb; E2; : : : ; Es 1;
2. rewriting: u = i corresponds to a rewritten sym- [as]spb; Es $; Accept) then the symbols fa1; ; asg
bol ai (hence oi = wr[bij ], bij 2 , 1 i s, are pebbled and they correspond to the nodes in
1 j r) and v = i0j . Let us note that the rewrit- U = f1; 2; : : : ; sg. All edges (u; v) 2 H are oblique and
ing edge is always vertical. have the same properties as deleting edges in graphs
An application of Ie on a word w consists in: enhancing restarting instructions.</p>
      <p>An application of Ie = (I; G), with G = (U; H), on
a word w consists in:
1. Choosing a factorization of cw$ of the form w =
v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i =
0; : : : ; s. On the other hand, if w does not admit
any factorization of this form, then I cannot be
applied on w.
2. Rewriting the tape containing cw$ into the tape</p>
      <p>containing cv0o1(a1)v1 vs 1 os(as)vs$.
3. For each edge e 2 H a new edge is inserted into
the current DR-structure.</p>
      <p>If e = (i; j) is a deleting edge, an oblique edge from
the item containing deleted ai into the item
containing the symbol corresponding to j (either aj , if
1 j s, or bj , when j 2 fi01; : : : ; i0rg) is inserted.</p>
      <p>If e = (i; j) is a rewriting edge, a vertical edge from
the item containing ai into the item containing
bj is inserted (j 2 fi01; : : : ; i0rg) and the vertical
position of the new item containing bj is set to the
value by q + 1, where q is the vertical position of
the item containing ai.</p>
      <p>If there was a DR-structure attached to some of the
deleted of rewritten cell, the structure is preserved
and combined into a larger graph.</p>
      <p>1. Choosing a factorization of cw$ of the form w =
v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i =
0; : : : ; s. On the other hand, if w does not admit
any factorization of this form, then I cannot be
applied on w.
2. For each edge e = (i; j) in H a new oblique edge
is inserted into the current DR-structure. The
inserted edge starts from the item containing ai and
ends in the item containing symbol aj .</p>
      <p>If there was a DR-structure attached to some of
the connected items, the structure is preserved and
combined into the resulting graph.</p>
      <p>Example 2: Let I2 = ((c; [a]1pb; ; [b]2pb; X ;
[c]3pb; $; Accept); D2) be an enhanced accepting
instruction with the graph D2 of the following form:
1
2</p>
      <p>3</p>
      <sec id="sec-6-1">
        <title>Then after applying I2 on the resulting DR-structure</title>
        <p>Example 1: Let I1 = ((c a ; [a]1pb; ; [a]2dl; ; from Example 1 we obtain the following nal
DR[b]3dl; b X ; [c]4wr[X]; c $; Restart); D1) be an en- structure:
hanced restarting instruction with the graph D1 of [9,0,c]
the following form:
1
4'</p>
        <p>[6,0,b] [7,1,X][8,1,X]
[2,0,a]
[5,0,b]</p>
        <p>[7,0,c] [8,0,c]
Computation of enhanced restarting automata. 3 FGD as a (4)-sERL-automaton
A (restarting) con guration C = (T; D) of a
computation by Me = ( ; c; $; ER(Me); EA(Me)) consists of Our ultimate goal is to model FGD { we consider
the set T of items representing current tape content a (4)-sERL-automaton MFD to be a suitable formal
and a DR-structure D. By an application of an en- frame for this linguistic theory.
hanced instruction I, the tape content is changed and Let MFD = ( ; c; $; ER(MFD); EA(MFD));
conthe DR-structure can grow. The initial con guration sists of four parts = w [ m [ a [ t which
corC0 = (Tw; D;) for an input word w = x1 : : : xn con- respond to the respective layers of FGD (Section 1.2).
sists of the set of items representing the initial con- Recall that the symbols from individual layers can be
tent of the tape Tw = f[0; 0; c]g [ f[0; i; xi] j i = quite complex.
1; : : : ; ng [ f(n + 1; 0; $]g and the empty DR-structure A language of layer ` 2 fw; m; s; tg accepted
D; = (;; ;). By application of an enhanced instruc- by MFD is obtained as a projection of the characteristic
tion I, a con guration C is transformed onto a new language onto `, i.e., L`(MFD) = Pr ` (LC (MFD))).
con guration C0. The characteristic language LC (MFD) contains
in</p>
        <p>An input word w is accepted by Me if there exists put sequences (over w) interleaved with
metalana computation starting with the initial con guration guage information in the form of symbols from m [
for w which ends in a con guration Ca by an applica- a [ t. Then, the language of correct sentences of the
tion of an accepting enhanced instruction. The result studied natural language is Lw = Pr w (LC (MFD)). In
of the computation is the DR-structure of Ca. our case, it de nes the set of correct Czech sentences.</p>
        <p>Similarly as for sRL-automata, we de ne the char- Similarly, a DR-language of layer ` accepted by
acteristic language of Me as LC (Me) = fw j w 2 and MFD is obtained as DR`(MFD) = Pr ` (DR(MFD)),
Me accepts wg. Moreover, DR-language of Me is the ` 2 fw; m; s; tg. Let us note that DRw(MFD) and
set DR(Me) = fD j D is a result of some accepting DRm(MFD) are empty (Lw and Lm are string
lancomputation of Meg. guages). Further, DRa(MFD) and DRt(MFD) are
languages of DR-trees. Each DR-tree T 2 DRt(MFD) is
projective (with respect to its descendants); that is,
2.2 Enhanced automata with several layers for each node n of the DR-tree T all its descendants
(including also the node n) constitute a contiguous
sequence in the horizontal ordering of nodes of the
First, we introduce a technical notion of projection. Let tree T . On the other hand, trees from DRa(MFD) can
and ( ) be alphabets. The projection from be in general non-projective.
onto denoted as Pr is the morphism de ned as The DR-language DRt(MFD) represents the set of
a 7! a (for a 2 ) and A 7! (for A 2 r ). meaning descriptions in FGD whereas and DRa(MFD)
Similarly, we de ne the projection of languages: Pr : models the set of (surface) syntactic trees.
P( ) 7! P( ). Let us note that Lt(MFD) is designed as a
deter</p>
        <p>Similarly as above, we introduce a projection ministic context-free language. Readers familiar with
for DR-structures. Let D be a DR-structure from restarting automata can see that LC (MFD) is a
dea DR-language over an alphabet , Pr (D) denotes terministic context-sensitive language and Lw(MFD) is
a DR-structure over that is obtained from D by a context-sensitive language.
removing all nodes with (at least one) symbol from We have not mentioned so far the edges from
r and all edges incident to that nodes. A projec- DR-structures from DR(MFD) which have the edges
tion may be in an obvious way extended onto projec- with nodes (their symbols) in di erent layers. These
tion of a DR-language over . edges serve for connecting the corresponding lexical</p>
        <p>Let 1; : : : ; j , for some j 1, be a se- units on di erent layers (see dotted lines in Figure 1
quence of pairwise disjoint alphabets and = and are obtained by applications of extended
restart1 [ [ j . We say that sERL-automaton Mj = ing meta-instructions of MFD.
( ; c; $; ER(Mj ); EA(Mj )) is an enhanced simple Here such an edge connects nodes on neighboring
sERL-automaton with j layers ((j)-sERL-automaton layers only. I.e., this edges connect w-layer nodes to
for short) if the following assumptions are ful lled: m-layer nodes, m-layer nodes to a-layer nodes, a-layer
nodes to t-layer nodes, and nothing else (see Figure 1).
{ Mj is correctness preserving ;
{ Mj is allowed to rewrite a symbol from i (for 1
i j) by a symbol from the same sub-vocabulary</p>
        <p>i, only. We refer to the symbols from i as the
symbols on layer i (or i-symbols).</p>
      </sec>
      <sec id="sec-6-2">
        <title>Concluding remarks In this paper, encouraged by [9, 10], we extend the formal model of natural language description based on FGD so that it outputs neither lists of words nor lists of symbols but the</title>
        <p>so called reduction language and a set of complex
DR-structures. We plan to study the relation between
reduction languages and DR-languages more deeply in
the near future.</p>
        <p>
          The novelty of this contribution consists in (i) the
extension of enhanced restarting automata in order to
produce tree structures with several interlinked layers
and (ii) the application of these automata to the
straticational description of a natural language. Moreover,
we outline a formalization of the basic methodology
of FGD in terms derived from the automata theory
and from the theory of parsing schemata as well. We
envisage that the proposed methodology is not
FGDspeci c and that similar approach can be used to
obtain a formal frame for other language descriptions, as
e.g. those presented in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lopatkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Platek</surname>
          </string-name>
          , V. Kubon:
          <article-title>Modeling syntax of free word-order languages: dependency analysis by reduction</article-title>
          . In: Matousek,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Mautner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Pavelka</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          , (eds),
          <source>Proceedings of Text, Speech and Dialogue International Conference, TSD 2005</source>
          , Volume
          <volume>3658</volume>
          of LNAI, Berlin Heidelberg, Springer-Verlag (
          <year>2005</year>
          )
          <volume>140</volume>
          {
          <fpage>147</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lopatkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Platek</surname>
          </string-name>
          , P. Sgall:
          <article-title>Towards a formal model for functional generative description, analysis by reduction and restarting automata</article-title>
          .
          <source>The Prague Bulletin of Mathematical Linguistics</source>
          ,
          <volume>87</volume>
          ,
          <year>2007</year>
          ,
          <volume>7</volume>
          {
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Platek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mraz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Lopatkova: Restarting automata with structured output and functional generative description</article-title>
          . In: Dediu,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fernau</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          , MartinVide, C. (eds),
          <source>Proceedings of the Fourth International Conference Language and Automata Theory and Applications</source>
          ,
          <string-name>
            <surname>LATA</surname>
          </string-name>
          <year>2010</year>
          , Berlin Heidelberg, SpringerVerlag,
          <year>2010</year>
          ,
          <volume>500</volume>
          {
          <fpage>511</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Sgall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hajicova</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Panevova: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects</article-title>
          . Reidel, Dordrecht (
          <year>1986</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Hajic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hajicova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Panevova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sgall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pajas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stepanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Havelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikulova</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          : Prague Dependency Treebank 2.0.
          <string-name>
            <given-names>Linguistic</given-names>
            <surname>Data</surname>
          </string-name>
          <string-name>
            <surname>Consortium</surname>
          </string-name>
          , Philadelphia, PA, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>I.A</given-names>
            .
            <surname>Mel</surname>
          </string-name>
          <article-title>'cuk: Dependency syntax: theory and practice</article-title>
          . State University of New York Press, Albany,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T.</given-names>
            <surname>Holan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kubon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Platek: Two useful measures of word order complexity</article-title>
          . In: Polguere,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kahane</surname>
          </string-name>
          , S. (eds),
          <source>Processing of Dependency-Based Grammars: Proceedings of the Workshop (COLING/ACL'98)</source>
          , Montreal, ACL,
          <year>1998</year>
          ,
          <volume>21</volume>
          {
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Messerschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mraz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Platek: Correctness preservation and complexity of simple RLautomata</article-title>
          .
          <source>In: Implementation and Application of Automata</source>
          , Volume
          <volume>4094</volume>
          of LNCS, Berlin Heidelberg, Springer-Verlag,
          <year>2006</year>
          ,
          <volume>162</volume>
          {
          <fpage>172</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Gramatovici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mart</surname>
          </string-name>
          n-Vide:
          <article-title>Sorted dependency insertion grammars</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>354</volume>
          (
          <issue>1</issue>
          ),
          <year>2006</year>
          ,
          <volume>142</volume>
          {
          <fpage>152</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Bensch</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Drewes: Millstream systems</article-title>
          .
          <source>Report UMINF 09.21</source>
          , Umea University,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. J. Kunze:
          <article-title>Abhangigkeitsgrammatik. Volume XII of Studia Grammatica</article-title>
          . Akademie Verlag, Berlin,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>