-

Towards a formal model of natural language description based on restarting automata with parallel DR-structures?

Marketa Lopatkova

lopatkova@ufal.mff.cuni.cz 0

Frantisek Mraz

frantisek.mraz@mff.cuni.cz 0

Martin Platek

martin.platek@mff.cuni.cz 0 0 Charles University in Prague , Czech Republic

25 32

We provide a formal model of a strati cational layers of linguistic description in a parallel way. The dependency approach to natural language description. This novelty of this approach consists in the formal preformal model is motivated by an elementary method of sentation of the stepwise parallel composition of tree analysis by reduction, which serves for describing correct structures on di erent language layers. sentence analysis. The model is based on enhanced restart- In [2], natural language description is modeled as ing automata that assign a set of parallel dependency struc- a formal string to string translation using a suitable ttuurreess tcoapetvuerrey rthedeuccotirornesopfonadneinncpeutofsednetpeenncde.enTchyesteresetsruocn- model of restarting automata. [3] introduces a class of di erent layers of linguistic description, namely layer of enhanced restarting automata with an output consistsurface syntax and layer of language meaning. ing of a single DR-tree. Here we discuss a model which The novelty of this contribution consists in (i) the exten- is able to represent several parallel dependency strucsion of enhanced restarting automata in order to produce tures and thus to capture relations between syntactic tree structures with several interlinked layers and (ii) the structures on di erent layers derived from RA. application of these automata to the strati cational description of a natural language.

1.1

Functional Generative Description The theoretical linguistic basis for our research is

1 Introduction provided by the Functional Generative Description (FGD in the sequel, see esp. [ 4 ]). FGD is characterized Formal modeling of syntactic structure of a natural by its strati cational and dependency-based approach language, its syntactic analysis as well as synthesis, to the language description. has an important impact on an insight into the char- The strati cational approaches split language deacteristic features of the language and into the needs scription into layers, each layer providing complete of its explicit description. description of a (disambiguated) sentence and having

We attempt to provide a formal model for natural its own vocabulary and syntax. We use the version of language description which would adequately re ect FGD that distinguishes four layers of description:1 linguistic methods and makes it possible to formulate and re ne linguistic observations and thus deepen the t-layer (tectogrammatical layer) capturing deep synunderstanding of the language. tax, which comprises language meaning in a form

The proposed formal model is based on an ele- of a dependency tree; the core concepts of this mentary method of analysis by reduction. The anal- layer are dependency, valency, and topic-focus arysis by reduction (RA henceforth, see [ 1, 2 ], here Sec- ticulation, see esp. [ 4 ]; tion 1.2), serves for describing correct reductions of a-layer (analytical layer) capturing surface syntax in natural language sentences (particularly for languages a form of a dependency tree (non-projective in with free word order) on several linguistic layers (see general); Section 1.1). m-layer (morphological layer) capturing morphology

The proposed model is based on the concept of en- (string of triples [word form, lemma, tag]); hanced restarting automata that assign a set of depen- w-layer (word layer) capturing individual words and dency structures (DR-structures) to every reduction punctuation marks in a form of a simple string. of an input sentence; DR-structures can capture a set of dependency trees representing sentence on di erent

There are one-to-one correspondences between

w- and m-layer (we leave aside small exceptions here)

1 We adopt the notation of the Prague Dependency Treebank [ 5 ], a large corpus of Czech sentences, which uses FGD as its theoretical basis. and between m- and a-layer; individual symbols of these three layers (surface layers in the sequel) re ect individual `surface' words and punctuation marks.

On the other hand, individual symbols of t-layer re

ect only lexical words (so called function words, as, e.g., prepositions, auxiliary verbs, are captured as attributes of lexical words); moreover, surface ellipses (as, e.g., elided subject in Czech) are restored as nodes on t-layer.

Similarly as in other strati cational approaches, see e.g. [ 6 ], the layers are ordered; the lowest one being the simplest w-layer, the highest being the most abstract t-layer.

FGD as a dependency-based approach captures both surface and deep syntactic information in a form of dependency structures. Words (i.e., their a- and tcorrelates, respectively) are represented as nodes of the respective trees, each node being a complex unit capturing the lexical, morphological and syntactic features; relations among words are represented by oriented edges [ 7 ]. The dependency nature of these representations is important particularly for languages with relatively high freedom of word order; it also complies with the shift of focus to deep syntactic representation for which dependency approach is commonly used.

The following example illustrates description of a sentence at four layers of FGD (slightly simpli ed). iSnufcohrmaatdioenscroinptaiodniseaxmpbriegsuseasteadllsennetceenscsea.ry linguistic oFfigth.e1.saPmarpalelleslenretepnrecseeanctcaotirodninogntot-F,Ga-D,.m- and w-layers S dlo dnes mohla m t ve state Texas. residence - today - could - have - in - state - Texas `She (= elided Sb) could reside in the state of Texas sequences of reductions (deletions) in the sentence { today.' each step of RA is represented by (i) deleting at least Figure 1 shows the deep syntactic tree on t-layer, the one word of the input sentence, or (ii) by replacing surface non-projective syntactic tree on a-layer, the an (in general discontinuous) substring of a sentence string of triples [word form, lemma, tag] on m-layer by a shorter substring. Consequently, it is possible to and the string of wordforms on w-layer. The dotted derive formal dependency relations between individlines interconnect corresponding nodes. We focus on ual sentence members based on the possible order(s) the non-trivial relation between a-layer and t-layer of reductions. here; (i) preposition ve `in' as well as noun state `state' Using RA we analyze an input sentence (w-layer) in the a-tree are linked to the single t-symbol rep- enriched with the metalanguage information from the resenting lexical word stat `state'; (ii) similarly, both m-, a- and t-layer. Symbols on di erent layers repremodal verb mohla `could' and lexical verb m t `have' senting a single word of an input sentence are proare represented as the single t-node m t `to have' (in- cessed simultaneously. formation on modal verb is preserved as the attribute The principles of RA can be summed up in the `poss'); as a result, the non-projective a-tree is trans- following observations: formed to the projective t-tree; (iii) moreover, subject elided in a surface sentence is restored in the t-tree (the node with the symbol starting with #PersPron), thus this node has no counterpart on the a-layer. 1.2

Basic principles of analysis by reduction

Analysis by reduction is based on a stepwise simplication of an analyzed sentence. It de nes possible { The fact that a certain word (or a group of words) can be deleted implies that this word (or group of words) depends in RA on one of the words retained in the simpli ed sentence; the latter being called governing word(s) in RA. { Two words (or groups of words) can be deleted in an arbitrary order if and only if they are mutually independent in RA. w-layer m-layer a-layer t-layer S dlo s dlo.NNNS4 Obj PAT ACT TWHEN dnes mohla m t ve state Texas .

dnes.Db- - - Adv moci.VpYS- Pred m t.Vf- - - Obj v-1.RV- -6 AuxP stat.NNIS6 Adv Texas.NNIS6 Atr ..Z: AuxK PRED.Frame1.poss LOC ID { In order to ensure adequate modeling of natural language meaning (on t-layer), certain groups of words have to be deleted in a single step (e.g., valency frame evoking words2 and their (valency) complementaions [ 1 ]); such words is said to constitute a reduction component. Even in such cases it is usual to determine governing-dependent pairs on the layer of surface syntax (a-layer). In such a case, it is necessary to de ne special rules for particular language phenomena.

When simplifying input sentence, it is necessary to apply certain elementary principles assuring adequate analysis:

2 A frame evoking word is a lexical word (verb, noun, adjective or adverb) that requires a set of syntacticosemantic complementations, as e.g. the verb to give requires three complementations, namely actor (ACT, who gives something), patient (PAT, what is given) and addressee (ADDR, to whom something is given). lency complementations are reduced there, the frame { dl { deletes the visited symbol and shifts the head evoking predicate and its complementations can be on its right neighbor; processed on the t-layer. Let us describe the analysis { wr[b] { rewrites the visited symbol by the symbol b; of this component in more detail. { pb { serves for marking (putting a pebble on) the

First, a prepositional group ve+state `in (the) visited item only: marked items are used as nodes state' is identi ed in the sentence { it is processed in DR-trees (in any other aspect it is an empty in a single step on `surface' layers (its w- and m- sym- operation, see later); bols are deleted and a-symbols are included into the { accept { halts the computations and accepts the a-structure as adverbial modi cation of the verb m t input word. `to have'). The whole prepositional group will be represented as a single t-node stat , see Figure 1. It will be Of course, neither the left sentinel c nor the right identi ed as a local valency complementation LOC on sentinel $ must be deleted. At the right end of the the t-layer. Second, the noun s dlo `residence' is pro- tape M either halts and accepts, or it halts and rejects, cessed on the surface layers and marked as a valency or it restarts, that is, it places its window over the left PAT complementation on the t-layer. Third, the sym- end of the tape and reenters the initial state. It is bol for elided subject, which is restored on the t-layer, required that prior to the rst restart step and also is marked as ACT valency complementation. All the between any two restart steps, M executes at least valency complementations of the predicate are identi- one delete operation. During each step, M can change ed now, the valency frame Frame1 is saturated. its internal state.

Next, we focus on the modal verb mohla `(she) We can see that any nite computation of M concan'. On the a-layer, this is a governing node of the sists of certain phases. A phase, called a cycle, starts in lexical verb m t ; the respective edge is created, which a restarting con guration, the window is moved along results in a non-projective a-tree (the symbols on the the tape by performing its operations until a restart surface layers are deleted). On the other hand, modal operation is performed and thus a new restarting converbs are accounted functional verbs in FGD and thus guration is reached. If no further restart operation represented as attributes of lexical verbs in t-trees (m t is performed, each nite computation necessarily nin our case, value `pass' in the respective t-node). Thus ishes in a halting con guration { such a phase is called the non-projectivity is eliminated in the t-tree. a tail. We assume that no delete and rewrite operation

Now the predicate with its complementations is executed in a tail computation.

ACT, PAT and LOC (Frame1) can be identi ed as The notation u `cM v denotes a reduction perthe core predicative structure on the t-layer, the rel- formed during a cycle of M that begins with the tape evant edges for individual valency complementations inscription cu$ and ends with the tape inscription cv$; are created and the simpli ed sentence is accepted (in the relation `cM is the re exive and transitive closure the accepting step, the nal full stop is processed on of `cM . the surface layers). A string w 2 is accepted by M , if there is an accepting computation which starts in the restarting con guration with the tape inscription cw$ and ends 2 Restarting automata by executing the accept operation. By LC (M ) we denote the language consisting of all words accepted by M ; we say that M recognizes (accepts) the character

First, we introduce a relevant type of a simple restart

ing automaton { sRL-automaton { rather informally.

From technical reasons, we do it in a slightly di erent istic language LC (M ). way than in [ 8 ]. Further we will refer to a sRL-automaton M as

An sRL-automaton M is (in general) a nondeter- a tuple M = ( ; c; $; R(M ); A(M )), where is a charministic machine with a nite-state control Q, a - acteristic vocabulary (alphabet), c and $ are sentinels nite characteristic alphabet , and a head (window of not belonging to , R(M ) is a nite set of restarting size 1) that works on a exible tape delimited by the instructions over and A(M ) is a nite set of acceptleft sentinel c and the right sentinel $ (c; $ 62 ). For ing meta-instructions over . an input w 2 , the initial tape inscription is cw$. Remark: sRL-automata are two-way nondeterministic To process this input, M starts in its initial state q0 automata which allow to check whole input sentence with its window over the left end of the tape, scanning prior to any changes. It resembles linguist who can the left sentinel c. According to its transition relation, read the whole sentence rst and reduce the sentence M performs the following operations in the individual in a correct way afterward. We choose nondeterminissteps: tic model to enable various orders of reductions. This { moves to the right or to left { shift the head one can serve for veri cation of independency between inposition to the right or to the left; dividual parts of a sentence.

Similarly as in [ 8 ], we use a restricted type of sRL- other hand, it is easy to design a nondeterministic automata for which the number of rewrite, delete and sRL-automaton which is not correctness preserving pebble operations made per cycle is limited by a con- (see [ 8 ]). We consider only the correctness preserving stant. Such sRL-automata can be described by (meta)- automata in the sequel in order to model adequately instructions, which describe the moves of the head and the analysis by reduction. the changes of the states implicitly. Each cycle of M is described by a single restarting instructions over 2.1 of the following form:

Restarting automata enhanced with DR-structures

IR = (c E0; [a1]1o1; E1; [a2]2o2; E2; : : : ; Es 1; [as]sos; Es $; Restart), where { E0; E1; : : : ; Es (s > 0), are regular languages over (usually represented by regular expressions); { o1; ; os 2 fdl; pb; wr[b]g, where b 2 . { The symbols a1; a2; : : : ; as 2 correspond to the symbols on which the corresponding operations fo1; ; osg are executed.

Let us de ne auxiliary function o : operation o 2 fdl; pb; wr[b]g: ! for each { pb(ai) = ai, { dl(ai) = , and { wr[b](ai) = b.

In this section we introduce enhanced restarting au

tomata, so called sERL-automata. During their computations, these automata build structures consisting of deleted, rewritten, or marked items and of directed edges between pairs of such items.

Enhanced restarting automata sERL-automata were formally introduced in a restricted form in [ 3 ]. Their formal de nition is rather long and very technical. So we prefer to use an informal description of the model and concentrate on their possible applications. In contrast to sRL-automata, there can be attached a so called DR-structure to any item of the tape of an sERL-automaton.

A DR-structure is a slight generalization of DR-trees used in [ 7 ]. It is an oriented graph D = (V; Hd), where V is a nite set of nodes, and Hd is a nite set of edges. A node u 2 V is a tuple u = [i; j; a], where a is a symbol assigned to the node, i; j are natural numbers; i represents the horizontal position of the node u, j represents the vertical position of u (it is equal to 0 or to the number of nodes with the same horizontal position i from which there are oriented paths to u). An edge h of D is an ordered pair of nodes h = (u; v). We de ne two types of edges: When trying to execute IR starting from a tape inscription cw$, M will get stuck (and so reject), if w does not admit a factorization of the form w = v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i = 0; : : : ; s. On the other hand, if w admits factorizations of this form, then one of them is chosen nondeterministically, and cw$ is transformed (reduced) into cv0o1(a1)v1

vs 1os(as)vs$.

Tails of accepting computations are described by

accepting instructions over of the form IA = (c E0; [a1]1; E1; [a2]2; E2; : : : ; Es 1; [as]s; Es $; Accept), where individual Ei are regular languages over .

For our linguistic application (i.e., modeling FGD), we consider the accepting instructions with nite Ei's We say that D = (V; Hd) is a DR-tree if the graph D only. is an oriented tree (with a single root, in which all

A tail performed by the instruction IA starts with maximal paths in D end). the inscription on the tape cz$; if z 2 E0a1 asEs, Each sERL-automaton Me is actually an sRLthen M accepts z (and the whole computation as well). automaton with enhanced instructions. An enhanced Otherwise the computation halts with rejection. This instruction is a pair Ie = (I; G) consisting of an special form of accepting instruction is introduced instruction I of a sRL-automaton and an acyclic with regard to the future enhancements of restarting graph G representing the required structure for symautomata. bols processed { deleted, rewritten or marked (pebThe class of all sRL-automata are denote as sRL. bled) { during the application of the instruction I. The A crucial role in our applications has the following restrictions put on the set of edges of G are described property of restarting automata. below together with the application of an enhanced (Correctness Preserving Property) A sRL-auto- instruction Ie = (I; G), where G = (U; H), on a tape maton M is correctness preserving if u 2 LC (M ) and containing cw$. u `cM v imply that v 2 LC (M ). All the symbols on the tape are stored in so called

It is already known that all deterministic items which are actually nodes of a DR-structure. I.e., sRL-automata are correctness preserving. On the a symbol x is stored in a node [i; j; x], where i is { Oblique edge: h = (u; v), where u = [iu; ju; a], v =

[iv; jv; b] and iu 6= iv; { Vertical edge: h = (u; v), where u = [iu; ju; a], v = [iv; jv; b] and iu = iv, jv = ju + 1; its horizontal position. Initially, horizontal position of n-th input symbol of the input equals n, j = 0 and there are no edges between the items.

Restarting instruction. If I is a restarting in

struction I = (c E0; [a1]1o1; E1; [a2]2o2; E2; : : : ; Es 1; [as]sos; Es $; Restart) then o1; ; os 2 fdl; pb; wr[b]g (b 2 ) are the operations performed on symbols fa1; ; asg. Let oi1 = wr[bi1 ]; : : : ; oir = wr[bir ], for some r 0, be all rewrite operations from fo1; : : : ; osg.

Let G = (U; H), where U = f1; 2; : : : ; s; i01; i02; : : : ; i0rg, and let the nodes correspond to the symbols a1; : : : ; as and symbols bi1;: : :;bir , respectively. An edge (u;v) 2 H is of one of the following forms: Then two consecutive applications of I1 on the word aaabbbccc will result in the following sequence of tape contents (in the gure the tape content consists of the bold items depicted in the upper horizontal part of a picture for a particular con guration): [0,0,c][1,0,a][2,0,a][3,0,a][4,0,b][5,0,b][6,0,b][7,0,c][8,0,c][9,0,c][10,0,$] [0,0,c][1,0,a][2,0,a]

[5,0,b][6,0,b][7,1,X][8,0,c][9,0,c][10,0,$] 1. deleting: u = i corresponds to a deleted symbol ai [3,0,a] [4,0,b] (hence oi = dl, 1 i s) and v = j for some j 2 U , j 6= i. Let us note that the deleting edge is Accepting instruction. If I is an accepting inalways oblique. struction I = (c E0; [a1]1pb; E1; [a2]2pb; E2; : : : ; Es 1; 2. rewriting: u = i corresponds to a rewritten sym- [as]spb; Es $; Accept) then the symbols fa1; ; asg bol ai (hence oi = wr[bij ], bij 2 , 1 i s, are pebbled and they correspond to the nodes in 1 j r) and v = i0j . Let us note that the rewrit- U = f1; 2; : : : ; sg. All edges (u; v) 2 H are oblique and ing edge is always vertical. have the same properties as deleting edges in graphs An application of Ie on a word w consists in: enhancing restarting instructions.

An application of Ie = (I; G), with G = (U; H), on a word w consists in: 1. Choosing a factorization of cw$ of the form w = v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i = 0; : : : ; s. On the other hand, if w does not admit any factorization of this form, then I cannot be applied on w. 2. Rewriting the tape containing cw$ into the tape

containing cv0o1(a1)v1 vs 1 os(as)vs$. 3. For each edge e 2 H a new edge is inserted into the current DR-structure.

If e = (i; j) is a deleting edge, an oblique edge from the item containing deleted ai into the item containing the symbol corresponding to j (either aj , if 1 j s, or bj , when j 2 fi01; : : : ; i0rg) is inserted.

If e = (i; j) is a rewriting edge, a vertical edge from the item containing ai into the item containing bj is inserted (j 2 fi01; : : : ; i0rg) and the vertical position of the new item containing bj is set to the value by q + 1, where q is the vertical position of the item containing ai.

If there was a DR-structure attached to some of the deleted of rewritten cell, the structure is preserved and combined into a larger graph.

1. Choosing a factorization of cw$ of the form w = v0a1v1a2 : : : vs 1asvs such that vi 2 Ei for all i = 0; : : : ; s. On the other hand, if w does not admit any factorization of this form, then I cannot be applied on w. 2. For each edge e = (i; j) in H a new oblique edge is inserted into the current DR-structure. The inserted edge starts from the item containing ai and ends in the item containing symbol aj .

If there was a DR-structure attached to some of the connected items, the structure is preserved and combined into the resulting graph.

Example 2: Let I2 = ((c; [a]1pb; ; [b]2pb; X ; [c]3pb; $; Accept); D2) be an enhanced accepting instruction with the graph D2 of the following form: 1 2

Then after applying I2 on the resulting DR-structure

Example 1: Let I1 = ((c a ; [a]1pb; ; [a]2dl; ; from Example 1 we obtain the following nal DR[b]3dl; b X ; [c]4wr[X]; c $; Restart); D1) be an en- structure: hanced restarting instruction with the graph D1 of [9,0,c] the following form: 1 4'

[6,0,b] [7,1,X][8,1,X] [2,0,a] [5,0,b]

[7,0,c] [8,0,c] Computation of enhanced restarting automata. 3 FGD as a (4)-sERL-automaton A (restarting) con guration C = (T; D) of a computation by Me = ( ; c; $; ER(Me); EA(Me)) consists of Our ultimate goal is to model FGD { we consider the set T of items representing current tape content a (4)-sERL-automaton MFD to be a suitable formal and a DR-structure D. By an application of an en- frame for this linguistic theory. hanced instruction I, the tape content is changed and Let MFD = ( ; c; $; ER(MFD); EA(MFD)); conthe DR-structure can grow. The initial con guration sists of four parts = w [ m [ a [ t which corC0 = (Tw; D;) for an input word w = x1 : : : xn con- respond to the respective layers of FGD (Section 1.2). sists of the set of items representing the initial con- Recall that the symbols from individual layers can be tent of the tape Tw = f[0; 0; c]g [ f[0; i; xi] j i = quite complex. 1; : : : ; ng [ f(n + 1; 0; $]g and the empty DR-structure A language of layer ` 2 fw; m; s; tg accepted D; = (;; ;). By application of an enhanced instruc- by MFD is obtained as a projection of the characteristic tion I, a con guration C is transformed onto a new language onto `, i.e., L`(MFD) = Pr ` (LC (MFD))). con guration C0. The characteristic language LC (MFD) contains in

An input word w is accepted by Me if there exists put sequences (over w) interleaved with metalana computation starting with the initial con guration guage information in the form of symbols from m [ for w which ends in a con guration Ca by an applica- a [ t. Then, the language of correct sentences of the tion of an accepting enhanced instruction. The result studied natural language is Lw = Pr w (LC (MFD)). In of the computation is the DR-structure of Ca. our case, it de nes the set of correct Czech sentences.

Similarly as for sRL-automata, we de ne the char- Similarly, a DR-language of layer ` accepted by acteristic language of Me as LC (Me) = fw j w 2 and MFD is obtained as DR`(MFD) = Pr ` (DR(MFD)), Me accepts wg. Moreover, DR-language of Me is the ` 2 fw; m; s; tg. Let us note that DRw(MFD) and set DR(Me) = fD j D is a result of some accepting DRm(MFD) are empty (Lw and Lm are string lancomputation of Meg. guages). Further, DRa(MFD) and DRt(MFD) are languages of DR-trees. Each DR-tree T 2 DRt(MFD) is projective (with respect to its descendants); that is, 2.2 Enhanced automata with several layers for each node n of the DR-tree T all its descendants (including also the node n) constitute a contiguous sequence in the horizontal ordering of nodes of the First, we introduce a technical notion of projection. Let tree T . On the other hand, trees from DRa(MFD) can and ( ) be alphabets. The projection from be in general non-projective. onto denoted as Pr is the morphism de ned as The DR-language DRt(MFD) represents the set of a 7! a (for a 2 ) and A 7! (for A 2 r ). meaning descriptions in FGD whereas and DRa(MFD) Similarly, we de ne the projection of languages: Pr : models the set of (surface) syntactic trees. P( ) 7! P( ). Let us note that Lt(MFD) is designed as a deter

Similarly as above, we introduce a projection ministic context-free language. Readers familiar with for DR-structures. Let D be a DR-structure from restarting automata can see that LC (MFD) is a dea DR-language over an alphabet , Pr (D) denotes terministic context-sensitive language and Lw(MFD) is a DR-structure over that is obtained from D by a context-sensitive language. removing all nodes with (at least one) symbol from We have not mentioned so far the edges from r and all edges incident to that nodes. A projec- DR-structures from DR(MFD) which have the edges tion may be in an obvious way extended onto projec- with nodes (their symbols) in di erent layers. These tion of a DR-language over . edges serve for connecting the corresponding lexical

Let 1; : : : ; j , for some j 1, be a se- units on di erent layers (see dotted lines in Figure 1 quence of pairwise disjoint alphabets and = and are obtained by applications of extended restart1 [ [ j . We say that sERL-automaton Mj = ing meta-instructions of MFD. ( ; c; $; ER(Mj ); EA(Mj )) is an enhanced simple Here such an edge connects nodes on neighboring sERL-automaton with j layers ((j)-sERL-automaton layers only. I.e., this edges connect w-layer nodes to for short) if the following assumptions are ful lled: m-layer nodes, m-layer nodes to a-layer nodes, a-layer nodes to t-layer nodes, and nothing else (see Figure 1). { Mj is correctness preserving ; { Mj is allowed to rewrite a symbol from i (for 1 i j) by a symbol from the same sub-vocabulary

i, only. We refer to the symbols from i as the symbols on layer i (or i-symbols).

Concluding remarks In this paper, encouraged by [9, 10], we extend the formal model of natural language description based on FGD so that it outputs neither lists of words nor lists of symbols but the

so called reduction language and a set of complex DR-structures. We plan to study the relation between reduction languages and DR-languages more deeply in the near future.

The novelty of this contribution consists in (i) the extension of enhanced restarting automata in order to produce tree structures with several interlinked layers and (ii) the application of these automata to the straticational description of a natural language. Moreover, we outline a formalization of the basic methodology of FGD in terms derived from the automata theory and from the theory of parsing schemata as well. We envisage that the proposed methodology is not FGDspeci c and that similar approach can be used to obtain a formal frame for other language descriptions, as e.g. those presented in [ 6 ] and [ 11 ].

Lopatkova ,

Platek , V. Kubon: Modeling syntax of free word-order languages: dependency analysis by reduction . In: Matousek, V. , Mautner , P. , Pavelka , T. , (eds), Proceedings of Text, Speech and Dialogue International Conference, TSD 2005 , Volume 3658 of LNAI, Berlin Heidelberg, Springer-Verlag ( 2005 ) 140 { 147 .

Lopatkova ,

Platek , P. Sgall: Towards a formal model for functional generative description, analysis by reduction and restarting automata . The Prague Bulletin of Mathematical Linguistics , 87 , 2007 , 7 { 26 .

Platek ,

Mraz , M. Lopatkova: Restarting automata with structured output and functional generative description . In: Dediu, A. , Fernau , H. , MartinVide, C. (eds), Proceedings of the Fourth International Conference Language and Automata Theory and Applications , LATA 2010 , Berlin Heidelberg, SpringerVerlag, 2010 , 500 { 511 .

Sgall ,

Hajicova , J. Panevova: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects . Reidel, Dordrecht ( 1986 ).

Hajic ,

Hajicova ,

Panevova ,

Sgall ,

Pajas ,

Stepanek ,

Havelka ,

Mikulova , M. : Prague Dependency Treebank 2.0.

Linguistic

Data Consortium , Philadelphia, PA, USA, 2006 .

6. I.A . Mel 'cuk: Dependency syntax: theory and practice . State University of New York Press, Albany, 1988 .

Holan ,

Kubon ,

Oliva , M. Platek: Two useful measures of word order complexity . In: Polguere, A. , Kahane , S. (eds), Processing of Dependency-Based Grammars: Proceedings of the Workshop (COLING/ACL'98) , Montreal, ACL, 1998 , 21 { 28 .

Messerschmidt ,

Mraz ,

Otto , M. Platek: Correctness preservation and complexity of simple RLautomata . In: Implementation and Application of Automata , Volume 4094 of LNCS, Berlin Heidelberg, Springer-Verlag, 2006 , 162 { 172 .

Gramatovici ,

Mart n-Vide: Sorted dependency insertion grammars . Theor. Comput. Sci. , 354 ( 1 ), 2006 , 142 { 152 .

10.

Bensch , F. Drewes: Millstream systems . Report UMINF 09.21 , Umea University, 2009 .

11. J. Kunze: Abhangigkeitsgrammatik. Volume XII of Studia Grammatica . Akademie Verlag, Berlin, 1975 .