Autonomous Applications – Towards a Better Data Integration Model András Benczúr1 , Zsolt Hernáth1 , and Zoltán Porkoláb2 1 Eötvös Loránd University, Faculty of Informatics Dept. of Information Systems Pázmány Péter sétány 1/C H-1117 Budapest, Hungary abenczur@ludens.elte.hu hernath@ullman.inf.elte.hu 2 Eötvös Loránd University, Faculty of Informatics Dept. of Programming Languages and Compilers Pázmány Péter sétány 1/C H-1117 Budapest, Hungary gsd@elte.hu Abstract. One of the most important and critical part of integrating al- ready existing standalone applications is to design and implement a com- mon data model and the corresponding data access layer which makes both data sources and processed results being shared and accessible over the applications in question. In case of even well-architected applications or application systems, establishing a common data model and the layer that gives access to data costs relatively large human and computer de- velopment resources. The problem of integration may be investigated from several aspects. The esence of these aproaches are the same: trying to achieve run-time environment independent applications’ logic. One as- pect is OMG’s Model Driven Architecture Frame Work [5]. The primary goals of OMG’s MDA are portability, interoperability and resuability through architectural concernes of specifying Application’s logic, their operational environments, and technical aspects of their implementation details, and mappings between them. This paper views the same problem but with focus on different structural apearances of applications’ data, mappings between them, and possible integration of such data models. We call applications autonomous, if they are independent of their all time run-time data access environment. Concerning applications’ autonomy, our base idea is that the most natural media that is able to carry infor- mation on structural apperance of data and mappings between them are data themselves, using Document Data Model also presented here. 1 Introduction Applied computer science and informatics by now plays a central role in our everyday life. The possibility of getting scientific, technical, business and quite general everyday on-line information accessing the World Wide Web database, together with having personal computers and Internet access possibilities at rea- sonable prices has changed our everyday life. In professional life, beyond infor- mation’s access, processing and integrating information coming from different 150 151 fields and from heterogeneous data sources is more important, so that, infor- mation processing needs integrated application systems, rather than standalone applications. Information industry vendors provide for various integrated appli- cation and information processing systems, some of them offer much less, than what could be many times achieved by bringing together already existing and ef- ficiently used standalone applications over the same or different technical fields. There are lots of already existing and still efficiently used standalone applica- tions, and some of them are reasonable to be brought together with others. Figure 1 below shows two standalone applications with heterogeneous data sources, Figure 2 and 3 present two versions of an integrated system consisting of the standalone applications on Figure 1. As it is seen, the integrated system on Figure 2 does not implement a unified common shared data model. The only thing that happened is that a common I/O layer is established in order to make bidirectional mapping, and even cross mapping between data source models and application data models. In the practice the integration middleware implements an interface to the union of the applications’ domains. The integrated system on Figure 3 provides a unified common application and data source model, and a common unified I/O layer as well, for the integrated application system. Fig. 1. Standalone applications A significant part of the necessary knowledge to perform such conversions are data mappings that is only partly application-specific, yet some of those (e.g. retrieval and storage of particular pieces of data) - even in well-architected ap- plications as well - is sometimes hard coded. In contrast of all above, we call applications autonomous, if they are independent of their all-time run-time data access environment. The idea of the notion of applications’ autonomy has in fact been set as a generalization of an XML document based common data model - called Document Data Model - as one of our proposals for a real industrial need of integrating the two standalone CAD applications shown in Figure 1. 152 Fig. 2. Traditional solution 1. – a kind of gluing together. The two standalone applications has been designed and partly implemented in heterogeneous run-time data environment, and even, they established different application object models according to different standards: CIS/2 and IFC stan- dard object model [1–3]. Though the two standards are in relation to each other, they have been defined for different goals. Instead of establishing a common ob- ject and e.g. a relational storage model and its middleware for the integrated application system, our proposal was a DOM (Document Object Model) based middleware (object cache), and an XML document based mapping between CIS2 and IFC as well as the corresponding storage environments [4, 6]. To support the implementation of the DOM based middleware, a non-full EXPRESS parser [1] has been implemented that, from EXPRESS schemata descriptions, generates C++ class skeletons and necessary type definitions. Fig. 3. Traditional solution 2. – another kind of gluing together. 153 Autonomous applications’ data access from and to their all-time run-time data environment is modeled by applying mappings between abstract domains, the semantics of those is defined by real run-time domains and data access engine. Concerning applications’ autonomy, our base idea is that the most natural, and probably the most competent media that is able to carry information on seman- tics of abstract domains and mappings between them are data themselves. Any application in general can be considered as sequences of optional data retrieval from its run-time environment, processing data, and optionally dis- playing or storing/restoring processed data. We would like to outline, that from the general perspective above the word data is used in a quite general sense, it may refer to values of particular domains including numbers, letters, pictures, events, processes and also processors. According to the above, an application may formally be viewed as a 3-tupple A(D, E, α), where D is a strictly application- specific non-persistent data model - a set of containers of data of atomic and/or constructed and also constrained types - called application cache memory, E is called run-time data environment that in general implements persistent ap- plication data, and α : D × E 7→ E is a mapping, called application logic. In the above model α implements all necessary non-application-specific knowledge about accessing data from and to E. To purge away all non-application-specific knowledge concerning data access from the application logic, we develop a finer and more specific model that basically concerns mappings between two partic- ular domains: one that models industrial, scientific or business technologies - called schema, and another one that models a set of data carrier means - which together is called a medium - some of them preserves data, others makes data visible in some particular form. In our model, mappings between the schemata and media are also considered as an abstract domain called access environment. 2 Schema, Medium and Abstract Access Environment Assume, there is a finite set Ta of named (i.e. uniquely identified) atomic (i.e. un- structured) types, a finite set CS = {device, file, record, field} of what are called container sorts, and a finite set of rules by means of which complex data types and container sorts can be composed. Assume further on, that for each atomic data type, there is a value null, and that atomic data types are ordered sets, that is, two instances of the same atomic type are comparable. For any atomic data type, an instance of value null indicates that no particular value is instan- tiated. Whenever we speak about data types, we always think of finite subsets of infinite sets of particular values, and whenever we speak about data or container sorts, we think always of some structuring of complex values or containers. For instance, if int ∈ Ta , int is a finite subset of the set Z = {0, +1, −1, +2, −2, . . .} of whole numbers. Staying still by this example, int is an unstructured or raw type. Opposite to constructed types, there is no way to refer to a ”part” of an instance of a raw type. For example, there is no way to refer to a value of 23 as part above 100 of the value of 123. In the case of constructed types particu- 154 lar parts of data instances can be referred to according to the sorts of such types. 2.1 Schema Data types are to define applications’ internal data model, also referred to as application domain. For our would-be model only a minimal data type system – just for illustration – is developed as follows: (a) members of Ta are data types; (b) if for each j ∈ [1, m] for some natural number m, xj are distinct (attribute) names and tj are (named) data types, then t < KP ⇒ [x1 : t1 , . . . , xm : tm ] > is a data type named t, and called entity. KP is a (possible empty) subset of the set {x1 , . . . , xm }, keeping this order, and is called primary key constraint. Sometimes we use the notation t.y to refer to an attribute named y of a type named t; (c) if t is a data type, then v{t} is a data type named v, and called bag. An instance of type {t} is a bag of instances of type t; (d) given the data types t1 , . . . , tm , u[t1 , . . . , tm ] is a data type named u, and called union. An instance of type u is an instance of any one of types t1 , . . . , t m . (e) given a finite set of named types T , a finite set of referential constraints C, a pair S(T , C) is a data type named S, and called schema. A referential constraint is of form φ ⇐ {ψ1 , . . . , ψr}, where φ is a parent term, and ψj for each j ∈ [1, r] is a child term. Given t < Kpt ⇐ [x1 : t1 , . . . , xmt : tmt ] >, a parent term for type t is Kpt itself of the form {t.xi1 , . . . , t.xint }. A child term for the parent term above, and for type u[Kpu ⇐ [y1 : u1 , . . . , ymu : umu ] > is of form u[%mu1 ymu1 , ..., %mus ymus ], where {ymu1 , ..., ymus } ⊆ {y1 , . . . , ymu }, s ≤ nt , and %mul for each l ∈ [1, s] stands for one of the binary relations equality (=), element of (∈), subset (⊆) and superset (⊇), depending on the types tmtl and umul . A child term is called a foreign constraint if it refers to each attribute of the primary key constraint of the parent term. (f ) if t is a data type r[∗t] is a data type, named as r, called ref erence to t. Based on the comparability of instances of atomic data types, we assume that instances of the same constructed type are also comparable, also in the case of bags. According to the latter, if T is a data type, U and V are instances of type {T }, and it is an instance of type T , then expressions like it ∈ U , U ⊆ V are computable comparisons. 2.2 Medium For container sorts, a rather strict hierarchy with device on the top is available. Possible constructions of container sort hierarchies are as follows: 155 (a) a named device is a particular container sort that may be either raw (un- structured) or sorted as a finite set of named f iles. A device is homogeneous concerning accessing its data; (b) a named file is a finite set of possible differently structured (or sorted) named records. Specifying the structure of a file, XML-like regular expressions are to be used; (c) a named record is a tuple of named fields. Describing a record is similar to that of an entity type, but instead of named types named container sorts are to be used. Just like an entity type, a record may also specify a primary key constraint in the same form as there, but the roles of set of attributes are played by a set of named fields here; (d) a named field may be either raw or of sort of a named container sorts but f ield, or a sort of reference to any of named container sorts but itself. There is a difference between a container sort or a reference to that: an instance of a reference to a named container sort refers to a uniquely identified instance of that container sort; (e) if s1 , . . . , sl are named container sorts, su [s1 , . . . , sl ] is a container sort named su , and called union. An instance of sort su , is an instance of any one sort of sorts s1 , . . . , sl ; (f ) given a finite set of raw or sorted named devices D, a finite set of ref- erential constraints C, a pair M (D, C) is a container sort named M , and called medium. Referential constraints here have the same form as that of schemata’s, but the role of types and the role of attributes are now played by named records and named fields, respectively; (g) if s is a named container sort, rs [∗s] is a container sort named rs , and called reference to s. 2.3 Abstract Access Environment Given a schema S(TS , CS ), and a medium M (DM , CM ), suppose, TS = TP ∪ TC with TP ∩ TC = Ø is held. A total mapping σ : S 7→ 2M is a serialization of schema S over the medium M , if the following rules are satisfied. – The inverse mapping σ −1 : 2M 7→ S is a partial mapping such that σ −1 (FM ) for some FM ⊂ 2M is defined, iff there is T ⊆ S such that FM = ∪t∈T σ(t), and σ ◦ σ −1 on each T ⊆ S is the identity. – Let Π ⊂ 2TP denote a partition over TP (i.e. ∪Π = TP and elements of Π are pairwise disjoint), then for each T ∈ Π, let σ(T ) = D be such that, (i) D ∈ DM is a raw device; (ii) σ defines and preserves a particular ordering over Π, and also over each T ∈ Π; (iii) for each distinct pair U, V ∈ Π σ(U ) 6= σ(V ) holds. – Each t ∈ TC is mapped over named fields of M under particular sort- and constraint restrictions listed below: (i) distinct elements of Ta ∩ TC are mapped to distinct raw fields: a field that an element of Ta is already mapped to becomes of that type; 156 (ii) a bag type is mapped to a field of sort file, or a sort reference to a file; (iii) a reference to a type is mapped to a field of sort reference to an ade- quate sort. For set types f iles are adequate sorts. For a type {t} ∈ Π, the devecie that t is mapped to is an adequate sort. For entity types, adequate sorts are records or f ileds; (iv) for entity types, attributes of type different from an entity type are mapped according to (i)-(iii). Attributes of some entity type are mapped as if it were a reference to an entity type, or to a field of sort reference to a record; (v) σ preserves type integrity: assume, the image of the attributes of type t < Kpt ⇐ [x1 : t1 , . . . , xm : tm ] > with Kpt = {xi1 , . . . , xin } by σ are distributed over the records r1 , . . . , rs with primary key constraints Kpr1 , . . . , Kprs . Assume furthermore that for each k ∈ [1, s], there exist a cMk ∈ CM of form φrk ⇐ {ψk1 , . . . , ψkp }, and consider the foreign key constraints, φrk ⇐ {ψrj : j ∈j6=k [1, s]} for each k ∈ [1, s], where φrk refers to rk as parent, and ψrj for each j ∈j6=k [1, s] refers to rj as a child, suppose, Ck denotes the set {ψrj |j ∈j6=k [1, s]}. Then, on the one hand, ∪sl=1 Kpil ⊆ {σ(t.xi1 ), . . . , σ(t.xin )}, and on the other hand, CK ⊆ {ψk1 , . . . , ψkp } shall hold. (vi) σ preserves also schema’s integrity: keeping denotations of (v), as- sume in addition, that images of types u1 , ..., ur in foreign key constraint {t.xi1 , ..., t.xin } ⇐ {u1 [%u11 yu11 , ..., %u1n yu1n ], ..., ur [%ur1 yur1 , ..., %ur1 yurn ]} are spread over records, Ru11 , ..., Ru1q1 , . . . , Rur1 , ..., Rurqr , and consider the foreign key constraints φrk ⇐ {ψrkum : m ∈ [1, r], j ∈ [1, qm ]} for j each k ∈ [1, s], where φrk , just like in (v), refers to rk as parent, and ψrkum refers to Rumj as child. Then for the foreign key constraint cMk j for each k ∈ [1, s], {ψrkum : m ∈ [1, r], j ∈ [1, qm ]} ⊆ {ψk1 , . . . , ψkp } shall j hold. Given a schema S(TS , CS ) and a medium M (DM , CM ), S is said serializable over M , if there is some serialization σ of schema S over medium M . If so, a 3-tuple ε(S, M, σ) is called an abstract access environment for S and M . 3 Instantiation, Data Exchange, Autonomy In the following, from autonomy’s point of view, the base abstract domains are schema, medium, and access environment. Applications’ data environment are modeled as instances of domains above. 3.1 Instantiation An instantiation of a schema S is a swarm (i.e. a bag or set) of instances of types of S. A schema instantiation is allowed to produce either replicas of the same instance of the same type or no instance at all of particular types. We’d like to 157 outline, we don’t concern here with whether or not a schema instance is consis- tent, i.e. no primary or foreign key constraint is violated. It is the responsibility the application that creates instances and operates on them. An instantiation of a medium M is to carry, i.e., temporarily store, preserve, and display data. A medium instance is a finite set of uniquely identified in- stances of named devices, each of which may either be raw, or sorted. A sorted named device is a finite set of instances of named files, each of them uniquely identified on the device. Unlike schema instantiations, a medium instance only for container sorts record and field, is allowed to have replicas of instances (i.e. that carries replicas of type instances). Instances of records are tuples of instances of fields. Recall, that we want application data to carry all necessary non-application- specific information, so that from now on, we assume that instances of types and also of container sorts carries all necessary information on their types and sorts. In addition, when we consider instances of schemata and media, they are assumed to carry schemata’s and media’s integrity constraints. Given a schema S, a medium M , and an abstract access environment ε(S, M, σ) for S and M , an instance over S over an instance over M can easily be serialized induced by σ: unless primary key constraints are violated, each atomic piece of data (data of atomic types) of the instance over S has to be mapped to a new instance of the container sort (i.e. a container of that sort) the type of the atomic data by σ is mapped to. The mapping induced by σ is denoted by σ , and a medium instance that carries serialized data of a schema instance is called the total serialization of that schema instance. Similarly to the above, the inverse mapping σ −1 also induce a partial mapping – denoted by σ−1 – which maps particular subsets of serialized data of an instance over a medium to subsets of an instance over a schema. For given instances IS and IM over S and M , a 3-tuple ∆(IS , IM , σ ) is considered as an instantiation of ε(S, M, σ), and called abstract access method between instances IS and IM . Since instances IS and IM carries all necessary information on S and M , mappings σ and σ−1 are given for the abstract access method ∆(IS , IM , σ ) between IS and IM . 3.2 Data Exchange Assume schema S is serializable over medium M , and let ε(S, M, σ) be an ab- stract access environment for S and M . Let IS denote the (infinite) set of all possible instances over S, and IMIS denote the set of total serializations of all elements of IS i.e. the set {σ (IS ) : IS ∈ IS }. A KS ∈ IS is called a key expression over S, if it is a set. An element of a key expression over S is called key term over S. A key term is considered as an instance pattern, and determines a (possible empty) subset or sub-bag of any particular IS ∈ IS based on some type-dependent pattern matching (e.g. in the case of atomic types a type-dependent pattern matching is checking equality of values of the same type), by that null values are not considered. Since serializations map integrity constraints of schemata onto integrity constraints of media, the im- age of a key expression over S by σ is a key expression over IMIS . Given 158 particular instances IS ∈ IS , IM ∈ IMIS and a key expressions KS ∈ IS , KM ∈ IMIS , let IS (KS ) ⊆ IS and IM (KM ) ⊆ IM denote the subsets that match KS and KM , respectively, which themselves are also particular instances over S and M at the same time. Let KS ∈ IS be a key expression; the abstract access method ∆(IS , IM , σ ) between IS and IM together with key expression KS determines particular instances IS − IS (K) ∪ σ−1 (IM (σ (KS ))) ∈ IS , and IM − IM (σ (KS )) ∪ σ (IS (KS )) ∈ IMIS , called a partial input data access from IM into IS , and partial output data access from IS onto IM , respectively. In the above formulas, only KS is out of scope of ∆(IS , IM , σ ), so one may asso- ciate a tuple (KS , ∆(IS , IM , σ )) with the set IS − IS (KS ) ∪ σ−1 (IM (σ (KS ))), and a tuple (∆(IS , IM , σ ), KS ) with the set IM − IM (σ (KS )) ∪ σ (IS (KS )). 3.3 Autonomy Given a schema S(TS , CS ), a medium M (DM , CM ), assume S is serializable over M and ε(S, M, σ) is an abstract access environment for S and M . Sup- pose, IS , denote the set of all possible instances over schema S, and IMIS be the set of the total serializations of all elements of IS . Let Iε(S,M,σ) de- {} note all abstract access method instances over ε(S, M, σ). Let IS ⊂ IS de- note the set of possible key expression over S. Let ac , dc , δ(ac ,dc ,σ ) denote variables of types ∗S, ∗M and ∗ε(S, M, σ) called application cache, data car- rier and access interface, respectively. For particular instances IS ∈ IS and IM ∈ IMIS let us assume, that ∗ac = IS and ∗dc = IM hold , and let us fur- {} {} thermore define AD as AD = (Iε(S,M,σ) ∪ {∅}) × IS ∪ IS × (Iε(S,M,σ) ∪ {∅}). An autonomous application can now be formalized as A(ac , dc , δ(ac ,dc ,σ ) , α), where α : AD 7→ IS ∪ IMIS ∪Iε(S,M,σ) a mapping, called application logic, such that (i) for each a ∈ AD of any of the forms (IS , ∅) or (∅, IS ), α(a) is a mapping over IS , called computation that effects also on contents of variables ac and so do on variable δ(ac ,dc ,σ ) . (ii) for each i ∈ AD of form (KS , ∆(IS , IM , σ )), α(i) is an partial input from IM into IS that performs the assignment ac = IS −IS (KS )∪σ−1 (IM (σ (KS ))), causing change also in the content of variable δ(ac ,dc ,σ ) . (iii) last, for each o of form (∆(IS ,IM ,σ ) , KS ), α(o) is a partial output from IS onto IM , that performs the assignment dc = IM −IM (σ (KS ))∪σ (IS (KS )), which changes the content of variable δ(ac ,dc ,σ ) . 4 Real Data Environment and Computer Aided Autonomy Using variables application cache, data carrier, and access interface is not simple formalism, which helps us to establish a formalization of autonomous application presented above. Application cache is a main memory area in process space which may actually be accessed through a pointer. For data carrier the situation 159 is the same – using any programming languages you have means to open files, to connect to a database, which is really a pointer to some particularly structured data instance for an application. Considering access interface that is to hold an instance of an abstract access environment may seem a bit more artificial, just like data (instances of types) that carry all necessary knowledge about their types and integrity constraints. Yet one should realize, that such instances are mappings between finite sets induced by a mapping between a finite set of types, and a finite set of container sorts, which can be implemented as data. The data model we need is introduced in the next subsection and called 4.1 Generalized Document Data Model The data model introduced here, is a generalization of Document Data Model (DDM) that has been introduced for one of our proposals for a common data model for applications on Figure 1. DDM is a data representation in which one can naturally code information and structuring knowledge (also out of scope of any application logic), and so it is one possible model favouring our need in achieving autonomous applications. Document Data Model is a semi-structured data model, where information about structuring and typing data is separated from the real content of data. Data in DDM are represented in forms of ordered pairs (R, S). R and S are called raw data and schema, respectively. Both of those are simple byte streams, but S, if given, is always an XML document or document fragment [4]. The above organization of semi-structured data has two advantages: the same raw data can be structured in several way, so that there might be a number of pairs where only the schema components are different, each of them providing for adequate structuring and occasionally additional semantics the all-time processing phase needs, so that they can be separately stored with storing only one instance of the raw data; the other is that the data representation above enables to define structured data types and primitive abstract operations on data. For instance, data types and type templates in DDM are represented by pairs like T(∅, S), where S is non-empty [6]. To instantiate such types or templates one should provide the raw data component for such pairs. If the raw data component itself is a kind of schema, e.g. an XML general entity, a real type is instantiated. Pairs like V(R, ∅) where R is non-empty are typeless (i.e. void) data. Typeless data, if meaningful, can be converted to any type by providing for a schema component. One has to realize, that the schema components are not only for structuring the raw data. The schema components may describe or identify also methods characteristic for the structured type represented. Changing the schema components can be considered as a cast operation. Last, pairs D(R, S), neither R nor S is empty, constitute typed data instances. A step forward from DDM leads us to the Generalized Document Data Model, where the schema component of a piece of data specifies all non-application- specific knowledge of its flow through an application, including data retrieval and storage, and in addition there is a global schema interpreter available. Using GDDM, knowledge about data processing is carried by data themselves, and the 160 knowledge is interpreted by the G schema interpreter. To get and interpret such knowledge, the knowledge carrier media has to be accessed. 4.2 Implementation Considerations In the abstract model of autonomous application introduced in the previous sub- section the knowledge in question is implemented by the abstract access environ- ment. Since an abstract access environment is a serialization - i.e. a structural mapping and type serialization - it is easily implemented by an XML docu- ment. One difficulty, what is a kind of discomforts – much rather than crux, to implement such an XML document by hand. Instantiating an abstract access environment cause formally also no difficulties by using variables application cache and data carrier, provided if they already contain instances of application domain and medium. Instantiating application domain is always natural part of any applications, but instantiating a medium needs to communicate some data carrier server including data storage- and/or display engine, e.g. some RDBMS, WEB browser. The latter gives semantics among others to the induced serial- izations σ , and its inverse mapping, and provides also mean for the evaluation of a subset of a medium instance using a key expression. To implement the model presented before is nothing elese as binding our abstract data environment to a real one. By such a binding an autonomous application is integrated into a real run-time environment. When we speak on binding, we think of an early binding, that is abstract operations invoked by the application are linking to particular operation libraries provided by the all- time run-time environment. Our idea is that run-time environments that supply integrating autonomous applications provide for a DLLs that implements in- stantiations of real data carrier media’s that are available there, and also DOM or SAX based XML document processing middleware that implements opera- tions σ and σ−1 between real data carrier instances and application domains. Integrating an autonomous application into such run-time environments needs to rebuild the DLLs in order to extend it’s application domain knowledge. Type conflicts within a middleware caused by collision of type names of different appli- cations can be avoided by using XML namespaces implementing abstract access environments. 4.3 Computer Aided Autonomy Instead of following a today’s fashionable and usual way, i.e. establishing a GDDM based integrated autonomous application environment with providing GDDM oriented program development tools, we belive that traditional program- ming languages and their development environment extended by computer aided implementation of a desired GDDM representation of data environment will be much more popular, and will result much more satisfaction for the community of professional application developers. The computer aided autonomy we pro- pose is a computer generated GDDM representation of applications’ schemata, media and abstract access environment. If there is a language on which abstract 161 domains and serializations between them can be defined, and that can easily be embedded into any host languages, we have a language (the host language that embeds the domain definition language) that is suitable to develop autonomous application. We would like to outline, that such a language is not for extending host languages’ domain definition capabilities, but much rather facilitating them to define abstract data environments. Preprocessing embedded statements of a program will generate a pure host language text and the GDDM representation of the abstract data environment (see Figure 4). Fig. 4. Environment for integrating autonomous applications. We are in progress to define a language, called LORD (Lay-Out Relationship and Domain definition language), that consists of two components. The one is developed to define schemata, media, and serializations of schemata over media, the other one is a macro language which provides way to implement context- dependent macro definitions, context-dependent and context-keeping macro ex- pansion and what are called meta-macro definitions, the expansion of those are macro or meta-macro definitions. 5 Syntactical considerations LORD has been designing for natural embedding into arbitrary host languages – or with other words – a statement including embedded symbols should not look strange against the host language’s style. One way to reach that, LORD language constructions are considered as invocations of macros defined in some intelligent macro language which provide means for defining meta-macros (the expansion of those results in macro definitions) and context-dependent macro expansions. The above featured macro language presented here is AAML (Au- tonomous Applications’ Macro Language). 162 5.1 AAML In AAML, any host language statement that embeds AAML macro invocations (an embedding statement may contain more than one macro invocations) may generate single or more pure host language statements, depending on the corre- sponding macro definitions. The recursive process of such text generations is con- trolled by Complementary Expansion Method (CEM). CEM is based on invoca- tion patterns and invocation-contexts. Invocation patterns control formal param- eter declarations and their value settings for macro definitions, and invocation- contexts control statement completion and context-dependent text generation. Macro definitions may obtain complete, and also incomplete statements. Ex- panding a macro invocation, complete statements are expanded in the usual way – all the macro invocations within the complete (embedding) statement, in the order of invocations from left to right are recursively expanded. Incomplete statements shall, however, be first completed and the completion is then in the usual way to be expanded. 5.2 Other approaches There have been many other approaches to make programming languages either syntactically extendible or integrating domain specific concepts into the host language. Extendible Syntax [9] was introduced for incremental syntax definition by ex- tending core language. Syntax extensions were placed into the host language be- tween special syntactical delimiters. The implementation is based on LL parsing. Because the class of LL languages is not closed under union and concatenation, the syntax definition sometime uncomfortable. The Java Syntactic Extender (JSE) [10] is a macro system. The expressive- ness of the syntax that can be introduced is limited. As in most macro systems, a macro identifier is required in invocations and the JSE parser is not extensible. Opposite to the above, AAML does not restricts the syntax of macro names. Bravenboer and Visser gives a detailed discussion in [8] on syntactical ex- tension of host languages respect to domain specific extensions, advertising the METABORG method. The METABORG is a method providing concrete syn- tax mainly for domain abstractions to application programmers. In contrast, our goal, concerning LORD, is not to extent the host language rather than integrate and assimilate domain-specific concerns that otherwise could be formulated in the language. This way our approach is more make map a certain domain into many host languages. 6 Conclusions Application integration needs to design and implement a common data model and the corresponding data access layer. Information and software industry hoped finding real solutions by establishing data standards for particular groups 163 of application fields, like IFC, CIS/2. Such standards mostly concern data ex- change format and are highly domain specific. Another approach are e.g. data warehouses, that implements a common data access layer over heterogeneously structured data sources for applications. Integration of applications of possibly different domains and/or following different standards are not generally solved and still costs a large amount of human and computer resources. Autonomous applications using Document Data Model can easily follow struc- tural differences of data, and, in addition, structural information of data is car- ried by themselves. This integration model is also capable to express collabora- tion protocols between applications processing data of possibly different domains or using different standards. Due to comprehensive XML technology and gener- ative programming philosophy (like that of LORD) computer aided integration in our model may hope better chance. References 1. Industrial automation systems and integration - Product data representation and exchange - Part 11: Description methods: The EXPRESS language reference man- ual Reference Number ISO 10303-11:1994, ISO Switzerland (1994) 2. Industrial automation systems and integration - Product data representation and exchange - Part 22: Implementation methods: Standard Data Access Interface spec- ification, Reference Number ISO/DIS 10303-22. ISO, Switzerland (1993) 3. CIMsteel Integration Standards Release 2 (Second Edition) http://www.cis2.org/ 4. Extensible Markup Language (XML) 1.0 (Third Edition) http://www.w3.org/TR/2003/PER-xml-20031030 5. Mukerji, J., Miller, J.: Overview and guide to OMG’s architecture http://www.omg.org/docs/omg/03-06-01.pdf 6. Abiteboul, S., Buneman, P., Suciu, D.: Data on the WEB – From Relations to Semistructured Data and XML, W3C Proposed Edited Recommendation 2003, San Francisco (2000) 7. Hernath, Z., Vinceller, Z.: Generalized Document Data Model for Integrating Au- tonomous Applications In Proceedings of International Conference of Applied In- formatics (ICAI’6), Eger, Hungary (2003) 8. Bravenboer, M., Visser, E.: Concrete Syntax for Abstract Objects: Domain-Specific Language Embedding In Proceedings of Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’04), Vancouver, Canada (2004) 365–383 9. Cardelli, L., Matthes, F., Abadi, M.: Extensible Syntax with Lexical Scoping SRS Research Report 121, DEC Systems Research Center (1994) 10. Bachrach, J., Playford, K.: The Java Syntactic Extender. In Proceedings of Object- Oriented Programming, Languages, Systems, and Applications (OOPSLA’01), Tampa, Florida (2001) 31–42