Autonomous Applications – Towards a Better
            Data Integration Model

            András Benczúr1 , Zsolt Hernáth1 , and Zoltán Porkoláb2
                1
                   Eötvös Loránd University, Faculty of Informatics
                             Dept. of Information Systems
              Pázmány Péter sétány 1/C H-1117 Budapest, Hungary
             abenczur@ludens.elte.hu hernath@ullman.inf.elte.hu
                2
                   Eötvös Loránd University, Faculty of Informatics
                  Dept. of Programming Languages and Compilers
              Pázmány Péter sétány 1/C H-1117 Budapest, Hungary
                                      gsd@elte.hu


      Abstract. One of the most important and critical part of integrating al-
      ready existing standalone applications is to design and implement a com-
      mon data model and the corresponding data access layer which makes
      both data sources and processed results being shared and accessible over
      the applications in question. In case of even well-architected applications
      or application systems, establishing a common data model and the layer
      that gives access to data costs relatively large human and computer de-
      velopment resources. The problem of integration may be investigated
      from several aspects. The esence of these aproaches are the same: trying
      to achieve run-time environment independent applications’ logic. One as-
      pect is OMG’s Model Driven Architecture Frame Work [5]. The primary
      goals of OMG’s MDA are portability, interoperability and resuability
      through architectural concernes of specifying Application’s logic, their
      operational environments, and technical aspects of their implementation
      details, and mappings between them. This paper views the same problem
      but with focus on different structural apearances of applications’ data,
      mappings between them, and possible integration of such data models.
      We call applications autonomous, if they are independent of their all time
      run-time data access environment. Concerning applications’ autonomy,
      our base idea is that the most natural media that is able to carry infor-
      mation on structural apperance of data and mappings between them are
      data themselves, using Document Data Model also presented here.


1    Introduction
Applied computer science and informatics by now plays a central role in our
everyday life. The possibility of getting scientific, technical, business and quite
general everyday on-line information accessing the World Wide Web database,
together with having personal computers and Internet access possibilities at rea-
sonable prices has changed our everyday life. In professional life, beyond infor-
mation’s access, processing and integrating information coming from different

                                         150
                                                                               151

fields and from heterogeneous data sources is more important, so that, infor-
mation processing needs integrated application systems, rather than standalone
applications. Information industry vendors provide for various integrated appli-
cation and information processing systems, some of them offer much less, than
what could be many times achieved by bringing together already existing and ef-
ficiently used standalone applications over the same or different technical fields.
There are lots of already existing and still efficiently used standalone applica-
tions, and some of them are reasonable to be brought together with others.
    Figure 1 below shows two standalone applications with heterogeneous data
sources, Figure 2 and 3 present two versions of an integrated system consisting
of the standalone applications on Figure 1. As it is seen, the integrated system
on Figure 2 does not implement a unified common shared data model. The only
thing that happened is that a common I/O layer is established in order to make
bidirectional mapping, and even cross mapping between data source models and
application data models. In the practice the integration middleware implements
an interface to the union of the applications’ domains. The integrated system
on Figure 3 provides a unified common application and data source model, and
a common unified I/O layer as well, for the integrated application system.


                          Fig. 1. Standalone applications


A significant part of the necessary knowledge to perform such conversions are
data mappings that is only partly application-specific, yet some of those (e.g.
retrieval and storage of particular pieces of data) - even in well-architected ap-
plications as well - is sometimes hard coded. In contrast of all above, we call
applications autonomous, if they are independent of their all-time run-time data
access environment. The idea of the notion of applications’ autonomy has in fact
been set as a generalization of an XML document based common data model
- called Document Data Model - as one of our proposals for a real industrial
need of integrating the two standalone CAD applications shown in Figure 1.
152


            Fig. 2. Traditional solution 1. – a kind of gluing together.


The two standalone applications has been designed and partly implemented in
heterogeneous run-time data environment, and even, they established different
application object models according to different standards: CIS/2 and IFC stan-
dard object model [1–3]. Though the two standards are in relation to each other,
they have been defined for different goals. Instead of establishing a common ob-
ject and e.g. a relational storage model and its middleware for the integrated
application system, our proposal was a DOM (Document Object Model) based
middleware (object cache), and an XML document based mapping between CIS2
and IFC as well as the corresponding storage environments [4, 6]. To support the
implementation of the DOM based middleware, a non-full EXPRESS parser [1]
has been implemented that, from EXPRESS schemata descriptions, generates
C++ class skeletons and necessary type definitions.


         Fig. 3. Traditional solution 2. – another kind of gluing together.
                                                                                    153

Autonomous applications’ data access from and to their all-time run-time data
environment is modeled by applying mappings between abstract domains, the
semantics of those is defined by real run-time domains and data access engine.
Concerning applications’ autonomy, our base idea is that the most natural, and
probably the most competent media that is able to carry information on seman-
tics of abstract domains and mappings between them are data themselves.
    Any application in general can be considered as sequences of optional data
retrieval from its run-time environment, processing data, and optionally dis-
playing or storing/restoring processed data. We would like to outline, that from
the general perspective above the word data is used in a quite general sense, it
may refer to values of particular domains including numbers, letters, pictures,
events, processes and also processors. According to the above, an application may
formally be viewed as a 3-tupple A(D, E, α), where D is a strictly application-
specific non-persistent data model - a set of containers of data of atomic and/or
constructed and also constrained types - called application cache memory, E
is called run-time data environment that in general implements persistent ap-
plication data, and α : D × E 7→ E is a mapping, called application logic. In
the above model α implements all necessary non-application-specific knowledge
about accessing data from and to E. To purge away all non-application-specific
knowledge concerning data access from the application logic, we develop a finer
and more specific model that basically concerns mappings between two partic-
ular domains: one that models industrial, scientific or business technologies -
called schema, and another one that models a set of data carrier means - which
together is called a medium - some of them preserves data, others makes data
visible in some particular form. In our model, mappings between the schemata
and media are also considered as an abstract domain called access environment.


2    Schema, Medium and Abstract Access Environment

Assume, there is a finite set Ta of named (i.e. uniquely identified) atomic (i.e. un-
structured) types, a finite set CS = {device, file, record, field} of what are called
container sorts, and a finite set of rules by means of which complex data types
and container sorts can be composed. Assume further on, that for each atomic
data type, there is a value null, and that atomic data types are ordered sets,
that is, two instances of the same atomic type are comparable. For any atomic
data type, an instance of value null indicates that no particular value is instan-
tiated. Whenever we speak about data types, we always think of finite subsets of
infinite sets of particular values, and whenever we speak about data or container
sorts, we think always of some structuring of complex values or containers. For
instance, if int ∈ Ta , int is a finite subset of the set Z = {0, +1, −1, +2, −2, . . .}
of whole numbers. Staying still by this example, int is an unstructured or raw
type. Opposite to constructed types, there is no way to refer to a ”part” of an
instance of a raw type. For example, there is no way to refer to a value of 23
as part above 100 of the value of 123. In the case of constructed types particu-
154

lar parts of data instances can be referred to according to the sorts of such types.


2.1    Schema

Data types are to define applications’ internal data model, also referred to as
application domain. For our would-be model only a minimal data type system –
just for illustration – is developed as follows:

(a) members of Ta are data types;
(b) if for each j ∈ [1, m] for some natural number m, xj are distinct (attribute)
     names and tj are (named) data types, then t < KP ⇒ [x1 : t1 , . . . , xm : tm ] >
     is a data type named t, and called entity. KP is a (possible empty) subset of
     the set {x1 , . . . , xm }, keeping this order, and is called primary key constraint.
     Sometimes we use the notation t.y to refer to an attribute named y of a type
     named t;
(c) if t is a data type, then v{t} is a data type named v, and called bag. An
     instance of type {t} is a bag of instances of type t;
(d) given the data types t1 , . . . , tm , u[t1 , . . . , tm ] is a data type named u, and
     called union. An instance of type u is an instance of any one of types
     t1 , . . . , t m .
(e) given a finite set of named types T , a finite set of referential constraints
     C, a pair S(T , C) is a data type named S, and called schema. A referential
     constraint is of form φ ⇐ {ψ1 , . . . , ψr}, where φ is a parent term, and ψj for
     each j ∈ [1, r] is a child term. Given t < Kpt ⇐ [x1 : t1 , . . . , xmt : tmt ] >, a
     parent term for type t is Kpt itself of the form {t.xi1 , . . . , t.xint }. A child term
     for the parent term above, and for type u[Kpu ⇐ [y1 : u1 , . . . , ymu : umu ] > is
     of form u[%mu1 ymu1 , ..., %mus ymus ], where {ymu1 , ..., ymus } ⊆ {y1 , . . . , ymu },
     s ≤ nt , and %mul for each l ∈ [1, s] stands for one of the binary relations
     equality (=), element of (∈), subset (⊆) and superset (⊇), depending on the
     types tmtl and umul . A child term is called a foreign constraint if it refers to
     each attribute of the primary key constraint of the parent term.
(f ) if t is a data type r[∗t] is a data type, named as r, called ref erence to t.

Based on the comparability of instances of atomic data types, we assume that
instances of the same constructed type are also comparable, also in the case of
bags. According to the latter, if T is a data type, U and V are instances of type
{T }, and it is an instance of type T , then expressions like it ∈ U , U ⊆ V are
computable comparisons.


2.2    Medium

For container sorts, a rather strict hierarchy with device on the top is available.
Possible constructions of container sort hierarchies are as follows:
                                                                                             155

(a) a named device is a particular container sort that may be either raw (un-
     structured) or sorted as a finite set of named f iles. A device is homogeneous
     concerning accessing its data;
(b) a named file is a finite set of possible differently structured (or sorted) named
     records. Specifying the structure of a file, XML-like regular expressions are
     to be used;
(c) a named record is a tuple of named fields. Describing a record is similar to
     that of an entity type, but instead of named types named container sorts
     are to be used. Just like an entity type, a record may also specify a primary
     key constraint in the same form as there, but the roles of set of attributes
     are played by a set of named fields here;
(d) a named field may be either raw or of sort of a named container sorts but
     f ield, or a sort of reference to any of named container sorts but itself. There
     is a difference between a container sort or a reference to that: an instance of
     a reference to a named container sort refers to a uniquely identified instance
     of that container sort;
(e) if s1 , . . . , sl are named container sorts, su [s1 , . . . , sl ] is a container sort named
     su , and called union. An instance of sort su , is an instance of any one sort
     of sorts s1 , . . . , sl ;
(f ) given a finite set of raw or sorted named devices D, a finite set of ref-
     erential constraints C, a pair M (D, C) is a container sort named M , and
     called medium. Referential constraints here have the same form as that of
     schemata’s, but the role of types and the role of attributes are now played
     by named records and named fields, respectively;
(g) if s is a named container sort, rs [∗s] is a container sort named rs , and called
     reference to s.

2.3    Abstract Access Environment
Given a schema S(TS , CS ), and a medium M (DM , CM ), suppose, TS = TP ∪ TC
with TP ∩ TC = Ø is held. A total mapping σ : S 7→ 2M is a serialization of
schema S over the medium M , if the following rules are satisfied.
 – The inverse mapping σ −1 : 2M 7→ S is a partial mapping such that σ −1 (FM )
   for some FM ⊂ 2M is defined, iff there is T ⊆ S such that FM = ∪t∈T σ(t),
   and σ ◦ σ −1 on each T ⊆ S is the identity.
 – Let Π ⊂ 2TP denote a partition over TP (i.e. ∪Π = TP and elements of Π
   are pairwise disjoint), then for each T ∈ Π, let σ(T ) = D be such that,
   (i) D ∈ DM is a raw device;
   (ii) σ defines and preserves a particular ordering over Π, and also over each
        T ∈ Π;
   (iii) for each distinct pair U, V ∈ Π σ(U ) 6= σ(V ) holds.

 – Each t ∈ TC is mapped over named fields of M under particular sort- and
   constraint restrictions listed below:
   (i) distinct elements of Ta ∩ TC are mapped to distinct raw fields: a field
       that an element of Ta is already mapped to becomes of that type;
156

      (ii) a bag type is mapped to a field of sort file, or a sort reference to a file;
      (iii) a reference to a type is mapped to a field of sort reference to an ade-
           quate sort. For set types f iles are adequate sorts. For a type {t} ∈ Π,
           the devecie that t is mapped to is an adequate sort. For entity types,
           adequate sorts are records or f ileds;
      (iv) for entity types, attributes of type different from an entity type are
           mapped according to (i)-(iii). Attributes of some entity type are mapped
           as if it were a reference to an entity type, or to a field of sort reference
           to a record;
      (v) σ preserves type integrity: assume, the image of the attributes of type
           t < Kpt ⇐ [x1 : t1 , . . . , xm : tm ] > with Kpt = {xi1 , . . . , xin } by σ
           are distributed over the records r1 , . . . , rs with primary key constraints
           Kpr1 , . . . , Kprs . Assume furthermore that for each k ∈ [1, s], there exist
           a cMk ∈ CM of form φrk ⇐ {ψk1 , . . . , ψkp }, and consider the foreign
           key constraints, φrk ⇐ {ψrj : j ∈j6=k [1, s]} for each k ∈ [1, s], where φrk
           refers to rk as parent, and ψrj for each j ∈j6=k [1, s] refers to rj as a child,
           suppose, Ck denotes the set {ψrj |j ∈j6=k [1, s]}. Then, on the one hand,
           ∪sl=1 Kpil       ⊆ {σ(t.xi1 ), . . . , σ(t.xin )}, and on the other hand,
           CK ⊆ {ψk1 , . . . , ψkp } shall hold.
      (vi) σ preserves also schema’s integrity: keeping denotations of (v), as-
           sume in addition, that images of types u1 , ..., ur in foreign key constraint
           {t.xi1 , ..., t.xin } ⇐ {u1 [%u11 yu11 , ..., %u1n yu1n ], ..., ur [%ur1 yur1 , ..., %ur1 yurn ]}
           are spread over records, Ru11 , ..., Ru1q1 , . . . , Rur1 , ..., Rurqr , and consider
           the foreign key constraints φrk ⇐ {ψrkum : m ∈ [1, r], j ∈ [1, qm ]} for
                                                                 j
           each k ∈ [1, s], where φrk , just like in (v), refers to rk as parent, and
           ψrkum refers to Rumj as child. Then for the foreign key constraint cMk
                 j

           for each k ∈ [1, s], {ψrkum : m ∈ [1, r], j ∈ [1, qm ]} ⊆ {ψk1 , . . . , ψkp } shall
                                      j
           hold.

Given a schema S(TS , CS ) and a medium M (DM , CM ), S is said serializable
over M , if there is some serialization σ of schema S over medium M . If so, a
3-tuple ε(S, M, σ) is called an abstract access environment for S and M .


3     Instantiation, Data Exchange, Autonomy

In the following, from autonomy’s point of view, the base abstract domains are
schema, medium, and access environment. Applications’ data environment are
modeled as instances of domains above.


3.1     Instantiation

An instantiation of a schema S is a swarm (i.e. a bag or set) of instances of types
of S. A schema instantiation is allowed to produce either replicas of the same
instance of the same type or no instance at all of particular types. We’d like to
                                                                                  157

outline, we don’t concern here with whether or not a schema instance is consis-
tent, i.e. no primary or foreign key constraint is violated. It is the responsibility
the application that creates instances and operates on them.
    An instantiation of a medium M is to carry, i.e., temporarily store, preserve,
and display data. A medium instance is a finite set of uniquely identified in-
stances of named devices, each of which may either be raw, or sorted. A sorted
named device is a finite set of instances of named files, each of them uniquely
identified on the device. Unlike schema instantiations, a medium instance only
for container sorts record and field, is allowed to have replicas of instances (i.e.
that carries replicas of type instances). Instances of records are tuples of instances
of fields.
    Recall, that we want application data to carry all necessary non-application-
specific information, so that from now on, we assume that instances of types
and also of container sorts carries all necessary information on their types and
sorts. In addition, when we consider instances of schemata and media, they are
assumed to carry schemata’s and media’s integrity constraints. Given a schema
S, a medium M , and an abstract access environment ε(S, M, σ) for S and M , an
instance over S over an instance over M can easily be serialized induced by σ:
unless primary key constraints are violated, each atomic piece of data (data of
atomic types) of the instance over S has to be mapped to a new instance of the
container sort (i.e. a container of that sort) the type of the atomic data by σ is
mapped to. The mapping induced by σ is denoted by σ , and a medium instance
that carries serialized data of a schema instance is called the total serialization
of that schema instance. Similarly to the above, the inverse mapping σ −1 also
induce a partial mapping – denoted by σ−1 – which maps particular subsets of
serialized data of an instance over a medium to subsets of an instance over a
schema. For given instances IS and IM over S and M , a 3-tuple ∆(IS , IM , σ ) is
considered as an instantiation of ε(S, M, σ), and called abstract access method
between instances IS and IM . Since instances IS and IM carries all necessary
information on S and M , mappings σ and σ−1 are given for the abstract access
method ∆(IS , IM , σ ) between IS and IM .

3.2   Data Exchange
Assume schema S is serializable over medium M , and let ε(S, M, σ) be an ab-
stract access environment for S and M . Let IS denote the (infinite) set of all
possible instances over S, and IMIS denote the set of total serializations of
all elements of IS i.e. the set {σ (IS ) : IS ∈ IS }. A KS ∈ IS is called a
key expression over S, if it is a set. An element of a key expression over S is
called key term over S. A key term is considered as an instance pattern, and
determines a (possible empty) subset or sub-bag of any particular IS ∈ IS
based on some type-dependent pattern matching (e.g. in the case of atomic
types a type-dependent pattern matching is checking equality of values of the
same type), by that null values are not considered. Since serializations map
integrity constraints of schemata onto integrity constraints of media, the im-
age of a key expression over S by σ is a key expression over IMIS . Given
158

particular instances IS ∈ IS , IM ∈ IMIS and a key expressions KS ∈ IS ,
KM ∈ IMIS , let IS (KS ) ⊆ IS and IM (KM ) ⊆ IM denote the subsets that
match KS and KM , respectively, which themselves are also particular instances
over S and M at the same time. Let KS ∈ IS be a key expression; the abstract
access method ∆(IS , IM , σ ) between IS and IM together with key expression
KS determines particular instances IS − IS (K) ∪ σ−1 (IM (σ (KS ))) ∈ IS , and
IM − IM (σ (KS )) ∪ σ (IS (KS )) ∈ IMIS , called a partial input data access
from IM into IS , and partial output data access from IS onto IM , respectively.
In the above formulas, only KS is out of scope of ∆(IS , IM , σ ), so one may asso-
ciate a tuple (KS , ∆(IS , IM , σ )) with the set IS − IS (KS ) ∪ σ−1 (IM (σ (KS ))),
and a tuple (∆(IS , IM , σ ), KS ) with the set IM − IM (σ (KS )) ∪ σ (IS (KS )).

3.3   Autonomy
Given a schema S(TS , CS ), a medium M (DM , CM ), assume S is serializable
over M and ε(S, M, σ) is an abstract access environment for S and M . Sup-
pose, IS , denote the set of all possible instances over schema S, and IMIS
be the set of the total serializations of all elements of IS . Let Iε(S,M,σ) de-
                                                                         {}
note all abstract access method instances over ε(S, M, σ). Let IS ⊂ IS de-
note the set of possible key expression over S. Let ac , dc , δ(ac ,dc ,σ ) denote
variables of types ∗S, ∗M and ∗ε(S, M, σ) called application cache, data car-
rier and access interface, respectively. For particular instances IS ∈ IS and
IM ∈ IMIS let us assume, that ∗ac = IS and ∗dc = IM hold , and let us fur-
                                                       {}      {}
thermore define AD as AD = (Iε(S,M,σ) ∪ {∅}) × IS ∪ IS × (Iε(S,M,σ) ∪ {∅}).
An autonomous application can now be formalized as A(ac , dc , δ(ac ,dc ,σ ) , α),
where α : AD 7→ IS ∪ IMIS ∪Iε(S,M,σ) a mapping, called application logic, such
that
(i) for each a ∈ AD of any of the forms (IS , ∅) or (∅, IS ), α(a) is a mapping
     over IS , called computation that effects also on contents of variables ac and
     so do on variable δ(ac ,dc ,σ ) .
(ii) for each i ∈ AD of form (KS , ∆(IS , IM , σ )), α(i) is an partial input from
     IM into IS that performs the assignment ac = IS −IS (KS )∪σ−1 (IM (σ (KS ))),
     causing change also in the content of variable δ(ac ,dc ,σ ) .
(iii) last, for each o of form (∆(IS ,IM ,σ ) , KS ), α(o) is a partial output from IS
     onto IM , that performs the assignment dc = IM −IM (σ (KS ))∪σ (IS (KS )),
     which changes the content of variable δ(ac ,dc ,σ ) .


4     Real Data Environment and Computer Aided
      Autonomy
Using variables application cache, data carrier, and access interface is not simple
formalism, which helps us to establish a formalization of autonomous application
presented above. Application cache is a main memory area in process space
which may actually be accessed through a pointer. For data carrier the situation
                                                                                159

is the same – using any programming languages you have means to open files, to
connect to a database, which is really a pointer to some particularly structured
data instance for an application. Considering access interface that is to hold an
instance of an abstract access environment may seem a bit more artificial, just
like data (instances of types) that carry all necessary knowledge about their
types and integrity constraints. Yet one should realize, that such instances are
mappings between finite sets induced by a mapping between a finite set of types,
and a finite set of container sorts, which can be implemented as data. The data
model we need is introduced in the next subsection and called

4.1   Generalized Document Data Model
The data model introduced here, is a generalization of Document Data Model
(DDM) that has been introduced for one of our proposals for a common data
model for applications on Figure 1. DDM is a data representation in which one
can naturally code information and structuring knowledge (also out of scope of
any application logic), and so it is one possible model favouring our need in
achieving autonomous applications. Document Data Model is a semi-structured
data model, where information about structuring and typing data is separated
from the real content of data.
    Data in DDM are represented in forms of ordered pairs (R, S). R and S are
called raw data and schema, respectively. Both of those are simple byte streams,
but S, if given, is always an XML document or document fragment [4]. The above
organization of semi-structured data has two advantages: the same raw data can
be structured in several way, so that there might be a number of pairs where
only the schema components are different, each of them providing for adequate
structuring and occasionally additional semantics the all-time processing phase
needs, so that they can be separately stored with storing only one instance of
the raw data; the other is that the data representation above enables to define
structured data types and primitive abstract operations on data.
    For instance, data types and type templates in DDM are represented by pairs
like T(∅, S), where S is non-empty [6]. To instantiate such types or templates one
should provide the raw data component for such pairs. If the raw data component
itself is a kind of schema, e.g. an XML general entity, a real type is instantiated.
    Pairs like V(R, ∅) where R is non-empty are typeless (i.e. void) data. Typeless
data, if meaningful, can be converted to any type by providing for a schema
component. One has to realize, that the schema components are not only for
structuring the raw data. The schema components may describe or identify also
methods characteristic for the structured type represented. Changing the schema
components can be considered as a cast operation.
    Last, pairs D(R, S), neither R nor S is empty, constitute typed data instances.
    A step forward from DDM leads us to the Generalized Document Data Model,
where the schema component of a piece of data specifies all non-application-
specific knowledge of its flow through an application, including data retrieval
and storage, and in addition there is a global schema interpreter available. Using
GDDM, knowledge about data processing is carried by data themselves, and the
160

knowledge is interpreted by the G schema interpreter. To get and interpret such
knowledge, the knowledge carrier media has to be accessed.

4.2   Implementation Considerations
In the abstract model of autonomous application introduced in the previous sub-
section the knowledge in question is implemented by the abstract access environ-
ment. Since an abstract access environment is a serialization - i.e. a structural
mapping and type serialization - it is easily implemented by an XML docu-
ment. One difficulty, what is a kind of discomforts – much rather than crux, to
implement such an XML document by hand. Instantiating an abstract access
environment cause formally also no difficulties by using variables application
cache and data carrier, provided if they already contain instances of application
domain and medium. Instantiating application domain is always natural part of
any applications, but instantiating a medium needs to communicate some data
carrier server including data storage- and/or display engine, e.g. some RDBMS,
WEB browser. The latter gives semantics among others to the induced serial-
izations σ , and its inverse mapping, and provides also mean for the evaluation
of a subset of a medium instance using a key expression.
    To implement the model presented before is nothing elese as binding our
abstract data environment to a real one. By such a binding an autonomous
application is integrated into a real run-time environment. When we speak on
binding, we think of an early binding, that is abstract operations invoked by
the application are linking to particular operation libraries provided by the all-
time run-time environment. Our idea is that run-time environments that supply
integrating autonomous applications provide for a DLLs that implements in-
stantiations of real data carrier media’s that are available there, and also DOM
or SAX based XML document processing middleware that implements opera-
tions σ and σ−1 between real data carrier instances and application domains.
Integrating an autonomous application into such run-time environments needs
to rebuild the DLLs in order to extend it’s application domain knowledge. Type
conflicts within a middleware caused by collision of type names of different appli-
cations can be avoided by using XML namespaces implementing abstract access
environments.

4.3   Computer Aided Autonomy
Instead of following a today’s fashionable and usual way, i.e. establishing a
GDDM based integrated autonomous application environment with providing
GDDM oriented program development tools, we belive that traditional program-
ming languages and their development environment extended by computer aided
implementation of a desired GDDM representation of data environment will be
much more popular, and will result much more satisfaction for the community
of professional application developers. The computer aided autonomy we pro-
pose is a computer generated GDDM representation of applications’ schemata,
media and abstract access environment. If there is a language on which abstract
                                                                             161

domains and serializations between them can be defined, and that can easily be
embedded into any host languages, we have a language (the host language that
embeds the domain definition language) that is suitable to develop autonomous
application. We would like to outline, that such a language is not for extending
host languages’ domain definition capabilities, but much rather facilitating them
to define abstract data environments. Preprocessing embedded statements of a
program will generate a pure host language text and the GDDM representation
of the abstract data environment (see Figure 4).


           Fig. 4. Environment for integrating autonomous applications.


   We are in progress to define a language, called LORD (Lay-Out Relationship
and Domain definition language), that consists of two components. The one is
developed to define schemata, media, and serializations of schemata over media,
the other one is a macro language which provides way to implement context-
dependent macro definitions, context-dependent and context-keeping macro ex-
pansion and what are called meta-macro definitions, the expansion of those are
macro or meta-macro definitions.


5   Syntactical considerations

LORD has been designing for natural embedding into arbitrary host languages
– or with other words – a statement including embedded symbols should not
look strange against the host language’s style. One way to reach that, LORD
language constructions are considered as invocations of macros defined in some
intelligent macro language which provide means for defining meta-macros (the
expansion of those results in macro definitions) and context-dependent macro
expansions. The above featured macro language presented here is AAML (Au-
tonomous Applications’ Macro Language).
162

5.1   AAML

In AAML, any host language statement that embeds AAML macro invocations
(an embedding statement may contain more than one macro invocations) may
generate single or more pure host language statements, depending on the corre-
sponding macro definitions. The recursive process of such text generations is con-
trolled by Complementary Expansion Method (CEM). CEM is based on invoca-
tion patterns and invocation-contexts. Invocation patterns control formal param-
eter declarations and their value settings for macro definitions, and invocation-
contexts control statement completion and context-dependent text generation.
    Macro definitions may obtain complete, and also incomplete statements. Ex-
panding a macro invocation, complete statements are expanded in the usual
way – all the macro invocations within the complete (embedding) statement, in
the order of invocations from left to right are recursively expanded. Incomplete
statements shall, however, be first completed and the completion is then in the
usual way to be expanded.


5.2   Other approaches

There have been many other approaches to make programming languages either
syntactically extendible or integrating domain specific concepts into the host
language.
   Extendible Syntax [9] was introduced for incremental syntax definition by ex-
tending core language. Syntax extensions were placed into the host language be-
tween special syntactical delimiters. The implementation is based on LL parsing.
Because the class of LL languages is not closed under union and concatenation,
the syntax definition sometime uncomfortable.
   The Java Syntactic Extender (JSE) [10] is a macro system. The expressive-
ness of the syntax that can be introduced is limited. As in most macro systems,
a macro identifier is required in invocations and the JSE parser is not extensible.
Opposite to the above, AAML does not restricts the syntax of macro names.
   Bravenboer and Visser gives a detailed discussion in [8] on syntactical ex-
tension of host languages respect to domain specific extensions, advertising the
METABORG method. The METABORG is a method providing concrete syn-
tax mainly for domain abstractions to application programmers. In contrast, our
goal, concerning LORD, is not to extent the host language rather than integrate
and assimilate domain-specific concerns that otherwise could be formulated in
the language. This way our approach is more make map a certain domain into
many host languages.


6     Conclusions

Application integration needs to design and implement a common data model
and the corresponding data access layer. Information and software industry
hoped finding real solutions by establishing data standards for particular groups
                                                                                  163

of application fields, like IFC, CIS/2. Such standards mostly concern data ex-
change format and are highly domain specific. Another approach are e.g. data
warehouses, that implements a common data access layer over heterogeneously
structured data sources for applications.
    Integration of applications of possibly different domains and/or following
different standards are not generally solved and still costs a large amount of
human and computer resources.
    Autonomous applications using Document Data Model can easily follow struc-
tural differences of data, and, in addition, structural information of data is car-
ried by themselves. This integration model is also capable to express collabora-
tion protocols between applications processing data of possibly different domains
or using different standards. Due to comprehensive XML technology and gener-
ative programming philosophy (like that of LORD) computer aided integration
in our model may hope better chance.


References
 1. Industrial automation systems and integration - Product data representation and
    exchange - Part 11: Description methods: The EXPRESS language reference man-
    ual Reference Number ISO 10303-11:1994, ISO Switzerland (1994)
 2. Industrial automation systems and integration - Product data representation and
    exchange - Part 22: Implementation methods: Standard Data Access Interface spec-
    ification, Reference Number ISO/DIS 10303-22. ISO, Switzerland (1993)
 3. CIMsteel Integration Standards Release 2 (Second Edition) http://www.cis2.org/
 4. Extensible       Markup      Language     (XML)       1.0     (Third      Edition)
    http://www.w3.org/TR/2003/PER-xml-20031030
 5. Mukerji, J., Miller, J.: Overview and guide to OMG’s architecture
    http://www.omg.org/docs/omg/03-06-01.pdf
 6. Abiteboul, S., Buneman, P., Suciu, D.: Data on the WEB – From Relations to
    Semistructured Data and XML, W3C Proposed Edited Recommendation 2003,
    San Francisco (2000)
 7. Hernath, Z., Vinceller, Z.: Generalized Document Data Model for Integrating Au-
    tonomous Applications In Proceedings of International Conference of Applied In-
    formatics (ICAI’6), Eger, Hungary (2003)
 8. Bravenboer, M., Visser, E.: Concrete Syntax for Abstract Objects: Domain-Specific
    Language Embedding In Proceedings of Object-Oriented Programming, Systems,
    Languages, and Applications (OOPSLA’04), Vancouver, Canada (2004) 365–383
 9. Cardelli, L., Matthes, F., Abadi, M.: Extensible Syntax with Lexical Scoping SRS
    Research Report 121, DEC Systems Research Center (1994)
10. Bachrach, J., Playford, K.: The Java Syntactic Extender. In Proceedings of Object-
    Oriented Programming, Languages, Systems, and Applications (OOPSLA’01),
    Tampa, Florida (2001) 31–42