=Paper= {{Paper |id=Vol-1/paper-6 |storemode=property |title=What's in a federation? Extending data dictionaries with knowledge representation techniques |pdfUrl=https://ceur-ws.org/Vol-1/benn-long.pdf |volume=Vol-1 |authors=W. Benn }} ==What's in a federation? Extending data dictionaries with knowledge representation techniques== https://ceur-ws.org/Vol-1/benn-long.pdf
                           What’s in a Federation?
    Extending Data Dictionaries with Knowledge Representation Techniques
                                                   Wolfgang Benn
                               Chemnitz University of Technology • Management of Data
                                         P.O. Box 964 • D-09009 Chemnitz
                                          benn@informatik.tu-chemnitz.de



1. Introduction
Databases and knowledge representation languages
have a rather different view upon data: knowledge rep-       In the remainder of this paper we will briefly introduce
resentation languages describe a universe of discourse       a module that coordinates a federation of systems and
in a taxonomy and allow a user to ask epistemic ques-        that hosts a central data dictionary. It is the module,
tions against the relationships between concepts and         which we will extend to provide users with an entity
roles. However, no data structures, data locations, nor      view upon the information available in a federation.
any information about the existence or availability of       We introduce the logical architecture of a prototypical
data can be found in a taxonomy -- even not if it in-        implementation of this module in section 2 and de-
cludes an assertion that describes a particular data         scribe some extensions that we made in section 3. In
item.                                                        section 4 we specify some ideas of the mentioned ex-
                                                             tension, conclude in section 5 and give some literature
Relational databases provide users with schemata.            in section 6.
Schemata describe in detail the data structures of sets
of persistent data items. Data dictionaries, included in
these systems, tell about data existence and its avail-      2. The Federal System Manager
ability. Anyway, these tools do not provide the entity
view, relationships between entities are merely              The Federal System Manager (FSM) is a module that
implicit, and no question about the universe of dis-         coordinates a federation of autonomous systems.
course that is behind a schema will get an answer.           These systems can be applications or services like
                                                             databases, which may link to the FSM to form a
Object-oriented databases provide users with class hi-       federation for some particular tasks. Afterwards they
erarchies as schemata. They support the entity view --       can leave the federation and run again as autonomous
is-a as well as part-of relationships are explicit. Never-   systems. This idea is rather similar to the concept of
theless, an information about the universe of discourse      multi-agent systems.
is not given as well.
                                                             The FSM performs a minimum of three tasks: The first
In a federation of systems -- databases and                  one is to run a protocol that enables the linkage
applications, for instance -- the situation gets worse.      process and guarantees a negotiation of autonomy as-
Databases may be heterogeneous in their modeling             pects to the components, if these want to join or leave
technique: some will follow the object-oriented the          the federation. Second, the FSM must provide a uni-
majority certainly follows the relational paradigm.          form view upon all information that is available to ap-
How does a user get to know what data is available in        plications of the federation through a so-called Com-
a federation, if he wants to build a new application?        mon Data Model (CDM). Third, it must support an ex-
How does that user get to know how he may access a           change of information, i.e., data types and data itself,
particular data item? How does he know that the              between members of the federation. We will detail
selected data item is semantically correct concerning        these tasks and concentrate on the second one.
the context of his application?
                                                             Comparing an FSM with the Common Object Request
If he can access a federated data dictionary, it will pro-   Broker Architecture (CORBA) [1] the FSM is an
vide him with the technical information about the data       object broker that looks at databases as service pro-
in a common data model -- similar to the global con-         viding objects and applications as clients that request
ceptual schema of a distributed database. If such a tool     these services. Commonly known services from data-
does not exist, the user must read all available             base components are storage, retrieval, update, etc.
schemata from all available federation components
(i.e., he must know about all languages, data models,        Moreover, the FSM is an object itself! It provides ser-
and dialects that the local components of the feder-         vices like data and type exchange. It contains a Fed-
ation individually use).                                     eral Data Dictionary (FDD) that allows a user to re-
trieve the information contents of the actual federation    available for all programs written in this programming
under several aspects. It is our aim to extend this         language. Application objects described in our CDM
Federal      Data     Dictionary    with     knowledge      are (under certain conditions) transformable into all
representation techniques to better support users in        data models that are represented in the FSM.
their retrieval than before.
                                                            The Meta Layer
2.1.    The FSM Prototype
                                                            An extension of the IRD standard was made for the
The currently implemented FSM prototype has its             meta layer. If the FSM supports an exchange of data
roots in an ESPRIT project, finished in 1991                between components, it must be able to transform data
[2,3,4,5,6]. The prototype mainly follows the reference     between the different individual data descriptions.
architecture for interoperable systems given in [7] and     These descriptions follow type or schema declarations,
includes a repository according to the Information          which use data model elements. Thus, our meta layer
Resource Dictionary Standard IRDS [8].                      has to include a suitable sub-set of the component data
                                                            model for each involved component. Moreover, it
This standard defines a four-layer architecture with        must include some rules that guide the transformation
(top down)                                                  of entities between these data model sub-sets.
• a meta-meta layer that describes the model of the
   meta layer descriptions -- which is in our case the      However, the description of a data model sub-set is
   Common Data Model of the FSM, a frame work that          somewhat more complex than the description of a
   basis on the Abstract Data Type (ADT) idea --,           schema. While a schema merely consists of data struc-
• a meta layer where we find the description of sche-       tures, a data model usually includes data types and
   mata -- which is in our case a description of the fed-   data type semantics. The meta layer of our FSM in-
   eration components data models --,                       cludes both (the assignment of a set of operations to a
• a schema layer where the data descriptions are lo-        data type that makes up the type’s semantics in the
   cated -- which is in our case the data types that are    data model of a component is currently under
   defined in schemata of databases or in type declara-     implementation).
   tions of applications --, and
• an application data layer where we finally find the       To enable the exchange of data and schema
   application data itself.                                 information between components the system
                                                            administrator of each federation component defines
The Meta-Meta Layer                                         the relevant structural part of his component data
                                                            model types with the CDM types and assigns some
To enable the description of schema descriptions we         procedures that make up the semantics of these data
implemented a common data model.                            types. He inserts the necessary data model knowledge
                                                            into the meta layer using the meta-meta layer ele-
In the literature we found many different approaches        ments.
to implement a CDM -- the approach most often used,
however, was the object-oriented. Thus, we asked our-       For instance, from an object oriented data model the
selves, what is the kernel idea of the object-oriented      administrator defines the structural parts of the
paradigm that makes it suitable for a CDM. We found         concept CLASS and assigns at least one particular
out that it probably is the idea of Abstract Data Types.    routine that performs inheritance similar to his
                                                            individual data model.
Thus, we implemented a frame work, which is actually
not a real data model but a tool box [2]. It allows a       This information is provided through an interface,
user to describe the structure and semantics of those       which is the so-called Data-Model-Profile. It is an
elements, which he uses to describe a schema, similar       ASCII file with a particular syntax that is parsed. Then
to the ADT concept (see next paragraph).                    the information is kept in a knowledge base -- the
                                                            FSM Meta Knowledge Base.
The CDM that we implemented is very similar to the
Interface Description Language (IDL) of the CORBA           The Schema Layer
specification [1] -- because its purposes are rather
similar. IDL is a language, which describes object ser-     Databases, as components of a federation, use
vices in an intermediate way and the CDM describes          database schemata. Applications use data type
entities (application objects) in an intermediate way.      definitions to declare their application types.

An IDL description is mapped into a real                    The FSM reads these schemata and declarations and
programming language and the object services are            interprets the used data types through the information
of the meta layer. Application entities are transformed
into entities of the CDM and then -- for storage             The FSM-Bind-Agent acts as a client to the FSM-Bind
purposes -- transformed into entities of a database data     module, which is the server, and performs the link pro-
model.                                                       cess between FSM and component. It runs an imple-
                                                             mented protocol for start-up and shut-down situations
The entity information in CDM-format is stored in the        and uses the Remote Procedure Call (RPC) technique.
Federal Data Dictionary (FDD) for retrieval purposes.
                                                             After linkage the FSM-Bind-Agent passes control to a
The Application Layer                                        so-called FSM-Agent, which performs the information
                                                             exchange and the retrieval of schema information via
Finally the data that comes from applications is stored      the Remote Data Access (RDA) protocol.
in databases that have joined the federation, that are
represented through meta-information in the Meta             What is still missing, is a user friendly retrieval
Knowledge Base, and that are willing to perform the          facility that completes the Federal Data Dictionary.
storage process after a negotiation of their autonomy        We will describe our ideas in the next section.
rights.
                                                             3.1.    Extensions of the FDD
Of course, the data is not stored as CDM-typed data
but is typed according to the data model of the              Data dictionaries offer technical information to users -
involved database system. The interpretation of binary       - and exactly this can be expected from our Federal
data runs the same way as the transformation of type         Data Dictionary as it is currently implemented. If a
information: It goes from the data model of the              user wants to build a new application he looks into the
application towards the CDM and from the CDM to              FDD and looks up some data structures that he wants
the database data model, and vv.                             to re-use. Then he includes the chosen data structures
                                                             into his new schema (the FSM provides some
                                                             commands to do so) and runs his application.
3. Extensions of the FSM Prototype
                                                             This user is unable to check whether his new schema
Since 1991 the FSM prototype has been completed by           violates the semantic integrity of the universe of dis-
some student’s work.                                         course of the actual federation because he can not ask
                                                             the FDD to present him semantic relations between
The Federal Data Dictionary of the prototype                 entities.
contained information about data type declarations,
the types of application entities, and the structure of      We wish to provide such a user with an extended Fed-
these entities -- as well, access rights were included. It   eral Data Dictionary, which shows the contents of a
did not include any technical information about the          federation from various levels of abstraction. If this
availability of entities or schemata.                        extended data dictionary has a graphic interface the
                                                             user will use a mouse to easily request the change of
We extended the FDD and it now contains technical            levels. Which are these levels?
information about the federation components. The
meta layer includes information about the technical          Taxonomy Level
system that hosts the application or the database
system. The schema layer includes information about          The highest level presented, should be a taxonomy
the technical availability of entities [9].                  upon the universe of discourse. It could be the union
                                                             of all schemata (and may be data type declarations of
The lack of a docking mechanism and a protocol to            applications) of local database components, which we
negotiate autonomy was another problem of the                previously transformed into the abstraction level of a
original FSM prototype. It was a static system with          concept language. This level would represent the data
two applications, a database system and the FSM with         of a particular federation without any technical details.
hard wired mechanisms to read data type declarations         Here the user could look-up the real-world context of
-- database schemata could not be read, nor was it           an entity and might ask questions about the relation-
possible to link another database system with the FSM.       ships between entities. It is the level that KL-ONE like
                                                             languages usually offer to users with their T-Box.
Now we have implemented a link mechanism that
generalizes the old one [10]. We now use a FSM-Bind          Concept Languages separate between the terminologi-
module that binds a component -- either a database           cal (T-Box) and assertion knowledge (A-Box). The
system or an application -- if it includes our FSM-          task, which we have to perform is to abstract the tech-
Bind-Agent.                                                  nical information from schemata and data type
declarations to concepts of concept languages. In [11]
we find a theoretical basis that allows us to express        We realize this view by an FDD retrieval, because our
database schemata with concept languages.                    directory includes the structure information of entities
                                                             in a neutral representation and the information about
Moreover, the authors show that classification is then       the availability of these entities.
available for entities of schemata -- and we found out
that the implementation of a classificator is                Syntax Level
surprisingly supported through an algorithm, which we
use within the FSM to detect data type intersections         Finally, the user may get what he always got from
for types from different data models. This algorithm         databases: the pure schema information. If he asks for
follows perfectly the above mentioned steps for a            this, he will get an excerpt of a schema of one or more
classification of concepts.                                  particular local components of the federation -- and he
                                                             should decide himself whether he would like to
Anyway, if we make the is-a and part-of relations of         receive this information in the format of a common
entities from schemata explicit and suppress the             data model or in the individual format of the involved
technical information, then we can ask questions             local federation components.
against a schema similar to the questions against a
taxonomy.
                                                             4. First Steps toward the Taxonomy Level
The implementation of this level may use intermediate
language representations that follow the idea of at-         Concerning the integration of abstract schema rep-
tributed trees. This model allows us to determine the        resentations into one taxonomy we did some work in
degree of entity detail information, which we want to        advance and evaluated an idea, published in [12]. It
present, by cutting the tree at a certain level. The in-     proposed the assignment of fuzzy values to
formation above the cut is presented as concept. The         relationships to determine the is-a of an entity.
rest is hidden until requests from other levels of our re-
trieval interface force it to become visible.                We took this idea and tried to use probability values
                                                             for the integration of different schemata into one -- to
Apparently, we address some open questions if we             simulate the situation that comes up if we have to
want to extend a data dictionary with knowledge              integrate abstracted schemata from components into
representation features:                                     one taxonomy. It was a first guess to cope with
                                                             modeling heterogeneity.
How do we find a way to reconstruct the entity view
from relational schemata with normalized relations?          The basic assumption behind our tests was, that the in-
Any automatic evaluation of foreign keys -- which is         sert of knowledge into a taxonomy is an evolutionary
the only data model construct that can be used to ex-        process and that we ask ”is B a A or a C” and not
press sub-part relationships, set-inclusions, and entity-    ”how probably is B a A and a C”.
inclusions within the relational data model -- finally
depends on the support of a human. A machine may             We defined a value CT (Ei, Ej) for the correctness of a
solely hypothesize is-a relations between entities.          is-a relationship between two entities Ei and Ej in a ta-
Thus, our entity re-constructor can not be a completely      xonomy for the federation. Such a value is assumed to
automatic component. It has to include a dialogue            be assigned to each is-a relationship within that taxo-
component to keep in touch with a human expert, but          nomy. Similar to CT we defined a CS (Ei, Ej) as a value
it may be a component that is able to learn.                 for the correctness of a is-a relationship in a local
                                                             schema.
Schema Level
                                                             Next we said that ST (En) and SS (En) are the sets of all
On a second level, the schema level, in a detailed           super-concepts of a concept in the taxonomy and an
view, the user should have access to the more techni-        entity in a local schema.
cal details of entities and should see what attributes an
entity make up, where the information resides within         Finally, we defined two functions, which were neces-
the federation, whether and when it is accessible for        sary to calculate the probability values during the inte-
him.                                                         gration process.

This level is comparable with an extended Entity-            The first function was called INIT and initialized an
Relationship level where we added attributes about           initial taxonomy with the value 1 for all is-a relation-
data distribution and data availability to the usual         ships: CT (Ei, Ej) := 1.
representation of entities, attributes, and relationships.
                                                             however, gave a new balance to both values, which
The second function included a case statement and            was 0.69 for the ”B is-a A” and 0.42 for the ”B is-a C”
was called CALC. It calculated the initialized values        relationship.
according to the new schema. The first case, C1, was
used if a relationship was found in a schema -- it           A second test gave surprising results: We inserted the
corresponds with the INIT function for the taxonomy -        two C-type schemata and then four times the A-type
- and set CS (Ei, Ej) := 1. We assume that the designer      schemata. This gave a high value to the ”B is-a C”
of the schema did a good and correct work.                   relationship first -- the balance was 0.5 for ”B is-a A”
                                                             and 0.84 for ”B is-a C” -- and a final value of 0.96 for
The second case, C2, was used, if we find a                  ”B is-a A” and 0.37 for ”B is-a C”.
relationship within the schema but not within the
taxonomy. We insert the relationship into the                While the first test showed that the late insert of an ap-
taxonomy and give it the value CT (Ei,Ej) := CS (EiEj)       parently insignificant relationship makes the value sys-
÷ card (ST (Ei) ≈ SS (Ei)).                                  tem unstable, the second test showed that an early
                                                             insert of the two C-type schemata prevents the al-
This approach seems to be correct because we can not         ternative relationship to fall down to an ”insignificant”
guarantee that the taxonomy was correctly initialized        valuation.
with relationships. Moreover, an insertion of a new re-
lationship affects the probability value of another one      Anyway, both value calculations were highly sequence
because there must be a reason why a particular appli-       dependent, and we suspected the second assumption as
cation domain needs this new relationship. It may be,        the reason for it. Thus we tried again without this as-
that the already existing relationships do not have the      sumption. We inserted into C3 a variable: V (Ei)
importance, which we have expected.                          counts the number of schemata without a particular
                                                             relationship and the calculation C3 changed to
Finally there is the case C3. In this case we see a rela-                 CS (Ei,Ej) := 1 ÷ (V (Ei) + 1).
tionship within the taxonomy but miss it in a schema.
We interpret that relationship as ”possible but              This does not change much and we were stuck to the
unnecessary” within this application domain and              question: Is the insert of knowledge really an evolu-
”insert” it into the schema with CS (Ei,Ej) := CT (Ei,Ej)    tionary process or is it correct to calculate probability
÷ card (ST (Ei)).                                            values from the arithmetic mean of all values from
                                                             schemata?
Then we made three assumptions:
a) The increase of probability of one particular rela-
tionship is given by its existence in schemata and           5. Conclusion
causes a decrease of probability for those
relationships, which are often missed.                       The proposed extended data dictionary gives a twofold
b) The results of calculations about the overall proba-      benefit. At first, a user who wants to build a new
bility for a particular relationship is included into the    schema for an application in a system federation can
taxonomy.                                                    check which entities already exist, which of them he
c) Results are calculated through the geometrical            can re-use within his application, and which one he
mean of the two probability values from the taxonomy         has to add or modify.
and from a schema.                                           Second, an administrator can test the correctness of an
                                                             existing schema against the universe of discourse. He
With these assumptions and formulas we tested the in-        can check the completeness of relations between enti-
tegration of six schemata into a taxonomy, which was         ties by looking-up the taxonomy, where he would find
initialized with one relationship ”B is-a A”. Four of        the collection of all relationships between entities --
these schemata included the relationship ”B is-a A”          and eventually a probability value of the necessity or
(we call them the A-type schemata). Two included ”B          reliability of an individual relationship.
is-a C” and not ”B is-a A” (we call these the C-type
schemata).
                                                             6. Literature
In a first test, we inserted a C-type schema first and af-
terwards both relationships had the same value (0.71)        [1] The Common Object Request Broker: Architecture
in the taxonomy. A four-times insert of the A-type           and Specification, OMG Document Number 91.12.1,
schemata brought the value of the ”B is-a A” relation-       Revision 1.1, Draft
ship up to 0.98 and the value of ”B is-a C” fell down        [2] W. Benn, G. Junkermann, H. Kalweit, Ch. Kor-
to 0.18 -- similar to the predicate ”insignificant” or       tenbreer, G. Schlageter, X. Wu: The Conceptual Ob-
”incorrect”. A final insert of a C-type schema,
ject Manager Document, University of Hagen, Com-
puter Science Report Nº 99, 1990
[3] W. Benn, Ch. Kortenbreer, X. Wu: Towards Inter-
operability: Vertical Integration of Languages with a
KBMS, GI-Fachtagung “Datenbanksysteme in Büro,
Technik und Wissenschaft” (BTW 91), Springer-Ver-
lag, 1991
[4] W. Benn: KBMS Support for Multiple Paradigm
Applications, in [16]
[5] W. Benn: KBMS Support for Conceptual
Modeling in AI, 3rd International Conference on Tools
for Artificial Intelligence, 1991
[6] W. Benn, Ch. Kortenbreer, G. Schlageter, X. Wu:
On Interoperability for KBMS Applications - The Ho-
rizontal Integration Task -, 8 th Intl. Conference on
Data Engineering, Phoenix, AZ, 1992
[7] A.P. Sheth, J.A. Larson: Federated Database Sys-
tems for Managing Distributed, Heterogeneous, and
Autonomous Databases, ACM Computing Surveys
(1990) 3
[8] DIN 66 313, Rahmenangaben für Systeme zur Ver-
waltung von Informationsrecourcenverzeichnissen,
DIN Deutsches Institut für Normung e.V., Berlin,
1992 (same as ISO/IEC 10 027)
[9] J. Hunstock: Erweiterung einer Wissensbasis zur
Realisierung von universellem Polymorphismus in fö-
derativen Systemen um technische Informationen auto-
nomer Systemkomponenten (Extending the Meta-
Knowledge Base of the FSM by technical information),
thesis for diploma, Chemnitz University of
Technology, 1993
[10]     M. Schöne, S. Herold: Konzeption und Imple-
mentierung eines Protokolls und zugehöriger System-
komponenten zur Integration von Datenbanksystemen
in einer Föderation (Design and implementation of a
protocol for the integration of database components
into a federation), thesis for diploma, Chemnitz Uni-
versity of Technology, 1994
[11]    S. Bergamaschi, C. Sartori: On taxonomic re-
asoning in conceptual design, ACM TODS (1992) 3
[12] P. Fankhauser, M. Kracker, E. Neuhold: Semantic
vs. Structural Resemblance of Classes, ACM SIG-
MOD Record 20 (1991) 4