=Paper=
{{Paper
|id=Vol-1/paper-6
|storemode=property
|title=What's in a federation? Extending data dictionaries with knowledge representation techniques
|pdfUrl=https://ceur-ws.org/Vol-1/benn-long.pdf
|volume=Vol-1
|authors=W. Benn
}}
==What's in a federation? Extending data dictionaries with knowledge representation techniques==
What’s in a Federation?
Extending Data Dictionaries with Knowledge Representation Techniques
Wolfgang Benn
Chemnitz University of Technology • Management of Data
P.O. Box 964 • D-09009 Chemnitz
benn@informatik.tu-chemnitz.de
1. Introduction
Databases and knowledge representation languages
have a rather different view upon data: knowledge rep- In the remainder of this paper we will briefly introduce
resentation languages describe a universe of discourse a module that coordinates a federation of systems and
in a taxonomy and allow a user to ask epistemic ques- that hosts a central data dictionary. It is the module,
tions against the relationships between concepts and which we will extend to provide users with an entity
roles. However, no data structures, data locations, nor view upon the information available in a federation.
any information about the existence or availability of We introduce the logical architecture of a prototypical
data can be found in a taxonomy -- even not if it in- implementation of this module in section 2 and de-
cludes an assertion that describes a particular data scribe some extensions that we made in section 3. In
item. section 4 we specify some ideas of the mentioned ex-
tension, conclude in section 5 and give some literature
Relational databases provide users with schemata. in section 6.
Schemata describe in detail the data structures of sets
of persistent data items. Data dictionaries, included in
these systems, tell about data existence and its avail- 2. The Federal System Manager
ability. Anyway, these tools do not provide the entity
view, relationships between entities are merely The Federal System Manager (FSM) is a module that
implicit, and no question about the universe of dis- coordinates a federation of autonomous systems.
course that is behind a schema will get an answer. These systems can be applications or services like
databases, which may link to the FSM to form a
Object-oriented databases provide users with class hi- federation for some particular tasks. Afterwards they
erarchies as schemata. They support the entity view -- can leave the federation and run again as autonomous
is-a as well as part-of relationships are explicit. Never- systems. This idea is rather similar to the concept of
theless, an information about the universe of discourse multi-agent systems.
is not given as well.
The FSM performs a minimum of three tasks: The first
In a federation of systems -- databases and one is to run a protocol that enables the linkage
applications, for instance -- the situation gets worse. process and guarantees a negotiation of autonomy as-
Databases may be heterogeneous in their modeling pects to the components, if these want to join or leave
technique: some will follow the object-oriented the the federation. Second, the FSM must provide a uni-
majority certainly follows the relational paradigm. form view upon all information that is available to ap-
How does a user get to know what data is available in plications of the federation through a so-called Com-
a federation, if he wants to build a new application? mon Data Model (CDM). Third, it must support an ex-
How does that user get to know how he may access a change of information, i.e., data types and data itself,
particular data item? How does he know that the between members of the federation. We will detail
selected data item is semantically correct concerning these tasks and concentrate on the second one.
the context of his application?
Comparing an FSM with the Common Object Request
If he can access a federated data dictionary, it will pro- Broker Architecture (CORBA) [1] the FSM is an
vide him with the technical information about the data object broker that looks at databases as service pro-
in a common data model -- similar to the global con- viding objects and applications as clients that request
ceptual schema of a distributed database. If such a tool these services. Commonly known services from data-
does not exist, the user must read all available base components are storage, retrieval, update, etc.
schemata from all available federation components
(i.e., he must know about all languages, data models, Moreover, the FSM is an object itself! It provides ser-
and dialects that the local components of the feder- vices like data and type exchange. It contains a Fed-
ation individually use). eral Data Dictionary (FDD) that allows a user to re-
trieve the information contents of the actual federation available for all programs written in this programming
under several aspects. It is our aim to extend this language. Application objects described in our CDM
Federal Data Dictionary with knowledge are (under certain conditions) transformable into all
representation techniques to better support users in data models that are represented in the FSM.
their retrieval than before.
The Meta Layer
2.1. The FSM Prototype
An extension of the IRD standard was made for the
The currently implemented FSM prototype has its meta layer. If the FSM supports an exchange of data
roots in an ESPRIT project, finished in 1991 between components, it must be able to transform data
[2,3,4,5,6]. The prototype mainly follows the reference between the different individual data descriptions.
architecture for interoperable systems given in [7] and These descriptions follow type or schema declarations,
includes a repository according to the Information which use data model elements. Thus, our meta layer
Resource Dictionary Standard IRDS [8]. has to include a suitable sub-set of the component data
model for each involved component. Moreover, it
This standard defines a four-layer architecture with must include some rules that guide the transformation
(top down) of entities between these data model sub-sets.
• a meta-meta layer that describes the model of the
meta layer descriptions -- which is in our case the However, the description of a data model sub-set is
Common Data Model of the FSM, a frame work that somewhat more complex than the description of a
basis on the Abstract Data Type (ADT) idea --, schema. While a schema merely consists of data struc-
• a meta layer where we find the description of sche- tures, a data model usually includes data types and
mata -- which is in our case a description of the fed- data type semantics. The meta layer of our FSM in-
eration components data models --, cludes both (the assignment of a set of operations to a
• a schema layer where the data descriptions are lo- data type that makes up the type’s semantics in the
cated -- which is in our case the data types that are data model of a component is currently under
defined in schemata of databases or in type declara- implementation).
tions of applications --, and
• an application data layer where we finally find the To enable the exchange of data and schema
application data itself. information between components the system
administrator of each federation component defines
The Meta-Meta Layer the relevant structural part of his component data
model types with the CDM types and assigns some
To enable the description of schema descriptions we procedures that make up the semantics of these data
implemented a common data model. types. He inserts the necessary data model knowledge
into the meta layer using the meta-meta layer ele-
In the literature we found many different approaches ments.
to implement a CDM -- the approach most often used,
however, was the object-oriented. Thus, we asked our- For instance, from an object oriented data model the
selves, what is the kernel idea of the object-oriented administrator defines the structural parts of the
paradigm that makes it suitable for a CDM. We found concept CLASS and assigns at least one particular
out that it probably is the idea of Abstract Data Types. routine that performs inheritance similar to his
individual data model.
Thus, we implemented a frame work, which is actually
not a real data model but a tool box [2]. It allows a This information is provided through an interface,
user to describe the structure and semantics of those which is the so-called Data-Model-Profile. It is an
elements, which he uses to describe a schema, similar ASCII file with a particular syntax that is parsed. Then
to the ADT concept (see next paragraph). the information is kept in a knowledge base -- the
FSM Meta Knowledge Base.
The CDM that we implemented is very similar to the
Interface Description Language (IDL) of the CORBA The Schema Layer
specification [1] -- because its purposes are rather
similar. IDL is a language, which describes object ser- Databases, as components of a federation, use
vices in an intermediate way and the CDM describes database schemata. Applications use data type
entities (application objects) in an intermediate way. definitions to declare their application types.
An IDL description is mapped into a real The FSM reads these schemata and declarations and
programming language and the object services are interprets the used data types through the information
of the meta layer. Application entities are transformed
into entities of the CDM and then -- for storage The FSM-Bind-Agent acts as a client to the FSM-Bind
purposes -- transformed into entities of a database data module, which is the server, and performs the link pro-
model. cess between FSM and component. It runs an imple-
mented protocol for start-up and shut-down situations
The entity information in CDM-format is stored in the and uses the Remote Procedure Call (RPC) technique.
Federal Data Dictionary (FDD) for retrieval purposes.
After linkage the FSM-Bind-Agent passes control to a
The Application Layer so-called FSM-Agent, which performs the information
exchange and the retrieval of schema information via
Finally the data that comes from applications is stored the Remote Data Access (RDA) protocol.
in databases that have joined the federation, that are
represented through meta-information in the Meta What is still missing, is a user friendly retrieval
Knowledge Base, and that are willing to perform the facility that completes the Federal Data Dictionary.
storage process after a negotiation of their autonomy We will describe our ideas in the next section.
rights.
3.1. Extensions of the FDD
Of course, the data is not stored as CDM-typed data
but is typed according to the data model of the Data dictionaries offer technical information to users -
involved database system. The interpretation of binary - and exactly this can be expected from our Federal
data runs the same way as the transformation of type Data Dictionary as it is currently implemented. If a
information: It goes from the data model of the user wants to build a new application he looks into the
application towards the CDM and from the CDM to FDD and looks up some data structures that he wants
the database data model, and vv. to re-use. Then he includes the chosen data structures
into his new schema (the FSM provides some
commands to do so) and runs his application.
3. Extensions of the FSM Prototype
This user is unable to check whether his new schema
Since 1991 the FSM prototype has been completed by violates the semantic integrity of the universe of dis-
some student’s work. course of the actual federation because he can not ask
the FDD to present him semantic relations between
The Federal Data Dictionary of the prototype entities.
contained information about data type declarations,
the types of application entities, and the structure of We wish to provide such a user with an extended Fed-
these entities -- as well, access rights were included. It eral Data Dictionary, which shows the contents of a
did not include any technical information about the federation from various levels of abstraction. If this
availability of entities or schemata. extended data dictionary has a graphic interface the
user will use a mouse to easily request the change of
We extended the FDD and it now contains technical levels. Which are these levels?
information about the federation components. The
meta layer includes information about the technical Taxonomy Level
system that hosts the application or the database
system. The schema layer includes information about The highest level presented, should be a taxonomy
the technical availability of entities [9]. upon the universe of discourse. It could be the union
of all schemata (and may be data type declarations of
The lack of a docking mechanism and a protocol to applications) of local database components, which we
negotiate autonomy was another problem of the previously transformed into the abstraction level of a
original FSM prototype. It was a static system with concept language. This level would represent the data
two applications, a database system and the FSM with of a particular federation without any technical details.
hard wired mechanisms to read data type declarations Here the user could look-up the real-world context of
-- database schemata could not be read, nor was it an entity and might ask questions about the relation-
possible to link another database system with the FSM. ships between entities. It is the level that KL-ONE like
languages usually offer to users with their T-Box.
Now we have implemented a link mechanism that
generalizes the old one [10]. We now use a FSM-Bind Concept Languages separate between the terminologi-
module that binds a component -- either a database cal (T-Box) and assertion knowledge (A-Box). The
system or an application -- if it includes our FSM- task, which we have to perform is to abstract the tech-
Bind-Agent. nical information from schemata and data type
declarations to concepts of concept languages. In [11]
we find a theoretical basis that allows us to express We realize this view by an FDD retrieval, because our
database schemata with concept languages. directory includes the structure information of entities
in a neutral representation and the information about
Moreover, the authors show that classification is then the availability of these entities.
available for entities of schemata -- and we found out
that the implementation of a classificator is Syntax Level
surprisingly supported through an algorithm, which we
use within the FSM to detect data type intersections Finally, the user may get what he always got from
for types from different data models. This algorithm databases: the pure schema information. If he asks for
follows perfectly the above mentioned steps for a this, he will get an excerpt of a schema of one or more
classification of concepts. particular local components of the federation -- and he
should decide himself whether he would like to
Anyway, if we make the is-a and part-of relations of receive this information in the format of a common
entities from schemata explicit and suppress the data model or in the individual format of the involved
technical information, then we can ask questions local federation components.
against a schema similar to the questions against a
taxonomy.
4. First Steps toward the Taxonomy Level
The implementation of this level may use intermediate
language representations that follow the idea of at- Concerning the integration of abstract schema rep-
tributed trees. This model allows us to determine the resentations into one taxonomy we did some work in
degree of entity detail information, which we want to advance and evaluated an idea, published in [12]. It
present, by cutting the tree at a certain level. The in- proposed the assignment of fuzzy values to
formation above the cut is presented as concept. The relationships to determine the is-a of an entity.
rest is hidden until requests from other levels of our re-
trieval interface force it to become visible. We took this idea and tried to use probability values
for the integration of different schemata into one -- to
Apparently, we address some open questions if we simulate the situation that comes up if we have to
want to extend a data dictionary with knowledge integrate abstracted schemata from components into
representation features: one taxonomy. It was a first guess to cope with
modeling heterogeneity.
How do we find a way to reconstruct the entity view
from relational schemata with normalized relations? The basic assumption behind our tests was, that the in-
Any automatic evaluation of foreign keys -- which is sert of knowledge into a taxonomy is an evolutionary
the only data model construct that can be used to ex- process and that we ask ”is B a A or a C” and not
press sub-part relationships, set-inclusions, and entity- ”how probably is B a A and a C”.
inclusions within the relational data model -- finally
depends on the support of a human. A machine may We defined a value CT (Ei, Ej) for the correctness of a
solely hypothesize is-a relations between entities. is-a relationship between two entities Ei and Ej in a ta-
Thus, our entity re-constructor can not be a completely xonomy for the federation. Such a value is assumed to
automatic component. It has to include a dialogue be assigned to each is-a relationship within that taxo-
component to keep in touch with a human expert, but nomy. Similar to CT we defined a CS (Ei, Ej) as a value
it may be a component that is able to learn. for the correctness of a is-a relationship in a local
schema.
Schema Level
Next we said that ST (En) and SS (En) are the sets of all
On a second level, the schema level, in a detailed super-concepts of a concept in the taxonomy and an
view, the user should have access to the more techni- entity in a local schema.
cal details of entities and should see what attributes an
entity make up, where the information resides within Finally, we defined two functions, which were neces-
the federation, whether and when it is accessible for sary to calculate the probability values during the inte-
him. gration process.
This level is comparable with an extended Entity- The first function was called INIT and initialized an
Relationship level where we added attributes about initial taxonomy with the value 1 for all is-a relation-
data distribution and data availability to the usual ships: CT (Ei, Ej) := 1.
representation of entities, attributes, and relationships.
however, gave a new balance to both values, which
The second function included a case statement and was 0.69 for the ”B is-a A” and 0.42 for the ”B is-a C”
was called CALC. It calculated the initialized values relationship.
according to the new schema. The first case, C1, was
used if a relationship was found in a schema -- it A second test gave surprising results: We inserted the
corresponds with the INIT function for the taxonomy - two C-type schemata and then four times the A-type
- and set CS (Ei, Ej) := 1. We assume that the designer schemata. This gave a high value to the ”B is-a C”
of the schema did a good and correct work. relationship first -- the balance was 0.5 for ”B is-a A”
and 0.84 for ”B is-a C” -- and a final value of 0.96 for
The second case, C2, was used, if we find a ”B is-a A” and 0.37 for ”B is-a C”.
relationship within the schema but not within the
taxonomy. We insert the relationship into the While the first test showed that the late insert of an ap-
taxonomy and give it the value CT (Ei,Ej) := CS (EiEj) parently insignificant relationship makes the value sys-
÷ card (ST (Ei) ≈ SS (Ei)). tem unstable, the second test showed that an early
insert of the two C-type schemata prevents the al-
This approach seems to be correct because we can not ternative relationship to fall down to an ”insignificant”
guarantee that the taxonomy was correctly initialized valuation.
with relationships. Moreover, an insertion of a new re-
lationship affects the probability value of another one Anyway, both value calculations were highly sequence
because there must be a reason why a particular appli- dependent, and we suspected the second assumption as
cation domain needs this new relationship. It may be, the reason for it. Thus we tried again without this as-
that the already existing relationships do not have the sumption. We inserted into C3 a variable: V (Ei)
importance, which we have expected. counts the number of schemata without a particular
relationship and the calculation C3 changed to
Finally there is the case C3. In this case we see a rela- CS (Ei,Ej) := 1 ÷ (V (Ei) + 1).
tionship within the taxonomy but miss it in a schema.
We interpret that relationship as ”possible but This does not change much and we were stuck to the
unnecessary” within this application domain and question: Is the insert of knowledge really an evolu-
”insert” it into the schema with CS (Ei,Ej) := CT (Ei,Ej) tionary process or is it correct to calculate probability
÷ card (ST (Ei)). values from the arithmetic mean of all values from
schemata?
Then we made three assumptions:
a) The increase of probability of one particular rela-
tionship is given by its existence in schemata and 5. Conclusion
causes a decrease of probability for those
relationships, which are often missed. The proposed extended data dictionary gives a twofold
b) The results of calculations about the overall proba- benefit. At first, a user who wants to build a new
bility for a particular relationship is included into the schema for an application in a system federation can
taxonomy. check which entities already exist, which of them he
c) Results are calculated through the geometrical can re-use within his application, and which one he
mean of the two probability values from the taxonomy has to add or modify.
and from a schema. Second, an administrator can test the correctness of an
existing schema against the universe of discourse. He
With these assumptions and formulas we tested the in- can check the completeness of relations between enti-
tegration of six schemata into a taxonomy, which was ties by looking-up the taxonomy, where he would find
initialized with one relationship ”B is-a A”. Four of the collection of all relationships between entities --
these schemata included the relationship ”B is-a A” and eventually a probability value of the necessity or
(we call them the A-type schemata). Two included ”B reliability of an individual relationship.
is-a C” and not ”B is-a A” (we call these the C-type
schemata).
6. Literature
In a first test, we inserted a C-type schema first and af-
terwards both relationships had the same value (0.71) [1] The Common Object Request Broker: Architecture
in the taxonomy. A four-times insert of the A-type and Specification, OMG Document Number 91.12.1,
schemata brought the value of the ”B is-a A” relation- Revision 1.1, Draft
ship up to 0.98 and the value of ”B is-a C” fell down [2] W. Benn, G. Junkermann, H. Kalweit, Ch. Kor-
to 0.18 -- similar to the predicate ”insignificant” or tenbreer, G. Schlageter, X. Wu: The Conceptual Ob-
”incorrect”. A final insert of a C-type schema,
ject Manager Document, University of Hagen, Com-
puter Science Report Nº 99, 1990
[3] W. Benn, Ch. Kortenbreer, X. Wu: Towards Inter-
operability: Vertical Integration of Languages with a
KBMS, GI-Fachtagung “Datenbanksysteme in Büro,
Technik und Wissenschaft” (BTW 91), Springer-Ver-
lag, 1991
[4] W. Benn: KBMS Support for Multiple Paradigm
Applications, in [16]
[5] W. Benn: KBMS Support for Conceptual
Modeling in AI, 3rd International Conference on Tools
for Artificial Intelligence, 1991
[6] W. Benn, Ch. Kortenbreer, G. Schlageter, X. Wu:
On Interoperability for KBMS Applications - The Ho-
rizontal Integration Task -, 8 th Intl. Conference on
Data Engineering, Phoenix, AZ, 1992
[7] A.P. Sheth, J.A. Larson: Federated Database Sys-
tems for Managing Distributed, Heterogeneous, and
Autonomous Databases, ACM Computing Surveys
(1990) 3
[8] DIN 66 313, Rahmenangaben für Systeme zur Ver-
waltung von Informationsrecourcenverzeichnissen,
DIN Deutsches Institut für Normung e.V., Berlin,
1992 (same as ISO/IEC 10 027)
[9] J. Hunstock: Erweiterung einer Wissensbasis zur
Realisierung von universellem Polymorphismus in fö-
derativen Systemen um technische Informationen auto-
nomer Systemkomponenten (Extending the Meta-
Knowledge Base of the FSM by technical information),
thesis for diploma, Chemnitz University of
Technology, 1993
[10] M. Schöne, S. Herold: Konzeption und Imple-
mentierung eines Protokolls und zugehöriger System-
komponenten zur Integration von Datenbanksystemen
in einer Föderation (Design and implementation of a
protocol for the integration of database components
into a federation), thesis for diploma, Chemnitz Uni-
versity of Technology, 1994
[11] S. Bergamaschi, C. Sartori: On taxonomic re-
asoning in conceptual design, ACM TODS (1992) 3
[12] P. Fankhauser, M. Kracker, E. Neuhold: Semantic
vs. Structural Resemblance of Classes, ACM SIG-
MOD Record 20 (1991) 4