<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SEEKing Knowledge in Legacy Information Systems to Support Interoperability</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Joachim Hammer</institution>
          ,
          <addr-line>Mark Schmalz</addr-line>
          ,
          <country>William O'Brien</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The SEEK project (Scalable Extraction of Enterprise Knowledge) is developing methodologies to overcome the problems of assembling knowledge resident in numerous legacy information systems by enabling rapid connection to, and privacy-constrained filtering of, legacy data and applications with little programmatic setup. In this report we outline our use of data reverse engineering and code analysis techniques to automatically infer as much as possible the schema and semantics of a legacy information system. We illustrate the approach using an example from our construction supply chain testbed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>MOTIVATION</title>
      <p>
        We are developing methodologies and algorithms to facilitate
discovery and extraction of enterprise knowledge from legacy
sources. These capabilities are being implemented in a toolkit
called SEEK (Scalable Extraction of Enterprise Knowledge).
SEEK is being developed as part of a larger, multi-disciplinary
research project to develop theory and methodologies in
support of computerized decision and negotiation support
across a network of firms (general overview in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). SEEK is
not meant as a replacement for wrapper or mediator
development toolkits. Rather, it complements existing tools by
providing input about the contents and structure of the legacy
source that has so far been supplied manually by domain
experts. This streamlines the process and makes wrapper
development scalable.
      </p>
      <p>Figure 1 illustrates the need for knowledge extraction
tools in support of wrapper development in the context of a
supply chain. There are many firms (principally, subcontractors
and suppliers), and each firm contains legacy data used to
manage internal processes. This data is also useful as input to a
project level decision support tool. However, the large number
of firms working on a project makes it likely that there will be a
high degree of physical and semantic heterogeneity in their
legacy systems. This implies practical difficulties in connecting
firms’ data and systems with enterprise-level decision support
tools. It is the role of the SEEK toolkit to help establish the
necessary connections with minimal burden on the underlying
firms, which often have limited technical expertise. The SEEK
wrappers shown in Fig. 1 are wholly owned by the firm they
are accessing and hence provide a safety layer between the
source and end user. Security can be further enhanced by
deploying the wrappers in a secure hosting infrastructure at an
ISP, for example, as shown in the figure.</p>
      <p>We note that SEEK is not intended to be a
generalpurpose data extraction tool: SEEK extracts a narrow range of
data and knowledge from heterogeneous sources. Current
instantiations of SEEK are designed to extract the limited range
of information needed by these process models to support
project optimization.</p>
      <p>Supplier</p>
      <p>Extended Enterprise
Sub/
Supplier
…</p>
      <p>Sub/</p>
      <p>Supplier
SEEK
wrapper</p>
      <p>SEEK … SEEK
wrapper wrapper
Secure Hosting Infrastructure
Coordinator/</p>
      <p>Lead</p>
      <p>Analysis
(e.g., E-ERP)</p>
    </sec>
    <sec id="sec-2">
      <title>SEEK APPROACH TO KNOWLEDGE</title>
    </sec>
    <sec id="sec-3">
      <title>EXTRATCION</title>
      <p>SEEK applies Data Reverse Engineering (DRE) and Schema
Matching (SM) processes to legacy database(s), to produce a
source wrapper for a legacy source. The source wrapper will be
used by another component (for the analysis component in
Figure 1) wishing to communicate and exchange information
with the legacy system.</p>
      <p>First SEEK generates a detailed description of the legacy
source, including entities, relationships, application-specific
meanings of the entities and relationships, business rules, data
formatting and reporting constraints, etc. We collectively refer
to this information as enterprise knowledge. The extracted
enterprise knowledge forms a knowledgebase that serves as
input for subsequent steps. In particular, DRE connects to the
underlying DBMS to extract schema information (most data
sources support some form of Call-Level Interface such as
JDBC). The schema information from the database is
semantically enhanced using clues extracted by the semantic
analyzer from available application code, business reports, and,
in the future, perhaps other electronically available information
that may encode business data such as e-mail correspondence,
corporate memos, etc. It has been our experience (through
visits with representatives from the construction and
manufacturing domains) that such application code exists and
can be made available electronically. Second, the semantically
enhanced legacy source schema must be mapped into the
domain model (DM) used by the application(s) that want(s) to
access the legacy source. This is done using a schema mapping
process that produces the mapping rules between the legacy
source schema and the application domain model. In addition
to the domain model, the schema mapper also needs access to
the domain ontology (DO) describing the model.</p>
      <p>Finally, the extracted legacy schema and the mapping
rules provide the input to the wrapper generator (not shown),
which produces the source wrapper. In this paper, we focus on
our implementation of the DRE algorithm.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Data Reverse Engineering</title>
      <p>
        Data reverse engineering (DRE) is defined as the application of
analytical techniques to one or more legacy data sources to
elicit structural information (e.g., term definitions, schema
definitions) from the legacy source(s) in order to improve the
database design or produce missing schema documentation. So
far in SEEK, we are applying DRE to relational databases only.
However, since the relational model has only limited semantic
expressability, in addition to the schema, our DRE algorithm
generates an E/R-like representation of the entities and
relationships that are not explicitly defined in the legacy
schema (but which exist implicitly). Our approach to data
reverse engineering for relational sources is based on existing
algorithms by Chiang [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and Petit [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, we have
improved their methodologies in several ways, most
importantly to reduce the dependency on human input and to
eliminate some of the limitations of their algorithms (e.g.,
consistent naming of key attributes, legacy schema in 3-NF).
      </p>
      <p>DB Interface
Module</p>
      <p>Data
configuration</p>
      <p>Queries
2 Dictionary Extraction</p>
      <sec id="sec-4-1">
        <title>4DepeInndcelunsciyonMining</title>
      </sec>
      <sec id="sec-4-2">
        <title>5 ClaRsesliafticioantion</title>
      </sec>
      <sec id="sec-4-3">
        <title>6 ClaAsttsriifbicuatetion</title>
        <p>Entity
7 Identification</p>
      </sec>
      <sec id="sec-4-4">
        <title>8 CRlealsastiiofincsahtiiopn</title>
        <p>Application Code
1 AST Generation</p>
      </sec>
      <sec id="sec-4-5">
        <title>3 ACnaoldyesis</title>
        <p>Business
Knowledge</p>
        <p>Schema
Knowledge
Encoder
XML DOC</p>
        <p>To Schema Matcher</p>
        <p>Legacy
Source
AST
Metadata
Repository
XML DTD</p>
        <p>Our DRE algorithm is divided into schema extraction and
semantic analysis, which operate in interleaved fashion. An
overview of the two algorithms, which are comprised of eight
steps, is shown in Figure 2. In addition to the modules that
execute each of the eight steps, the architecture in Figure 3
includes three support components: the configurable Database
Interface Module (upper-right hand corner), which provides
connectivity to the underlying legacy source. Note that this
component is the ONLY source-specific component in the
architecture: in order to perform knowledge extraction from
different sources, only the interface module needs to be
changed. The Knowledge Encoder (lower right-hand corner)
represents the extracted knowledge in the form of an XML
document so that it can be shared with other components in the
SEEK architecture (e.g., the semantic matcher). The Metadata
Repository is internal to DRE and used to store intermediate
run-time information needed by the algorithms including user
input parameters, the abstract syntax tree for the code (e.g.,
from a previous invocation), etc.</p>
        <p>
          We now highlight each of the eight steps and related
activities outlined in Figure 3 using an example from our
construction supply chain testbed. For a detailed description of
our algorithm, refer to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For simplicity, we assume without
lack of generality or specificity that only the following relations
exist in the MS-Project application, which will be discovered
using DRE (for a description of the entire schema refer to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]):
MSP-Project [PROJ_ID, ...]
MSP-Availability[PROJ_ID, AVAIL_UID, ...]
MSP-Resources [PROJ_ID, RES_UID, ...]
MSP-Tasks [PROJ_ID, TASK_UID, ...]
MSP-Assignment [PROJ_ID, ASSN_UID, ...]
        </p>
        <p>In order to illustrate the code analysis and how it enhances
the schema extraction, we refer the reader to the following C
code fragment representing a simple, hypothetical interaction
with the MS Project database.</p>
        <p>char *aValue, *cValue;
int flag = 0;
int bValue = 0;
EXEC SQL SELECT A,C INTO :aValue, :cValue
FROM Z WHERE B = :bValue;
if (cValue &lt; aValue)</p>
        <p>{ flag = 1; }
printf(“Task Start Date %s “, aValue);
printf(“Task Finish Date %s “, cValue);</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Step 1: AST Generation</title>
      <p>We start by creating an Abstract Syntax Tree (AST) shown in
Figure 3. The AST will be used by the semantic analyzer for
code exploration during step 3. Our objective in AST
generation is to be able to associate “meaning” with program
variables. Format strings in input/output statements contain
semantic information that can be associated with the variables
in the input/output statement. This program variable in turn
may be associated with a column of a table in the underlying
legacy database.</p>
      <p>1
dclns</p>
      <p>2
embSQL
&lt;id&gt;
A
beginSQL
columnlist
&lt;id&gt;
C</p>
      <p>Program
3
if</p>
      <p>2
embSQL
SQLselectone
hostvariablelist</p>
      <p>&lt;id&gt;
aValue
&lt;id&gt;
cValue</p>
      <p>4
print</p>
      <p>5
print
SQLAssignment
&lt;id&gt;
B
&lt;id&gt;
bValue</p>
    </sec>
    <sec id="sec-6">
      <title>Step 2. Dictionary Extraction.</title>
      <p>The goal of step 2 is to obtain the relation and attribute names
from the legacy source. This is done by querying the data
dictionary, stored in the underlying database in the form of one
or more system tables. Otherwise, if primary key information
cannot be retrieved directly from the data dictionary, the
algorithm passes the set of candidate keys along with
predefined “rule-out” patterns to the code analyzer. The code
analyzer searches for these patterns in the application code and
eliminates those attributes from the candidate set, which occur
in the rule-out pattern. The rule-out patterns, which are
expressed as SQL queries, occur in the application code
whenever programmer expects to select a SET of tuples. If,
after the code analysis, not all primary key can be identified,
the reduced set of candidate keys is presented to the user for
final primary key selection.</p>
      <p>Result. In the example DRE application, the following
relations and their attributes were obtained from the
MSProject database:</p>
    </sec>
    <sec id="sec-7">
      <title>Step 3: Code Analysis</title>
      <p>
        The objective of step 3, code analysis, is twofold: (1) augment
entities extracted in step 2 with domain semantics, and (2)
identify business rules and constraints not explicitly stored in
the database, but which may be important to the wrapper
developer or application program accessing the legacy source.
Our approach to code analysis is based on code analysis, which
includes slicing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and pattern matching [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The first step is the pre-slicing. From the AST of the
application code, the pre-slicer identifies all the nodes
corresponding to input, output and embedded SQL statements.
It appends the statement node name, and identifier list to an
array as the AST is traversed in pre-order. For example, for the
AST in Figure 3, the array contains the following information
depicted in Table 1. The identifiers that occur in this data
structure maintained by the pre-slicer form the set of slicing
variables.</p>
      <p>The code slicer and analyzer, which represent steps two
and three respectively, are executed once for each slicing
variable identified by the pre-slicer. In the above example, the
slicing variables that occur in SQL and output statements are
aValue and cValue. The direction of slicing is fixed as
backwards or forwards depending on whether the variable in
question is part of a output (backwards) or input (forwards)
statement. The slicing criterion is the exact statement (SQL or
input or output) node that corresponds to the slicing variable.</p>
      <p>During code slicing sub-step we traverse the AST for the
source code and retain only those nodes that have an
occurrence of the slicing variable in sub-tree. This results in a
reduced AST, which is shown in Fig. 4.</p>
      <p>dclns
embSQL
if</p>
      <p>print</p>
      <p>During the analysis sub-step, our algorithm extracts the
information shown in Table 2, while traversing the reduced
AST in pre-order.
1. If a dcln node is encountered, the data type of the identifier
can be learned.
2. embSQL contain the mapping information of identifier
name to corresponding column name and table name in the
database.
3. Printf/scanf nodes contain the mapping information from
the text string to the identifier. In other words we can
extract the ‘meaning’ of the identifier from the text string.</p>
      <p>The results of analysis sub-step are appended to a result
report file. After the code slicer and analyzer have been
invoked on every slicing variable identified by the pre-slicer,
the results report file is presented to the user. The user can base
his decision of whether to perform further analysis based on the
information extracted so far. If the user decides not to perform
further analysis, code analysis passes control to the inclusion
dependency detection module.</p>
      <p>It is important to note, that we identify enterprise
knowledge by matching templates against code fragments in
the AST. So far, we have developed patterns for discovering
business rules which are encoded in loop structures and/or
conditional statements and mathematical formulae, which are
encoded in loop structures and/or assignment statements. Note,
the occurrence of an assignment statement itself does not
necessarily indicate the presence of a mathematical formula,
but the likelihood increases significantly if the statement
contains one of the “slicing variables.”</p>
      <sec id="sec-7-1">
        <title>Step 4. Discovering Inclusion Dependencies.</title>
        <p>After extraction of the relational schema in step 2, the goal of
step 4 is to identify constraints to help classify the extracted
relations, which represent both the real-world entities and the
relationships among them. This is done using inclusion
dependencies (INDs), which indicate the existence of
interrelational constraints including class/subclass relationships.</p>
        <p>Let A and B be two relations, and X and Y be attributes or
a set of attributes of A and B respectively. An inclusion
dependency A.X &lt;&lt; B.Y denotes that a set of values appearing
in A.X is a subset of B.Y. Inclusion dependencies are
discovered by examining all possible subset relationships
between any two relations A and B in the legacy source.</p>
        <p>Without additional input from the domain expert,
inclusion dependencies can be identified in an exhaustive
manner as follows: for each pair of relations A and B in the
legacy source schema, compare the values for each non-key
attribute combination X in B with the values of each candidate
key attribute combination Y in A (note that X and Y may be
single attributes). An inclusion dependency B.X&lt;&lt;A.Y may be
present if:
1.
2.
3.</p>
        <p>X and Y have same number of attributes.</p>
        <p>X and Y must have pair wise domain compatibility.
B.X ⊆ A.Y</p>
        <p>In order to check the subset criteria (3), we have designed
the following generalized SQL query templates, which are
instantiated for each pair of relations and attribute
combinations and run against the legacy source:
C1 =
SELECT count (*)
FROM R1
WHERE U NOT IN
(SELECT V</p>
        <p>FROM R2);</p>
        <p>C2 =
SELECT count (*)
FROM R2
WHERE V NOT IN
(SELECT U
FROM R1);</p>
        <p>If C1 is zero, we can deduce that there may exist an
inclusion dependency R1.U &lt;&lt; R2.V; likewise, if C2 is zero
there may exist an inclusion dependency R2.V &lt;&lt; R1.U. Note
that it is possible for both C1 and C2 to be zero. In that case,
we can conclude that the two sets of attributes U and V are
equal.</p>
        <p>The worst-case complexity of this exhaustive search,
given N tables and M attributes per table (NM total attributes),
is O(N2M2). However, we reduce the search space in those
cases where we can identify equi-join queries in the application
code (during semantic analysis). Each equi-join query allows us
to deduce the existence of one or more inclusion dependencies
in the underlying schema. In addition, using the results of the
corresponding count queries we can also determine the
“direction” of the dependencies. This allows us to limit our
exhaustive searching to only those relations not mentioned in
the extracted queries.</p>
        <p>Result: Inclusion dependencies are as follows:
1 MSP_Assignment[Task_uid,Proj_ID] &lt;&lt; MSP_Tasks [Task_uid,Proj_ID]
2 MSP_Assignment[Res_uid,Proj_ID] &lt;&lt; MSP_Resources[Res_uid,Proj_ID]
3 MSP_Availability [Res_uid,Proj_ID] &lt;&lt; MSP_Resources [Res_uid,Proj_ID]
4 MSP_Resources [Proj_ID] &lt;&lt; MSP_Project [Proj_ID]
5 MSP_Tasks [Proj_ID] &lt;&lt; MSP_Project [Proj_ID]
6 MSP_Assignment [Proj_ID] &lt;&lt; MSP_Project [Proj_ID]
7 MSP_Availability [Proj_ID] &lt;&lt; MSP_Project [Proj_ID]</p>
        <p>The last two inclusion dependencies are removed since
they are implicitly contained in the inclusion dependencies
listed in lines 2, 3 and 4 using the transitivity relationship.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Step 5. Classification of the Relations.</title>
        <p>When reverse-engineering a relational schema, it is important
to understand that due to the limited expressability of the
relational model, all real-world entities are represented as
relations irrespective of their types and role in the model. The
goal of this step is to identify the different “types” of relations,
some of which correspond to actual real-world entities while
others represent relationships among them.</p>
        <p>In this step all the relations in the database are classified
into one of four types – strong, regular, weak or specific.
Identifying different relations is done using the primary key
information obtained in step 2 and the inclusion dependencies
from step 4. Intuitively, a strong entity-relation represents a
real-world entity whose members can be identified exclusively
through its own properties. A weak entity-relation represents an
entity that has no properties of its own that can be used to
identify its members. In the relation model, the primary keys of
weak entity-relations usually contain primary key attributes
from other (strong) entity-relations. Both regular and specific
relations are relations that represent relationships between two
entities in the real world (rather then the entities themselves).
However, there are instances when not all of the entities
participating in an (n-ary) relationship are present in the
database schema (e.g., one or more of the relations were
deleted as part of the normal database schema evolution
process). While reverse engineering the database, we identify
such relationships as special relations.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Result:</title>
        <p>Strong Entities: MSP_Projects
Weak Entities: MSP_Resources, MSP_Tasks,</p>
        <p>MSP_Availability
Regular Relationship: MSP-Assignment</p>
      </sec>
      <sec id="sec-7-4">
        <title>Step 6. Classification of the Attributes.</title>
        <p>We classify attributes as (a) PK or FK (from DRE-1 or
DRE2), (b) Dangling or General, or (c) Non-Key (rest).</p>
        <p>Result: Table 3 illustrates attributes obtained from the example
legacy source.
Result: The following entities were classified:
Strong entities:</p>
        <p>MSP_Project with Proj_ID as its key.</p>
        <p>Weak entities:</p>
        <p>MSP_Tasks with Task_uid as key and
MSP_Project as its owner.</p>
        <p>MSP_Resources with Res_uid as key and
MSP_Project as its owner.</p>
        <p>MSP_Availability with Avail_uid as key and
MSP_Resources as owner.</p>
      </sec>
      <sec id="sec-7-5">
        <title>Step 8. Identify Relationship Types.</title>
        <p>The inclusion dependencies discovered in step 4 form the basis
for determining the relationship types among the entities
identified above. This is a two-step process:
1. Identify relationships present as relations in the relational
database. The relation types (regular and specific) obtained
from the classification of relations (Step 5) are converted
into relationships. The participating entity types are derived
from the inclusion dependencies. For completeness of the
extracted schema, we may decide to create a new entity
when conceptualizing a specific relation.</p>
        <p>The cardinality between the entities is M:N.
2. Identify relationships among the entity types (strong and
weak) that were not present as relations in the relational
database, via the following classification.
• IS-A relationships can be identified using the PKAs of
strong entity relations and the inclusion dependencies
among PKAs. The cardinality of the IS-A relationship
between the corresponding strong entities is 1:1.
• Dependent relationship: For each weak entity type, the
owner is determined by examining the inclusion
dependencies involving the corresponding weak
entityrelation. The cardinality of the dependent relationship
between the owner and the weak entity is 1:N.</p>
        <p>Aggregate relationships: If the foreign key in any of the
regular and specific relations refers to the PKA of one
of the strong entity relations, an aggregate relationship
is identified. The cardinality is either 1:1 or 1:N.
• Other binary relationships: Other binary relationships
are identified from the FKAs not used in identifying the
above relationships. If the foreign key contains unique
values, the cardinality is 1:1, else the cardinality is 1:N.</p>
      </sec>
      <sec id="sec-7-6">
        <title>Result:</title>
        <p>We discovered 1:N binary relationships between the following
weak entity types:
Between MSP_Project and MSP_Tasks
Between MSP_Project and MSP_Resources
Between MSP_Resources and MSP_Availabilty</p>
        <p>Since two inclusion dependencies involving
MSP_Assignment exist (i.e., between Task and
Assignment and between Resource and Assignment),
there is no need to define a new entity. Thus,
MSP_Assignment becomes an M:N relationship between
MSP_Tasks and MSP_Resources.</p>
        <p>At the end of Step 8, DRE has extracted the following
schema information from the legacy database:
•
•
•
•
•
•</p>
        <p>Names and classification of all entities and attributes.
Primary and foreign keys.</p>
        <p>Data types.</p>
        <p>Simple constraints (e.g., unique) and explicit assertions.
Relationships and their cardinalities.</p>
        <p>Business rules</p>
        <p>A conceptual overview of the extracted schema is
represented by the entity-relationship diagram shown in Figure
5 (business rules not shown), which is an accurate
representation of the information in encoded in the original MS
Project schema.</p>
        <p>Proj_ID
MSP_PROJECTS
1</p>
        <p>N</p>
        <p>Has
MSP_TASKS
Task_UID
1</p>
        <p>M</p>
        <p>Use
MSP_
ASSIGN</p>
        <p>N
N</p>
        <p>Res_UID</p>
        <p>Have
MSP_RESOURCES
MSP_AVAILABILITY</p>
        <p>Avail_UID
We have manually tested our approach for a number of
scenarios and domains (including construction, manufacturing
and health care) to validate our knowledge extraction algorithm
and to estimate how much user input is required. In addition,
we have also conducted experiments using nine different
database applications that were created by students during
course projects. The experimental results so far are
encouraging: the DRE algorithm was able to reverse engineer
all of the sample legacy sources encountered so far. When
coupled with semantic analysis, human input is reduced
compared to existing methods. Instead the user is presented
with clues and guidelines that lead to the augmentation of the
schema with additional semantic knowledge.</p>
        <p>The SEEK prototype is being extended using sample data
from a large building construction project on the University of
Florida campus in cooperation with the manager, Centex
Rooney Inc., and several subcontractors or suppliers. This data
testbed will support much more rigorous testing of the SEEK
toolkit. Other plans for the SEEK toolkit are:
• Develop a formal representation for the extracted
knowledge.
• Develop a matching tool capable of producing mappings
between two semantically related yet structurally different
schemas. Currently, schema matching is performed
manually, which is a tedious, error-prone, and expensive
process.
• Integrate SEEK with a wrapper development toolkit to
determine if the extracted knowledge is sufficiently rich
semantically to support compilation of legacy source
wrappers for our construction testbed.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This material is based upon work supported by the National
Science Foundation under grant numbers CMS-0075407 and
CMS-0122193. The authors also thank Dr. Raymond Issa for
his valuable comments and feedback on a draft of this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Chiang</surname>
          </string-name>
          , “
          <article-title>A knowledge-based system for performing reverse engineering of relational database,” Decision Support Systems</article-title>
          ,
          <volume>13</volume>
          , pp.
          <fpage>295</fpage>
          -
          <lpage>312</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. H. L.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Barron</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. C.</given-names>
            <surname>Storey</surname>
          </string-name>
          , “
          <article-title>Reverse engineering of relational databases: Extraction of an EER model from a relational database</article-title>
          ,
          <source>” Data and Knowledge Engineering</source>
          ,
          <volume>12</volume>
          :1, pp.
          <fpage>107</fpage>
          -
          <lpage>142</lpage>
          .,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmalz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shekar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Haldavnekar</surname>
          </string-name>
          , “
          <article-title>Knowledge Extraction in the SEEK Project</article-title>
          ,” University of Florida, Gainesville, FL
          <volume>32611</volume>
          -
          <issue>6120</issue>
          ,
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          TR-
          <volume>0214</volume>
          ,
          <year>June 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Horwitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Reps</surname>
          </string-name>
          , “
          <article-title>The use of program dependence graphs in software engineering</article-title>
          ,”
          <source>in Proceedings of the Fourteenth International Conference on Software Engineering</source>
          , Melbourne, Australia,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Microsoft</given-names>
            <surname>Corp</surname>
          </string-name>
          .,
          <source>“Microsoft Project 2000 Database Design Diagram”</source>
          , http://www.microsoft.com/office/project/prk/2 000/Download/VisioHTM/P9_dbd_frame.htm.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W. O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Issa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schmalz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geunes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. X.</given-names>
            <surname>Bai</surname>
          </string-name>
          , “SEEK:
          <string-name>
            <surname>Accomplishing Enterprise Information Integration Across Heterogeneous Sources</surname>
          </string-name>
          ,
          <source>” ITCON - Journal of Information Technology in Construction</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Prakash</surname>
          </string-name>
          , “
          <article-title>A Framework for Source Code Search Using Program Patterns</article-title>
          ,” Software Engineering,
          <volume>20</volume>
          :6, pp.
          <fpage>463</fpage>
          -
          <lpage>475</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>J.-M. Petit</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Toumani</surname>
            ,
            <given-names>J.-F.</given-names>
          </string-name>
          <string-name>
            <surname>Boulicaut</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kouloumdjian</surname>
          </string-name>
          , “Towards the Reverse Engineering of Denormalized Relational Databases,”
          <source>in Proceedings of the Twelfth International Conference on Data Engineering (ICDE)</source>
          , New Orleans, LA, pp.
          <fpage>218</fpage>
          -
          <lpage>227</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>