<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <fpage>78</fpage>
      <lpage>87</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>a class room experiment. Section 4 shows how the structural query approach
ponents of the framework and their application in the domain of ER models
models are abstract enough to deliver information about the stored data
presents a realization of this approach based on a framework for general model
trol of retrieval behavior is explained in section 5. Before we summarize and
nal ER models of Deep Web information systems and provides a front end
generation of search engines. A keyword description of script pages (the
enconclude this paper, section 6 gives an overview of other approaches to search
the information provided by keywords relies on the relations between them
try points to the Deep Web) can only describe very vague the Deep Web
an information of the Deep Web and therefore not available for the index
It is interesting to see that the structural information (lost when a user’s
question is transformed to a set of keywords) is actually used to manage the
information of the Deep Web: Most of the sites oering dynamic W eb pages
in the next section. Section 3 evaluates the structural retrieval approach in
models (ER models [4]) accurately describe the content and relations. ER
without using the data itself. Imagine a search engine that indexes the
internot deliver a link to a travel agency, because "Frankfurt" and "Hawaii" is
and are not expressed. A search engine for Deep Web information should use
to formulate queries in a structural manner like ER modelling. This paper
retrieval [12]. The paper is structured as follows: We briey describe the
comcan be combined with the necessary keyword-based retrieval. Dynamic
conkeywords like "igh t", "from", "Frankfurt", "to", "Hawaii" would probably
this structural information, too.
information and the real content of a Web site. Additionally the main part of
for Deep Web information.
is used to transmit user input to the Web site. Search engines feeded with
store their data within relational databases for which Entity-Relationship</p>
      <p>Transaction</p>
      <p>(1,1)
performs</p>
      <p>(0,N)</p>
      <p>Account (1,1) has (0,N) Customer
of a one-to-one mapping between ER models and SSMs following the rules
As we can assume w.l.o.g. that there are no isolated nodes in a SSM, this is
in a relational database and provides algorithms to specify the similarity
bethe mappings we implemented graphical user interfaces used to state queries
Figure 1 shows a simple ER model and its corresponding SSM. The mapping
tween SSMs. This similarity exploits the adjacency structure of SSM graphs.
nodes of the same type. The number q expresses the number of edges in G
and to visualize the stored SSMs as natural ER models.</p>
      <p>Mappings are always one to one. The matching quality is a rational number
( (v); (w)) 2 The matching quality realized by the mapping is dened E0.
entities and dependencies between entities and relationships become nodes
rules enforce that SSMs always decompose into two bipartite graphs. Apart of
0
by the maximum over all matching qualities of mappings from G onto G .</p>
      <p>Given two graphs G = (V; E) and = (V we look at partial mappings G0 0; E0),
between 0 and 1 with a value of 1 indicating that there exists a mapping
beon the second layer with types derived from the cardinalities of the relations.
the case if and only if both graphs are isomorphic.
that are implicitly mapped onto edges in i.e. edges (v; w) 2 E such that G0,
layer. Relationships become nodes of the third layer and the relations between
of [12]: Dieren t kind of entities become dieren t kind of nodes of the rst
tween the library graph and the query graph that matches exactly all edges.
of the nodes from G onto the nodes of which only map nodes onto G0
We extended the framework to the eld of ER models by the denition
The server part of Firestorm remains merely unchanged. It stores SSMs
as d( ) = 2q=(jEj + The similarity between two graphs is then dened jE0j).</p>
      <p>Relationship Relationship
performs has
polynomial for a xed n umber of nodes on layer 2, and a reasonable algorithm
the exact algorithm has to be applied. Further details of the algorithms are
solves for each of them the matching problems on layers 1 and 3 to optimality
is N P -hard. But the three-layer structure of SSMs supports the design of
heuristic and exact retrieval algorithms as well as a fast lter. The exact
alThe authors of [11] showed that nding a mapping of optimal quality
by using algorithms for weighted bipartite matching. The exact algorithm is
up the retrieval time tremendously by reducing the set of graphs to which
for small numbers of nodes on layer 2. Filter and heuristic algorithms speed
explained in [11].
gorithm enumerates all feasible mappings of nodes from the second layer and
proof is out of sight.
of corresponding ER. Modelling is an art and one may argue that e.g. the
evaluated the solutions of the students and assigned credit points according
models. The restrictiveness in structure and available means to create ER
made it possible to evaluate whether high similarity on the SSMs (respectively
positive answer would promise to apply the structural retrieval approach for
models somehow limits the modelling freedom but a mathematical quality
ilarities with maximal credit points.
similarity between dieren t ER models to the graph similarity between their
ships so that we can concentrate only on the structure. Some sta mem bers
main of ER models. This means that the system can report the similarity
models for three dieren t real world cases. These cases are formulated as
natural language text. The text implies the naming of entities and
relationFirestorm is used by students to create and manage their ER models.</p>
      <p>University of Tubingen (Germany) is on database and information systems.
same real world situation can be modelled in very dieren t ways leading to
These master models were used as queries to the system with the
corbetween two dieren t ER models. The focus of our working group at the
responding student models building the library. The system computed the
similarities were converted into credit points by simply multiplying the
simto how well the ER models represent the real world cases. For each exercise a
dieren t SSMs. The structural retrieval approach reduces the detection of
This section will show how well the structures of SSMs capture the semantics
sta mem ber created a master model representing the expected solution. This
Part of a database course is to learn how to create ER models that describe
SSMs but the retrieval quality depends on semantic similarity between ER
semantics of the modelled real world cases, expressed by a high grade. A
the domain of ER models.</p>
      <p>As explained in the previous section Firestorm was extended to the
do(graph) similarity between each student model and the master model. The
The course contains three exercises that require students to design ER
an aspect of the real world to be managed with a database. In this course
the corresponding ER models) reects a certain similarity according to the
models (as long as the context is clear).
the sta mem bers to the credit points assigned by Firestorm. The dierences
between SSMs can be used to determine semantic similarity between ER
in credit point assignment are marginal so it seems that graph similarity
The results are encouraging. Figure 3 relates the credit points assigned by
a users chooses the third tree level for the second layer the algorithms can
and specializes the types along the tree paths. Figure 4 shows the subtree of
cardinality distinction ((0,1),(1,1),(0,N),(1,N) and (N,M)).
connections that represent two dieren t kinds of connection cardinalities in
root type the second level denes the t ypes of all nodes on the second layer.
words which subtypes can be matched to each other. This means e.g. that if
should be used by the algorithms for the similarity computation, in other
Assume that the SSM of the query model and the SSMs of all library
dieren t types. We dene a t ype tree that starts with a generic root type and
the second layer SSM node types for the domain of ER models. Following the
The control functionality relies on a type tree of SSM nodes. As we know
(min; max) notation. The fourth level of the tree captures the most specic
models only contain nodes of the most specic t ype tree levels. Before a user
triggers a retrieval action the user species for each layer which tree level
from section 2 the elements of ER models are mapped onto SSM nodes of
third level of the tree distinguishes between (*,N) connections and (*,1)
These nodes represent connections between entities and relationships. The
in databases exist nearly as long as the Web exists. There are a number of
interface for simple text or boolean queries. The users can adjust the search
contains the addresses of about 103000 databases organized in a directory
structural dependencies between dieren t information units captures much
The LII [9] is a free search engine with an annotated, searchable subject
Also from BrightPlanet is the free search engine CompletePlanet [2]. It
and boolean operators are the means to state queries.</p>
      <p>Searching for information that is hidden in the Deep Web is not a new
search engines that index those sites and query them according to some user
Neither of the presented search engines use the database structure of
retrieval approach.
ple text input.
resemble much the other mentioned directory-based search engines.
strategies and result presentation to its own preferences.</p>
      <p>Query interfaces require simple text and boolean queries. They are easier to
structure with categories and subcategories. The search interface allows
sim[3] from BrightPlanet. It searches among 2200 databases and provides a query
input. We look at some of those engines and relate them to the structural
directory of about 9000 Web resources. The resources are selected and
evalumore semantics and therefore allows a more precise search.
task. Sites providing information generated by scripts out of data stored
ated by librarians for their usefulness to users of public libraries. Simple text
One of the most popular commercial Deep Web search engines is LexiBot
Deep Web information systems. They describe the content of those systems
use than our query interface that is based on ER models. But especially the
The search engines of IntelliSeek (ProFusion [8] and InvisibleWeb [7])
with keywords and organize them (by hand or automated) within a directory.
6. A. M. Georion. An in troduction to structured modeling. Management Science,
33(5):547{588, 1987.
5. S. Decker. The Semantic Web Community Portal, 2001.
complex models. SIGMOD Record, 29(4):55{63, 2000.</p>
      <p>Proceedings of the 1th Conference on Very Large Databases, Morgan Kaufman
3. BrightPlanet. LexiBot, 2001. http://www.lexibot.com/.
2. BrightPlanet. CompletePlanet, 2000. http://www.completeplanet.com/.
www.semanticweb.org/.
1. P. A. Bernstein, A. Y. Halevy, and R. Pottinger. A vision of management of
4. P. P. S. Chen. The entity-relationship model | toward a unied view of data.
pubs. (Los Altos CA), Kerr (ed), pp.173, 1975.
unfamiliar with ER modelling but it is a standardized and powerful way
among the individual elements. The management of RDF denitions can
according to recall and precision.
dynamic control of retrieval behavior based on type trees provides the users
Inuenced by the special requirements of the Web we describe a technique
to combine structural and keyword-based retrieval. A functionality for the
with means to adjust the system with respect to the individual preferences
our system to provide Web users a promising way to search for information
benet a lot b y an RDF extension of Firestorm.
to express dependencies between information objects. The simple graphical
retrieval approach has yet to prove its applicability under the real world
information. The semantic description is mostly done following the principles
user interface should allow users with basic knowledge to specify their own
to describe the desired information in structural correlation. Many users are
up many applications that require tools to handle semantic descriptions of
the general model retrieval framework Firestorm to the domain of ER models.
sites need to register with the system. Most of the target sites store their data
Users have to state their queries to the system by specifying ER models
that provide information contained in the Deep Web. Therefore we extend
ER models.
conditions of the Web. Therefore we hope that many Web sites register to
within relational databases. If they do not have ER models that describe their
of RDF [15]. RDF denitions resemble graphs that show the dependencies
Another drawback of the approach is the absence of ER models that Web
Apart of the sandbox evaluation presented in section 3 the structural
of the Deep Web.
databases a tool could probably create a model out of the database schema.</p>
      <p>Looking into the future the ongoing hype of the Semantic Web [5] brings
This paper describes a structural retrieval approach to search for Web sites</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>