=Paper=
{{Paper
|id=Vol-55/paper-3
|storemode=property
|title=WebScripter: World-Wide Grass-roots OntologyTranslation via Implicit End-User Alignment
|pdfUrl=https://ceur-ws.org/Vol-55/frank.pdf
|volume=Vol-55
|dblpUrl=https://dblp.org/rec/conf/sww/FrankSNYL02
}}
==WebScripter: World-Wide Grass-roots OntologyTranslation via Implicit End-User Alignment==
WebScripter: World-Wide Grass-roots Ontology
Translation via Implicit End-User Alignment
Research Paper Category
Martin Frank, Pedro Szekely, Robert Neches, Baoshi Yan, Juan Lopez
Distributed Scalable Systems Division
Information Sciences Institute
University of Southern California
{frank,szekely,rneches,baoshi,juan}@isi.edu
ABSTRACT Keywords
Ontologies define hierarchies of classes and attributes; they Meta-data, DAML, RDF Schema, RDF, XML Schema
are meta-data: data about data. XML Schema and RDF
Schema are both (lightweight) ontology definition languages 1. INTRODUCTION
in that sense. In the “traditional” approach to ontology
engineering, experts add new data by carefully analyzing Imagine that you work for an emergency preparedness
others’ ontologies and fitting their new concepts into the agency and that you were just handed the job of construct-
existing hierarchy. In the emerging “Semantic Web” ap- ing and maintaining a list of public health experts employed
proach to ontology engineering, ordinary users may not look by U.S. universities.
at anyone’s ontology before creating theirs – instead, they Doing this manually on the (non-semantic) Web would be
may simply define a new local schema from scratch that ad- a monumental effort, both in terms of the initial effort and
dresses their immediate needs, without worrying how their in the continuous effort to keep the list up to date. The
data may some day integrate with others’. only options are to either do the job completely manually
This paper describes an approach and implemented sys- in a text file or spreadsheet (quickly outdated), or to write
tem for translating between the countless mini-ontologies wrapper software specific for each university’s Web pages
that the Semantic Web approach yields. In this approach, that extracts the experts (the wrappers constantly break as
ordinary users graphically align data from multiple sources universities change their Web page designs).
in a simple spreadsheet-like view without having to know Now let us presume that all universities list their personnel
anything about ontologies or even taxonomies. The result- in a Semantic Web [10] format, such as RDF Schema [1].
ing web of equivalency statements can then be mined to This improves on the current sitation (because you don’t
help other users find related ontologies and data, and to have to work instance by instance but rather concept by
automatically align the data with theirs. concept) but your job is still rather monumental because
the sources will likely use a myriad of different ontologies.
We have a vision and partial implementation addressing
Categories and Subject Descriptors this problem by (a) making it easy for individual users to
H.1.2 [Information Systems]: User/Machine Systems— graphically align the attributes of two separate externally-
Human information processing; H.3.3 [Information Sys- defined concepts, and (b) making it easy to re-use the align-
tems]: Information Search and Retrieval—Information fil- ment work of others.
tering, Relevance feedback ; H.3.5 [Information Systems]:
Online Information Services—Data sharing, Web-based ser- 2. OVERVIEW
vices
Figure 1 depicts a number of home pages, marked up with
DAML information about the authors, located somewhere
General Terms in the world. The DAML instances in these Web pages are
Collaborative filtering, recommender systems, social infor- organized according to one or more ontologies, such as an
mation filtering, ontology alignment, ontology translation ISI ontology of people, a Stanford ontology of people, and a
Karlsruhe ontology of people. The challenge is to produce
a report incorporating all of that information with minimal
effort.
Permission to make digital or hard copies of all or part of this work for At a high level, the WebScripter concept is that users ex-
personal or classroom use is granted without fee provided that copies are tract content from apparently ordinary Web pages and paste
not made or distributed for profit or commercial advantage and that copies that content into what looks like an ordinary spreadsheet
bear this notice and the full citation on the first page. To copy otherwise, to (lower left corner of Figure 1).
republish, to post on servers or to redistribute to lists, requires prior specific What users implicitly do in WebScripter - without ex-
permission by the authors.
Semantic Web Workshop 2002 Hawaii, USA pending extra effort - is to build up an articulation ontology
Copyright by the authors. containing equivalency statements. For example, this artic-
Stanford
ISI
Karlsruhe
Figure 2: Visionary Example: The user types as-
yet-unrecognized example values.
WebScripter
fullname = has-name
Mitarbeiter = Person = Member
... ...
Figure 1: Working With Multiple Data Sources In
Multiple Ontologies. Figure 3: Visionary Example: The system deter-
mines data sources and a classification.
ulation ontology expresses that the attribute that ISI calls
“fullname” is the same as the one Stanford calls “has-name”; ful to our reasoning here). Yahoo:UniversitiesAndColleges
and that the object Karlsruhe calls “Mitarbeiter” Stanford and Lycos:Universities both apply. The universities now
calls “Person” and ISI calls “Member” are the same for the appear underlined because they are recognized by the sys-
purposes of this report (lower right corner of Figure 1). tem - double-clicking on them brings up their web pages.
We believe that in the long run, this articulation ontology The system then fetches their DAML-enabled Web pages
will be more valuable than the data the users happened to in the background, and computes a minimal covering set
obtain when they constructed the original report. Its equiv- of declared DAML IS-A types that cover all the univer-
alency information reduces the amount of work future Web- sities. In our example, all of the current universities de-
Scripter users have to perform when working in the same clare to be instances of the World-Wide Web Consortium’s
domain.1 Thus, in some sense, you don’t just use the Se- W3C:University concept (Figure 3).
mantic Web when you use WebScripter, you help build it as The user now selects “find more” from the menu bar. The
you go along. system will fetch every entity that the two known indices
point to (several hundred). It simultaneously performs a
3. VISIONARY EXAMPLE different type of analysis: Which are the RDF(S) subclass-
of types that are declared by more than 10% of the entities
This section presents a detailed step-by-step vision of what (result: U.N.:University, UsPostalService:Recipient, W3C:
WebScripter (and a future Semantic Web) could be; we will University, and IRS:NonProfitInstitution) [this is a recall
later present a step-by-step example of what our current im- test]? Of these, which apply to less than 1% of nearby cate-
plementation can already do (with existing RDF(S) data on gories of the same index (remaining result: W3C:University
the Web produced by others). In this example, the applica- and U.N.:University) [this is a precision test]. The latter
tion is to quickly produce a self-updating list of faculty at one is now automatically treated as an alternatively valid
U.S. universities that are public health experts, listing their type, and WebScripter will include every entity declaring
specialization. The user starts WebScripter and types the to be one of these types, thereby finding institutions not
names of several universities into the first column. At the yet listed by the well-known indices. Note that there are
point shown in Figure 2, truly nothing is known about these no duplicate universities in this column (such as “UCLA”
hand-typed values. and “UC Los Angeles”). The challenge, of course, is to be
After the user selects “classify” from a menu, WebScripter able to determine that they are “different”, as they sub-
uses a list of well-known indices to find an existing taxonomy scribe to different DAML ontologies. One possiblity is that
that matches all of the typed phrases (note that commer- any Semantic Web description of an entity existing on the
cial search engines do not have to be DAMLized to be use- Web contain its normalized HTTP URL in a standardized
1
Who benefits depends on your willingness to share that attribute, which can serve as a simple unique id for com-
information, of course - it could be the world, your organi- parisons across ontologies (first choice for disambiguation
zation, your workgroup, or just yourself. in our example). Another possibility is that they contain
Figure 4: Visionary Example: The system auto- Figure 5: Visionary Example: Combining HTML
completes the user-provided values. navigation with embedded DAML semantics.
ing countries in the second column are now filled in. The
(possibly composite) keys that point into popular external user selects a United States cell in the second column and
ontologies, for the same reason (“companies in this ontology invokes “filter by” from the right-click menu, checks “United
are uniquely identified by their UsTreas:IRS:TaxPayerId”), States”, and clicks OK, which removes Oxford and all other
(“universities are identical if they point to the same Us- foreign-university rows. Performing a number of substan-
PostalService:UsStreetAddress”). This gets the user to the tially similar steps, the user can navigate to the universities’
state of Figure 4. chemistry, biology, and medical departments, from the de-
In this vision of a future Semantic Web, the user has to partment to the faculty, from the faculty to their research
know little to get a lot of leverage out of the existing seman- interests, and filter by a particular research area, resulting
tic information: (1) The user did not have to do anything in the table shown in Figure 6. (As before, bold entries were
but type out some university names that came to mind – he provided or demonstrated by the user; but we are no longer
or she didn’t have to understand an ontological query lan- underlining recognized cells below for readability).
guage or the notion of an ontology or even a taxonomy for In the end, what users want is a report containing the
that matter - yet the result is perfectly ontologically typed. information that serves their immediate needs. In our ap-
(2) Very little DAML has to be in place for this to work: for proach, users build a report in steps, by manipulating the
this particular example, two external DAML ontologies of data it contains so far to refine it and to add more. This
existing non-DAML university web sites should be sufficient is a qualitatively easier task than working with a query,
for the inferencing of this example. (3) Data from two dif- which is an inherently more abstract specification. In our
ferent ontologies can be seamlessly integrated without the approach, a final report may contain dozens or hundreds
need for pre-merging/translation between the ontologies. of single-step scripts that operate on DAML markup. The
In this example, the user now demonstrates that she wants equivalent query could be enormously complicated (perhaps
to extract the nationality of the universities, in the following several pages long), but users never have to see it with this
manner (Figure 5). She double-clicks on USC, which brings approach.
up a Web browser to the (hypothetically) DAML-enabled Now that this hypothetical WebScripter report is defined,
USC home page. The user then clicks on “Maps & Direc- its data can be refreshed at any time, and it itself can be-
tions”, and copies and pastes “United States” from that come the source for further Web scripting as it carries all its
page, which is not just plain text but carries its embedded DAML within the generated HTML report.
DAML type.2
In response, the system now fills in all those cells that use
the same underlying W3C university definition, by inferring 4. IMPLEMENTED EXAMPLE
the ontological path from university to country and applying In our initial implementation we have focused on mak-
it to all other instances of this ontology. In our particular ing it easy for ordinary (non-programming, never heard of
case, the user is best served by now doing the same for the ontologies) users to contruct reports from multi-ontology
UN-based university entry “Stanford” (not shown) because DAML data. This section first describes a step-by-step
there are only two ontologies involved.3 As a result, all miss- walkthrough of using WebScripter as implemented to com-
bine DAML personnel data from different organizations on
2
Note that one would not have to internally instrument a the Web. It then describes how the resulting implicit on-
Web browser to achieve this level of integration – one could tology alignment data benefits other users in constructing
know which page the user is looking at through a proxy similar reports.
Web server and receive the copied HTML+DAML out of
the window system’s paste buffer. 4.1 Constructing a first report from scratch
3
If there is more than two WebScripter could attempt to Imagine that you work for the government DAML pro-
produce a generalized “fuzzy” script that will work for all
remaining university ontologies given two (or more) exam- gram office, and that your job is to maintain a list of per-
ples (“extract the attribute whose name contains Country sonnel funded by that program, and let’s assume that all of
or Nation in the top-level concept or in a sub-concept called the contractors provide their personnel data in some DAML
Location or Address”). format. The first task is to find the URLs where the vari-
Figure 6: Visionary Example: The end result in this fictious example.
ous DAML resides. BBN’s crawled ontology library comes
closest to a Yahoo-style portal for DAML content [2]. This
site contains a registry for DAML content root files, which
a crawler uses as starting points to find more DAML files.
Teknowledge built a DAML search engine for that ontol-
ogy library which is a good starting place for finding DAML
content [3].
In this example, the DAML sources can be found by query-
ing the Teknowledge search engine for the terms “Person”,
“Employee”, and “Staff” (which will return a large number
of hits of non-DAML contractors), or alternatively it can be
found by collecting the regular project Web pages and per- Figure 7: Implemented Example: Initial report of
sonal home pages of the DAML contractors (because they Stanford KSL personnel.
embed DAML content inside the HTML pages).
For the sake of this example, we started WebScripter and
loaded DAML from just the Stanford Database and Knowl-
edge Systems groups by copying and pasting the URLs of
their DAML pages into WebScripter’s “Add DAML” dia-
log box. WebScripter then displays the class hierarchy of
that DAML, intermixing the concepts from the two sepa-
rate ontologies. The user can browse the content by select-
ing classes, which displays all of their (local and inherited)
attributes as columns, and their data instances as rows.
In this example, we started a new report by (1) choos-
ing “New Report” from a menu, (2) selecting Person in
the class hierarchy, and (3) selecting three columns of Per-
son to include in the report. The latter is done by se- Figure 8: Implemented Example: Adding and align-
lecting a cell in the data display for Person and choos- ing Stanford Database personnel.
ing “Add as new column” from the right-click menu, once
each for the Has-Full-Name, Has-Phone-Number, and Has-
Email-Address columns. The resulting WebScripter display different sources [4]. The largest such report we have gener-
is shown in Figure 7. (Note that the first of the four columns, ated is 3.3MB, taking 8.7MB of DAML input, and running
the DAML instance identifier column, was automatically in- for about 45 seconds.
serted when the first column was added to the report. The The Web page embeds the WebScripter report definition,
column is hidden from the generated report Web page by thus it can be re-run at any time and will then possibly show
default.) more people (presuming their DAMLized Web pages can be
In this example, we will now add and align data from a found by following just one link from the two group Web
different research group using a different ontology. This is pages, and presuming their DAML instance data follows one
done by (1) selecting PhDStudent in the class hierarchy to of the two ontologies).
display its instance data, (2) selecting a cell in the “name” There are a large number of WebScripter features that we
column of that instance data and choosing “Add to column will not discuss here – such as un-loading DAML sources,
1” from the right click menu, and (3) repeating the second deleting columns, re-arranging columns, filtering rows, and
step for the “phone” and “email” columns. Figure 8 shows sorting by multiple criteria, and so on – because they are
the combined data from the two groups. what you would expect from any DAML report generator.
This in a nutshell is how WebScripter looks to the users. Instead, we’ll focus on the generated DAML equivalency
This report can then be published in various formats, includ- statements shown in Table 1.
ing as a plain Web page that color-codes its content based These statements can be automatically published on a
on where it came from; Figure 9 shows a snapshot of a large Web site and registered as a new DAML content root in
DAML personnel report that loads data from more than 30 BBN’s DAML content library. Consequently, you can then
Figure 9: Snapshot of (a fragment of ) a large-size
WebScripter DAML people report.
Table 1: Implemented Example: Resulting DAML
equivalency statements.
there is not that much interesting, continuously updated
RDF(S), much less DAML, available on the Web today.4
make use of the equivalency statements by selecting the “Ex- What made the original Web take off was that there was
tended with Equivalence” option in Teknowledge’s DAML an immediate incentive for producers to use the technology
search engine (note that it can take up to 24 hours for the because it was an easy way to publish information. We cur-
statements to make it into BBN’s cache and then up to rently see no strong motivation for producers to put work
another week from there into Teknowledge’s search engine into putting out RDF(S) in addition to their regular HTML
cache). Concretely, if you for example now query for all pages, but there is at least a compelling intra-organizational
instances of person (“?x type Person”) in the first ontol- benefit in using RDF(S) and WebScripter to generate regu-
ogy in that fashion you will now also retrieve PhDStudent lar HTML pages by pulling RDF from various pages within
instances from the second ontology. the organization.
To be more concrete, once a DAML-enabled document is
4.2 Constructing a second report using the published on the Web, WebScripter makes it easy to access
alignment data and republish portions of it as part of a larger report – an
We have also implemented an intial use of the WebScripter- effort savings for federated information providers who cur-
generated equivalency statements in WebScripter itself: if rently need to maintain the same information in multiple
you start it with the insert-equivalents flag it will automati- places. For example, professors routinely publish a list of
cally add and align any classes that it has sameClassAs and their publications on their home page. Departments pub-
sameInstanceAs data for. It reads these equivalency state- lish a list of all publications, and project pages publish a
ments from a fixed location on our Web site to which you list of project-related publications from the project mem-
can contribute more via the “Easy Publish” menu in Web- bers. Today, someone has to manually construct these pages
Scripter. (presuming these federated organizations are not so tightly
Let’s assume that a second user comes along later whose integrated that they maintain a shared database or other
job it is to maintain a list of researchers with Semantic Web common structured information source, of course). When
expertise, plus their email addresses and home pages. She an author publishes a new paper or makes a correction on
starts WebScripter in the same way as above, selects for ex- an existing one, he or she has to either manually update the
ample PhDStudent and adds “name” as the first column in other pages, or coordinate with the appropriate people to
her report. At that point, WebScripter will not only add have all the other lists updated. WebScripter can eliminate
all instances of Person, but also automatically align their the additional work, authors only need to mark up their
names into the column. Similarly, when then selecting the personal paper publication with DAML, and the reports
email address for either Person or PhDStudent and saying for the department and project-specific pages will automat-
“Add as new column” WebScripter will fill in the email ad- ically pick up the new publication (e.g. every night). Web-
dresses for the other ontology as well. This will not happen Scripter eliminates overhead not only for the organization,
after she adds Has-Home-Page as a new column (as there is but also for the individual producing the information, who
no existing equivalency data) so that she has to manually no longer needs to coordinate the redistribution effort. Web-
select homepage and say “Add to column”. (However, if she Scripter can also enhance the flexibility and value of Web
is willing to share her alignment data via the “Easy Pub- sites with large amounts of information by publishing skele-
lish” option future users do not have to align this column ton WebScripter reports that visitors can refine to obtain
by hand either.) customized reports. Thus, we are cautiously optimistic that
WebScripter may help with the adoption of RDF(S)/DAML
5. THOUGHTS ON INCENTIVIZING PRO- on the producer side as well.
DUCERS 4
The notable exception are headline exchange files such as
As of the time of writing, one issue we encountered is that slashdot.org/slashdot.rdf.
Class Hops Origin Author Rows Date Users
6. THOUGHTS ON END-USER CONTROL Person 1 stanford.e... Smith 235 10/6/02 12
Employee 1 stanford.e... Smith 57 10/6/02 6
OVER AUTO-ALIGNMENT Staff 1 stanford.e... Smith 697 10/6/02 0
Member 2 www.isi.e... Chen 15 3/4/01 17
You can currently run WebScripter either in an “ignore all Person 2 cmu.edu/... Miller 973 12/7/01 4
Member 2 cmu.edu/... Miller 107 12/7/01 9
equivalencies” mode or in an “auto-insert all known equiv-
alencies” mode, neither of which is ideal of course. In par-
ticular, the latter may quickly become impractical if a large Table 2: Sketch of a graphical user interface.
number of people share alignment data, even if they are
not ill-intentioned. This is either because they made a hon-
est mistake (they aligned homepages from one ontology with states that the class the user just added by hand is
email addresses from another and did not notice) or because the same as the class shown, 2 or more if the equiv-
they had a different type of equivalency in mind when they alence was inferred by transitive closure. The third
authored their report (graduate research assistants are the column contains the Uniform Resource Locator for the
same as machines in the sense that they cost the project equivalency file. The fourth column shows the name
money to support, but that may then cause machines to of the author of the WebScripter report that implied
auto-appear in a report of someone else trying to author the equivalencies. The fifth column contains the num-
a personnel list). We see the following potential solutions ber of additional rows inserted into the user’s report
(which are not mutually exclusive). if she would incorporate the equivalency. The sixth
column indicates when the report that resulted in the
• Centralized Human Editors. One possiblity is for an equivalency statements was authored. The last col-
organization to appoint an “alignment czar”. The job umn sums up how many other users already made use
of such a czar would be to periodically validate the of the equivalency statement in their reports.
equivalency data contributed by organization members
into a staging area. If approved, equivalency files are
then moved to that organization’s official equivalency 7. THOUGHTS ON OTHER OPEN QUES-
data area. Cautious organization members can then
exclusively make use of the approved equivalency data TIONS
while adventurous ones are free to use staging data or Addressing a number of other issues would also help in
external data. Obviously, any use of explicit human making DAML and WebScripter use take off.
effort is associated with costs; however, one attraction
of this model is that the “alignment czar” does not • How do ordinary users find good original Semantic
nearly need the technical sophistication of an “ontol- Web content? WebScripter does not address this prob-
ogy librarian” and can possibly be a clerical worker lem: once you found one it can point you to related
given a specialized graphical application. content that others may have by using an equivalency-
aware DAML search engine such as Teknowledge’s DAML
• Social Filtering. Another approach would be to keep Semantic Search Service [3]. There are no Yahoo-style
track of the authors of equivalency statements as well portals for DAML content yet to our knowledge. There
as the users of equivalency statements (neither of which are, however at least two RDF crawlers – one from
we currently do); this would enable users to say “I BBN [2] and one from the University of Karlsruhe [5]
want to use the same equivalency data that Jim and – that could help in building such a portal.
Chris are using” (this is a nicely implicit way to limit
equivalencies to e.g. the accounting context if they are • What does it really mean for two classes or two at-
co-workers in accounting, without having to more for- tributes to be “the same”? The current DAML equiv-
mally define the context, which is a more abstract and alance statements allow users to say that x is equiva-
difficult task). This would also allow cautious users to lent to y. We likely need a replacement construct that
express “I am willing to use any DAML equivalency allows users to express that x is equivalent to y in the
file that at least 10 others are using” (which addresses sense of (or context of) z. We will try to influence the
the erroneous-alignment problem but not the context DAML language definition in that direction (but ad-
mismatch problem). mittedly aren’t quite sure ourselves how to model z).
The most difficult problem we see is in the end-user
• Fine-Grained Control in the User Interface. Finally, it interface for stating these more complex equivalencies.
would be nice to have a compact display of the avail-
able equivalency information. This display would show 8. RELATED WORK
a row of information about the available equivalency WebScripter’s approach to ontology alignment is extreme:
information and give the user a checkbox for incorpo- terms from different ontologies are always assumed to mean
rating or ignoring each. Table 2 sketches a preliminary different things by default, and all ontology mapping is done
design for deciding which sameClassAs statements to by humans (implicitly, by putting them into the same col-
use. (This sketch assumes that we store much more umn of a report).
fine-grained information in the equivalency files than This is similar in spirit to Gio Wiederhold’s mediation ap-
we currently do.) proach to ontology interoperation [18], which also assumes
The first column shows the human-given label of the that terms from different ontologies never mean the same
class that is being declared as equivalent to the one the thing unless committees of integration experts say they are.
user added by hand. The second column indicates the WebScripter pushes that concept to the brink by replacing
level of indirection - 1 if the equivalency file directly the experts with ordinary users that may not even be aware
of their implicit ontology alignment contributions. (Note, 10. ACKNOWLEDGMENTS
however, that we cannot yet proof that this collective align- We gratefully acknowledge DARPA DAML program fund-
ment data is indeed a useful source for automatic ontology ing for WebScripter under contract number F30602-00-2-
alignment on an Internet scale – we lack sufficient data from 0576. The first author would also like to acknowledge AFOSR
distributed WebScripter use to make that claim.) funding under grant number F49620-01-1-0341.
The ONION system [15] takes a semi-automated approach
to ontology interoperation: the system guesses likely matches
between terms of two separately conceived ontologies, a hu-
11. REFERENCES
man expert knowledgeable about the semantics of both on- [1] http://www.w3.org/TR/2000/CR-rdf-schema-
tologies then verifies the inferences, using a graphical user 20000327/.
interface. ONION’s guessing analyzes the schema informa- [2] http://www.daml.org/crawler/.
tion using relationships with semantics known to the sys- [3] http://reliant.teknowledge.com/DAML.
tem in advance (subclass-of, part-of, attribute-of, instance- [4] http://www.isi.edu/webscripter/daml-
of, value-of); in WebScripter human users rely purely on personnel.gen.html.
the data instances to decide what collates and what doesn’t [5] http://ontobroker.semanticweb.org/rdfcrawl.
(because they are just not expert enough to analyze the [6] http://www.isi.edu/divisions/div2/. Click on People.
abstractions). That being said, incorporating ONION-style [7] http://tools.semanticweb.org.
alignment guessing into WebScripter would clearly be ben-
[8] http://www.isi.edu/webscripter.
eficial presuming the rate of correct guesses is sufficiently
[9] Y. Arens, C. Knoblock, and W.-M. Shen. Query
high.
reformulation for dynamic information integration.
OBSERVER [14], SIMS [9], TSIMMIS [11] and the Infor-
Intelligent Information Systems, 6(2-3):99–130, 1996.
mation Manifold [13] are all systems for querying multiple
data sources of different schemata in a uniform way; how- [10] T. Berners-Lee, J. Hendler, and O. Lassila. The
ever, they all rely on human experts to devise the ontolog- semantic web. Scientific American, May 2001.
ical mappings between the sources to our knowledge. This [11] H. Garcia-Molina, Y. Papakonstantinou, D. Quass,
is because they mediate between structured dynamic data A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and
sources (such as SQL/ODBC sources) without run-time hu- J. Widom. The TSIMMIS approach to mediation:
man involvement where a higher level of precision is required data models and languages. Intelligent Information
to make the interoperation work. In contrast, WebScripter Systems, 8(2):117–32, 1997.
is targeted towards mediating between different ontologies [12] E. Hovy. Combining and standardizing large-scale,
in static RDF-based Web pages with run-time human in- practical ontologies for machine translation and other
volvement, where the need for precision in the translation is uses. In Proceedings of the First International
naturally lower. Conference on Language Resources and Evaluation
(LREC), 1998.
[13] A. Levy, D. Srivastava, and T. Kirk. Data model and
9. EVALUATION AND CONCLUSIONS query evaluation in global information systems.
Intelligent Information Systems, 5(2):121–43, 1995.
WebScripter has turned out to be a valuable practical tool
[14] E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth.
even for the simple single-ontology case where there is only
OBSERVER: an approach for query processing in
one schema but the instance data is distributed over many
global information systems based on interoperation
Web pages. For example, the Distributed Scalable Systems
across pre-existing ontologies. Distributed and Parallel
Division at ISI automatically pulls together its people page
Databases, 8(2):223–71, 2000.
from many different DAMLized Web pages: some informa-
tion is maintained by individuals themselves (such as their [15] P. Mitra and G. Wiederhold. An algebra for semantic
research interests), other information is maintained by the interoperability of information sources. In 2nd Annual
division director (such as project assignments), and some in- IEEE International Symposium on Bioinformatics and
formation is maintained at the institute level (such as office Bioengineering, pages 174–82, Bethesda, MD, USA,
assignments); this relieved the administrative assistant from November 4-6 2001.
manually maintaining everyone’s interests [6]. WebScripter [16] P. Mitra, G. Wiederhold, and M. Kersten. A
has also been used externally, for example to maintain a Se- graph-oriented model for articulation of ontology
mantic Web tools list [7]. You can download WebScripter interdependencies. In Advances in Database
from [8]. Technology - EDBT 2000. 7th International
However, the most exciting application of WebScripter, as Conference on Extending Database Technology,
a world-wide collaborative ontology translation tool, is con- Lecture Notes in Computer Science, pages 86–100,
fined to experimental use by ourselves at this point. This is Konstanz, Germany, March 27-31 2000.
more due to a lack of widespread interesting RDF(S) content [17] N. F. Noy and M. A. Musen. PROMPT: Algorithm
than it is due to any limitation of WebScripter itself. Nev- and tool for automated ontology merging and
ertheless, we are excited about this new approach to global alignment. In 17th National Conference on AI, 2000.
knowledge sharing, may it be achieved by a future version [18] G. Wiederhold. Interoperation, mediation, and
of WebScripter or a similar tool or tools. The key difference ontologies. In International Symposium on Fifth
we see between “traditional” ontology translation and our Generation Computer Systems, Workshop on
approach is that non-experts perform all of the translation Heterogeneous Cooperative Knowledge-Bases,
- but potentially on a global scale, leveraging each others’ volume W3, pages 33–48. ICOT, Tokyo, Japan,
work. December 1994.