=Paper=
{{Paper
|id=Vol-55/paper-3
|storemode=property
|title=WebScripter: World-Wide Grass-roots OntologyTranslation via Implicit End-User Alignment
|pdfUrl=https://ceur-ws.org/Vol-55/frank.pdf
|volume=Vol-55
|dblpUrl=https://dblp.org/rec/conf/sww/FrankSNYL02
}}
==WebScripter: World-Wide Grass-roots OntologyTranslation via Implicit End-User Alignment==
<pdf width="1500px">https://ceur-ws.org/Vol-55/frank.pdf</pdf>
<pre>
                WebScripter: World-Wide Grass-roots Ontology
                 Translation via Implicit End-User Alignment
                                                          Research Paper Category
                      Martin Frank, Pedro Szekely, Robert Neches, Baoshi Yan, Juan Lopez
                                                         Distributed Scalable Systems Division
                                                             Information Sciences Institute
                                                            University of Southern California
                                             {frank,szekely,rneches,baoshi,juan}@isi.edu


ABSTRACT                                                                             Keywords
Ontologies define hierarchies of classes and attributes; they                        Meta-data, DAML, RDF Schema, RDF, XML Schema
are meta-data: data about data. XML Schema and RDF
Schema are both (lightweight) ontology definition languages                          1.   INTRODUCTION
in that sense. In the “traditional” approach to ontology
engineering, experts add new data by carefully analyzing                               Imagine that you work for an emergency preparedness
others’ ontologies and fitting their new concepts into the                           agency and that you were just handed the job of construct-
existing hierarchy. In the emerging “Semantic Web” ap-                               ing and maintaining a list of public health experts employed
proach to ontology engineering, ordinary users may not look                          by U.S. universities.
at anyone’s ontology before creating theirs – instead, they                            Doing this manually on the (non-semantic) Web would be
may simply define a new local schema from scratch that ad-                           a monumental effort, both in terms of the initial effort and
dresses their immediate needs, without worrying how their                            in the continuous effort to keep the list up to date. The
data may some day integrate with others’.                                            only options are to either do the job completely manually
  This paper describes an approach and implemented sys-                              in a text file or spreadsheet (quickly outdated), or to write
tem for translating between the countless mini-ontologies                            wrapper software specific for each university’s Web pages
that the Semantic Web approach yields. In this approach,                             that extracts the experts (the wrappers constantly break as
ordinary users graphically align data from multiple sources                          universities change their Web page designs).
in a simple spreadsheet-like view without having to know                               Now let us presume that all universities list their personnel
anything about ontologies or even taxonomies. The result-                            in a Semantic Web [10] format, such as RDF Schema [1].
ing web of equivalency statements can then be mined to                               This improves on the current sitation (because you don’t
help other users find related ontologies and data, and to                            have to work instance by instance but rather concept by
automatically align the data with theirs.                                            concept) but your job is still rather monumental because
                                                                                     the sources will likely use a myriad of different ontologies.
                                                                                       We have a vision and partial implementation addressing
Categories and Subject Descriptors                                                   this problem by (a) making it easy for individual users to
H.1.2 [Information Systems]: User/Machine Systems—                                   graphically align the attributes of two separate externally-
Human information processing; H.3.3 [Information Sys-                                defined concepts, and (b) making it easy to re-use the align-
tems]: Information Search and Retrieval—Information fil-                             ment work of others.
tering, Relevance feedback ; H.3.5 [Information Systems]:
Online Information Services—Data sharing, Web-based ser-                             2.   OVERVIEW
vices
                                                                                        Figure 1 depicts a number of home pages, marked up with
                                                                                     DAML information about the authors, located somewhere
General Terms                                                                        in the world. The DAML instances in these Web pages are
Collaborative filtering, recommender systems, social infor-                          organized according to one or more ontologies, such as an
mation filtering, ontology alignment, ontology translation                           ISI ontology of people, a Stanford ontology of people, and a
                                                                                     Karlsruhe ontology of people. The challenge is to produce
                                                                                     a report incorporating all of that information with minimal
                                                                                     effort.
Permission to make digital or hard copies of all or part of this work for               At a high level, the WebScripter concept is that users ex-
personal or classroom use is granted without fee provided that copies are            tract content from apparently ordinary Web pages and paste
not made or distributed for profit or commercial advantage and that copies           that content into what looks like an ordinary spreadsheet
bear this notice and the full citation on the first page. To copy otherwise, to      (lower left corner of Figure 1).
republish, to post on servers or to redistribute to lists, requires prior specific      What users implicitly do in WebScripter - without ex-
permission by the authors.
Semantic Web Workshop 2002 Hawaii, USA                                               pending extra effort - is to build up an articulation ontology
Copyright by the authors.                                                            containing equivalency statements. For example, this artic-
                      Stanford
          ISI
                                      Karlsruhe


                                                                     Figure 2: Visionary Example: The user types as-
                                                                     yet-unrecognized example values.


                                 WebScripter


                                        fullname = has-name
                                     Mitarbeiter = Person = Member
                                           ...          ...


Figure 1: Working With Multiple Data Sources In
Multiple Ontologies.                                                 Figure 3: Visionary Example: The system deter-
                                                                     mines data sources and a classification.

ulation ontology expresses that the attribute that ISI calls
“fullname” is the same as the one Stanford calls “has-name”;         ful to our reasoning here). Yahoo:UniversitiesAndColleges
and that the object Karlsruhe calls “Mitarbeiter” Stanford           and Lycos:Universities both apply. The universities now
calls “Person” and ISI calls “Member” are the same for the           appear underlined because they are recognized by the sys-
purposes of this report (lower right corner of Figure 1).            tem - double-clicking on them brings up their web pages.
  We believe that in the long run, this articulation ontology        The system then fetches their DAML-enabled Web pages
will be more valuable than the data the users happened to            in the background, and computes a minimal covering set
obtain when they constructed the original report. Its equiv-         of declared DAML IS-A types that cover all the univer-
alency information reduces the amount of work future Web-            sities. In our example, all of the current universities de-
Scripter users have to perform when working in the same              clare to be instances of the World-Wide Web Consortium’s
domain.1 Thus, in some sense, you don’t just use the Se-             W3C:University concept (Figure 3).
mantic Web when you use WebScripter, you help build it as               The user now selects “find more” from the menu bar. The
you go along.                                                        system will fetch every entity that the two known indices
                                                                     point to (several hundred). It simultaneously performs a
3.   VISIONARY EXAMPLE                                               different type of analysis: Which are the RDF(S) subclass-
                                                                     of types that are declared by more than 10% of the entities
   This section presents a detailed step-by-step vision of what      (result: U.N.:University, UsPostalService:Recipient, W3C:
WebScripter (and a future Semantic Web) could be; we will            University, and IRS:NonProfitInstitution) [this is a recall
later present a step-by-step example of what our current im-         test]? Of these, which apply to less than 1% of nearby cate-
plementation can already do (with existing RDF(S) data on            gories of the same index (remaining result: W3C:University
the Web produced by others). In this example, the applica-           and U.N.:University) [this is a precision test]. The latter
tion is to quickly produce a self-updating list of faculty at        one is now automatically treated as an alternatively valid
U.S. universities that are public health experts, listing their      type, and WebScripter will include every entity declaring
specialization. The user starts WebScripter and types the            to be one of these types, thereby finding institutions not
names of several universities into the first column. At the          yet listed by the well-known indices. Note that there are
point shown in Figure 2, truly nothing is known about these          no duplicate universities in this column (such as “UCLA”
hand-typed values.                                                   and “UC Los Angeles”). The challenge, of course, is to be
   After the user selects “classify” from a menu, WebScripter        able to determine that they are “different”, as they sub-
uses a list of well-known indices to find an existing taxonomy       scribe to different DAML ontologies. One possiblity is that
that matches all of the typed phrases (note that commer-             any Semantic Web description of an entity existing on the
cial search engines do not have to be DAMLized to be use-            Web contain its normalized HTTP URL in a standardized
1
 Who benefits depends on your willingness to share that              attribute, which can serve as a simple unique id for com-
information, of course - it could be the world, your organi-         parisons across ontologies (first choice for disambiguation
zation, your workgroup, or just yourself.                            in our example). Another possibility is that they contain
Figure 4: Visionary Example: The system auto-                      Figure 5: Visionary Example: Combining HTML
completes the user-provided values.                                navigation with embedded DAML semantics.


                                                                   ing countries in the second column are now filled in. The
(possibly composite) keys that point into popular external         user selects a United States cell in the second column and
ontologies, for the same reason (“companies in this ontology       invokes “filter by” from the right-click menu, checks “United
are uniquely identified by their UsTreas:IRS:TaxPayerId”),         States”, and clicks OK, which removes Oxford and all other
(“universities are identical if they point to the same Us-         foreign-university rows. Performing a number of substan-
PostalService:UsStreetAddress”). This gets the user to the         tially similar steps, the user can navigate to the universities’
state of Figure 4.                                                 chemistry, biology, and medical departments, from the de-
   In this vision of a future Semantic Web, the user has to        partment to the faculty, from the faculty to their research
know little to get a lot of leverage out of the existing seman-    interests, and filter by a particular research area, resulting
tic information: (1) The user did not have to do anything          in the table shown in Figure 6. (As before, bold entries were
but type out some university names that came to mind – he          provided or demonstrated by the user; but we are no longer
or she didn’t have to understand an ontological query lan-         underlining recognized cells below for readability).
guage or the notion of an ontology or even a taxonomy for             In the end, what users want is a report containing the
that matter - yet the result is perfectly ontologically typed.     information that serves their immediate needs. In our ap-
(2) Very little DAML has to be in place for this to work: for      proach, users build a report in steps, by manipulating the
this particular example, two external DAML ontologies of           data it contains so far to refine it and to add more. This
existing non-DAML university web sites should be sufficient        is a qualitatively easier task than working with a query,
for the inferencing of this example. (3) Data from two dif-        which is an inherently more abstract specification. In our
ferent ontologies can be seamlessly integrated without the         approach, a final report may contain dozens or hundreds
need for pre-merging/translation between the ontologies.           of single-step scripts that operate on DAML markup. The
   In this example, the user now demonstrates that she wants       equivalent query could be enormously complicated (perhaps
to extract the nationality of the universities, in the following   several pages long), but users never have to see it with this
manner (Figure 5). She double-clicks on USC, which brings          approach.
up a Web browser to the (hypothetically) DAML-enabled                 Now that this hypothetical WebScripter report is defined,
USC home page. The user then clicks on “Maps & Direc-              its data can be refreshed at any time, and it itself can be-
tions”, and copies and pastes “United States” from that            come the source for further Web scripting as it carries all its
page, which is not just plain text but carries its embedded        DAML within the generated HTML report.
DAML type.2
   In response, the system now fills in all those cells that use
the same underlying W3C university definition, by inferring        4.    IMPLEMENTED EXAMPLE
the ontological path from university to country and applying         In our initial implementation we have focused on mak-
it to all other instances of this ontology. In our particular      ing it easy for ordinary (non-programming, never heard of
case, the user is best served by now doing the same for the        ontologies) users to contruct reports from multi-ontology
UN-based university entry “Stanford” (not shown) because           DAML data. This section first describes a step-by-step
there are only two ontologies involved.3 As a result, all miss-    walkthrough of using WebScripter as implemented to com-
                                                                   bine DAML personnel data from different organizations on
2
  Note that one would not have to internally instrument a          the Web. It then describes how the resulting implicit on-
Web browser to achieve this level of integration – one could       tology alignment data benefits other users in constructing
know which page the user is looking at through a proxy             similar reports.
Web server and receive the copied HTML+DAML out of
the window system’s paste buffer.                                  4.1    Constructing a first report from scratch
3
  If there is more than two WebScripter could attempt to             Imagine that you work for the government DAML pro-
produce a generalized “fuzzy” script that will work for all
remaining university ontologies given two (or more) exam-          gram office, and that your job is to maintain a list of per-
ples (“extract the attribute whose name contains Country           sonnel funded by that program, and let’s assume that all of
or Nation in the top-level concept or in a sub-concept called      the contractors provide their personnel data in some DAML
Location or Address”).                                             format. The first task is to find the URLs where the vari-
                       Figure 6: Visionary Example: The end result in this fictious example.


ous DAML resides. BBN’s crawled ontology library comes
closest to a Yahoo-style portal for DAML content [2]. This
site contains a registry for DAML content root files, which
a crawler uses as starting points to find more DAML files.
Teknowledge built a DAML search engine for that ontol-
ogy library which is a good starting place for finding DAML
content [3].
   In this example, the DAML sources can be found by query-
ing the Teknowledge search engine for the terms “Person”,
“Employee”, and “Staff” (which will return a large number
of hits of non-DAML contractors), or alternatively it can be
found by collecting the regular project Web pages and per-        Figure 7: Implemented Example: Initial report of
sonal home pages of the DAML contractors (because they            Stanford KSL personnel.
embed DAML content inside the HTML pages).
   For the sake of this example, we started WebScripter and
loaded DAML from just the Stanford Database and Knowl-
edge Systems groups by copying and pasting the URLs of
their DAML pages into WebScripter’s “Add DAML” dia-
log box. WebScripter then displays the class hierarchy of
that DAML, intermixing the concepts from the two sepa-
rate ontologies. The user can browse the content by select-
ing classes, which displays all of their (local and inherited)
attributes as columns, and their data instances as rows.
   In this example, we started a new report by (1) choos-
ing “New Report” from a menu, (2) selecting Person in
the class hierarchy, and (3) selecting three columns of Per-
son to include in the report. The latter is done by se-           Figure 8: Implemented Example: Adding and align-
lecting a cell in the data display for Person and choos-          ing Stanford Database personnel.
ing “Add as new column” from the right-click menu, once
each for the Has-Full-Name, Has-Phone-Number, and Has-
Email-Address columns. The resulting WebScripter display          different sources [4]. The largest such report we have gener-
is shown in Figure 7. (Note that the first of the four columns,   ated is 3.3MB, taking 8.7MB of DAML input, and running
the DAML instance identifier column, was automatically in-        for about 45 seconds.
serted when the first column was added to the report. The            The Web page embeds the WebScripter report definition,
column is hidden from the generated report Web page by            thus it can be re-run at any time and will then possibly show
default.)                                                         more people (presuming their DAMLized Web pages can be
   In this example, we will now add and align data from a         found by following just one link from the two group Web
different research group using a different ontology. This is      pages, and presuming their DAML instance data follows one
done by (1) selecting PhDStudent in the class hierarchy to        of the two ontologies).
display its instance data, (2) selecting a cell in the “name”        There are a large number of WebScripter features that we
column of that instance data and choosing “Add to column          will not discuss here – such as un-loading DAML sources,
1” from the right click menu, and (3) repeating the second        deleting columns, re-arranging columns, filtering rows, and
step for the “phone” and “email” columns. Figure 8 shows          sorting by multiple criteria, and so on – because they are
the combined data from the two groups.                            what you would expect from any DAML report generator.
   This in a nutshell is how WebScripter looks to the users.      Instead, we’ll focus on the generated DAML equivalency
This report can then be published in various formats, includ-     statements shown in Table 1.
ing as a plain Web page that color-codes its content based           These statements can be automatically published on a
on where it came from; Figure 9 shows a snapshot of a large       Web site and registered as a new DAML content root in
DAML personnel report that loads data from more than 30           BBN’s DAML content library. Consequently, you can then
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdf="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:daml="http://www.daml.org/2001/03/daml+oil#">

<rdfs:Class
  rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#PERSON">
  <daml:sameClassAs rdf:resource=
  "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#PhDStudent"/>
</rdfs:Class>

<rdfs:Property
  rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Full-Name>
  <daml:samePropertyAs rdf:resource=
  "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#name"/>
</rdfs:Property>
<rdfs:Property
  rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Phone-Number>
  <daml:samePropertyAs rdf:resource=
  "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#phone"/>
</rdfs:Property>
<rdfs:Property
  rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Email-Address>
  <daml:samePropertyAs rdf:resource=
  "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#email"/>
</rdfs:Property>

</rdf:RDF>
                                                                                           Figure 9: Snapshot of (a fragment of ) a large-size
                                                                                           WebScripter DAML people report.
Table 1: Implemented Example: Resulting DAML
equivalency statements.
                                                                                           there is not that much interesting, continuously updated
                                                                                           RDF(S), much less DAML, available on the Web today.4
make use of the equivalency statements by selecting the “Ex-                               What made the original Web take off was that there was
tended with Equivalence” option in Teknowledge’s DAML                                      an immediate incentive for producers to use the technology
search engine (note that it can take up to 24 hours for the                                because it was an easy way to publish information. We cur-
statements to make it into BBN’s cache and then up to                                      rently see no strong motivation for producers to put work
another week from there into Teknowledge’s search engine                                   into putting out RDF(S) in addition to their regular HTML
cache). Concretely, if you for example now query for all                                   pages, but there is at least a compelling intra-organizational
instances of person (“?x type Person”) in the first ontol-                                 benefit in using RDF(S) and WebScripter to generate regu-
ogy in that fashion you will now also retrieve PhDStudent                                  lar HTML pages by pulling RDF from various pages within
instances from the second ontology.                                                        the organization.
                                                                                              To be more concrete, once a DAML-enabled document is
4.2          Constructing a second report using the                                        published on the Web, WebScripter makes it easy to access
             alignment data                                                                and republish portions of it as part of a larger report – an
   We have also implemented an intial use of the WebScripter-                              effort savings for federated information providers who cur-
generated equivalency statements in WebScripter itself: if                                 rently need to maintain the same information in multiple
you start it with the insert-equivalents flag it will automati-                            places. For example, professors routinely publish a list of
cally add and align any classes that it has sameClassAs and                                their publications on their home page. Departments pub-
sameInstanceAs data for. It reads these equivalency state-                                 lish a list of all publications, and project pages publish a
ments from a fixed location on our Web site to which you                                   list of project-related publications from the project mem-
can contribute more via the “Easy Publish” menu in Web-                                    bers. Today, someone has to manually construct these pages
Scripter.                                                                                  (presuming these federated organizations are not so tightly
   Let’s assume that a second user comes along later whose                                 integrated that they maintain a shared database or other
job it is to maintain a list of researchers with Semantic Web                              common structured information source, of course). When
expertise, plus their email addresses and home pages. She                                  an author publishes a new paper or makes a correction on
starts WebScripter in the same way as above, selects for ex-                               an existing one, he or she has to either manually update the
ample PhDStudent and adds “name” as the first column in                                    other pages, or coordinate with the appropriate people to
her report. At that point, WebScripter will not only add                                   have all the other lists updated. WebScripter can eliminate
all instances of Person, but also automatically align their                                the additional work, authors only need to mark up their
names into the column. Similarly, when then selecting the                                  personal paper publication with DAML, and the reports
email address for either Person or PhDStudent and saying                                   for the department and project-specific pages will automat-
“Add as new column” WebScripter will fill in the email ad-                                 ically pick up the new publication (e.g. every night). Web-
dresses for the other ontology as well. This will not happen                               Scripter eliminates overhead not only for the organization,
after she adds Has-Home-Page as a new column (as there is                                  but also for the individual producing the information, who
no existing equivalency data) so that she has to manually                                  no longer needs to coordinate the redistribution effort. Web-
select homepage and say “Add to column”. (However, if she                                  Scripter can also enhance the flexibility and value of Web
is willing to share her alignment data via the “Easy Pub-                                  sites with large amounts of information by publishing skele-
lish” option future users do not have to align this column                                 ton WebScripter reports that visitors can refine to obtain
by hand either.)                                                                           customized reports. Thus, we are cautiously optimistic that
                                                                                           WebScripter may help with the adoption of RDF(S)/DAML
5.      THOUGHTS ON INCENTIVIZING PRO-                                                     on the producer side as well.
        DUCERS                                                                             4
                                                                                             The notable exception are headline exchange files such as
     As of the time of writing, one issue we encountered is that                           slashdot.org/slashdot.rdf.
                                                                        Class      Hops   Origin          Author   Rows   Date      Users
6.    THOUGHTS ON END-USER CONTROL                                      Person     1      stanford.e...   Smith    235    10/6/02   12
                                                                        Employee   1      stanford.e...   Smith    57     10/6/02   6
      OVER AUTO-ALIGNMENT                                               Staff      1      stanford.e...   Smith    697    10/6/02   0
                                                                        Member     2      www.isi.e...    Chen     15     3/4/01    17
   You can currently run WebScripter either in an “ignore all           Person     2      cmu.edu/...     Miller   973    12/7/01   4
                                                                        Member     2      cmu.edu/...     Miller   107    12/7/01   9
equivalencies” mode or in an “auto-insert all known equiv-
alencies” mode, neither of which is ideal of course. In par-
ticular, the latter may quickly become impractical if a large            Table 2: Sketch of a graphical user interface.
number of people share alignment data, even if they are
not ill-intentioned. This is either because they made a hon-
est mistake (they aligned homepages from one ontology with                  states that the class the user just added by hand is
email addresses from another and did not notice) or because                 the same as the class shown, 2 or more if the equiv-
they had a different type of equivalency in mind when they                  alence was inferred by transitive closure. The third
authored their report (graduate research assistants are the                 column contains the Uniform Resource Locator for the
same as machines in the sense that they cost the project                    equivalency file. The fourth column shows the name
money to support, but that may then cause machines to                       of the author of the WebScripter report that implied
auto-appear in a report of someone else trying to author                    the equivalencies. The fifth column contains the num-
a personnel list). We see the following potential solutions                 ber of additional rows inserted into the user’s report
(which are not mutually exclusive).                                         if she would incorporate the equivalency. The sixth
                                                                            column indicates when the report that resulted in the
     • Centralized Human Editors. One possiblity is for an                  equivalency statements was authored. The last col-
       organization to appoint an “alignment czar”. The job                 umn sums up how many other users already made use
       of such a czar would be to periodically validate the                 of the equivalency statement in their reports.
       equivalency data contributed by organization members
       into a staging area. If approved, equivalency files are
       then moved to that organization’s official equivalency      7.      THOUGHTS ON OTHER OPEN QUES-
       data area. Cautious organization members can then
       exclusively make use of the approved equivalency data               TIONS
       while adventurous ones are free to use staging data or       Addressing a number of other issues would also help in
       external data. Obviously, any use of explicit human         making DAML and WebScripter use take off.
       effort is associated with costs; however, one attraction
       of this model is that the “alignment czar” does not               • How do ordinary users find good original Semantic
       nearly need the technical sophistication of an “ontol-              Web content? WebScripter does not address this prob-
       ogy librarian” and can possibly be a clerical worker                lem: once you found one it can point you to related
       given a specialized graphical application.                          content that others may have by using an equivalency-
                                                                           aware DAML search engine such as Teknowledge’s DAML
     • Social Filtering. Another approach would be to keep                 Semantic Search Service [3]. There are no Yahoo-style
       track of the authors of equivalency statements as well              portals for DAML content yet to our knowledge. There
       as the users of equivalency statements (neither of which            are, however at least two RDF crawlers – one from
       we currently do); this would enable users to say “I                 BBN [2] and one from the University of Karlsruhe [5]
       want to use the same equivalency data that Jim and                  – that could help in building such a portal.
       Chris are using” (this is a nicely implicit way to limit
       equivalencies to e.g. the accounting context if they are          • What does it really mean for two classes or two at-
       co-workers in accounting, without having to more for-               tributes to be “the same”? The current DAML equiv-
       mally define the context, which is a more abstract and              alance statements allow users to say that x is equiva-
       difficult task). This would also allow cautious users to            lent to y. We likely need a replacement construct that
       express “I am willing to use any DAML equivalency                   allows users to express that x is equivalent to y in the
       file that at least 10 others are using” (which addresses            sense of (or context of) z. We will try to influence the
       the erroneous-alignment problem but not the context                 DAML language definition in that direction (but ad-
       mismatch problem).                                                  mittedly aren’t quite sure ourselves how to model z).
                                                                           The most difficult problem we see is in the end-user
     • Fine-Grained Control in the User Interface. Finally, it             interface for stating these more complex equivalencies.
       would be nice to have a compact display of the avail-
       able equivalency information. This display would show       8.      RELATED WORK
       a row of information about the available equivalency          WebScripter’s approach to ontology alignment is extreme:
       information and give the user a checkbox for incorpo-       terms from different ontologies are always assumed to mean
       rating or ignoring each. Table 2 sketches a preliminary     different things by default, and all ontology mapping is done
       design for deciding which sameClassAs statements to         by humans (implicitly, by putting them into the same col-
       use. (This sketch assumes that we store much more           umn of a report).
       fine-grained information in the equivalency files than        This is similar in spirit to Gio Wiederhold’s mediation ap-
       we currently do.)                                           proach to ontology interoperation [18], which also assumes
       The first column shows the human-given label of the         that terms from different ontologies never mean the same
       class that is being declared as equivalent to the one the   thing unless committees of integration experts say they are.
       user added by hand. The second column indicates the         WebScripter pushes that concept to the brink by replacing
       level of indirection - 1 if the equivalency file directly   the experts with ordinary users that may not even be aware
of their implicit ontology alignment contributions. (Note,       10.   ACKNOWLEDGMENTS
however, that we cannot yet proof that this collective align-      We gratefully acknowledge DARPA DAML program fund-
ment data is indeed a useful source for automatic ontology       ing for WebScripter under contract number F30602-00-2-
alignment on an Internet scale – we lack sufficient data from    0576. The first author would also like to acknowledge AFOSR
distributed WebScripter use to make that claim.)                 funding under grant number F49620-01-1-0341.
   The ONION system [15] takes a semi-automated approach
to ontology interoperation: the system guesses likely matches
between terms of two separately conceived ontologies, a hu-
                                                                 11.   REFERENCES
man expert knowledgeable about the semantics of both on-          [1] http://www.w3.org/TR/2000/CR-rdf-schema-
tologies then verifies the inferences, using a graphical user         20000327/.
interface. ONION’s guessing analyzes the schema informa-          [2] http://www.daml.org/crawler/.
tion using relationships with semantics known to the sys-         [3] http://reliant.teknowledge.com/DAML.
tem in advance (subclass-of, part-of, attribute-of, instance-     [4] http://www.isi.edu/webscripter/daml-
of, value-of); in WebScripter human users rely purely on              personnel.gen.html.
the data instances to decide what collates and what doesn’t       [5] http://ontobroker.semanticweb.org/rdfcrawl.
(because they are just not expert enough to analyze the           [6] http://www.isi.edu/divisions/div2/. Click on People.
abstractions). That being said, incorporating ONION-style         [7] http://tools.semanticweb.org.
alignment guessing into WebScripter would clearly be ben-
                                                                  [8] http://www.isi.edu/webscripter.
eficial presuming the rate of correct guesses is sufficiently
                                                                  [9] Y. Arens, C. Knoblock, and W.-M. Shen. Query
high.
                                                                      reformulation for dynamic information integration.
   OBSERVER [14], SIMS [9], TSIMMIS [11] and the Infor-
                                                                      Intelligent Information Systems, 6(2-3):99–130, 1996.
mation Manifold [13] are all systems for querying multiple
data sources of different schemata in a uniform way; how-        [10] T. Berners-Lee, J. Hendler, and O. Lassila. The
ever, they all rely on human experts to devise the ontolog-           semantic web. Scientific American, May 2001.
ical mappings between the sources to our knowledge. This         [11] H. Garcia-Molina, Y. Papakonstantinou, D. Quass,
is because they mediate between structured dynamic data               A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and
sources (such as SQL/ODBC sources) without run-time hu-               J. Widom. The TSIMMIS approach to mediation:
man involvement where a higher level of precision is required         data models and languages. Intelligent Information
to make the interoperation work. In contrast, WebScripter             Systems, 8(2):117–32, 1997.
is targeted towards mediating between different ontologies       [12] E. Hovy. Combining and standardizing large-scale,
in static RDF-based Web pages with run-time human in-                 practical ontologies for machine translation and other
volvement, where the need for precision in the translation is         uses. In Proceedings of the First International
naturally lower.                                                      Conference on Language Resources and Evaluation
                                                                      (LREC), 1998.
                                                                 [13] A. Levy, D. Srivastava, and T. Kirk. Data model and
9.   EVALUATION AND CONCLUSIONS                                       query evaluation in global information systems.
                                                                      Intelligent Information Systems, 5(2):121–43, 1995.
   WebScripter has turned out to be a valuable practical tool
                                                                 [14] E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth.
even for the simple single-ontology case where there is only
                                                                      OBSERVER: an approach for query processing in
one schema but the instance data is distributed over many
                                                                      global information systems based on interoperation
Web pages. For example, the Distributed Scalable Systems
                                                                      across pre-existing ontologies. Distributed and Parallel
Division at ISI automatically pulls together its people page
                                                                      Databases, 8(2):223–71, 2000.
from many different DAMLized Web pages: some informa-
tion is maintained by individuals themselves (such as their      [15] P. Mitra and G. Wiederhold. An algebra for semantic
research interests), other information is maintained by the           interoperability of information sources. In 2nd Annual
division director (such as project assignments), and some in-         IEEE International Symposium on Bioinformatics and
formation is maintained at the institute level (such as office        Bioengineering, pages 174–82, Bethesda, MD, USA,
assignments); this relieved the administrative assistant from         November 4-6 2001.
manually maintaining everyone’s interests [6]. WebScripter       [16] P. Mitra, G. Wiederhold, and M. Kersten. A
has also been used externally, for example to maintain a Se-          graph-oriented model for articulation of ontology
mantic Web tools list [7]. You can download WebScripter               interdependencies. In Advances in Database
from [8].                                                             Technology - EDBT 2000. 7th International
   However, the most exciting application of WebScripter, as          Conference on Extending Database Technology,
a world-wide collaborative ontology translation tool, is con-         Lecture Notes in Computer Science, pages 86–100,
fined to experimental use by ourselves at this point. This is         Konstanz, Germany, March 27-31 2000.
more due to a lack of widespread interesting RDF(S) content      [17] N. F. Noy and M. A. Musen. PROMPT: Algorithm
than it is due to any limitation of WebScripter itself. Nev-          and tool for automated ontology merging and
ertheless, we are excited about this new approach to global           alignment. In 17th National Conference on AI, 2000.
knowledge sharing, may it be achieved by a future version        [18] G. Wiederhold. Interoperation, mediation, and
of WebScripter or a similar tool or tools. The key difference         ontologies. In International Symposium on Fifth
we see between “traditional” ontology translation and our             Generation Computer Systems, Workshop on
approach is that non-experts perform all of the translation           Heterogeneous Cooperative Knowledge-Bases,
- but potentially on a global scale, leveraging each others’          volume W3, pages 33–48. ICOT, Tokyo, Japan,
work.                                                                 December 1994.

</pre>