<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">WebScripter: World-Wide Grass-roots Ontology Translation via Implicit End-User Alignment Research Paper Category</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Martin</forename><surname>Frank</surname></persName>
							<email>frank@isi.edu</email>
						</author>
						<author>
							<persName><forename type="first">Pedro</forename><surname>Szekely</surname></persName>
							<email>szekely@isi.edu</email>
						</author>
						<author>
							<persName><forename type="first">Robert</forename><surname>Neches</surname></persName>
							<email>rneches@isi.edu</email>
						</author>
						<author>
							<persName><forename type="first">Baoshi</forename><surname>Yan</surname></persName>
							<email>baoshi@isi.edu</email>
						</author>
						<author>
							<persName><forename type="first">Juan</forename><surname>Lopez</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Distributed Scalable Systems Division Information Sciences</orgName>
								<orgName type="institution">Institute</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">University of Southern California</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">WebScripter: World-Wide Grass-roots Ontology Translation via Implicit End-User Alignment Research Paper Category</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E7BE4BF6538CA4F6321BEA26BB6C8059</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.1.2 [Information Systems]: User/Machine Systems-Human information processing</term>
					<term>H.3.3 [Information Systems]: Information Search and Retrieval-Information filtering, Relevance feedback</term>
					<term>H.3.5 [Information Systems]: Online Information Services-Data sharing, Web-based services Collaborative filtering, recommender systems, social information filtering, ontology alignment, ontology translation Meta-data, DAML, RDF Schema, RDF, XML Schema</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Ontologies define hierarchies of classes and attributes; they are meta-data: data about data. XML Schema and RDF Schema are both (lightweight) ontology definition languages in that sense. In the "traditional" approach to ontology engineering, experts add new data by carefully analyzing others' ontologies and fitting their new concepts into the existing hierarchy. In the emerging "Semantic Web" approach to ontology engineering, ordinary users may not look at anyone's ontology before creating theirs -instead, they may simply define a new local schema from scratch that addresses their immediate needs, without worrying how their data may some day integrate with others'.</p><p>This paper describes an approach and implemented system for translating between the countless mini-ontologies that the Semantic Web approach yields. In this approach, ordinary users graphically align data from multiple sources in a simple spreadsheet-like view without having to know anything about ontologies or even taxonomies. The resulting web of equivalency statements can then be mined to help other users find related ontologies and data, and to automatically align the data with theirs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Imagine that you work for an emergency preparedness agency and that you were just handed the job of constructing and maintaining a list of public health experts employed by U.S. universities.</p><p>Doing this manually on the (non-semantic) Web would be a monumental effort, both in terms of the initial effort and in the continuous effort to keep the list up to date. The only options are to either do the job completely manually in a text file or spreadsheet (quickly outdated), or to write wrapper software specific for each university's Web pages that extracts the experts (the wrappers constantly break as universities change their Web page designs). Now let us presume that all universities list their personnel in a Semantic Web <ref type="bibr" target="#b1">[10]</ref> format, such as RDF Schema <ref type="bibr">[1]</ref>. This improves on the current sitation (because you don't have to work instance by instance but rather concept by concept) but your job is still rather monumental because the sources will likely use a myriad of different ontologies.</p><p>We have a vision and partial implementation addressing this problem by (a) making it easy for individual users to graphically align the attributes of two separate externallydefined concepts, and (b) making it easy to re-use the alignment work of others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">OVERVIEW</head><p>Figure <ref type="figure" target="#fig_0">1</ref> depicts a number of home pages, marked up with DAML information about the authors, located somewhere in the world. The DAML instances in these Web pages are organized according to one or more ontologies, such as an ISI ontology of people, a Stanford ontology of people, and a Karlsruhe ontology of people. The challenge is to produce a report incorporating all of that information with minimal effort.</p><p>At a high level, the WebScripter concept is that users extract content from apparently ordinary Web pages and paste that content into what looks like an ordinary spreadsheet (lower left corner of Figure <ref type="figure" target="#fig_0">1</ref>).</p><p>What users implicitly do in WebScripter -without expending extra effort -is to build up an articulation ontology containing equivalency statements. For example, this artic- ulation ontology expresses that the attribute that ISI calls "fullname" is the same as the one Stanford calls "has-name"; and that the object Karlsruhe calls "Mitarbeiter" Stanford calls "Person" and ISI calls "Member" are the same for the purposes of this report (lower right corner of Figure <ref type="figure" target="#fig_0">1</ref>). We believe that in the long run, this articulation ontology will be more valuable than the data the users happened to obtain when they constructed the original report. Its equivalency information reduces the amount of work future Web-Scripter users have to perform when working in the same domain. 1 Thus, in some sense, you don't just use the Semantic Web when you use WebScripter, you help build it as you go along.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">VISIONARY EXAMPLE</head><p>This section presents a detailed step-by-step vision of what WebScripter (and a future Semantic Web) could be; we will later present a step-by-step example of what our current implementation can already do (with existing RDF(S) data on the Web produced by others). In this example, the application is to quickly produce a self-updating list of faculty at U.S. universities that are public health experts, listing their specialization. The user starts WebScripter and types the names of several universities into the first column. At the point shown in Figure <ref type="figure" target="#fig_1">2</ref>, truly nothing is known about these hand-typed values.</p><p>After the user selects "classify" from a menu, WebScripter uses a list of well-known indices to find an existing taxonomy that matches all of the typed phrases (note that commercial search engines do not have to be DAMLized to be use-1 Who benefits depends on your willingness to share that information, of course -it could be the world, your organization, your workgroup, or just yourself.  ful to our reasoning here). Yahoo:UniversitiesAndColleges and Lycos:Universities both apply. The universities now appear underlined because they are recognized by the system -double-clicking on them brings up their web pages. The system then fetches their DAML-enabled Web pages in the background, and computes a minimal covering set of declared DAML IS-A types that cover all the universities. In our example, all of the current universities declare to be instances of the World-Wide Web Consortium's W3C:University concept (Figure <ref type="figure" target="#fig_2">3</ref>).</p><p>The user now selects "find more" from the menu bar. The system will fetch every entity that the two known indices point to (several hundred). It simultaneously performs a different type of analysis: Which are the RDF(S) subclassof types that are declared by more than 10% of the entities (result: U.N.:University, UsPostalService:Recipient, W3C: University, and IRS:NonProfitInstitution) [this is a recall test]? Of these, which apply to less than 1% of nearby categories of the same index (remaining result: W3C:University and U.N.:University) [this is a precision test]. The latter one is now automatically treated as an alternatively valid type, and WebScripter will include every entity declaring to be one of these types, thereby finding institutions not yet listed by the well-known indices. Note that there are no duplicate universities in this column (such as "UCLA" and "UC Los Angeles"). The challenge, of course, is to be able to determine that they are "different", as they subscribe to different DAML ontologies. One possiblity is that any Semantic Web description of an entity existing on the Web contain its normalized HTTP URL in a standardized attribute, which can serve as a simple unique id for comparisons across ontologies (first choice for disambiguation in our example). Another possibility is that they contain (possibly composite) keys that point into popular external ontologies, for the same reason ("companies in this ontology are uniquely identified by their UsTreas:IRS:TaxPayerId"), ("universities are identical if they point to the same Us-PostalService:UsStreetAddress"). This gets the user to the state of Figure <ref type="figure" target="#fig_3">4</ref>.</p><p>In this vision of a future Semantic Web, the user has to know little to get a lot of leverage out of the existing semantic information: (1) The user did not have to do anything but type out some university names that came to mind -he or she didn't have to understand an ontological query language or the notion of an ontology or even a taxonomy for that matter -yet the result is perfectly ontologically typed.</p><p>(2) Very little DAML has to be in place for this to work: for this particular example, two external DAML ontologies of existing non-DAML university web sites should be sufficient for the inferencing of this example. (3) Data from two different ontologies can be seamlessly integrated without the need for pre-merging/translation between the ontologies.</p><p>In this example, the user now demonstrates that she wants to extract the nationality of the universities, in the following manner (Figure <ref type="figure" target="#fig_4">5</ref>). She double-clicks on USC, which brings up a Web browser to the (hypothetically) DAML-enabled USC home page. The user then clicks on "Maps &amp; Directions", and copies and pastes "United States" from that page, which is not just plain text but carries its embedded DAML type. <ref type="foot" target="#foot_0">2</ref>In response, the system now fills in all those cells that use the same underlying W3C university definition, by inferring the ontological path from university to country and applying it to all other instances of this ontology. In our particular case, the user is best served by now doing the same for the UN-based university entry "Stanford" (not shown) because there are only two ontologies involved.<ref type="foot" target="#foot_1">3</ref> As a result, all miss- ing countries in the second column are now filled in. The user selects a United States cell in the second column and invokes "filter by" from the right-click menu, checks "United States", and clicks OK, which removes Oxford and all other foreign-university rows. Performing a number of substantially similar steps, the user can navigate to the universities' chemistry, biology, and medical departments, from the department to the faculty, from the faculty to their research interests, and filter by a particular research area, resulting in the table shown in Figure <ref type="figure" target="#fig_5">6</ref>. (As before, bold entries were provided or demonstrated by the user; but we are no longer underlining recognized cells below for readability).</p><p>In the end, what users want is a report containing the information that serves their immediate needs. In our approach, users build a report in steps, by manipulating the data it contains so far to refine it and to add more. This is a qualitatively easier task than working with a query, which is an inherently more abstract specification. In our approach, a final report may contain dozens or hundreds of single-step scripts that operate on DAML markup. The equivalent query could be enormously complicated (perhaps several pages long), but users never have to see it with this approach.</p><p>Now that this hypothetical WebScripter report is defined, its data can be refreshed at any time, and it itself can become the source for further Web scripting as it carries all its DAML within the generated HTML report.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">IMPLEMENTED EXAMPLE</head><p>In our initial implementation we have focused on making it easy for ordinary (non-programming, never heard of ontologies) users to contruct reports from multi-ontology DAML data. This section first describes a step-by-step walkthrough of using WebScripter as implemented to combine DAML personnel data from different organizations on the Web. It then describes how the resulting implicit ontology alignment data benefits other users in constructing similar reports.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Constructing a first report from scratch</head><p>Imagine that you work for the government DAML program office, and that your job is to maintain a list of personnel funded by that program, and let's assume that all of the contractors provide their personnel data in some DAML format. The first task is to find the URLs where the vari- In this example, the DAML sources can be found by querying the Teknowledge search engine for the terms "Person", "Employee", and "Staff" (which will return a large number of hits of non-DAML contractors), or alternatively it can be found by collecting the regular project Web pages and personal home pages of the DAML contractors (because they embed DAML content inside the HTML pages).</p><p>For the sake of this example, we started WebScripter and loaded DAML from just the Stanford Database and Knowledge Systems groups by copying and pasting the URLs of their DAML pages into WebScripter's "Add DAML" dialog box. WebScripter then displays the class hierarchy of that DAML, intermixing the concepts from the two separate ontologies. The user can browse the content by selecting classes, which displays all of their (local and inherited) attributes as columns, and their data instances as rows.</p><p>In this example, we started a new report by (1) choosing "New Report" from a menu, (2) selecting Person in the class hierarchy, and (3) selecting three columns of Person to include in the report. The latter is done by selecting a cell in the data display for Person and choosing "Add as new column" from the right-click menu, once each for the Has-Full-Name, Has-Phone-Number, and Has-Email-Address columns. The resulting WebScripter display is shown in Figure <ref type="figure" target="#fig_6">7</ref>. (Note that the first of the four columns, the DAML instance identifier column, was automatically inserted when the first column was added to the report. The column is hidden from the generated report Web page by default.)</p><p>In this example, we will now add and align data from a different research group using a different ontology. This is done by (1) selecting PhDStudent in the class hierarchy to display its instance data, (2) selecting a cell in the "name" column of that instance data and choosing "Add to column 1" from the right click menu, and (3) repeating the second step for the "phone" and "email" columns. Figure <ref type="figure" target="#fig_7">8</ref> shows the combined data from the two groups.</p><p>This in a nutshell is how WebScripter looks to the users. This report can then be published in various formats, including as a plain Web page that color-codes its content based on where it came from; Figure <ref type="figure" target="#fig_8">9</ref> shows a snapshot of a large DAML personnel report that loads data from more than 30  The Web page embeds the WebScripter report definition, thus it can be re-run at any time and will then possibly show more people (presuming their DAMLized Web pages can be found by following just one link from the two group Web pages, and presuming their DAML instance data follows one of the two ontologies).</p><p>There are a large number of WebScripter features that we will not discuss here -such as un-loading DAML sources, deleting columns, re-arranging columns, filtering rows, and sorting by multiple criteria, and so on -because they are what you would expect from any DAML report generator. Instead, we'll focus on the generated DAML equivalency statements shown in Table <ref type="table" target="#tab_0">1</ref>.</p><p>These statements can be automatically published on a Web site and registered as a new DAML content root in BBN's DAML content library. Consequently, you can then make use of the equivalency statements by selecting the "Extended with Equivalence" option in Teknowledge's DAML search engine (note that it can take up to 24 hours for the statements to make it into BBN's cache and then up to another week from there into Teknowledge's search engine cache). Concretely, if you for example now query for all instances of person ("?x type Person") in the first ontology in that fashion you will now also retrieve PhDStudent instances from the second ontology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Constructing a second report using the alignment data</head><p>We have also implemented an intial use of the WebScriptergenerated equivalency statements in WebScripter itself: if you start it with the insert-equivalents flag it will automatically add and align any classes that it has sameClassAs and sameInstanceAs data for. It reads these equivalency statements from a fixed location on our Web site to which you can contribute more via the "Easy Publish" menu in Web-Scripter.</p><p>Let's assume that a second user comes along later whose job it is to maintain a list of researchers with Semantic Web expertise, plus their email addresses and home pages. She starts WebScripter in the same way as above, selects for example PhDStudent and adds "name" as the first column in her report. At that point, WebScripter will not only add all instances of Person, but also automatically align their names into the column. Similarly, when then selecting the email address for either Person or PhDStudent and saying "Add as new column" WebScripter will fill in the email addresses for the other ontology as well. This will not happen after she adds Has-Home-Page as a new column (as there is no existing equivalency data) so that she has to manually select homepage and say "Add to column". (However, if she is willing to share her alignment data via the "Easy Publish" option future users do not have to align this column by hand either.)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">THOUGHTS ON INCENTIVIZING PRO-DUCERS</head><p>As of the time of writing, one issue we encountered is that there is not that much interesting, continuously updated RDF(S), much less DAML, available on the Web today. <ref type="foot" target="#foot_2">4</ref>What made the original Web take off was that there was an immediate incentive for producers to use the technology because it was an easy way to publish information. We currently see no strong motivation for producers to put work into putting out RDF(S) in addition to their regular HTML pages, but there is at least a compelling intra-organizational benefit in using RDF(S) and WebScripter to generate regular HTML pages by pulling RDF from various pages within the organization.</p><p>To be more concrete, once a DAML-enabled document is published on the Web, WebScripter makes it easy to access and republish portions of it as part of a larger report -an effort savings for federated information providers who currently need to maintain the same information in multiple places. For example, professors routinely publish a list of their publications on their home page. Departments publish a list of all publications, and project pages publish a list of project-related publications from the project members. Today, someone has to manually construct these pages (presuming these federated organizations are not so tightly integrated that they maintain a shared database or other common structured information source, of course). When an author publishes a new paper or makes a correction on an existing one, he or she has to either manually update the other pages, or coordinate with the appropriate people to have all the other lists updated. WebScripter can eliminate the additional work, authors only need to mark up their personal paper publication with DAML, and the reports for the department and project-specific pages will automatically pick up the new publication (e.g. every night). Web-Scripter eliminates overhead not only for the organization, but also for the individual producing the information, who no longer needs to coordinate the redistribution effort. Web-Scripter can also enhance the flexibility and value of Web sites with large amounts of information by publishing skeleton WebScripter reports that visitors can refine to obtain customized reports. Thus, we are cautiously optimistic that WebScripter may help with the adoption of RDF(S)/DAML on the producer side as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">THOUGHTS ON END-USER CONTROL OVER AUTO-ALIGNMENT</head><p>You can currently run WebScripter either in an "ignore all equivalencies" mode or in an "auto-insert all known equivalencies" mode, neither of which is ideal of course. In particular, the latter may quickly become impractical if a large number of people share alignment data, even if they are not ill-intentioned. This is either because they made a honest mistake (they aligned homepages from one ontology with email addresses from another and did not notice) or because they had a different type of equivalency in mind when they authored their report (graduate research assistants are the same as machines in the sense that they cost the project money to support, but that may then cause machines to auto-appear in a report of someone else trying to author a personnel list). We see the following potential solutions (which are not mutually exclusive).</p><p>• Centralized Human Editors. One possiblity is for an organization to appoint an "alignment czar". The job of such a czar would be to periodically validate the equivalency data contributed by organization members into a staging area. If approved, equivalency files are then moved to that organization's official equivalency data area. Cautious organization members can then exclusively make use of the approved equivalency data while adventurous ones are free to use staging data or external data. Obviously, any use of explicit human effort is associated with costs; however, one attraction of this model is that the "alignment czar" does not nearly need the technical sophistication of an "ontology librarian" and can possibly be a clerical worker given a specialized graphical application.</p><p>• Social Filtering. Another approach would be to keep track of the authors of equivalency statements as well as the users of equivalency statements (neither of which we currently do); this would enable users to say "I want to use the same equivalency data that Jim and Chris are using" (this is a nicely implicit way to limit equivalencies to e.g. the accounting context if they are co-workers in accounting, without having to more formally define the context, which is a more abstract and difficult task). This would also allow cautious users to express "I am willing to use any DAML equivalency file that at least 10 others are using" (which addresses the erroneous-alignment problem but not the context mismatch problem).</p><p>• Fine-Grained Control in the User Interface. Finally, it would be nice to have a compact display of the available equivalency information. This display would show a row of information about the available equivalency information and give the user a checkbox for incorporating or ignoring each. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">THOUGHTS ON OTHER OPEN QUES-TIONS</head><p>Addressing a number of other issues would also help in making DAML and WebScripter use take off.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• How do ordinary users find good original Semantic</head><p>Web content? WebScripter does not address this problem: once you found one it can point you to related content that others may have by using an equivalencyaware DAML search engine such as Teknowledge's DAML Semantic Search Service <ref type="bibr">[3]</ref>. There are no Yahoo-style portals for DAML content yet to our knowledge. There are, however at least two RDF crawlers -one from BBN [2] and one from the University of Karlsruhe [5] -that could help in building such a portal.</p><p>• What does it really mean for two classes or two attributes to be "the same"? The current DAML equivalance statements allow users to say that x is equivalent to y. We likely need a replacement construct that allows users to express that x is equivalent to y in the sense of (or context of) z. We will try to influence the DAML language definition in that direction (but admittedly aren't quite sure ourselves how to model z).</p><p>The most difficult problem we see is in the end-user interface for stating these more complex equivalencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">RELATED WORK</head><p>WebScripter's approach to ontology alignment is extreme: terms from different ontologies are always assumed to mean different things by default, and all ontology mapping is done by humans (implicitly, by putting them into the same column of a report). This is similar in spirit to Gio Wiederhold's mediation approach to ontology interoperation <ref type="bibr" target="#b9">[18]</ref>, which also assumes that terms from different ontologies never mean the same thing unless committees of integration experts say they are. WebScripter pushes that concept to the brink by replacing the experts with ordinary users that may not even be aware of their implicit ontology alignment contributions. (Note, however, that we cannot yet proof that this collective alignment data is indeed a useful source for automatic ontology alignment on an Internet scale -we lack sufficient data from distributed WebScripter use to make that claim.)</p><p>The ONION system <ref type="bibr" target="#b6">[15]</ref> takes a semi-automated approach to ontology interoperation: the system guesses likely matches between terms of two separately conceived ontologies, a human expert knowledgeable about the semantics of both ontologies then verifies the inferences, using a graphical user interface. ONION's guessing analyzes the schema information using relationships with semantics known to the system in advance (subclass-of, part-of, attribute-of, instanceof, value-of); in WebScripter human users rely purely on the data instances to decide what collates and what doesn't (because they are just not expert enough to analyze the abstractions). That being said, incorporating ONION-style alignment guessing into WebScripter would clearly be beneficial presuming the rate of correct guesses is sufficiently high.</p><p>OBSERVER <ref type="bibr" target="#b5">[14]</ref>, SIMS <ref type="bibr" target="#b0">[9]</ref>, TSIMMIS <ref type="bibr" target="#b2">[11]</ref> and the Information Manifold <ref type="bibr" target="#b4">[13]</ref> are all systems for querying multiple data sources of different schemata in a uniform way; however, they all rely on human experts to devise the ontological mappings between the sources to our knowledge. This is because they mediate between structured dynamic data sources (such as SQL/ODBC sources) without run-time human involvement where a higher level of precision is required to make the interoperation work. In contrast, WebScripter is targeted towards mediating between different ontologies in static RDF-based Web pages with run-time human involvement, where the need for precision in the translation is naturally lower.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9.">EVALUATION AND CONCLUSIONS</head><p>WebScripter has turned out to be a valuable practical tool even for the simple single-ontology case where there is only one schema but the instance data is distributed over many Web pages. For example, the Distributed Scalable Systems Division at ISI automatically pulls together its people page from many different DAMLized Web pages: some information is maintained by individuals themselves (such as their research interests), other information is maintained by the division director (such as project assignments), and some information is maintained at the institute level (such as office assignments); this relieved the administrative assistant from manually maintaining everyone's interests <ref type="bibr">[6]</ref>. WebScripter has also been used externally, for example to maintain a Semantic Web tools list <ref type="bibr">[7]</ref>. You can download WebScripter from <ref type="bibr">[8]</ref>.</p><p>However, the most exciting application of WebScripter, as a world-wide collaborative ontology translation tool, is confined to experimental use by ourselves at this point. This is more due to a lack of widespread interesting RDF(S) content than it is due to any limitation of WebScripter itself. Nevertheless, we are excited about this new approach to global knowledge sharing, may it be achieved by a future version of WebScripter or a similar tool or tools. The key difference we see between "traditional" ontology translation and our approach is that non-experts perform all of the translation -but potentially on a global scale, leveraging each others' work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Working With Multiple Data Sources In Multiple Ontologies.</figDesc><graphic coords="2,66.81,248.58,104.88,60.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Visionary Example: The user types asyet-unrecognized example values.</figDesc><graphic coords="2,319.36,53.80,234.01,109.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Visionary Example: The system determines data sources and a classification.</figDesc><graphic coords="2,319.36,210.58,234.01,109.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Visionary Example: The system autocompletes the user-provided values.</figDesc><graphic coords="3,56.35,53.80,234.01,141.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Visionary Example: Combining HTML navigation with embedded DAML semantics.</figDesc><graphic coords="3,319.36,53.80,234.00,136.88" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Visionary Example: The end result in this fictious example.</figDesc><graphic coords="4,124.86,53.80,359.99,108.78" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Implemented Example: Initial report of Stanford KSL personnel.</figDesc><graphic coords="4,319.36,202.67,234.00,106.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Implemented Example: Adding and aligning Stanford Database personnel.</figDesc><graphic coords="4,319.36,357.38,234.00,106.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Snapshot of (a fragment of ) a large-size WebScripter DAML people report.</figDesc><graphic coords="5,319.36,53.80,233.99,151.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Implemented Example: Resulting DAML equivalency statements.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell>sketches a preliminary</cell></row><row><cell>design for deciding which sameClassAs statements to</cell></row><row><cell>use. (This sketch assumes that we store much more</cell></row><row><cell>fine-grained information in the equivalency files than</cell></row><row><cell>we currently do.)</cell></row><row><cell>The first column shows the human-given label of the</cell></row><row><cell>class that is being declared as equivalent to the one the</cell></row><row><cell>user added by hand. The second column indicates the</cell></row><row><cell>level of indirection -1 if the equivalency file directly</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>Sketch of a graphical user interface.states that the class the user just added by hand is the same as the class shown, 2 or more if the equivalence was inferred by transitive closure. The third column contains the Uniform Resource Locator for the equivalency file. The fourth column shows the name of the author of the WebScripter report that implied the equivalencies. The fifth column contains the number of additional rows inserted into the user's report if she would incorporate the equivalency. The sixth column indicates when the report that resulted in the equivalency statements was authored. The last column sums up how many other users already made use of the equivalency statement in their reports.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">Note that one would not have to internally instrument a Web browser to achieve this level of integration -one could know which page the user is looking at through a proxy Web server and receive the copied HTML+DAML out of the window system's paste buffer.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">If there is more than two WebScripter could attempt to produce a generalized "fuzzy" script that will work for all remaining university ontologies given two (or more) examples ("extract the attribute whose name contains Country or Nation in the top-level concept or in a sub-concept called Location or Address").</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">The notable exception are headline exchange files such as slashdot.org/slashdot.rdf.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10.">ACKNOWLEDGMENTS</head><p>We gratefully acknowledge DARPA DAML program funding for WebScripter under contract number F30602-00-2-0576. The first author would also like to acknowledge AFOSR funding under grant number F49620-01-1-0341.</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/2000/01/rdf-schema#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#"&gt; &lt;rdfs:Class rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#PERSON"&gt; &lt;daml:sameClassAs rdf:resource= "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#PhDStudent"/&gt; &lt;/rdfs:Class&gt; &lt;rdfs:Property rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Full-Name&gt; &lt;daml:samePropertyAs rdf:resource= "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#name"/&gt; &lt;/rdfs:Property&gt; &lt;rdfs:Property rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Phone-Number&gt; &lt;daml:samePropertyAs rdf:resource= "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#phone"/&gt; &lt;/rdfs:Property&gt; &lt;rdfs:Property rdf:about="http://ksl.stanford.edu/Projects/DAML/ksl-daml-desc.daml#Has-Email-Address&gt; &lt;daml:samePropertyAs rdf:resource= "http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#email"/&gt; &lt;/rdfs:Property&gt; &lt;/rdf:RDF&gt;</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Query reformulation for dynamic information integration</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Arens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Knoblock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-M</forename><surname>Shen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intelligent Information Systems</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="99" to="130" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The semantic web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lassila</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific American</title>
		<imprint>
			<date type="published" when="2001-05">May 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The TSIMMIS approach to mediation: data models and languages</title>
		<author>
			<persName><forename type="first">H</forename><surname>Garcia-Molina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Papakonstantinou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Quass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sagiv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ullman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vassalos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Widom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intelligent Information Systems</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="117" to="132" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Combining and standardizing large-scale, practical ontologies for machine translation and other uses</title>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First International Conference on Language Resources and Evaluation (LREC)</title>
				<meeting>the First International Conference on Language Resources and Evaluation (LREC)</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Data model and query evaluation in global information systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kirk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intelligent Information Systems</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="121" to="143" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">OBSERVER: an approach for query processing in global information systems based on interoperation across pre-existing ontologies</title>
		<author>
			<persName><forename type="first">E</forename><surname>Mena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Illarramendi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kashyap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sheth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Distributed and Parallel Databases</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="223" to="271" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An algebra for semantic interoperability of information sources</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wiederhold</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering</title>
				<meeting><address><addrLine>Bethesda, MD, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001-06">November 4-6 2001</date>
			<biblScope unit="page" from="174" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A graph-oriented model for articulation of ontology interdependencies</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wiederhold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kersten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Database Technology -EDBT 2000. 7th International Conference on Extending Database Technology</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Konstanz, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">March 27-31 2000</date>
			<biblScope unit="page" from="86" to="100" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">PROMPT: Algorithm and tool for automated ontology merging and alignment</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Noy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Musen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">17th National Conference on AI</title>
				<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Interoperation, mediation, and ontologies</title>
		<author>
			<persName><forename type="first">G</forename><surname>Wiederhold</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Symposium on Fifth Generation Computer Systems, Workshop on Heterogeneous Cooperative Knowledge-Bases</title>
				<meeting><address><addrLine>ICOT, Tokyo, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1994-12">December 1994</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="33" to="48" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
