MetaDesk: A Semantic Web Desktop Manager Robert MacGregor, Sameer Maggon, Baoshi Yan Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292, U.S.A. {macgregor, maggon, baoshi}@isi.edu Abstract information and for collaborating with other MetaDesk MetaDesk is an RDF authoring tool that facilitates entry of users. facts, rather than construction of ontologies. MetaDesk places no restrictions on vocabulary—users can invent Example: MetaDesk provides two metaphors for entering terms on-the-fly, which the system converts into underlying information—users can create “nodes” (represented RDF structures. Knowledge entry focuses on the creation internally as RDF resources) that are arranged in a of semantic structures that form scaffolding both for hierarchy, and they can attach attribute-value pairs to retrieving and interpreting facts. The most common nodes. hierarchic relationships turn out to be partonomies (whole/part structures) and set membership (as opposed to the traditional is-a hierarchies and class memberships). MetaDesk is also a semantic desktop that includes references to folders and documents within its knowledge base. We have found that the same semantic structures are appropriate for organizing desktop information Introduction A year ago we experimented with a tool for attaching RDF metadata to Web pages that used Protégé [Eriksson 1999] as the data entry (authoring) component. The tool required that a class be selected for instantiation as a prerequisite to knowledge entry. Our experiment was a failure, for two reasons. We found that the ontology-driven paradigm resulted in creation of artificial classes (often suffixed with the term “Annotation”) that drew an artificial boundary between the objects being annotated and the metadata Figure1: Recording Trip information in MetaDesk descriptions. Worse, it was just annoying—the effort to select a class before typing in an annotation discouraged Suppose you are planning a trip to the forthcoming ISWC use of the tool.1 conference and you need to record information about the In response, we invented a new tool, MetaDesk that trip in an organized fashion. Details could include flight makes RDF data authoring as quick and painless as carrier, confirmation number, hotel preferences, prices etc. possible. We use MetaDesk to record the kinds of In addition, you would like the information to be metadata we generate during everyday tasks. We quickly represented in such a way that restructuring of the data is discovered that the kinds of knowledge structures users feasible. Storing such information in the current RDF (the authors, in this case) produced with the tool differ authoring tools is a tedious process. As opposed to directly from the structures found in typical RDF databases. writing the information in the tool, you first have to create Currently, we are using MetaDesk as a personal a myriad of classes and properties like Trip Class, Flight information manager to keep track of projects, proposals, Class, and Hotel Class etc. Also, the domain and range to-do lists, slides, etc., and as a launching pad for quickly constraints of the properties have to be specified. Further bringing up specific folders and documents (like Windows more, the ontological information is not very obvious in shortcuts, only better organized and optionally possessing particular cases. For example, it is difficult to name the metadata annotations). Our intention is to add one or more relationship between Trip class and Flight class and additional knowledge sharing capabilities to MetaDesk, between Trip class and Hotel class. As a result, a naive and then release it as a generic tool for managing user, or one in a hurry, would prefer to create such information in a text format than recording it in an ontology-driven RDF authoring tool. Our tool excels in 1 simplicity, providing an efficient data entry paradigm. These artificial classes were created to provide domains for “annotation properties”. 107 Recording the information in this example is easy and fast consciously imitates the gestures, look and feel used to with MetaDesk. One can simply create a Trip node and construct hierarchies using Windows Explorer. add some child nodes to it. The child nodes could be a Flight node, a Hotel node and a Conference node. One can If ‘P’ is a node, and ‘C’ is one of its children, the link attach other information to individual nodes; for example, between them is represented by a triple of the form where ‘R’ is either ‘parentChild’ or one of its resultant hierarchy is shown in Figure 1. subproperties. The ‘parentChild’ relationship is roughly definable as the most-general, directed structural MetaDesk is all RDF-based--although users enter the data relationship. As such it subsumes more specific relations rather quickly without knowing anything about RDF, the such as whole/part, class/subclass, set/set member, or created data is converted to RDF triples. Below we list the folder/subfolder. We originally assumed that it should also underlying RDF triples (in N3 format for readability) for subsume the class/instance property (the inverse of the information shown in Figure 1. The "parentChild" links ‘rdf:type’), but when viewing children of a class, we found specify that under the "ISWC_2004_Trip" node are three that we wanted to see only its subclasses, not mixed in nodes: "Flight" node, "Hiroshima_Prince_Hotel" node and with its instances. A node can have multiple parents (it "Places_to_Visit" node. Under "Flight" node are four other occupies the object position of multiple ‘parentChild’ nodes representing individual connecting flights: triples). A special node called ‘Heap’ exists as a catch- "JAL1604", "JAL5016", "JAL5015", and all—an RDF resource that does not have a parent node is "JAL1601". There are also RDF triples defining the considered to be “on the heap”. This is handy for reservation number and phone number for the hotel, etc. operations such as tabbed search that assumes that each node it displays is located somewhere in the hierarchy. myNS:Trips rdfs:label “Trips” ; Each node N has zero or more attributes, represented by sew:parentChild myNS:ISWC_2004_Trip . triples of the form where ‘R’ is not a subproperty of ‘parentChild’ (or of its inverse). There are myNS:ISWC_2004_Trip no restrictions on what attributes can be attached to a node rdf:type myNS:Trip ; (i.e., violations of domain constraints may be flagged, but rdfs:label “ISWC 2004 Trip” ; are not forbidden). Users are encouraged, but not required, sew:parentChild myNS:Hiroshima_Prince_Hotel to fill in the attributed named “type”, which denotes the ,myNS:Places_to_Visit property ‘rdf:type’. A future version of MetaDesk will ,myNS:Flight . semi-automate the filling-in of type attributes. myNS:Hiroshima_Prince_Hotel rdf:type myNS:Hotel ; RDF structures in their raw format are not readable, so we rdfs:label "Hiroshima Prince Hotel" ; want to hide all details of RDF from users, including URIs myNS:Phone_Number "81-82-256-1111" ; and namespaces. Hence, all non-literal names that a user myNS:Reservation_Number "3345788" . sees in MetaDesk (names attached to nodes in the hierarchy, attributes, and in attribute value position) myNS:Places_to_Visit correspond to RDF ‘labels’. Underneath, each label ‘N’ rdf:type sew:Desktop_Folder ; maps to a URI ‘U’, and MetaDesk asserts the triple . Some labels have semantics built in, e.g., fileNS:fullpath "C:\\Documents and “type” maps to ‘rdf:type’ and “parent class” maps to Settings\\maggon\\My Documents\\Places to Visit". ‘rdfs:subClassOf’. By default, a label “xxx” that does not match an existing label is mapped to the URI ‘myns#xxx’, myNS:Phone_Number rdfs:label "Phone Number". where “myns” is the URI for a user’s personal namespace. myNS:Flight rdfs:label "Flight" ; An attribute value ‘V’ is stored as a literal (a string) if the sew:parentChild myNS:JAL5016 , myNS:JAL1604 , relevant range information references a literal class (a myNS:JAL5015 , myNS:JAL1601 . subclass of ‘rdfs:Literal’), or as a resource if the range indicates a non-literal. If there is no range information, Mapping MetaDesk to RDF then the system first looks for a label matching ‘V’, creating a matching resource if there is. Otherwise, ‘V’ MetaDesk is represented as “triples all the way defaults to a string, but the user can convert a literal value down”—every link in MetaDesk maps to a triple. A new it into a new resource (by gestures provided by MetaDesk) node is created by highlighting an existing node, and any time. Values representing brand-new resources are explicitly typing the name of a child node, or by dragging considered a part of the “heap”. something (a Web page, PDF file, Word Document, etc. or another node) onto the highlighted node. MetaDesk 108 Importing Data Plug-ins Arbitrary RDF files can be dropped into a MetaDesk MetaDesk architecture can be extended by using hierarchy, but MetaDesk will not know which new plug-ins to create alternate displays for the top and bottom resources to treat as nodes within the hierarchy. Instead, panes to the right of the hierarchy pane. Plug-ins are all of these nodes are assigned to the “heap”. An associated with particular data types – when a node is exception is Class and Property resources. These are highlighted, the default display plus all relevant plug-ins entered under the Ontology node, below either ‘owl:Thing’ that correspond to the type of that node are presented as or ‘sew:Attribute’(‘sew’ is the nickname for the options. MetaDesk also enables users to select a default namespace that is internally used by MetaDesk). plug-in for the data type; this way MetaDesk remembers the user’s choice for the next time. We have developed a Arbitrary XML files can also be dropped into a MetaDesk photo viewer plug-in (Figure 2) that enables users to view hierarchy. These are automatically converted into RDF, the thumbnails of the images organized in MetaDesk. with the top-most tag forming the root resource. The Whenever the user clicks on the Album Node (a node with ‘parentChild’ Property is used to represent the relationship the rdf:type – Photo_Album) in the hierarchy, the between tags and subtags (except when the subtag photographs are shown in the bottom pane. User can view represents a literal). For example, for the following XML as well as annotate the pictures thus embracing an interactive session. America West Our translator would create resources of RDF type ‘myns:Trip’, ‘myns:Hotel’, and ‘myns:Flight’, with ‘parentChild’ links from the Trip resource to the Hotel and Flight resources. Each of the three attributes is converted into the obvious RDF triple. The Flight resource is linked via a triple to the string “America West” via a property named ‘myns:carrier’. Interaction with Windows Applications The primary means provided currently for interacting Figure 2: Photo-Plugin for displaying graphics resources with desktop objects are (i) drag and drop actions to/from the desktop and (ii) launching applications by double- MetaDesk allows a user to choose the plug-in for any data- clicking on nodes denoting them that reside in the type (or class). For example, a user might want to associate the photo viewer plug-in with the nodes that have the type hierarchy. Windows folders are a special case—when a Photo Graphs instead of Photo Album. This leverages the Windows folder is dropped into the hierarchy, the ease of customizing MetaDesk according to personal corresponding MetaDesk node can materialize additional preference. In addition to developing plug-ins for specific child nodes (on demand) corresponding to the contents of data types, one might consider writing a plug-in that the folder when the node is “opened”. Annotations enforces type restrictions on its input, or one that displays attached to folders are persistent, but the ‘parentChild’ Protege-like templates in place of the free-form attribute links that relate folders and subfolders are not stored editor that comes standard in MetaDesk. Such plug-ins persistently (to save space). Move and copy operations on would enable MetaDesk to mimic more traditional folder nodes cause corresponding changes in the Semantic Web RDF editors. Thus, MetaDesk uses these underlying Windows desktop hierarchy. plug-in points to keep track of the user's working behavior and provide self-personalization. A complete semantic desktop should demonstrate similar levels of integration for other applications such as e-mail. Ideally, one or several commercial e-mailers could be Ongoing Work integrated with MetaDesk. Alternately, one could mimic Search: Currently, MetaDesk supports keyword search. Haystack [Quan 2003] and implement an entire e-mail When searching for a match to the keyword “xxx”, a triple application (as a plug-in) within MetaDesk. matches if one of S, P, or V has a label 109 containing “xxx” as a substring, or if V is a literal value defining ontology first. Instead, it is our belief that that contains “xxx”. Results may be in the form of a tabbed ontologies can be created later in a bottom-up fashion, as search, wherein each hit of the ‘tab’ key opens the the by-product of creating and using data, rather than a hierarchy to the location of the next matching node, or the straightjacket that inhibits the evolution of domain results may be placed under a newly-created search node vocabularies. Compared with other ontology-driven RDF which can further be annotated. authoring tools (SHOE Annotator [Heflin 1999] OntoMat [Handschuh 2002] SMORE [Kalyanpur 2003] Melita Ontology Alignment: Philosophically, MetaDesk runs [Ciravegna 2002]), MetaDesk is more ordinary-user completely against the grain by promoting “ontological friendly, more flexible in metadata creation, and provides promiscuity” and advocating bottom-up development of immediate rewards to users’ effort. ontologies. “Promiscuity” refers to MetaDesk’s MetaDesk’s metadata authoring paradigm allows quick encouraging users to make up their own vocabulary. In data entry and organization. As a result, MetaDesk is our scheme, we first let a thousand flowers bloom, and already viable as a personal information manager. then specify semantic mappings (alignments) that say how MetaDesk has been extended as a usable semantic desktop one user’s terminology relates to another’s. We call this application. It is integrated with an actual user desktop, “grassroots alignment”, since it empowers ordinary users allowing direct annotations on file systems and direct to build terminologies, instead of requiring ontology launching of applications from within it. MetaDesk’s experts. The current MetaDesk is missing two things: (i) simplicity in metadata creation as well as usefulness as a “carrots” that encourage MetaDesk users to align their semantic desktop makes it a rewarding semantic web terminology with terms used by others and to fill in the application. type attribute on each node, and (ii) alignment tools that make aligning terms very simple. One example of such a References carrot is a search facility that exploits alignments to increase the recall of its matches. Another is a report F. Ciravegna, A. Dingli, D. Petrelli, and Y. Wilks. Timely generator that produces denser, better organized reports and Non-Intrusive Active Document Annotation via when alignments are taken into account. ISI’s Adaptive Information Extraction. Semantic Authoring, WebScripter[Yan 2003] report generator incorporated both Annotation and Knowledge Markup, ECAI Workshop, a carrot and an alignment capability into a single tool. July 2002. Determining whether quality ontologies can be achieved bottom-up via a sufficiently mature set of carrots and H. Eriksson, R. W. Fergerson, Y. Shahar, and M. A. alignment tools is at this point an open question—one that Musen. Automatic Generation of Ontology Editors. 12th we believe deserves to be tested. Banff Knowledge Acquisition Workshop, 1999. S. Handschuh and S. Staab. Authoring and Annotation of Future Directions Web Pages in CREAM. WWW, May 2002. At present, we have hypothesized that end-user Heflin, J., Hendler, J., and Luke, S. SHOE: A Knowledge alignment can compensate for the ontological promiscuity Representation Language for Internet Applications. engendered by multiple MetaDesk users, enabling a Technical Report CS-TR-4078 (UMIACS TR-99-71), community of MetaDesk users to profitably share Dept. of Computer Science, University of Maryland at information. This hypothesis needs to be tested. Our near College Park. 1999. term goal is to add sharing capability, and then to distribute MetaDesk to a community of users. Our supply A. Kalyanpur, B. Parsia, J. Hendler, and J. Golbeck. of “carrots”—tools that encourage end-users to align with SMORE – Semantic Markup, Ontology, and RDF Editor. each others’ vocabulary—is still sparse. We will find out whether we are close to having a viable sharing D. Quan, D. Huynh, and D. R. Karger. Haystack: A infrastructure, or if more incentives are needed. Platform for Authoring End User Semantic Web Applications. International Semantic Web Conference, Oct MetaDesk will eventually support multiple search 2003. regimens—more sophisticated ones will trade precision for user convenience (more typing yields more precision). B. Yan, M. Frank, Pedro A. Szekely, R. Neches, J. Lopez: WebScripter: Grass-roots Ontology Alignment via End- User Report Creation. International Semantic Web Conclusion Conference, Oct 2003. We have introduced MetaDesk, an original RDF authoring tool. MetaDesk’s approach to RDF authoring is extreme: users immediately create metadata without 110