<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Annotation and Navigation in Semantic Wikis?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eyal Oren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renaud Delbru</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Knud M¨oller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Max V¨olkel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Siegfried Handschuh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DERI Galway</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Forschungzentrum Informatik</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <abstract>
        <p>Semantic Wikis allow users to semantically annotate their Wiki content. The particular annotations can differ in expressive power, simplicity, and meaning. We present an elaborate conceptual model for semantic annotations, introduce a unique and rich Wiki syntax for these annotations, and discuss how to best formally represent the augmented Wiki content. We improve existing navigation techniques to automatically construct faceted browsing for semistructured data. By utilising the Wiki annotations we provide greatly enhanced information retrieval. Further we report on our ongoing development of these techniques in our prototype SemperWiki.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Wikis are collaborative hypertext authoring environments. Wikis allow people
to collaboratively collect, describe, and author information. Since most
information in ordinary Wikis consists of natural-language texts, structured access and
information reuse are practically not possible [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Semantic Wikis allow users to make formal descriptions of resources by
annotating the pages that represent those resources. Where a regular Wiki enables
users to describe resources in natural language, a Semantic Wiki enables users
to additionally describe resources in a formal language. By adding metadata
to ordinary Wiki content, users get added benefits such as improved retrieval,
information exchange, and knowledge reuse.</p>
      <p>An ordinary Wiki should offer functionality3 such as access control, binary
data management, version management, notification, and data export. In our
opinion, a Semantic Wiki should specifically address three additional questions:</p>
      <sec id="sec-1-1">
        <title>1. how to annotate content? 2. how to formally represent content? 3. how to navigate content?</title>
        <p>? This material is based upon works supported by the Science Foundation Ireland
under Grants No. SFI/02/CE1/I131 and SFI/04/BR/CS0694 and by the European
Commission under the Nepomuk project FP6-027705.
3 http://en.wikipedia.org/wiki/Wiki</p>
        <p>
          Recently several Semantic Wikis have been developed, such as Platypus [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ],
WikSAR [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], Semantic MediaWiki [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and IkeWiki [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. These Wikis answer
these questions in a rather limited way: (a) they allow only simple annotations
of the current Wiki page; (b) they do not formally separate the page and the
concept that it describes; and (c) they do not fully exploit the semantic annotations
for improved navigation.
        </p>
        <p>
          In this paper we specifically address these three questions in a broader way:
in Sect. 2 we analyse Wiki annotations from a conceptual level, discuss
representation mechanisms, and current annotation support in Semantic Wikis. In Sect.
3 we offer an improved navigational model based on semantic annotation; the
navigation model is similar to e.g. Longwell4 for faceted browsing of
semistructured data, but works, in contrast to existing approaches, for arbitrary datasets
with arbitrary structure. We report on our prototype implementation
SemperWiki [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] in Sect. 4; the implementation has been updated to include these new
ideas.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Annotations</title>
      <p>In the following section we discuss our first question: how to annotate Wiki
content?</p>
      <p>
        Let us first analyse what an annotation is. We annotate data all the time:
when we read a paragraph, and mark “great!” in the margin, that is an
annotation; when our text editor underlines a misspelled word, that is also an
annotation. Annotations add some information to some other information; to
annotate means “to make notes or comments” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>Another way to view annotations is metaphorically: URIs5 are the “atoms”
of the Semantic Web and semantic annotations are the “molecules”. The
Semantic Web is about shared terminology, achieved through consistent use of URIs.
Annotations create a relationship between URIs and build up a network of data.
2.1</p>
      <sec id="sec-2-1">
        <title>Conceptual model</title>
        <p>
          We now explore the conceptual model behind annotation in more depth. The
term “annotation” can denote both the process of annotating and the result of
that process [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Where we say “annotation” we mean the result. An annotation
attaches some data to some other data. An annotation establishes, within some
context, a (typed) relation between the annotated data and the annotating data.
        </p>
        <p>Investigating the nature of annotation further, we can model it as a
quadruple:
Definition 1 (Annotation). An annotation A is a tuple (as, ap, ao, ac), where
as is the subject of the annotation, the annotated data, ao is the object of the
annotation, the annotating data, ap is the predicate, the annotation relation, that
4 http://simile.mit.edu/longwell/
5 http://www.w3.org/Addressing/
defines the type of relationship between as and ao, and ac is the context in which
the annotation is made.</p>
        <p>Example 1 (Informal annotation).</p>
        <p>The annotation subject can be formal or informal. For example, when we put
a note in the margin of a paragraph, the informal convention is that the note
applies to the paragraph, but that pointer is not formally defined. If we however
use a formal pointer such as a URI6 to point to the paragraph then the subject
is formally specified.</p>
        <p>The annotation predicate can be formal or informal. For example, when we
put a note in the margin, the relation is not formally defined, but we may
informally derive from the context that that the note is a comment, a change-request,
an approval or disapproval, etc. If we use a formal pointer to an ontological term
that indicates the relation (e.g. dc:comment) then the predicate is formally
defined.</p>
        <p>The annotation object can be formal or informal. If an object is formal we
can distinguish different levels of formality: textual, structural, or ontological.
For example, then string “This is great!” is a textual object. A budget
calculation table in the margin of a project proposal is a structural object. And an
annotation object that is not only explicitly structured but also uses ontological
terms7 is an ontological object.</p>
        <p>The annotation context can be formal or informal. Context can could indicate
when the annotation was made and by whom (provenance), or within what scope
the annotation is deemed valid, for example in a temporal scope (it is only valid
in 2006) or in a spatial scope (it is only valid in Western Europe). Usually context
is given informally and implicitly. If we use a formal pointer such as a URI then
the context is formally defined.</p>
        <p>
          Combining the levels of annotation subject, predicate, and object, we can
distinguish three layers in annotations: i) informal annotations, ii) formal
annotations (that have formally defined constituents and are thus machine-readable),
6 One can use XPointer to point to a paragraph in a document and XPointer can be
used as a URI, as discussed in http://www.w3.org/TR/xptr-framework/#escaping.
7 Ontological means that the terminology has a commonly understood meaning that
corresponds to an shared conceptualisation called ontology [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Whether a term is
ontological is a social matter and not a technical or formal matter. It is sometimes
mistakenly understood that using a formal ontology language makes terms
ontological. An ontology however denotes a shared (social) understanding; the ontology
language can be used to formally capture that understanding, but does not preclude
reaching an understanding in the first place.
and iii) semantic annotations (that have formally defined constituents and use
only ontological terms). We have given some simple examples for each kind of
annotation in Examples 1 (a handwritten margin annotation in a book), 2
(formally expressed in N38) and 3 (formally expressed and using ontological terms),
respectively. All three examples are here given without any explicit context.
Definition 2 (Formal annotation). A formal annotation Af is an annotation
A, where the subject as is a URI, the predicate ap is a URI, the object ao is a
URI or a formal literal, and the context ac is a URI.
        </p>
        <p>Example 2 (Formal annotation).
&lt;h t t p : / / p a p e r s . o r g / m i n i m a l i s m#minor&gt;
&lt;h t t p : / / l o c a l h o s t / schema#d i s a g r e e &gt;
” that ’ s not minor ! ” .</p>
        <p>Definition 3 (Semantic annotation). A semantic annotation As is a formal
annotation Af , where the predicate ap and the context ac is an ontological term,
and the object ao conforms9 to an ontological definition of ap.</p>
        <p>Example 3 (Semantic annotation).
&lt;h t t p : / / p a p e r s . o r g / m i n i m a l i s m#minor&gt;
i b i s : con
[ r d f : t y p e i b i s : Argument ;</p>
        <p>r d f : l a b e l ” that ’ s not minor ! ” ] .
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Annotations in Wikis</title>
        <p>
          We can, similarly to [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], distinguish three levels of annotations in a Semantic
Wiki:
Layout Annotations that describe textual formatting without additional
structural information, such as bold or italic words10.
        </p>
        <p>Structure Annotations that describe the structure of a page or of a set of
pages, such as hyperlinks (inter-page structure), headings, subheadings, and
paragraphs (internal page structure), and itemised and numbered lists.
Semantics Annotations that relate pages or page elements to arbitrary
resources through typed ontological relations, such as categorising a page in
a taxonomy, specifying the friends of a described person, or the books of a
described author.
8 http://www.w3.org/DesignIssues/Notation3.html
9 The notion of “conformance” is rather weak in some ontology languages (such as
RDFS or OWL) since these are not constraint-based languages (as opposed to e.g.
database schemas). However, we use the notion of conformance to differ between
“good” usage of textual objects, for example to indicate the name of a person, and
“bad” usage of textual objects, for example to indicate the friends of a person.
10 These annotations could formally be considered semantical, because they have an
explicit and shared meaning, which is used by the rendering engine.</p>
        <p>Annotations in a regular Wiki are limited to layout and structural
annotations. Semantic annotations are unique to Semantic Wikis, and are the further
focus of this section.</p>
        <p>
          We now present one possible annotation syntax for semantic annotations,
namely the one used in SemperWiki [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. To simplify the annotations, we only
consider annotations that have the page on which they appear as subject. The
annotation subject is thus implicitly defined. We also limit ourselves, for
simplicity, to annotations with an implicit context. The annotations are then restricted
to defining the predicate and object, which is done by simply stating the two on
a separate line.
        </p>
        <p>
          The example page shown in Fig. 1 describes the World Wide Web
Consortium. The page includes some English text, and some annotations which state
(using the Wordnet and Semantic Web Research Community ontologies) that
the W3C is an organisation lead by Tim Berners-Lee. The syntax includes
referencing using namespace abbreviations, internal Wiki pages, and full URIs; see
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] for more information.
        </p>
        <p>W3C
The World Wide Web Consortium (W3C) develops interoperable
technologies (specifications, guidelines, software, and
tools) to lead the Web to its full potential.
rdf:type wordnet:Organization
swrc:head http://www.w3.org/People/Berners-Lee/card#i
Having defined annotations in Wikis, we now answer the second question: how
to formally represent Wiki content?</p>
        <p>RDF11 is a straightforward way to represent these annotations formally, since
it has exactly the same model as our annotations. We can either use standard
RDF to represent annotations without context, or RDF quads (which is a
common RDF extension) for annotations with context.</p>
        <p>RDF does pose some constraints on the constituents of triples: the subject
must be a URI or a blank node (not a literal), and the predicate must be a URI
(not a literal or blank node). If we follow these restrictions in our annotations,
RDF offers a good representation model.
11 http://www.w3.org/RDF/</p>
        <p>We represent pages and their annotation in RDF as follows: each page is an
RDF resource, and each annotation a property of that resource. We can represent
not only the semantic annotations in RDF but the whole Wiki content. The
(natural language) Wiki content is captured through the predicate semper:content,
the outgoing links to other pages through the predicate semper:links. Figure
2 shows the RDF graph that represents the page in Fig. 1.
Problem: documents vs. concepts Because annotations can describe
concepts (the W3 consortium) and web documents (the page about the W3
Consortium), the question arises which URI to use as the annotation subject.</p>
        <p>For example, the Wiki page in Fig. 1 also contains the statement that it
was created on January 1, 2006. But does this statement say that the document
was created in 2006 or that the subject concept of the document, i.e. the W3C,
was created in 2006? We may derive with some background information that we
mean the first, but we actually need a way to say both: we sometimes want to
make statements about a concept and sometimes about the document describing
that concept.</p>
        <p>
          This issue (often referred to as the “URI crisis”) is well-known from early
discussions on Web architecture, and has gained renewed interest in the Semantic
Web community. The problem is that it is unclear what a URI denotes (at least,
it is unclear for URIs that are URLs, but the discussion focuses primarily on http
URIs which are indeed URLs). A URL can denote a name, an abstract concept,
a web location, or a document [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The root of the problem is that the same URI
can be used to identify a subject directly (web document) or indirectly (concept
that is subject of document) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Hawke [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] suggests12 to disambiguate the concept and the document
syntactically by using the # symbol: http://google.com/ would denote the web
document and http://google.com/# would denote the concept. The solution is
not ideal [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] since the hash symbol is a legal URI character and can be used to
denote a document fragment, while referring to document fragments with URI
fragment identifiers is crucial for fine-grained document annotation13.
Solution: locators vs. names As Pepper remarks, “using a locator for
something that does not have a location is asking for trouble” [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The obvious
solution is to not use a locator (URL) but a non-addressable identifier14 (URN)
for non-locatable things such as concepts.
        </p>
        <p>
          Unfortunately, using a URN to identify concepts violates the fundamental
Web principle that a URI should point to a location with useful information
about the thing it identifies [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, that could be remedied by using a
syntactical convention (mirror-URIs) to relate the document URL to the concept
URN, such as prefixing the URL with the urn: protocol handler.
        </p>
        <p>To complete this solution, we need to extend our Wiki syntax in two ways
to include a way:
1. to distinguish annotations about a document (Wiki page) from annotations
about the concept, which we do by prefixing the annotation with the !
symbol.
2. to relate a page to the concept it describes (in case the page describes a
concept in a different naming authority, e.g. a page on http://wikibase/W3C
that describes urn://w3.org), which we do with semper:about.
Having answered the first two questions (how to annotate and how to represent
Wiki content), we now characterise the annotation and representation in several
existing Semantic Wikis.</p>
        <p>
          Annotations in Semantic Wikis are formal and possibly semantic, i.e. they are
formally defined, and possibly use ontological terms. We have selected several
dimensions to classify annotations in Semantic Wikis from the literature (we
again focus on the annotation result, not the annotation process). We have added
one new dimension to capture the important notion of annotation context:
12 The proposal is a bit more intricate, but for our purposes this explanation suffices.
13 see e.g. http://w3.org/TR/annotor [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
14 Clarification on the relation between URIs, URLs and URNs can be found at http:
//www.w3.org/TR/uri-clarification/.
        </p>
        <p>W3C
The World Wide Web Consortium (W3C) develops interoperable
technologies (specifications, guidelines, software, and
tools) to lead the Web to its full potential
semper:about urn://w3.org
rdf:type wordnet:Organization
swrc:head http://www.w3.org/People/Berners-Lee/card#i
Now we have an annotation about the page itself:
!dc:date "2006/01/01"</p>
        <p>(a) example page
2006/01/01</p>
        <p>
          Fig. 3: RDF representation of an example page
Subject attribution (also called “scope” [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]) Indicates the subject of the
annotation: is the subject of the annotation the same as the page on which
it appears or an arbitrary page? In a Wiki, the possible attributions are: the
page on which an annotation appears, an arbitrary page, or an anonymous
resource.
        </p>
        <p>
          Subject granularity (also called “lexical span” [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]) Indicates the granularity
of the annotation subject: e.g. is the annotation about a document, a section
inside a document, a sentence, or a word?
Representation distinction (also called “instance identification vs. reference”
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]) Indicates whether the Wiki distinguishes annotations about the Wiki
page itself from annotations of the concept described on the page?
Terminology reuse (also called “interoperability” [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]) Indicates whether an
annotation is self-confined with its own terminology, or whether an
annotation uses terms from existing ontologies, and are thus interoperable and
understandable for others.
        </p>
        <p>
          Object type (also called “annotation form” [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]) Indicates the type of
annotation object: is it a literal or textual object, a structural object (including a
hyperlink to another page), or an ontological object?
Context Indicates the context of the annotation: when was it made, by whom
(provenance), and within what scope: the annotation could for example be
temporally scoped (it is only valid in 2006) or spatially scoped (it is only
valid in Western Europe).
        </p>
        <p>
          These dimensions can indicate the level of annotation in current Semantic
Wiki approaches. We do not provide an exhaustive evaluation, but evaluate
WikSAR [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], Semantic MediaWiki [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], IkeWiki [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and SemperWiki [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as the
most prominent systems under ongoing development.
        </p>
        <p>dimension WikSAR Sem. MediaWiki IkeWiki SemperWiki
attribution current current current current, any URI
granularity page page page page, any fragment
repr. distinction no no yes yes
terminology reuse no no yes yes
object type literal, page literal, page literal, page literal, page, URI
context no no no no</p>
        <p>Table 1: Annotations in current Semantic Wikis
Subject attribution Most existing Wikis only allow statements about the
current page. The subject of an annotation is never explicitly stated, but always
implicitly assumed to be the page on which the statement appears. In
SemperWiki the user can explicitly state the subject of the annotations, because
we separate the page and the thing it describes (as explained in Sect. 2.3),
and annotations can thus be attributed to arbitrary URIs.</p>
        <p>Subject granularity Most existing Wikis only allow annotation of complete
pages, not of subsections or arbitrary parts of text, for the same reason
(implicitly) as mentioned above.</p>
        <p>Since SemperWiki allows users to attribute annotations to arbitrary URIs
one could annotate a document fragment as follows: create a Wiki page,
point it to the document fragment using an XPointer URI, and annotate the
page.</p>
        <p>Representation distinction Of the discussed Wikis only SemperWiki clearly
separates the page from the concept that it describes, and offers a syntax
that distinguishes annotations of the page from annotations of the concept.
IkeWiki also separates pages from the concepts that they describe (a concept
can be represented on multiple pages), but does not, as far as we know, offer
a syntax to manually express this distinction.</p>
        <p>Terminology reuse IkeWiki and SemperWiki allow existing terminology to be
reused in annotations (through namespace definitions or full URIs), the rest
can only create annotations using internal Wiki pages and can thus not make
use of existing terminology.</p>
        <p>Object type All discussed Wikis allow an object to be a literal or an internal
Wiki page. Of the discussed Wikis, only SemperWiki allows the object of
an annotation to be an arbitrary URI. No Semantic Wiki allows unnamed
resources (blank nodes) as objects.</p>
        <p>Context Is ignored in all existing Wikis.</p>
        <p>Summarising, we have developed a conceptual model for annotations in
general, and for semantic annotations in the context of Semantic Wikis specifically.
Given this model we have seen that current Semantic Wikis offer only limited
annotation possibilities (which is not necessarily wrong, but has now been
recognised explicitly), and do not clearly separate the page from the concept that it
describes. We have shown how SemperWiki addresses these limitations.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Navigation</title>
      <p>Having answered the first two questions, we now investigate the third question:
how to navigate Wiki content?</p>
      <p>
        When navigating an ordinary Wiki, all content is considered either a
hyperlink or some natural language text. The hyperlinks between pages can be
followed, and the full-text can be searched by keyword. But if users can not
exactly formulate their information need, an exploration technique is necessary
that helps users to discover data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In our opinion, navigating a Wiki has two phases: looking for a page, and
looking at a page. In an ordinary Wiki, exploration in both phases is limited
to predefined hyperlinks. In Semantic Wikis, the semantic annotations structure
the Wiki content, and we can use that structure to offer improved exploration
through a technique called faceted browsing [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>Existing approaches for faceted browsing rely on manually constructing the
facets for a fixed data structure. But since Wiki content can form arbitrary and
fluent structures (because users can add arbitrary annotations to pages), we need
to adjust faceted browsing to arbitrary data structures.</p>
      <p>In this section, we present our approach to automatically construct facets for
an arbitrary semi-structured dataset, independent of its structure.
3.1</p>
      <sec id="sec-3-1">
        <title>Background</title>
        <p>
          Faceted browsing is a superior exploration technique for large structured datasets
[
          <xref ref-type="bibr" rid="ref21 ref24 ref6">24,21,6</xref>
          ] based on the theory of facet analysis [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>In faceted browsing, the information space is partitioned using orthogonal
conceptual dimensions of the data (these dimensions are called facets). Each
facet has multiple restriction values; users select a restriction value to constrain
relevant items in the information space.</p>
        <p>In the Semantic Wiki, a facet corresponds to an annotation predicate ap
and a restriction value corresponds to an annotation object ao. The annotation
subject is the result (or purpose) of the faceted browsing: faceted browsing is a
search process that takes the predicate and object values as input and returns
possible matching the subject.</p>
        <p>For example, a collection of art works can consist of facets (predicates) such
as type of work, time periods, artist names and geographical locations. Users can
select a certain restriction value (object) such as the 20th century to constrain
the visible collection to only some art works. Multiple constraints are applied
conjunctively.</p>
        <p>
          Existing approaches [
          <xref ref-type="bibr" rid="ref11 ref24">24,11</xref>
          ] cannot navigate arbitrary datasets: they are
limited to manually defined facets over predefined data structures. A technique for
automatic classification of new data under existing facets has been developed
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], but requires a predefined training set of data and facets, and only works
for textual data. A technique for automatic facet construction based on lexical
dispersion has been developed [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], but is also limited to textual data.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Automatic facet extraction</title>
        <p>We combine several existing techniques to offer faceted browsing for arbitrarily
structured data. Setting up faceted browsing for a specific dataset involves two
steps: i) selecting proper facets and ii) partitioning each facet into a number of
restriction values.</p>
        <p>In most existing faceted browsers, both steps are done manually: an
administrator examines the dataset (e.g. a museum collection), selects useful facets
(e.g. time period, artist name, location), and partitions each facet into useful
restriction values: e.g. the time facet would be divided in 20 centuries, the artist
facet into 26 starting letters, and the location (hierarchically) into continent and
then countries.</p>
        <p>We focus on automation of the first step: selecting proper facets.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Facet selection</title>
        <p>
          A facet should only represent one important characteristic of the classified
entity [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. This entity corresponds to our notion of RDF resource. In RDF, each
resource is defined by one or more predicates; these predicates could be
considered as entity characteristics. Our goal is to find, among all available predicates,
those that best represent the dataset.
        </p>
        <p>
          Frequency A good predicate has a high occurrence frequency inside the
collection. The more distinct resources a predicate covers, the more useful it is in
dividing the information space [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
Distinguishing power A good predicate has a uniform value distribution (its
distinguishing power is high). A division in which the information is
distributed uniformly across all partitions enables the fastest navigation to an
item of interest.
        </p>
        <p>Object values A good predicate has a limited number of different object values
(between 2 and 20). If there are too many different objects to choose from,
then the options are difficult to display and may disturb the user.
Intuition A good predicate reflects the scope of the information space and is
intuitive for the user. For example, a user who only knows the author of
some book will try to find it by using the facet “author”. Conversely, a user
who only knows the title of a book will try to find it using the “title”.</p>
        <p>We define three metrics (for the first three properties) that rank the
appropriateness of each predicate; we exclude the mathematical treatment for brevity.
Fig. 4 shows these metrics for a sample (CiteSeer) dataset. We cannot define a
metric for intuition, since we cannot properly define intuition.</p>
        <p>(b)
power
(a) Predicate frequency</p>
        <p>Distinguishing</p>
        <p>(c) Object values
Frequency To measure the frequency of a predicate, we use a simple function
based on the number of distinct resources that have the predicate. For example,
in Fig. 4a we see that year and type occur frequently in the sample data.
Distinguishing power To measure the distinguishing power of a predicate we
use a simple function based on the number of distinct subjects having the same
object. If each object has the same number of distinct subjects, the score of the
predicate is highest. For example, in Fig. 4b we see that the predicate year is
not very balanced: there are more publications in later years.</p>
        <p>
          Object values For displaying and usability purposes (the user should be able
to have an overview of options and decide on a restriction value), the number of
different object values should be approximately between [
          <xref ref-type="bibr" rid="ref2 ref20">2, 20</xref>
          ]. For example, in
Fig. 4c we see that the predicate booktitle has many different object values, and
the predicate type only a few (so the latter one would be more usable).
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Implementation</title>
      <p>This section presents our prototype implementations of the previous ideas.</p>
      <p>
        Our open-source prototype SemperWiki15 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was initially developed as
personal Wiki for knowledge management, and therefore designed as a desktop
application. The original version of SemperWiki, shown in Fig. 5, is implemented
in Ruby16, using the GTK17 graphical toolkit.
      </p>
      <p>Fig. 5: SemperWiki prototype</p>
      <p>
        We are currently porting SemperWiki to a Web architecture to make it
crossplatform accessible, using ActiveRDF [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Ruby on Rails18. The new version
of SemperWiki contains all the annotation functionality described in Sect. 2, and
clearly distinguishes between documents and concepts, as discussed in Sect. 2.3.
15 http://semperwiki.org
16 http://ruby-lang.org/.
17 http://gtk.org.
18 http://rubyonrails.org
      </p>
      <p>Secondly, we have built a prototype that implements the automatic selection
of facets. The resulting faceted browsing interface is shown in Fig. 6; please note
that this interface is automatically generated for arbitrary data. In this dataset,
year, type, booktitle and journal are the facets (selected from the predicates),
and 1988, 1992, etc. are the facet values (annotation objects without clustering).
The prototype is implemented in Ruby and ActiveRDF, and works on arbitrary
RDF data sources through the generic RDF API of ActiveRDF.</p>
      <p>We have not yet done a comprehensive assessment, but an initial evaluation19
looks promising: the metrics automatically select the most important predicates
(such as year, type and author) as the most important facets.
The results of our work allows us to give good answers to the three initial research
questions of this paper. We are satisfied with this overall results but we will also
have in the following a short discussion about possible unsettled points.</p>
      <p>Our approach for annotation in the Semantic Wiki ignores the context of
annotations. Actually, to our knowledge, all annotation approaches ignore the
notion of context. More research is needed on identifying and on modelling
context of annotations.
19 On a sample CiteSeer dataset from
http://www.csd.abdn.ac.uk/∼ggrimnes/swdataset.php.</p>
      <p>Secondly, when annotating Wiki concepts we might encounter a naming
ambiguity if two people use different URNs for the same real-world concept. But a
large-scale social system as Wikipedia shows us that naming ambiguity tends to
resolve over time (people reuse socially accepted names), especially if enhanced
with a popularity-based recommendation system.</p>
      <p>The solution for the representation problem of documents vs. pages, as
presented in Sect. 2.3, has one drawback concerning existing RDF data.
Unfortunately the world is already full of RDF statements that do not clearly distinguish
documents and pages, but use URLs to refer to both. Employing our solution,
encountering a URN as subject we would know that the concept is meant, but
encountering a URL we would not be sure that the document is meant; the
URL could be a “legacy” URL that does not conform to our distinction and
is (wrongly) used to identify a concept. Our solution has therefore only limited
applicability, but that is unfortunately the nature of the problem.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>As explained in the introduction, a Semantic Wiki needs to address three
questions:</p>
      <sec id="sec-5-1">
        <title>1. how to annotate content?</title>
        <p>2. how to formally represent content?
3. how to navigate content?</p>
        <p>We have developed an elaborate model of annotations and shown how
SemperWiki –as opposed to other Semantic Wikis– supports very rich annotations.
We have shown how to formally represent content, and shown how SemperWiki –
as opposed to other Semantic Wikis– correctly distinguishes between documents
and concepts, without limiting the possible annotations. Further, we have
presented how the existing technique of faceted browsing can be adjusted to flexible
semistructured data, by automatically constructing facets from the data. Finally,
we have developed metrics for facet (predicate) selection and techniques for
object clustering inside each facet.</p>
        <p>
          Faceted browsing is a superior data exploration technique [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. We have
shown how this technique can be employed for semistructured Wiki content.
The technique works for any formal annotation, without conforming to a fixed
data-schema; and it additionally rewards semantical annotations (because
consistent use of shared terminology reduces the search space).
        </p>
        <p>We are currently extending our work in several directions. First, we are
integrating the faceted browser into the Web version of SemperWiki. Secondly,
we are developing the clustering step of the faceted browser, and evaluating the
quality of the facet construction algorithm. Thirdly, we are working on a page
recommendation system, that works in the second phase of Wiki navigation
and recommends (similar or related) pages to the current page, based on the
structure of the Wiki content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Anick</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tipirneni</surname>
          </string-name>
          .
          <article-title>Interactive document retrieval using faceted terminological feedback</article-title>
          .
          <source>In HICSS</source>
          .
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Aumueller</surname>
          </string-name>
          .
          <article-title>Semantic authoring and retrieval within a wiki</article-title>
          .
          <source>In ESWC</source>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          , et al.
          <article-title>The semantics of semantic annotation</article-title>
          .
          <source>In ODBASE</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>T.</surname>
          </string-name>
          Berners-Lee.
          <source>Putting the Web back in Semantic Web</source>
          ,
          <year>2005</year>
          .
          <source>Keynote presentation at ISWC</source>
          <year>2005</year>
          , http://www.w3.org/2005/Talks/1110-iswc-tbl/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Booth</surname>
          </string-name>
          .
          <article-title>Four uses of a URL: Name, concept, web location, and document instance</article-title>
          . http://www.w3.org/
          <year>2002</year>
          /11/dbooth-names/
          <article-title>dbooth-names clean</article-title>
          .
          <source>htm.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>W.</given-names>
            <surname>Dakka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wood</surname>
          </string-name>
          .
          <article-title>Automatic construction of multifaceted browsing interfaces</article-title>
          .
          <source>In CIKM</source>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>Eight Questions about Semantic Web Annotations</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          , Mar/Apr
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Gruber</surname>
          </string-name>
          .
          <article-title>Towards principles for the design of ontologies used for knowledge sharing</article-title>
          . In N. Guarino and
          <string-name>
            <given-names>R.</given-names>
            <surname>Poli</surname>
          </string-name>
          , (eds.)
          <article-title>Formal Ontology in Conceptual Analysis</article-title>
          and
          <string-name>
            <given-names>Knowledge</given-names>
            <surname>Representation</surname>
          </string-name>
          . Kluwer Academic Publishers,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Handschuh</surname>
          </string-name>
          .
          <article-title>Creating Ontology-based Metadata by Annotation for the Semantic Web</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Karlsruhe,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hawke</surname>
          </string-name>
          .
          <source>Disambiguating RDF identifiers</source>
          ,
          <year>2002</year>
          . http://www.w3.org/
          <year>2002</year>
          /12/rdf-identifiers/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. E. Hyv¨onen, S. Saarela, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Viljanen</surname>
          </string-name>
          . Ontogator:
          <article-title>Combining view- and ontology-based search with semantic browsing</article-title>
          .
          <source>In Proceedings of XML Finland</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. E. Oren.
          <article-title>SemperWiki: a semantic personal Wiki</article-title>
          . In SemDesk.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. E. Oren,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Breslin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>How semantics make better wikis</article-title>
          .
          <source>In WWW</source>
          .
          <year>2006</year>
          . Poster.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. E. Oren and
          <string-name>
            <given-names>R.</given-names>
            <surname>Delbru</surname>
          </string-name>
          . ActiveRDF:
          <article-title>Object-oriented RDF in Ruby</article-title>
          .
          <source>In Scripting for Semantic Web (ESWC)</source>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>S.</given-names>
            <surname>Pepper</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schwab</surname>
          </string-name>
          .
          <article-title>Curing the web's identity crisis</article-title>
          . http://www.ontopia.net/topicmaps/materials/identitycrisis.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. N. Porter, (ed.)
          <source>Webster's Revised Unabridged Dictionary</source>
          .
          <year>1913</year>
          edn.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Ranganathan</surname>
          </string-name>
          .
          <article-title>Elements of library classification</article-title>
          . Bombay: Asia Publishing House,
          <year>1962</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>F.</given-names>
            <surname>Rinaldi</surname>
          </string-name>
          et al.
          <article-title>Multilayer annotations in Parmenides</article-title>
          .
          <source>In Proc. of the K-CAP2003 workshop on Knowledge Markup and Semantic Annotation</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>P.</given-names>
            <surname>Sazedj</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Pinto</surname>
          </string-name>
          .
          <article-title>Time to evaluate: Targeting annotation tools</article-title>
          .
          <source>In Proc. of Knowledge Markup and Semantic Annotation at ISWC</source>
          <year>2005</year>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>S.</given-names>
            <surname>Schaffert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gruber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Westenthaler</surname>
          </string-name>
          .
          <article-title>A semantic wiki for collaborative knowledge formation</article-title>
          .
          <source>In Semantics</source>
          <year>2005</year>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>V.</given-names>
            <surname>Sinha</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          . Magnet:
          <article-title>Supporting navigation in semistructured data environments</article-title>
          .
          <source>In SIGMOD</source>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>R.</given-names>
            <surname>Tazzoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castagna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Campanini</surname>
          </string-name>
          .
          <article-title>Towards a semantic wiki wiki web</article-title>
          .
          <source>In ISWC</source>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. M.
          <article-title>Vo¨lkel</article-title>
          , et al.
          <article-title>Semantic wikipedia</article-title>
          .
          <source>In WWW</source>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>K.-P. Yee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Swearingen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Faceted metadata for image search and browsing</article-title>
          .
          <source>In CHI</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>